Getting ready
You need to install OpenRefine and download a data file to follow this lesson.
Installing and running OpenRefine
OpenRefine is a free, open-source Java application. You can download OpenRefine from http://openrefine.org/download.html. This lesson has been tested with all versions of OpenRefine up to the latest tested version, 3.5.2
Packages are available on https://openrefine.org/download.html for Windows, MacOS, and Linux. Please download the latest stable version, choosing the “kit” for your operating system. Current versions of the “Windows kit with embedded Java” and “Mac kit” include everything you need to run OpenRefine. The “Linux kit” and traditional “Windows kit” require a “Java Runtime Environment” (JRE) installed on your system (see notes below).
If you are using an older version of OpenRefine, it is recommended you upgrade to the latest tested version.
Please follow the installation instructions in the OpenRefine User Manual: Installation Instructions
Notes:
- When you download OpenRefine for Windows or Linux from the address above, you are downloading an archive file (zip or tar). To install OpenRefine unzip the downloaded file to a permanent location on your computer. This can be to a personal directory or to an applications or software directory - OpenRefine should run wherever you put the unzipped folder. The location has to be a “local” drive as problems have been reported trying to run OpenRefine from a Network drive.
- The options “Windows kit with embedded Java” and “Mac kit” include Java as part of the package. You do not need to install Java if you use one of these kits. This is the preferred method on Windows and Mac systems.
- On Mac, depending on your privacy settings, you may get a safety warning that will prevent you from opening OpenRefine. To bypass that, in “System Preferences”, click “Security & Privacy”, then “click General”. Click the lock and enter your password to make changes. Select App Store under the header “Allow apps downloaded from”.
- On Windows, if you use the traditional “Windows kit” without embedded Java, you will need a “Java Runtime Environment” (JRE) on your system. If you do not already have JRE or JDK installed, you can visit Adopt OpenJDK or Oracle Java to download an installer package. Please note that Oracle significantly changed their license terms in 2019 limiting it to “personal use” without a paid license. If you use OpenRefine at work or in research, OpenJDK is preferred.
- On Linux a “Java Runtime Environment” (JRE) will be required to run OpenRefine. If you do not already have
JRE or JDK installed on your system, most distribution repositories will contain OpenJRE / OpenJDK packages.
Install the default version available from your distribution. For example, on Ubuntu/Debian:
sudo apt install default-jre
. - OpenRefine does not support Internet Explorer. Please use Firefox, Chrome or Safari instead.
Downloading the datasets
For this workshop we will be using two datasets. You should download both csv files DOAJ_big and DOAJ_small and make sure to have them available on your Desktop or in a directory you can easily locate on your computer.
Exiting OpenRefine
To exit OpenRefine, close all the browser tabs or windows, then navigate to the command line window. To close this window and ensure OpenRefine exits properly, hold down [control] and press [c] on your keyboard. This will save all changes to your projects.
Getting help
If you encounter problems installing or running OpenRefine, a good source of support is the OpenRefine mailing list and user forum. Include your operating system when searching to find the most relevant answers for your issue, such as threads related to Windows, macOS, or Linux.
You may also want to check the Stack Overflow OpenRefine tag or the OpenRefine Gitter room.
There are also general and specialist tutorials about using OpenRefine available on the web, including:
- Official wiki List of OpenRefine External Resources
- Getting started with OpenRefine by Thomas Padilla
- Cleaning Data with OpenRefine by Seth van Hooland, Ruben Verborgh and Max De Wilde
- Blog posts on using OpenRefine from Owen Stephens
- Identifying potential headings for Authority work using III Sierra, MS Excel and OpenRefine
- Free your metadata website
- Data Munging Tools in Preparation for RDF: Catmandu and LODRefine by Christina Harlow
- Cleaning Data with OpenRefine by John Little
- OpenRefine Blog