-
Notifications
You must be signed in to change notification settings - Fork 0
Frequently Asked Questions
This is a collection of frequently asked questions on OpenRefine. Feel free to ask your own question on the OpenRefine mailing list and we'll try to answer to the best of our abilities and add them to this list.
- Ensure that you have a Java JRE installed on your system. And have at least 1 GB of RAM available for it.
Send your question to the OpenRefine mailing list.
Consider first discussing it on the mailing list. This will likely help characterize the issue for a good quality bug report or feature request which you can file on the issue tracker.
- On Linux, If you run Refine from the terminal, you can point to the workspace directory through the -d parameter, e.g.,
./refine -p 3333 -i 0.0.0.0 -m 6000M -d /where/you/want/the/workspace
- Alternatively, you can update and add a preference at http://127.0.0.1:3333/preferences ,
KEY = refine.data_dir VALUE = T:\MyOpenRefineDataFolder
On Linux, Mac from the command line,
./refine -i 127.0.0.1.
On Windows use a slash character like,
C:>refine /i 127.0.0.1:8088)
On Linux, Mac from the command line,
./refine -i 127.0.0.1 -p 3334
On Windows, use a slash character like,
C:>refine /i 127.0.0.1 /p 3334)
You can also edit the refine.ini file to permanently set the IP Address and Port.
You might need to double check your Chrome or Firefox proxy settings. In Firefox options->advanced->network->connection->settings and switch from "use system proxy settings" to "auto-detect proxy settings for this network".
If you get a message "Network Error (tcp_error)" in your browser, you might also try to uncheck "automatically detect settings" and also add an exception to your firewall rules to allow 127.0.0.1 (or whatever IP address you decide to run OpenRefine with)
The regular expression syntax for GREL is that of Java regex, not of Javascript. See GREL Regular Expressions.
You can also use Jython Regex instead of GREL functions and use a Custom Text Facet with something like this:
import re g = re.search(ur"\u2014 (.*),\s*BWV", value) return g.group(1)
What syntax should I use with GREL for constructing URLs correctly and avoid HTTP errors and other pitfalls, for instance, when working with JSON strings within a URL or to create a HYPERLINK, etc ?
A good practice is to use ' single quotes for your Refine Expression syntax and reserve " double quotes for the URL syntax parts. Also make sure to escape() your cell values used, where necessary.
EXAMPLES:
'https://www.googleapis.com/freebase/v1/mqlread?query={"mid":null,"/type/object/key":{"namespace":"/authority/fmd/model","value":"'+escape(cells.ModelName.value, "url")+'"}}''=HYPERLINK("http://listings.listhub.net/pages/BHAMMLSAL/' + value + '",' + value + ')'- Flag (or star) the row(s)
- In the dropdown above the flag you can get a facet, by going to Facet > Facet by flag.
- From the facet that opens select the 'true' option.
- In the dropdown menu above the flag you can go to Edit Rows > Remove all matching rows.
You can go to http://127.0.0.1:3333/preferences and set the facet limit using the preference key "ui.browsing.listFacet.limit".
Several options:
- There is a shortcut for this, Facet → Customized facets → Duplicates facet
- Create a Text Facet on a column and then in the facet click "Sort by: count". Any facets with a count of 2 or more are duplicates
- Use the facetCount() function like facetCount(value, 'value', 'column name') > 1 and select 'true' to show all rows that have duplicates
Can OpenRefine be used as a piece of a larger ETL pipeline?
You can use one of the OpenRefine client libraries for automating OpenRefine programatically.
It's worth pointing out that not all Refine features can work unsupervised and without human interaction (clustering, for example), but some can.
Here is some further discussion and a project: