-
Notifications
You must be signed in to change notification settings - Fork 1
SQL to Neo4j
Every piece of software is using UTF-8 to handle data. If the encoding of the terminal used to run the scripts is not setup to UTF-8 too, errors might happen. In order to play it safe, you can add
export LC_ALL=es_ES.utf8
to your .bash_profile assuming you are using bash and the machine supports es_ES.utf8.
A table that doesn't fit in memory may be a problem when trying to disambiguate authors and/or orgnanizations: it must be read and written to a csv blockwise, and it might happen than an author read at e.g., the 1000th block is mapped (disambiguated) to one read at the 1st block. The newly read author should be dismissed (since it was already written before, and the disambiguated_id must be unique), but in order to know that you need to keep tabs on the authors already read.
Tables on which disambiguation is to be applied are read in one go.
When merging into neo4j authors/organizations, it must be taken into account that neo4j only merges a node (into an existing one) if the properties that are present in both exactly match. This means that if two nodes have the same disambiguated_id, but differ in the property name (e.g., Ana in one of them and A. in the other), they will not be merged, but neo4j will try to create a new node for the newcomer. This will give rise to an error since two nodes with the same disambiguated_id are not allowed, the latter being a unique property.
Properties with the same meaning (e.g., name) can have different names depending on the table (name_patents, name_projects...). After the import process is over, this can be fixed.