Dependencies
- install dependencies with
pip install -r requirements.txt - In
ccc/scripts, put Blazegraph.jar Direct link to download
- Change
bee/conf.pycontent with content ofbee/conf_local.py - Change
ccc/conf_spacin.pycontent with content ofccc/conf_spacin_local.py - Shell (from
ccc/scripts):python3 -m script.ccc.run_bee. It creates a folder calledtestin the same folderscripts. - OUTPUT JSON:
scripts/test/share/ref/todo - ERRORS:
scripts/test/index/ref/issue
- Empty/remove the folder
test/ - Run:
python3 -m script.ccc.run_bee
- Include the XML file in the folder
script/ccc/ - Uncomment lines 39, 40 of
script/ccc/jats2oc.py - Change in
script/ccc/test_bee.pythe name of the file to be parsed - Run:
python3 -m script.ccc.test_bee
- INPUT JSON:
scripts/test/share/ref/todo - OUTPUT RDF (dump):
scripts/ccc/ - Run Blazegraph:
java -Dfile.encoding=UTF-8 -Dsun.jnu.encoding=UTF-8 -server -Xmx1g -Djetty.port=9999 -Dbigdata.propertyFile=ccc.properties -jar blazegraph.jar - Run:
python3 -m script.ccc.run_spacin
- Empty
scripts/ccc/BUT do not removescripts/ccc/context.json - Remove
scripts/ccc.jnl(quit the .jar first!) - If you want to rerun SPACIN on the same JSON files, move the content of
scripts/test/share/ref/doneintoscripts/test/share/ref/todo
Other notes:
- do not change the config file
script/ccc/conf_bee.py - do not delete
context.jsonincluded inscripts/ccc/when rerunning SPACIN
BEE and SPACIN have been enhanced in order to exploit respectively a CSV dataset generated with europe-pubmed-central-dataset tool and papendex tool.
-
(BEE) in
scripts/script/bee/conf.pythere are:- PARALLEL_PROCESSING: set to True in order to enable the improvement made
- dataset_reference: absolute reference to the CSV generated
- article_path_reference: absolute reference to the directory where all the XML articles are stored
- n_process: the number of processes that will be spawned.
- doc_for_process: the CSV will be splitted in a number of chunks (one for each process), having the number of docs specified here
-
(SPACIN) in
script/ccc/conf_spacin.pythere are:- crossref_query_interface_type: set to 'local' if you want to exploit the local index, otherwise 'remote'
- orcid_query_interface_type = set to 'local' if you want to exploit the local index, otherwise 'remote'