Skip to content

infoqualitylab/text-mining-scholarly-API

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Text mining scholarly API

The project helps download the full text of scholarly publications for a given list of DOIs using the Crossref API. It then stores the full text files in a MongoDB database.

This code is used in the following studies:

Zheng, H. Fu, Yuanxi, Sarol, J. M., Sarraf, I., Schneider J. “Addressing Unreliability Propagation in Scientific Digital Libraries.” Accepted to the ACM/IEEE-CS Joint Conference on Digital Libraries 2024, Hong Kong. https://doi.org/10.1145/3677389.3702526

Sarraf, I., Fu, Y., Schneider, J. (2023, October 27). “Text Mining Scholarly Publications using APIs.” METSTI 2023: Workshop on Informetric, Scientometric, and Scientific and Technical Information Research, Association for Information Science and Technology, London. https://doi.org/10.5281/zenodo.10581542


The project runs with Python3 and requires the following Python libraries:

  1. habanero
  2. pymongo
  3. bson
  4. requests
  5. lxml
  6. io

For more detailed descriptions for running, please refer to the requirements.txt file.

dois.txt : Text file that contains the 286 DOIs

output.txt : Text file that produces after running through the API pipeline

pipeline-habanero.py : Python script that contains code to run the API pipeline

scopus-api-mining.py : Python script that extracts full text for Elsevier DOIs

requirements.txt : Text file that contains which Python packages are needed and a thorough explanation for how to run the code

setup.sh : UNIX shell scripting file that downloads all necessary packages needed to run the Python files

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Packages

No packages published

Contributors 2

  •  
  •