Text mining scholarly API

The project helps download the full text of scholarly publications for a given list of DOIs using the Crossref API. It then stores the full text files in a MongoDB database.

This code is used in the following studies:

Zheng, H. Fu, Yuanxi, Sarol, J. M., Sarraf, I., Schneider J. “Addressing Unreliability Propagation in Scientific Digital Libraries.” Accepted to the ACM/IEEE-CS Joint Conference on Digital Libraries 2024, Hong Kong. https://doi.org/10.1145/3677389.3702526

Sarraf, I., Fu, Y., Schneider, J. (2023, October 27). “Text Mining Scholarly Publications using APIs.” METSTI 2023: Workshop on Informetric, Scientometric, and Scientific and Technical Information Research, Association for Information Science and Technology, London. https://doi.org/10.5281/zenodo.10581542

The project runs with Python3 and requires the following Python libraries:

habanero
pymongo
bson
requests
lxml
io

For more detailed descriptions for running, please refer to the requirements.txt file.

dois.txt : Text file that contains the 286 DOIs

output.txt : Text file that produces after running through the API pipeline

pipeline-habanero.py : Python script that contains code to run the API pipeline

scopus-api-mining.py : Python script that extracts full text for Elsevier DOIs

requirements.txt : Text file that contains which Python packages are needed and a thorough explanation for how to run the code

setup.sh : UNIX shell scripting file that downloads all necessary packages needed to run the Python files

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Text mining scholarly API

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
README.md		README.md
dois.txt		dois.txt
output.txt		output.txt
pipeline-habanero.py		pipeline-habanero.py
requirements.txt		requirements.txt
scopus-api-mining.py		scopus-api-mining.py
setup.sh		setup.sh

infoqualitylab/text-mining-scholarly-API

Folders and files

Latest commit

History

Repository files navigation

Text mining scholarly API

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages