Skip to content

abir0/SJR-Journal-Ranking

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

36 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Colab Contributors MIT License Linkedin

SJR Journal Ranking Analysis

A web scraping and visualization project on SJR and WoS journal indexes.
View Dashboard

Table of Contents
  1. Problem Statement
  2. Built With
  3. Installation
  4. Results

Problem Statement

This project is a data scraping, analysis, and visualization project on Research Journals. The project is divided into two parts: the first part is the web scraping part, which is done using Selenium and Python; the second part is the data analysis and visualization part, which is done using Tableau. The project is done as a part of the 1st capstone project of MasterCourse Data Science Cohort 2 program.

The data is scraped from the following websites:

An external dataset is also used in this project:

From these 3 sources, the following information is scraped:

  • Journal Name or Title
  • Subject Area
  • Open Access Status
  • Publisher
  • Country
  • Coverage Year
  • Journal Rank
  • SJR Index
  • Quartile
  • H-Index
  • CiteScore
  • References Count
  • Citations Count
  • Documents Count ...

The scraped data is then cleaned and analyzed using Python libraries such as Pandas, Numpy, Matplotlib, and Seaborn. The cleaned data is then visualized using Tableau. The final dataset can be found in kaggle.

Built With

Python libraries and softwares used in this project:

  • Selenium
  • Pandas
  • Tableau

Installation

This project is done using Python 3.11.0. Please install the latest version of Python before running the project.

Below are the steps to run the project:

  1. Clone the repo
git clone https://github.com/abir0/SJR-Journal-Ranking.git
  1. Intialize and activate virtual environment
virtualenv --no-site-packages  venv
source venv/bin/activate
  1. Install dependencies
pip install -r requirements.txt
  1. Download Chrome WebDrive from https://chromedriver.chromium.org/downloads and add the path to the chromedriver.exe file in PATH environment variable.

  2. Run the scraper scripts

python src/sjr_scraper.py
python src/wos_scraper.py
  1. Run all the cells in the data transformation notebook in google colab or download the notebook and run it in Jupyter.

  2. You will get a file named combined_journal_ranking_data.csv. This is the final data.

  3. Open the SJR Journal Ranking Analysis.twb file in Tableau (or open the public tableau link) and connect the combined_journal_ranking_data.csv file to the workbook.

Results

The final dashboard can be found here.

Here are the two dashboards:

Key findings from the analysis:

  • From the correlation analysis, it is found that there is a positive correlation between SJR Index and CiteScore, H-index, and Cites per Docs. So, these metrics are better indicators than the simple counts of citations, references, and documents.
  • But for lower-ranking journals, these metrics do not represent much significance due to higher randomness (note that correlation plots get more scattered to the right).
  • Open Access journals have a higher average of Citations per Document than non-Open Access journals.
  • One interesting observation: based on the number of documents, citations, and references MDPI is among the top 5 publishers. This is because MDPI publishes a lot of journals, but the quality of the journals is not as high as the top 5 publishers which is reflected by the poor CiteScore.
  • Based on CiteScore, the top 5 publishers are: Wiley, Elsevier, Springer, Nature Portfolio, and Routledge.
  • The top 5 countries with the highest number of journals are: United States, United Kingdom, Netherlands, Germany, and Switzerland.
  • Medicine and Social Sciences are the top 2 subject areas that have the most number of documents, references, and combined H-index.

About

A web scraping and visualization project on SJR and WoS journal indexes.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published