📊 Software Metric Extractor

A research-oriented tool to extract static code metrics from trending open-source Python repositories.
Ideal for academic studies, benchmarking, or enriching datasets for training machine learning models.

🌟 Overview

Software Metric Extractor automates the pipeline of:

Fetching trending Python repositories from GitHub.
Cloning and storing them locally.
Extracting static code metrics using Radon (e.g., Cyclomatic Complexity, Maintainability Index).
Storing metrics into a MySQL database using SQLAlchemy.

This tool is perfect for researchers and developers studying software quality, complexity, and maintainability patterns in open-source projects.

🧾 Project Structure

.
├── cli/
│   └── main.py               # CLI entry point
├── core/
│   ├── analyze_metrics.py    # Radon-based metrics extraction
│   ├── db.py                 # DB models & session manager
│   ├── fetch_repos.py        # GitHub scraping logic
├── docker-compose.yml        # MySQL container setup
├── projects/                 # Local clones of repositories
├── requirements.txt          # Python dependencies
├── run.py                    # CLI launcher
└── .env                      # Environment variables

🛠️ Prerequisites

Python 3.11+
Docker & Docker Compose
A valid GitHub Personal Access Token

⚙️ Setup Instructions

1. Clone this repository

git clone https://github.com/yourusername/software-metric-extractor.git
cd software-metric-extractor

2. Create a `.env` file

# .env

# GitHub API Token (for higher rate limits)
GITHUB_TOKEN=ghp_...

# MySQL Database URL (used by SQLAlchemy)
DATABASE_URL=mysql+pymysql://metrics_user:metrics_password@localhost/software_metrics
MYSQL_ROOT_PASSWORD=your_root_password
MYSQL_DATABASE=software_metrics
MYSQL_USER=metrics_user
MYSQL_PASSWORD=your_database_password

# Optional defaults
REPO_LIMIT=100
REPO_LANGUAGE=Python

3. Start MySQL with Docker

docker-compose up -d

This will spin up a MySQL 8.0 container with a database named software_metrics.

4. Create a virtual environment and install dependencies

python -m venv venv
source venv/bin/activate
pip install -r requirements.txt

5. Run the CLI

python run.py --help

CLI Commands:

🔍 Fetch repositories

python run.py fetch-repos --limit 50 --language Python

📈 Analyze and compute metrics

python run.py analyze

🧹 Reset database

python run.py reset-db

📦 Metrics Extracted

Each file and project is analyzed for:

Cyclomatic Complexity
Maintainability Index
Lines of Code (LOC)
Number of Functions
Comment Lines

🧪 Use Case: ML Research

This project was originally designed for a research study exploring the relationship between code structure and performance in large language models. The resulting dataset can be used for:

Code complexity prediction
Model training for software quality estimations
Empirical software engineering research

🧰 Tech Stack

Python 🐍
SQLAlchemy ORM
MySQL 8
Docker 🐳
Radon (code analysis)
GitHub API & GitPython
Selenium (for trending repo scraping)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

📊 Software Metric Extractor

🌟 Overview

🧾 Project Structure

🛠️ Prerequisites

⚙️ Setup Instructions

1. Clone this repository

2. Create a `.env` file

3. Start MySQL with Docker

4. Create a virtual environment and install dependencies

5. Run the CLI

CLI Commands:

🔍 Fetch repositories

📈 Analyze and compute metrics

🧹 Reset database

📦 Metrics Extracted

🧪 Use Case: ML Research

🧰 Tech Stack

📜 License

About

Uh oh!

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
cli		cli
core		core
.gitignore		.gitignore
README.md		README.md
docker-compose.yml		docker-compose.yml
requirements.txt		requirements.txt
run.py		run.py

gajeshbhat/Software-Metric-Extractor

Folders and files

Latest commit

History

Repository files navigation

📊 Software Metric Extractor

🌟 Overview

🧾 Project Structure

🛠️ Prerequisites

⚙️ Setup Instructions

1. Clone this repository

2. Create a .env file

3. Start MySQL with Docker

4. Create a virtual environment and install dependencies

5. Run the CLI

CLI Commands:

🔍 Fetch repositories

📈 Analyze and compute metrics

🧹 Reset database

📦 Metrics Extracted

🧪 Use Case: ML Research

🧰 Tech Stack

📜 License

About

Resources

Uh oh!

Stars

Watchers

Forks

Uh oh!

Languages

2. Create a `.env` file