Skip to content

ucp4496/data-collection-pipeline

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Data Collection Pipeline

This repository contains the Data Collection Pipeline project for Model-Driven Development.


Running the CLI

To fetch commits and export them to a CSV file, run:

python -m src.repo_miner fetch-commits --repo owner/repo [--max 100] --out commits.csv
  • Replace owner/repo with the GitHub repository you want to analyze.
  • The --max flag is optional and limits the number of commits fetched.
  • The --out flag specifies the output file (e.g., commits.csv).

To fetch issues and export them to a CSV file, run:

python -m src.repo_miner fetch-issues --repo owner/repo [--state all|open|closed] [--max 50] --out issues.csv
  • Replace owner/repo with the GitHub repository you want to analyze.
  • The --state flag is optional and filters issues by status (all, open, or closed). Default is all.
  • The --max flag is optional and limits the number of commits fetched.
  • The --out flag specifies the output file (e.g., commits.csv).

To summarize commits and export them to a CSV file, run:

python -m src.repo_miner summarize --commits commits.csv --issues issues.csv
  • The --commits flag specifies the path to the CSV file containing commit data.
  • The --issues flag specifies the path to the CSV file containing issue data.

Note: Depending on your configuration, you may need to use python3 instead of python.


Dependencies

If you encounter missing dependency warnings, install the required packages:

pip install -r requirements.txt

Or, on some systems:

pip3 install -r requirements.txt

Running Tests

Run all tests using:

pytest

For more detailed output, use the verbose flag:

pytest -v

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages