This repository is dedicated to collecting and scraping KBO (Korea Baseball Organization) data. It includes scripts and processes for gathering player statistics, team data, game results, and other related information.
- Scrape KBO data including game results, schedules, and player statistics
- Supports various output formats:
Parquet,JSON,CSV - Flexible command-line interface with multiple scraping commands
- Filter by year, specific date, and series ID (league/stage type)
- Python 3.12+ is required.
-
Clone the repository
git clone https://github.com/kbo-data-portal/collector.git cd collector -
Install dependencies
pip install -r requirements.txt
This project provides a command-line tool for scraping KBO data. You can specify the target data and output format using commands.
python run.py <command> [options]<command>— The target data type (game, schedule, player)[options]— Additional filters and configurations
| Option | Description | Default |
|---|---|---|
-y, --year |
Specify the year | None |
-d, --date |
Specific date in YYYYMMDD format |
None |
-f, --format |
Output format: parquet, json, csv |
csv |
-s, --series |
Series ID to indicate league/stage type (see Series ID) | 0 (Regular Season) |
If neither --year nor --date is specified, the program will fetch all available data from 1982 to the present.
-
game- Scrape game-related data:
python run.py game -y 2014 -f csv # Season data python run.py game -d 20141111 -f json # Specific date data
- Scrape game-related data:
-
schedule- Scrape schedule of games:
python run.py schedule -y 2014 -f parquet
- Scrape schedule of games:
-
player- Scrape player statistics:
python run.py player -y 2014 -f csv
- Scrape player statistics:
For detailed command usage, run:
python run.py <command> --helpEach game record includes a SR_ID field representing the league/stage type:
| SR_ID | Description |
|---|---|
| 0 | Regular Season |
| 1 | Preseason Game |
| 3 | Semi-Playoffs |
| 4 | Wild Card Round |
| 5 | Playoffs |
| 7 | Korean Series |
| 8 | International Competitions |
| 9 | All-star Game |
You can use this field to filter games based on the competition stage.
This project is licensed under the MIT License. See the LICENSE file for details.