Skip to content

datasets: refresh GameTora event dataset and scraper pipeline#115

Open
ETO111139 wants to merge 3 commits intoMagody:mainfrom
ETO111139:dataset/gametora-refresh-2026-03-20
Open

datasets: refresh GameTora event dataset and scraper pipeline#115
ETO111139 wants to merge 3 commits intoMagody:mainfrom
ETO111139:dataset/gametora-refresh-2026-03-20

Conversation

@ETO111139
Copy link
Copy Markdown

This PR refreshes the Uma Musume dataset from GameTora and includes the scraping pipeline needed to keep it updated.

What is included

  • integrate scrape_skills.py, scrape_events.py, and run_characters_supports_cpu.py into the repo
  • add required Python dependencies for the scraper flow
  • refresh datasets/in_game/skills.json
  • refresh datasets/in_game/events.json
  • rebuild datasets/in_game/event_catalog.json
  • update web/public/events/*, web/public/icons/skills/*, and rebuilt web/dist/*
  • fix a Windows cp1252 console print issue in build_catalog.py flow so catalog rebuild works on Windows too

Validation performed on March 20, 2026

  • scraped skills successfully from the current GameTora skills JSON
  • scraped supports and characters from the current GameTora sitemap
  • rebuilt catalog successfully
  • ran npm ci
  • ran npm run build

Dataset status after refresh

  • supports in events.json: 492
  • trainees in events.json: 245
  • scenarios in events.json: 2
  • total root entries in events.json: 739
  • entries in event_catalog.json: 9313

Coverage note

  • current GameTora sitemap resolved to 526 support URLs and 245 character URLs
  • final missing count after scrape: 0 supports, 0 trainees
  • some support URLs in the sitemap are duplicates pointing to identical card/event data, so the final merged support count is 492 unique support entries rather than 526 raw URLs

This PR intentionally contains only the dataset refresh work and the scraper/build support needed for it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant