Skip to content

Updated gui website images skills and events to current Global+JP#111

Open
l2ofl wants to merge 3 commits intoMagody:mainfrom
l2ofl:supp_char_skill_scrape
Open

Updated gui website images skills and events to current Global+JP#111
l2ofl wants to merge 3 commits intoMagody:mainfrom
l2ofl:supp_char_skill_scrape

Conversation

@l2ofl
Copy link
Copy Markdown

@l2ofl l2ofl commented Feb 20, 2026

-Revised datasets/scrape_events.py to handle naming cards json name and id attributes more uniquely as well as the images for them. Additionally added an exception array to handle exception like 2x Silence Suzuku SPD SSR -Wrote a wrapper datasets/run_characters_supports_cpu.py that scrapes all availble data character and support data on gametora using lxml and robots.txt for the site. Additionally uses playwright to run gametora in a headless mode and capture a skills.json and then run it through scrape_skills.py -The latest data as of 2/20/2026 is reflected in \web\dist and \web\public -Updated requirements.txt to include beautifulsoup4, lxml and playwright -detailed features of run_characters_supports_cpu.py

Confirm npm works (npm --version)

Get sitemap URL from robots.txt

Fetch sitemap(s), extract all URLs

Extract support slugs + character slugs

Merge+dedupe supports → list

Merge+dedupe characters → list

Combine both + final dedupe

Write datasets/in_game/events.json

Run build_catalog.py

Run npm run build (retry with install if needed)

Hardcoded scenario events to append to events.json

Cleaned up old images in web/public/events folder

Scrape skills from gametora using playwright to run headless chromium

Probably some other stuff but I forgor and this is too long already

-Revised datasets/scrape_events.py to handle naming cards json name and id attributes more uniquely as well as the images for them. Additionally added an exception array to handle exception like 2x Silence Suzuku SPD SSR
-Wrote a wrapper datasets/run_characters_supports_cpu.py that scrapes all availble data character and support data on gametora using lxml and robots.txt for the site. Additionally uses playwright to run gametora in a headless mode and capture a skills<hash>.json and then run it through scrape_skills.py
-The latest data as of 2/20/2026 is reflected in \web\dist and \web\public
-Updated requirements.txt to include beautifulsoup4, lxml and playwright
-detailed features of run_characters_supports_cpu.py

Confirm npm works (npm --version)

Get sitemap URL from robots.txt

Fetch sitemap(s), extract all URLs

Extract support slugs + character slugs

Merge+dedupe supports → list

Merge+dedupe characters → list

Combine both + final dedupe

Write datasets/in_game/events.json

Run build_catalog.py

Run npm run build (retry with install if needed)

Hardcoded scenario events to append to events.json

Cleaned up old images in web/public/events folder

Scrape skills from gametora using playwright to run headless chromium

Probably some other stuff but I forgor and this is too long already
run_characters_supports_cpu wrapper creates a folder to hold each individual event since it multithreads calls to scrape_events.py does not need to be committed as script deletes it on each run anyways. Additionally update gitignore to well ignore it
Grab new icons if available for skills and grab more data from skills as well as update icons for skills if not updated
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant