Skip to content

Conversation

@kimlee87
Copy link
Collaborator

@kimlee87 kimlee87 commented Dec 11, 2025

  • Update function labelstudio_to_png() in groundtruth.py
    • to draw rectangles at correct positions (x and y from LabelStudio is the top left corner, not the center)
    • add suffix _dup to 18 duplicate image names with an overwrite flag so we can use 82 or 100 annotated images (overwrite=True or False, respectively)
  • Preppare data for training by grouping by year and page type before shuffling and splitting
  • Document setup steps for training Eynollah
  • Add a script to download original files corresponding to labeled files, as well as unseen data from SDS.

@kimlee87 kimlee87 changed the title Update labelstudio2png to draw rectangles correctly and to handle duplicate file names Train eynollah on labeled data Jan 2, 2026
@kimlee87 kimlee87 marked this pull request as ready for review January 7, 2026 10:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants