Skip to content

Conversation

@Sean1572
Copy link
Collaborator

No description provided.

* Added some experimental inferance pipeline

Needed this to test unseen burrowing owl dataset. Will help alot with getting data formatted correctly

* Add testing code for loading in models for inferance

* feat: add inferance pipeline

* Block CSVs from commits

* Move visualization packages to optional dependecy

* Clean code (remove comments and remove run_spefific code from library)

* Clean up timm_model

* Add BirdMAE Model Training
@Sean1572 Sean1572 marked this pull request as draft October 17, 2025 21:25
@Sean1572
Copy link
Collaborator Author

Merge only after merging inference pipeline see #88

@Sean1572 Sean1572 marked this pull request as ready for review December 12, 2025 23:40
BUG NOTICE: Continue broke inferance in waveform_preprocessors.py... need to rethink how addressing corrutpted audio files works in model training. Maybe replace with empty data so it doesn't break training?
@kgarwoodsdzwa
Copy link
Member

@Sean1572 was thinking of merging this before doing the push to main, but there's merge conflicts with the dev branch. i believe mainly with the inference script, train script, and trainer script, and raw data extractor etc. maybe this could be resolved by putting all the revised ones needed in a jupyter notebook in an example folder?

except Exception as e:
print(e)
print("File Likely is corrupted, moving on")
break
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kgarwoodsdzwa I realized an issue with this section of the code, part of the reason I didn't want it pushed to main

This was my cheap way of handling file corruption. Previously this was a continue and it was fine (just a smaller batch size). This however is a huge issue in inference because it can disalign what is the file name to the model output (I passed in 15 files, but only get 14 predictions...). Raising an error fixed this because I could just skip the batch. During training we have no such error handling since trainer doesn't allow for it.

So we need to tackle error handling as part of this repo

Linted most the model_trainer with flake8 and some of data downloader, realized that data downloader needs a major code clean up
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

San Diego Fewshot Dataset and Training add tools to download from xeno canto for other bird models and species

3 participants