Skip to content

Conversation

@gAldeia
Copy link
Contributor

@gAldeia gAldeia commented Sep 3, 2024

Data comes from two symbolic regression repos:

They are all datasets that have a first-principle equation derived from data and used in their respective papers to show how symbolic regression has the potential of retrieving the original equation when only observational data is available.

While some of them have just a few samples and others are synthetically generated, they are challenging for symbolic regression methods and can be used to evaluate these algorithms.

The idea of pushing them into PMLB is to help other users to quickly set up experiments with the data.

I still need to write proper metadata for them. My understanding is that opening a PR will trigger a GA that will push some new files to my fork, which I should complete before the new datasets go to revision. Please let me know if there is there anything I got wrong and need to update!

gAldeia and others added 3 commits September 3, 2024 18:48
Data comes from two symbolic regression repos:
- Miles Cranmer's PySR: https://github.com/MilesCranmer/PySR
- Etienne Russeil et al.'s MvSR: https://github.com/erusseil/MvSR-analysis

They are all datasets that have a first-principle equation
derived from data and used in their respective papers
to show how symbolic regression has the potential of retrieving
the original equation when only observational data is available.

While some of them have just a few samples and  others are synthetically
generated, they are challenging for symbolic regression methods and
can be used to evaluate these algorithms.

The idea of pushing them into PMLB is to help other users to
quickly set up experiments with the data.

I still need to write proper metadata for them.
CI was failing to parse the contents of these specific ones.
@trangdata
Copy link
Collaborator

Thank you for this PR, @gAldeia! 💯 🌈

If you could update the metadata.yaml files (e.g. datasets/first_principles_rydberg/metadata.yaml), I will review!

@gAldeia
Copy link
Contributor Author

gAldeia commented Feb 20, 2025

Done! Let me know if everything is ok or if it needs a major review.
Thanks!

Copy link
Contributor Author

@gAldeia gAldeia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is correct, I accidentally pushed changes from another branch into this one

@trangdata
Copy link
Collaborator

ah I think one of the datasets' names are not matching (directory name vs. metadata dataset field):

  • first_principles_supernovae_zr
  • first_principles_supernovae_zg
    Could you fix plz @gAldeia

@trangdata trangdata merged commit 7c1f4bd into EpistasisLab:master Feb 25, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants