-
Notifications
You must be signed in to change notification settings - Fork 140
First principles datasets #181
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
First principles datasets #181
Conversation
Data comes from two symbolic regression repos: - Miles Cranmer's PySR: https://github.com/MilesCranmer/PySR - Etienne Russeil et al.'s MvSR: https://github.com/erusseil/MvSR-analysis They are all datasets that have a first-principle equation derived from data and used in their respective papers to show how symbolic regression has the potential of retrieving the original equation when only observational data is available. While some of them have just a few samples and others are synthetically generated, they are challenging for symbolic regression methods and can be used to evaluate these algorithms. The idea of pushing them into PMLB is to help other users to quickly set up experiments with the data. I still need to write proper metadata for them.
CI was failing to parse the contents of these specific ones.
Created by https://github.com/gAldeia/pmlb/actions/runs/11616806556\nfrom f23672c on 2024-10-31
|
Thank you for this PR, @gAldeia! 💯 🌈 If you could update the metadata.yaml files (e.g. datasets/first_principles_rydberg/metadata.yaml), I will review! |
…pmlb into symreg_first_principles
|
Done! Let me know if everything is ok or if it needs a major review. |
gAldeia
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is correct, I accidentally pushed changes from another branch into this one
|
ah I think one of the datasets' names are not matching (directory name vs. metadata dataset field):
|
…pmlb into symreg_first_principles
Created by https://github.com/gAldeia/pmlb/actions/runs/13465733857\nfrom a226e6b on 2025-02-21
Data comes from two symbolic regression repos:
They are all datasets that have a first-principle equation derived from data and used in their respective papers to show how symbolic regression has the potential of retrieving the original equation when only observational data is available.
While some of them have just a few samples and others are synthetically generated, they are challenging for symbolic regression methods and can be used to evaluate these algorithms.
The idea of pushing them into PMLB is to help other users to quickly set up experiments with the data.
I still need to write proper metadata for them. My understanding is that opening a PR will trigger a GA that will push some new files to my fork, which I should complete before the new datasets go to revision. Please let me know if there is there anything I got wrong and need to update!