feat: ETL Pipeline Notebooks and Fix LinkML SQLAlchemy generation#18
feat: ETL Pipeline Notebooks and Fix LinkML SQLAlchemy generation#18petercarbsmith merged 10 commits intomainfrom
Conversation
…ade new ETL notebook and modified some of the gsheet extraction notebook
…. Committing before that"
| # --- Import generated models and their metadata --- | ||
| # from ca_biositing.datamodels.schemas.generated.census_survey import metadata as census_metadata | ||
| # from ca_biositing.datamodels.schemas.generated.geography import metadata as geography_metadata | ||
| from ca_biositing.datamodels.database import Base |
There was a problem hiding this comment.
note: migration only works for me if this is commented out. that's just a me thing though, don't know if this applies to others
avi9664
left a comment
There was a problem hiding this comment.
I checked the ETL pipeline part of it and it's good!
|
Hi! I ran the pipeline notebooks and they couldn't access the database because the DATABASE_URL that was used didn't work with mine, so I changed Some of the code in the latter half of |


📄 Description
Key Changes
🔧 LinkML/SQLAlchemy Model Generation Fixes
'Resource.id') to proper table name references (e.g.,'resource.id') in snake_case formatgenerate_sqla.pyto handle any ForeignKey column references, not just.idcolumns📊 ETL Pipeline Improvements & New Notebooks
New Interactive ETL Notebooks
etl_notebook.ipynb: Clean, streamlined version of the Google Sheets extraction workflow with improved error handling and data validationgsheet_extraction_notebook.ipynb: Comprehensive notebook containing additional helper functions for:ETL Pipeline Enhancements
FOR THE MOST USEFUL BIT @mglbleta and @avi9664, please have a look at the etl_notebook. In essence it does a couple operations.
My hope this can be used as a model for how to handle everything from extract to transform. It also serves as an example of how to do imports and work within a Jupyter notebook, which will be the most efficient way to see the results of your code without having to rebuild the containers constantly. Eventually, we will transition code out of the notebooks and into .py modules for production.
🏗️ Infrastructure & Database Configuration
database.pyandconfig.pyto the datamodels package for centralized database management📝 Development Experience
Technical Implementation
Model Generation Fixes