sustainability-software-lab · mglbleta · Mar 13, 2026 · Mar 21, 2026 · Mar 21, 2026 · Mar 31, 2026
diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
@@ -39,8 +39,8 @@ Your contributions make this project better—thank you for your support! 🚀
 1. Set up your development environment with `pixi install`.
 2. Install pre-commit hooks with `pixi run pre-commit-install`.
 3. Create a feature branch.
-4. Make your changes and ensure tests and pre-commit checks pass. . Submit a
-   pull request.
+4. Make your changes and ensure tests and pre-commit checks pass. Submit a pull
+   request.
 
 ### Configuring Pre-commit
 

diff --git a/ERD_VIEW.md b/ERD_VIEW.md
diff --git a/anaconda_projects/db/project_filebrowser.db b/anaconda_projects/db/project_filebrowser.db
diff --git a/docs/ERD_VIEW.md b/docs/ERD_VIEW.md
diff --git a/docs/architecture.md b/docs/architecture.md
@@ -2,12 +2,7 @@
 
 ## Overview
 
-CA-Biositing is a comprehensive geospatial bioeconomy platform for biodiversity
-data management and analysis, specifically focused on California biositing
-activities. The project combines ETL data pipelines, REST APIs, geospatial
-analysis tools, and web interfaces to support biodiversity research and
-conservation efforts. It processes data from Google Sheets into PostgreSQL
-databases and provides both programmatic and visual access to the data.
+The CA-Biositing system ingests agricultural and geospatial data from multiple external sources to support biomass siting analysis and related decision-making workflows. This architecture document describes how data flows through ETL pipelines, is validated and stored in relational and geospatial databases, and is orchestrated using workflow tooling. The diagram below provides a high-level view of the core services, data stores, and integrations that make up the platform.
 
 ## System Architecture Diagram
 
@@ -95,7 +90,7 @@ end
 ### Backend Infrastructure
 
 - **Programming Language**: Python 3.12+
-- **Database**: PostgreSQL 13+ with PostGIS extension
+- **Database**: PostgreSQL 13+ (17 in dev/staging) with PostGIS extension
 - **Database Migrations**: Alembic for schema versioning
 - **Data Models**: SQLModel (combining SQLAlchemy + Pydantic)
 - **API Framework**: FastAPI with automatic OpenAPI documentation
@@ -127,13 +122,15 @@ end
 
 ### Cloud Infrastructure & Services
 
-- **Google Cloud Platform**:
+- **Google Cloud Platform (GCP):**
   - Google Sheets API for data ingestion
-  - Google Cloud credentials management
-  - Potential cloud deployment target
-- **Database Hosting**: Containerized PostgreSQL (development), cloud SQL
-  (production)
-- **Container Registry**: For Docker image distribution
+  - Google Cloud Secret Manager for credentials
+- **Production deployment:** All core infrastructure (database, application
+  containers, orchestration, and secrets) runs on GCP using Cloud SQL, Cloud
+  Run, Artifact Registry, and Secret Manager
+- **Database Hosting:** PostgreSQL 17+ with PostGIS (Cloud SQL on GCP for
+  production, local PostGIS for development)
+- **Container Registry:** GCP Artifact Registry for Docker images
 
 ## Detailed Project Structure
 
@@ -319,6 +316,10 @@ subdirectories (91 models total). Four base mixins (`BaseEntity`, `LookupBase`,
 
 #### Resource & Biomass Models (`resource_information/`)
 
+<!--
+TODO (2026-03-12): The "Core Domain Models" section below may be outdated. Review for accuracy in the next documentation update.
+-->
+
 - **Resource**: Core biomass resource definitions
 - **ResourceClass**, **ResourceSubclass**: Hierarchical resource classification
 - **ResourceAvailability**: Seasonal and quantitative availability data
@@ -509,6 +510,10 @@ Environments:
 
 ## Deployment & Operations
 
+<!--
+TODO (2026-03-12): Could change section to be less heavily bullet point reliant and use more descriptive language, with greater explanation of future architecture considerations as well.
+-->
+
 ### Container Orchestration
 
 - **Development**: Docker Compose for local services

diff --git a/docs/notebook_setup.md b/docs/notebook_setup.md
@@ -1,6 +1,6 @@
 # Notebook Setup Guide for **ca-biositing**
 
-**Purpose** -- Set up Jupyter notebooks with correct imports for the PEP 420
+**Purpose**: Set up Jupyter notebooks with correct imports for the PEP 420
 namespace packages used in this repository.
 
 ---

diff --git a/docs/pipeline/ALEMBIC_WORKFLOW.md b/docs/pipeline/ALEMBIC_WORKFLOW.md
@@ -12,9 +12,11 @@ systematic and version-controlled way.
   allows you to modify your database schema (e.g., add a new table or column)
   and keep a versioned history of those changes.
 - **Why use it?** It prevents you from having to manually write SQL
-  `ALTER TABLE` statements. It automatically compares your SQLModel classes to
-  the current state of the database and generates the necessary migration
-  scripts.
+  `ALTER TABLE` statements which are not tracked in version control. Alembic
+  generates SQL code from the Python SQLModel schema to prevent manual errors.
+  It also automatically compares your SQLModel classes to the current state of
+  the database and generates the necessary migration scripts. This reduces
+  database drift.
 
 ---
 

diff --git a/docs/pipeline/ETL_WORKFLOW.md b/docs/pipeline/ETL_WORKFLOW.md
@@ -17,8 +17,9 @@ and loads it into the PostgreSQL database.
   - `load`: Functions to insert the transformed data into the database using
     SQLAlchemy.
 
-- **Hierarchical Pipelines:** Individual pipelines are nested within
-  subdirectories reflecting the data they handle (e.g., `products`, `biomass`).
+- **Hierarchical Pipelines:** Transform and load logic are organized into
+  subdirectories reflecting the data they handle (e.g., `products`, `usda`,
+  `analysis`).
 
 ---
 
@@ -32,41 +33,49 @@ The ETL system runs in a containerized Prefect environment.
 pixi run start-services
 ```
 
-**Step 2: Deploy Flows**
+**Step 2: Apply Datamodel**
+
+```bash
+pixi run migrate
+```
+
+**Step 3: Deploy Flows**
 
 ```bash
 pixi run deploy
 ```
 
-**Step 3: Run the Master Pipeline**
+**Step 4: Run the Master Pipeline**
 
 ```bash
 pixi run run-etl
 ```
 
-**Step 4: Monitor** Access the Prefect UI at
+**Step 5: Monitor** Access the Prefect UI at
 [http://localhost:4200](http://localhost:4200).
 
 ---
 
 ### How to Add a New ETL Flow
 
 **Step 1: Create the Task Files** Create the three Python files for your
-extract, transform, and load logic in the appropriate subdirectories under
-`src/ca_biositing/pipeline/ca_biositing/pipeline/etl/`. Decorate each function
-with `@task`.
+extract, transform, and load logic under
+`src/ca_biositing/pipeline/ca_biositing/pipeline/etl/`. Extract tasks go
+directly in `extract/`; transform and load tasks go in appropriately named
+subdirectories (e.g., `transform/products/`, `load/products/`). Decorate each
+function with `@task`.
 
 **Step 2: Create the Pipeline Flow** Create a new file in
 `src/ca_biositing/pipeline/ca_biositing/pipeline/flows/` to define the flow.
 
 ```python
 from prefect import flow
-from ca_biositing.pipeline.etl.extract.samples.new_type import extract
-from ca_biositing.pipeline.etl.transform.samples.new_type import transform
-from ca_biositing.pipeline.etl.load.samples.new_type import load
+from ca_biositing.pipeline.etl.extract.my_source import extract
+from ca_biositing.pipeline.etl.transform.products.my_product import transform
+from ca_biositing.pipeline.etl.load.products.my_product import load
 
 @flow
-def new_type_flow():
+def my_product_flow():
     raw_data = extract()
     transformed_data = transform(raw_data)
     load(transformed_data)

diff --git a/mkdocs.yml b/mkdocs.yml
@@ -1,5 +1,5 @@
 site_name: CA-BioSiting
-repo_url: https://github.com/uw-ssec/ca-biositing
+repo_url: https://github.com/sustainability-software-lab/ca-biositing
 
 theme:
   name: material
@@ -51,11 +51,11 @@ nav:
   - Notebook Setup: notebook_setup.md
   - Pipeline:
       - Overview: pipeline/README.md
+      - GCP Setup: pipeline/GCP_SETUP.md
       - ETL Workflow: pipeline/ETL_WORKFLOW.md
       - Alembic Workflow: pipeline/ALEMBIC_WORKFLOW.md
       - Docker Workflow: pipeline/DOCKER_WORKFLOW.md
       - Prefect Workflow: pipeline/PREFECT_WORKFLOW.md
-      - GCP Setup: pipeline/GCP_SETUP.md
       - USDA ETL Guide: pipeline/USDA/USDA_ETL_GUIDE.md
   - Datamodels:
       - Overview: datamodels/README.md
@@ -71,4 +71,3 @@ nav:
   - Deployment: deployment/README.md
   - Contributing: CONTRIBUTING.md
   - Code of Conduct: CODE_OF_CONDUCT.md
-  - ERD View: ERD_VIEW.md
diff --git a/pixi.lock b/pixi.lock