Feedstock End-to-End Requirements Discussion #76

nikiburggraf · 2025-12-02T23:20:47Z

nikiburggraf
Dec 2, 2025

The purpose of this thread is to determine what it will take to fully implement feedstock data in the tool, from the backend to the API to the vector tiles to the frontend.

Peter has proposed seeding in fake feedstock data into the database to make it easiest to work with.

Tyler has suggested that spatial queries can be put on the backburner for now, since some of that functionality could be handled by the frontend instead of the backend.

The following questions are just to get the discussion going -- if there's something that I've forgotten to ask about that would be worth mentioning, please bring that up! Folks who aren't tagged should also feel empowered to contribute to the discussion :)

Questions for @tylerhuntington

What data would you need to have access to in order for the frontend to work?
What data would be best retrieved from MapBox and what would be best retrieved from the API in order to make your life easier?
Are there any considerations we should be mindful of that would make frontend <-> API interaction easy for you? (query params vs path params? etc)

Questions for @petercarbsmith

What feedstock data is available in the backend and in what form? (you can just link your SQLAlchemy configurations here if that's easiest)

TODOs for @nikiburggraf once we have some data from the above questions
(TODO: create issues to track these)

What will the feedstock APIs look like?
What DB queries do we need from @petercarbsmith to support these APIs?
What should the vector tile pipeline be handling? What do the inputs and outputs look like that we can hand over to @petercarbsmith for implementation?

tylerhuntington · 2025-12-05T21:19:46Z

tylerhuntington
Dec 5, 2025
Maintainer

@nikiburggraf thanks for kicking this off. Here's a breakdown of what I'm thinking:

Q1. What data would you need to have access to in order for the frontend to work?

To render the 450k+ LandIQ polygons performantly I'm thinking we'll want the data partitioned into three accessibility tiers:

Tier 1: Tile Payload (Visualization)
- Geometry: Polygon boundaries.
- UUID: feedstock_id (Critical join key).
- Styling Attributes: residue_type (categorical coloring) and total_yield (quantitative scaling/opacity).
Tier 2: Static Lookup (Constants)
- Compositional Constants: ash_content, moisture_content, carbon_intensity, etc.
- Note: Since these are constant per residue_type (e.g., all Almonds have ~3.2% Ash), these should not be in the tile or the DB query. These should be in a static JSON object for O(1) client-side joins.
Tier 3: API Payload (Specifics)
- Non-constant/Frequently Updated Attributes: cost_per_ton (if known), availability_status, etc.

Q2. What data would be best retrieved from MapBox and what would be best retrieved from the API?

I think we should separate "Structural/Static" data from "Transactional/Live" data to minimize tile size and ensure data freshness.

Mapbox (Vector Tiles)

Content: Only the Tier 1 attributes listed above.
Goal: Keep tile sizes small (<500kb) for fast rendering.
Updates: Only updated when geometries or yield baselines change.

Static Asset (GCS Bucket/CDN)

Content: A feedstock_definitions.json file keyed by residue_type.
Goal: Provides the heavy chemical metadata (Tier 2) without bloating the tiles or hitting the database.

Backend API (FastAPI)

Content: Tier 3 data above
Goal: Single-record retrieval.
Why: We need real-time data here. If cost_per_ton changes in Postgres, the frontend needs to reflect that immediately without waiting for a tile-gen pipeline to run.

Q3. Are there any considerations we should be mindful of?

API Pattern:
- To maintain a consistent interface, let's use Query Parameters for API calls.
- Example: GET /api/feedstocks?id={feedstock_uuid} or GET /api/feedstocks?min_yield=500&type=almond.
- Simplifies the client-side request construction and avoids mixing path and query logic.
Data Consistency:
- The residue_type string in the Mapbox Tiles must strictly match the keys in the static JSON lookup. If the tile says "Almond" and the JSON says "Almond Prunings", the client-side join fails. We need a shared Enum definition here.

tldr;
We ultimately want "thin" Mapbox Tilesets containing only feedstock ID/Type/Yield attributes, a static JSON file for compositional constants, and query param endpoints for the higher variability attributes.

Very open to revising this strategy if anyone has feedback.

0 replies

petercarbsmith · 2025-12-09T22:33:34Z

petercarbsmith
Dec 9, 2025
Maintainer

Hey @nikiburggraf. I will go ahead and put in a PR that contains the current schemas. You can find the SQLalchemy versions in the datamoels/schemas/generated directory. The ca_biositing schema contains ALL of the other module schemas in a single .py file, and for now is the one that Alembic is importing from. If you also want, you can see our ERD here. It is pretty big with 85ish tables, but don't worry most of these are not relevant to the ca-biositing tool but rather the other data portal we are building. To get oriented, I would check out these specific tables.

resource - the biomass feedstock we are trying to map
resource_availability - this is where the seasonality information is stored
primary_crop - the commodity crop associated with a resource
There are a few other accessory tables that are used to link between internal biocirv, landiq, and usda crop classifications or bins.
dataset - the dataset
landiq_record - contains all context surrounding a landiq observation
usda_record - also the child tables for census and survey records
proximate_record - contains all context surrounding a proximate analysis observation
observation - our big fact table that will contain "results" for both geospatial and lab-analysis tables

I should note that these schemas are essentially just the base normalized data tables for the database. I do anticipate building materialized views from these base tables that are denormalized and from my understanding what the API will likely want to expose. We are meeting with Tyler next week to discuss general data modeling stuff, and my hope is to also get a good sense of what views I should build to accommodate the front end data needs.

I would also like to discuss at some point how much data you would like us to seed into the tables, and how "accurate" this should be. We could put nonsense in, but I was thinking it would probably be better to just seed in data for like one resource (e.g almond shells) to ensure we are getting all the info we want.

0 replies

petercarbsmith · 2026-01-07T18:51:20Z

petercarbsmith
Jan 7, 2026
Maintainer

Hi All,

Here is the ERD we were discussing yesterday. To be honest, I kind of forgot about Tyler's comment above, but re-reading it I think this is very much aligned with his comments (and it really clears up a few of the language problems we were having yesterday).

Feel free to let me know if you have any questions!

1 reply

mglbleta Jan 14, 2026
Maintainer

Data types are added @nikiburggraf FYI!

mglbleta · 2026-01-15T02:53:11Z

mglbleta
Jan 15, 2026
Maintainer

Hey @nikiburggraf & team! Just wanted to check in that you have all the data/clarity you need for what you're taking a crack at currently for the API.

Also wanted to further clarify how we designed the ERD above since I don't think we have written it in a note only described on call. Our thoughts were that each table in the small lucidchart ERD is essentially an endpoint (although flexible combining tables in endpoints if you think it's prudent), then filters for specific rows would be related to the highlighted/primary key values (e.g. geoid/county, main_crop/usda_crop, resource), and all data in this ERD should be presented as an output field in the API (since this is already the shortlist of information we think is interesting to users compared to our big database ERD).

I could also write out our vision for expected denormalized view tables + their constituent applicable filters + output fields in a different form if that would clarify anything. Perhaps a {endpoint --> [filter1, ..., filterN] --> [field1, ..., fieldn]} branching structure chart or as a table. Let me know if that would help!

0 replies

nikiburggraf · 2026-01-16T23:59:06Z

nikiburggraf
Jan 16, 2026
Author

Hey all! Thank you so much for the ERD and the types, I've got a document for proposed APIs here: https://github.com/nikiburggraf/ca-biositing/wiki/API-Design#apis

Happy to hear feedback on this thread or in our meeting on Tuesday!

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feedstock End-to-End Requirements Discussion #76

Uh oh!

{{title}}

Uh oh!

Replies: 5 comments 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Feedstock End-to-End Requirements Discussion #76

Uh oh!

nikiburggraf Dec 2, 2025

Replies: 5 comments · 1 reply

Uh oh!

Uh oh!

tylerhuntington Dec 5, 2025 Maintainer

Q1. What data would you need to have access to in order for the frontend to work?

Q2. What data would be best retrieved from MapBox and what would be best retrieved from the API?

Q3. Are there any considerations we should be mindful of?

Uh oh!

Uh oh!

petercarbsmith Dec 9, 2025 Maintainer

Uh oh!

petercarbsmith Jan 7, 2026 Maintainer

Uh oh!

mglbleta Jan 14, 2026 Maintainer

Uh oh!

mglbleta Jan 15, 2026 Maintainer

Uh oh!

nikiburggraf Jan 16, 2026 Author

nikiburggraf
Dec 2, 2025

Replies: 5 comments 1 reply

tylerhuntington
Dec 5, 2025
Maintainer

petercarbsmith
Dec 9, 2025
Maintainer

petercarbsmith
Jan 7, 2026
Maintainer

mglbleta Jan 14, 2026
Maintainer

mglbleta
Jan 15, 2026
Maintainer

nikiburggraf
Jan 16, 2026
Author