2025.10.14 #32

atambay37 · 2025-10-15T14:14:17Z

atambay37
Oct 15, 2025

Onboarding and Team Updates: Don introduced new team member Viraj, who will be working on the FastAPI skeleton, while Peter Smith and Mei discussed onboarding new research assistants and their focus on ETL pipeline training; Don also provided a status update on cloud infrastructure tasks.
- New Team Member Introduction: Don introduced Viraj as a new research assistant with a background in computer science and information management, who will be contributing to the project by working on the FastAPI skeleton.
- Onboarding Student Assistants: Peter Smith shared that several new research assistants are being onboarded, with initial tasks focused on learning the ETL pipeline and performing schema adjustments to ensure data is correctly loaded into database tables.
- Cloud Infrastructure Update: Don reported delays in completing cloud infrastructure tasks due to travel but committed to finishing the work by the following week, noting that the database is not yet needed by the rest of the team.
ETL Pipeline Progress and Prefect Integration: Peter Smith updated the group on the successful integration of Pixie install steps for CI/CD and the ETL pipeline, discussed ongoing data modeling efforts with Mei, and outlined plans to incorporate Prefect for orchestration, while Don provided guidance on precommit checks and repository workflows.
- ETL Pipeline and CI/CD Status: Peter Smith reported that the Pixie install steps for CI/CD have been completed and merged with the ETL pipeline, with all checks passing locally; the team is now focused on onboarding new assistants to use the pipeline and perform schema adjustments.
- Prefect Orchestration Plans: Peter Smith has begun exploring Prefect for global orchestration of the ETL pipeline, noting that it should integrate well with the current setup, and plans to implement it in the upcoming week.
- Precommit and Formatting Issues: Don and Peter Smith discussed issues with precommit checks, clarifying that 'pixie run precommit all' should be used to check all files, and suggested adding a whitelist for domain-specific vocabulary to avoid false positives in spelling checks.
- Data Modeling and Database MVP: Peter Smith and Mei are revisiting data modeling to ensure the ETL pipeline targets the correct database tables, aiming to create a database MVP that meets project needs and incorporates external datasets provided by Tyler.
Repository Access, Collaboration, and Workflow Guidelines: Don and Peter Smith discussed repository access, the process for sharing Google Sheets, and established a GitHub workflow based on forking, with Don providing resources and guidelines for code review, collaborator roles, and documentation practices; Mei raised questions about documentation update frequency, which Don addressed with best practices.
- Google Sheet Access and Permissions: Don requested access to the Google Sheet for local pipeline testing, and Peter Smith agreed to share view-only access with Don, Viraj, and others, confirming that editor access is not required for pipeline scripts.
- GitHub Workflow and Forking Model: Don outlined the forking workflow for GitHub collaboration, recommending that each student create a fork and submit pull requests to the main repository, with Berkeley team members conducting the first pass of code review and Don's team available for secondary review if needed.
- Collaborator Roles and Permissions: Don clarified that student assistants will be added as collaborators with triage access (not write access) to assign issues, and discussed the process for adding users to the repository.
- Documentation and Commit Practices: Mei asked about the frequency of documentation updates, and Don recommended frequent, small pull requests for documentation changes, emphasizing the use of clear commit messages and referencing the conventional commit standard.
- Resource Sharing and Communication: Don shared workflow guidelines and resources via Slack and discussed adding Mei to the Slack channel, suggesting that useful resources also be added to the project README for broader accessibility.
Mapbox Tile Set Synchronization and Database Design: Peter Smith and Tyler discussed strategies for synchronizing Mapbox tile sets with the Postgres database, including the use of the Mapbox Python SDK, ETL job design, and database schema considerations, with Don and Tyler providing detailed guidance on tile set generation, attribute modeling, and workflow automation.
- Mapbox Tile Set Update Workflow: Tyler explained that Mapbox tile sets should be updated via ETL jobs whenever relevant Postgres tables change, using the Mapbox Python SDK and Tippecanoe to generate and upload .mbtiles files, with Postgres serving as the source of truth.
- Tracking Tile Set Synchronization: Tyler recommended maintaining a Postgres table to track which tile sets are associated with which database tables, including last updated timestamps, to ensure only necessary tile sets are regenerated after data changes.
- Attribute and Feature Modeling: Tyler clarified that both geospatial and tabular data attributes should be included in tile sets, and advised that all data intended for front-end display be stored in the tiles to avoid repeated tile regeneration.
- Database Schema and Fact Tables: Don and Peter Smith discussed the importance of designing normalized database schemas with fact tables that aggregate attributes for tile set generation, ensuring efficient querying and flexibility for both ETL and front-end needs.
- Tile Set Specification and Layering: Tyler described the conceptual model of having separate tile sets for different geometry classes (e.g., counties, crop fields), each with relevant attributes, and suggested documenting front-end requirements in a GitHub discussion to guide database and tile set design.
Front-End Repository and Integration: Don and Tyler discussed the status of the front-end code repository, agreeing to make it public and integrate it as a submodule in the main project repository to facilitate local testing and full application deployment.
- Front-End Repository Status: Tyler confirmed that the front-end code is currently in a private repository but can be made public under an open source license, removing previous licensing concerns.
- Integration with Main Repository: Don proposed adding the front-end as a git submodule in the main repository, enabling local testing of the full application stack using Docker Compose, with Peter Smith providing the necessary setup.

Follow-up tasks:

Google Sheet Access: Add Don, Viraj, and Nikki's emails to the Google Sheet with view-only access for pipeline testing. (Peter Smith)
Precommit Configuration: Run 'pixie run precommit all' on a new branch and submit a PR to fix formatting and spelling issues in the repository. (Peter Smith)
Precommit Whitelist: Add a whitelist table of vocabulary for precommit to handle domain-specific terms and avoid false spelling errors. (Don)
Database and Mapbox Tile Set Synchronization: Design a method to track which Mapbox tile sets are associated with which Postgres tables, including last updated timestamps, to ensure synchronicity. (Peter Smith, Tyler)
Tile Set Feature Specification: Create a GitHub discussion to specify the attributes and layers needed in Mapbox tile sets for the front end. (Tyler)
Front End Repository Access: Make the front end repository public and link it as a git submodule in the main project repository. (Tyler, Don)
Slack Channel Access: Add Mei to the relevant Slack channel using her Berkeley email. (Don)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

2025.10.14 #32

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

2025.10.14 #32

Uh oh!

atambay37 Oct 15, 2025

Replies: 0 comments

atambay37
Oct 15, 2025