You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Onboarding and Team Updates: Peter Smith reintroduced May to the team, with Anshul assigning May as a GitHub maintainer, and the group discussed onboarding a new student assistant and upcoming onboarding plans, with Niki and Dom set to assist.
May's Return and GitHub Access: Peter Smith updated the team on May's return, explaining that they have been catching May up on project developments since their absence, and Anshul assigned May as a maintainer on the project's GitHub repository after confirming their username.
Student Assistant Onboarding: Peter Smith mentioned the addition of a new student assistant tasked with processing CSV datasets from Google Drive, and Anshul noted that the student would likely join the next meeting, with Niki and Dom handling onboarding later in the week.
ETL Pipeline Integration and Orchestration: Peter Smith sought feedback from Anshul and the team regarding the integration of an ETL pipeline developed in a separate GitHub repository, specifically asking about preferred orchestration tools, with Anshul recommending Prefect over Airflow for Python-based workflows.
Pipeline Orchestration Tool Discussion: Peter Smith asked for recommendations on orchestration libraries for the ETL pipeline, considering Airflow and Prefect, and Anshul advised using Prefect due to its Pythonic nature and ease of use, especially for teams already working in Python.
Prefect Features and Implementation: Anshul described Prefect's capabilities, including integration with cloud worker nodes, managed and self-hosted dashboard options, error logging, retry mechanisms, and Python decorator-based configuration, highlighting its advantages over Airflow for the team's use case.
Next Steps for Pipeline Integration: Peter Smith indicated plans to experiment with Prefect for the ETL pipeline after upcoming meetings, aiming to replace the current ad hoc orchestrator with a more robust solution.
Infrastructure as Code and Database Setup: Peter Smith and Niki discussed the current status of infrastructure as code for Google Cloud, with Niki planning to use Open Tofu for database provisioning and advising Peter to continue local development until the cloud infrastructure is ready.
Infrastructure as Code Tool Selection: Niki explained challenges with integrating Pulumi into existing workflows and shared the decision to pivot to Open Tofu, an open-source fork of Terraform, for managing cloud infrastructure.
Development Workflow Guidance: Niki and Anshul advised Peter Smith to focus on local development using Docker and containerized Postgres until the cloud infrastructure is established, emphasizing the importance of avoiding hardcoded paths and ensuring configuration flexibility.
Database Provisioning Timeline: Niki committed to working on the infrastructure setup and aimed to provide the development database by the following week, aligning with the team's project timeline.
Data Cleaning and Database Modeling: Peter Smith consulted Tyler about whether to normalize CSV datasets before importing them into Postgres, with Tyler recommending to retain the original table structure but standardize fields like state names and zip codes.
Normalization vs. Standardization: Peter Smith asked if the infrastructure datasets should be normalized into separate tables or kept as-is, and Tyler clarified that the tables should mirror the CSVs but with standardized fields for consistency.
Data Cleaning Priorities: Tyler emphasized the importance of cleaning data fields such as state names and zip codes, while avoiding unnecessary normalization that would complicate development and use of the California bio citing tool.
Pull Request Review and CI Troubleshooting: Anshul and Peter Smith discussed the status of a pull request for the ETL pipeline, addressing CI failures related to code formatting and pre-commit checks, with Niki providing documentation and guidance on resolving these issues.
Pull Request Status and Testing: Anshul confirmed readiness to merge Peter Smith's pull request after reviewing changes and discussed the need for credentials to test GSheet integration locally.
CI and Pre-commit Issues: Peter Smith reported CI failures, and Anshul identified issues related to code formatting and pre-commit checks, offering to help interpret CI logs and suggesting fixes.
Documentation and Onboarding Support: Niki pointed Peter Smith to the contributing documentation for environment setup and pre-commit configuration, offering further assistance and encouraging feedback to improve onboarding materials.
GitHub for Education and Copilot Tools: Anshul recommended using GitHub for Education to access Copilot and its explain error feature, which can assist with PR reviews and troubleshooting.
Website Attribution and Logo Update: Anshul requested Tyler to update the website's about page with a new logo and revised attribution text, specifying the replacement of the eScience logo and clarifying the correct institutional credits.
Logo and Attribution Instructions: Anshul sent Tyler a new logo to be added to the website and provided detailed instructions for updating the attribution text, including references to the Virtual Institute for Scientific Software and the Scientific Software Engineering Center at the University of Washington.
Open Source Documentation Goals: Anshul outlined the team's objective to improve open source documentation for third-party contributors, using onboarding experiences as a test for clarity and accessibility.
Documentation Improvement Strategy: Anshul emphasized the importance of making project documentation accessible to external contributors, aiming for clarity so that new users can independently understand and use the repository.
Follow-up tasks:
GitHub Access and Onboarding: Add Mei as a maintainer to the project GitHub repository and ensure she has access to relevant discussions and resources. (Anshul)
ETL Pipeline Orchestration: Evaluate and experiment with Prefect as an orchestrator for the ETL pipeline and consider integrating it in place of the current ad hoc solution. (Peter Smith)
Cloud Infrastructure Setup: Set up the cloud-hosted Postgres database using Open Tofu and provide access to Peter Smith, aiming for completion by next week's meeting. (Niki)
Data Cleaning for Database Import: Standardize state names and zip codes in the CSV datasets before importing them into Postgres, but do not normalize the tables beyond their current structure. (Peter Smith)
Website Attribution Update: Replace the eScience logo with the new logo and update the attribution on the website's about page as specified by Anshul. (Tyler)
Pre-commit and CI Documentation: Review and improve the contributing documentation for environment setup and pre-commit configuration, and address any questions or feedback from new contributors. (Niki)
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
Uh oh!
There was an error while loading. Please reload this page.
-
Follow-up tasks:
Beta Was this translation helpful? Give feedback.
All reactions