Skip to content

feat: integrate Focus SIS for KIPP Miami (Phase A)#3588

Open
cbini wants to merge 15 commits intomainfrom
cbini/claude-/feat/focus-sis-integration
Open

feat: integrate Focus SIS for KIPP Miami (Phase A)#3588
cbini wants to merge 15 commits intomainfrom
cbini/claude-/feat/focus-sis-integration

Conversation

@cbini
Copy link
Copy Markdown
Member

@cbini cbini commented Apr 6, 2026

Pull Request

Summary & Motivation

"When merged, this pull request will..."

Add the extraction pipeline and dbt scaffolding for Focus SIS integration (KIPP Miami). Focus replaces PowerSchool as the SIS for Miami.

Phase A includes:

  • k8s config + dagster-cloud.yaml for Focus DB credentials
  • dlt Focus library (src/teamster/libraries/dlt/focus/) — sql_database source with full_with_precision reflection, PyArrow backend
  • kippmiami dlt code location with 75-table config, daily midnight schedule
  • kippmiami Dagster definitions wiring (dlt module + DLT_RESOURCE)
  • dbt focus source-system project (dbt_project.yml, profiles.yml, packages.yml)
  • dbt source definitions for all 75 Phase 1 tables (sources-bigquery.yml)
  • kippmiami consumes focus as local package
  • CLAUDE.md documentation for all new modules

Phase B (separate PR): Staging models, intermediate models, and kipptaf integration — blocked on first dlt extraction landing data in BigQuery.

Prerequisites before first extraction:

  1. "Focus DB - Miami" item created in the vault
  2. Cloud NAT rule for static egress IP to Focus Postgres host
  3. Focus IP allowlist updated to include GKE static IP

AI Assistance

Claude Code authored the implementation, directed by human-approved design spec and implementation plan. Human reviewed spec, approved plan, and directed execution.

Self-review

General

  • Update due date and assignee on the
    TEAMster Asana Project
  • Review the Claude Code Review comment posted on this PR. Address valid
    feedback; dismiss false positives with a brief reply explaining why.

Dagster (skip if no Dagster changes)

  • Run uv run dagster definitions validate for any modified code location
  • Run uv run pytest tests/test_dagster_definitions.py for any modified
    code location
  • New integrations follow the
    Library + Config pattern
    with the correct asset key format and IO manager

dbt (skip if no dbt changes)

  • No new models yet (Phase B) — source definitions only
  • Breaking change? No — additive only

Docs (skip if no schedule, sensor, or integration changes)

  • If adding or changing a schedule or sensor, regenerate the automations
    catalog: uv run scripts/gen-automations-doc.py
  • If adding a new integration to a code location, update that location's
    CLAUDE.md by running /claude-md-management:revise-claude-md

CI checks

  • Trunk — passes.
  • dbt Cloud — passes.
  • Dagster Cloud — passes or not triggered.

@cbini cbini requested review from a team as code owners April 6, 2026 20:11
@cbini cbini linked an issue Apr 6, 2026 that may be closed by this pull request
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 6, 2026

Claude finished @cbini's task in 6m 57s —— View job


Claude Code Review

  • Gather context and read CLAUDE.md files
  • Get git SHAs for diff range
  • Dispatch code-reviewer subagent
  • Post review results

Code Review: Focus SIS Integration — Phase A

Strengths

Faithful spec-to-code translation. Every Phase A file was implemented. The deviation (using sql_database with table_names=[table_name] instead of sql_table) is correctly explained and properly implemented — dlt_assets requires a DltSource.

Clean library/code-location separation. src/teamster/libraries/dlt/focus/assets.py contains only the reusable factory and translator. All credential assembly and YAML config loading live in the code location, following the established two-layer pattern.

Credential security is correct. All six Focus DB fields are injected as Kubernetes secret environment variables in both server_k8s_config and run_k8s_config blocks in dagster-cloud.yaml. The 1Password item is properly defined. No secrets are hardcoded.

dbt source schema resolution is correct. sources-bigquery.yml uses var('focus_schema', 'dagster_' ~ project_name ~ '_dlt_focus'), and kippmiami/dbt_project.yml overrides focus_schema: dagster_kippmiami_dlt_focus — matching the dlt pipeline's dataset_name.

Table count is correct. 75 entries in YAML config, 75 matching table definitions in sources-bigquery.yml. Domain groupings match the spec exactly.

Consistent with Illuminate reference. The translator, factory, sql_database.with_args(...) call pattern, LogCollector, pool parameter, and write_disposition="replace" all mirror the Illuminate implementation correctly.


Issues

Critical (Must Fix)

1. Only FOCUS_DB_DRIVERNAME is guarded by check.not_none — five other required fields are not.

  • File: src/teamster/code_locations/kippmiami/dlt/focus/assets.py:16-20
  • What's wrong: The database, host, port, username, and pw fields all call .get_value() without check.not_none. If any of these Kubernetes-injected values is absent, ConnectionStringCredentials silently receives None and the error surfaces later as an obscure connection failure during a run rather than a clear startup-time assertion failure. FOCUS_DB_PORT is additionally risky because ConnectionStringCredentials expects an integer for that field.
  • Why it matters: Misconfigured Kubernetes secrets become a runtime mystery instead of a loud startup failure.
  • How to fix: Apply check.not_none(value=EnvVar("FOCUS_DB_...").get_value()) to all six fields, matching the pattern already used for FOCUS_DB_DRIVERNAME.

Important (Should Fix)

2. return super().__init__() in __init__ is semantically wrong.

  • File: src/teamster/libraries/dlt/focus/assets.py:15
  • What's wrong: __init__ must not return a value. return super().__init__() returns None accidentally — the return keyword signals a value is expected, which misleads type checkers. This is copied from the Illuminate translator.
  • How to fix: Assign self.code_location = code_location, call super().__init__() on its own line, no return.

3. get_asset_spec data parameter lacks a type annotation.

  • File: src/teamster/libraries/dlt/focus/assets.py:17
  • What's wrong: def get_asset_spec(self, data) -> AssetSpec:data is untyped, violating the project's library type annotation standard (dignified-python).
  • How to fix: Import DltResourceWrapper from dagster_dlt._translator and annotate data: DltResourceWrapper.

4. data.resource.explicit_args["table_names"][0] has no guard against a missing or empty key.

  • File: src/teamster/libraries/dlt/focus/assets.py:26
  • What's wrong: A KeyError or IndexError here breaks the code server at startup. Low probability since the factory always sets table_names=[table_name], but a broken code server is costly to debug.
  • How to fix: Use .get("table_names", []) with a fallback to data.resource.name.

5. focus_schema variable is not declared in src/dbt/focus/dbt_project.yml.

  • File: src/dbt/focus/dbt_project.yml (vars: block)
  • What's wrong: sources-bigquery.yml uses var('focus_schema', ...) but the variable is undeclared in project-level vars:. Invisible to dbt docs and inconsistent with src/dbt/focus/CLAUDE.md which lists it as a key variable.
  • How to fix: Add focus_schema: "" under vars: in dbt_project.yml.

Minor (Nice to Have)

6. No op_tags with dagster/priority on any table in focus.yaml.

  • File: src/teamster/code_locations/kippmiami/dlt/focus/config/focus.yaml
  • The spec's example YAML shows priority tags. Without them, all 75 assets compete equally in the pool. Tables like students and student_enrollment are likely highest-priority — downstream models depend on them. Suggestion: add priority tags to high-value tables before production launch.

7. Unquoted <2.0.0 in dbt_project.yml.

  • File: src/dbt/focus/dbt_project.yml:4
  • require-dbt-version: [">=1.3.0", <2.0.0] — the second entry is unquoted; other projects in this repo quote both. Should be "<2.0.0".

8. No GitHub issue tracking Phase B work.

  • Per project conventions (CLAUDE.md: "Create a GitHub issue before any planned work"), a Phase B issue should exist before merging so staging models, intermediate models, and kipptaf integration work isn't lost.

Recommendations

Infinity date risk. Before Phase B, verify whether any Focus Postgres columns contain infinity timestamps (common in academic year end-date columns). If found, a filter_infinity_callback analogous to Illuminate's pattern will be needed.

Unbounded numeric risk. Postgres numeric with no declared precision reflects as precision=None even with full_with_precision. The Illuminate CLAUDE.md documents this exact failure mode. Verify during branch deployment that no Focus tables have unbounded numeric columns.

Pool size. Pool dlt_focus_kippmiami — ensure Dagster Cloud has a pool limit configured, otherwise all 75 assets run concurrently and exhaust Focus's Postgres connection limit.


Assessment

Ready to merge? Yes, with fixes — specifically Issue 1 (missing check.not_none guards on five credential fields).

Reasoning: The overall implementation is well-structured, follows established patterns, and correctly wires all infrastructure. Issue 1 is a correctness bug that produces silent None-credential failures during runs and is a two-line fix. Issues 2–5 are code quality concerns that don't block the initial branch deployment. Issue 6 is post-deployment tuning.

Copy link
Copy Markdown
Member Author

cbini commented Apr 6, 2026

Addressed in 91f694c. Here's what we did and didn't do:

Fixed (backported to Illuminate):

  • Issue 1 — All credential fields now wrapped in check.not_none in both Focus (kippmiami/dlt/focus/assets.py) and Illuminate (kipptaf/dlt/illuminate/assets.py). Missing env vars now fail loudly at definition load time.
  • Issue 2 — Removed return from super().__init__() in both FocusDagsterDltTranslator and IlluminateDagsterDltTranslator.
  • Issue 5 — Declared focus_schema: null in src/dbt/focus/dbt_project.yml for discoverability in dbt docs.

Not fixing:

  • Issue 3DltResourceWrapper is a private import (dagster_dlt._translator). Coupling to internal APIs is more fragile than leaving data untyped. Both Focus and Illuminate use the same untyped signature.
  • Issue 4 — The factory always sets table_names=[table_name], so explicit_args["table_names"][0] can't fail. Adding a fallback is dead code for an impossible case. Same pattern in Illuminate.
  • Issue 7 — Unquoted <2.0.0> matches every other source-system dbt_project.yml in the repo. Trunk doesn't flag it.
  • Issues 6 and 8 — Valid suggestions. Priority tags will be tuned during Phase B after we see actual data volumes. Phase B issue will be created when we start that work.

Copy link
Copy Markdown
Member Author

cbini commented Apr 6, 2026

Update (c8e4644): Revised the credential handling approach based on further investigation:

Issue 1 — replaced with dlt native config resolution. Instead of adding check.not_none to all fields, switched Focus to use resolve_configuration(ConnectionStringCredentials(), sections=("FOCUS_DB",)). dlt resolves env vars natively using its SECTION__FIELD convention (double underscore) and raises ConfigFieldMissingException with the exact missing field name — no manual guards needed. Env vars in dagster-cloud.yaml renamed accordingly (FOCUS_DB__HOST, etc.). This will be backported to Illuminate separately.

Issue 2 — reverted. The return super().__init__() fix was also applied to Illuminate in the previous commit. Reverted the Illuminate changes to keep this PR focused on Focus only. Will backport as a follow-up.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat: integrate Focus SIS for KIPP Miami

1 participant