Job status monitoring for chuck data #36

punit-naik-amp · 2025-10-28T09:00:36Z

chuck > /job-status --job-id chk-20251029-61073-VZdjbQfw66r --live
Job chk-20251029-61073-VZdjbQfw66r: succeeded, Records: 7,155,216, Build: stitch-service-build/7766-6a34055, Created: 2025-10-29T16:57:53.890Z, Started: 2025-10-29T17:06:20.427060788Z, Ended: 2025-10-29T17:13:46.892749542Z, Duration: 7.4m

Added a new job status command ⬆️
Will be doing cosmetic changes i.e. properly displaying the job status in a nice table in subsequent PRs.

punit-naik-amp · 2025-10-30T08:34:04Z

chuck_data/clients/amperity.py

            )

-            if response.status_code == 200 or response.status_code == 201:
+            if response.status_code in (200, 201, 204):


This is an old bug. chuck-api endpoint actually returns a 204 response, hence the change. Maybe someone changed the response code and forgot to update this. Kept the old job statuses just to be safe.

punit-naik-amp · 2025-10-30T08:35:21Z

chuck_data/metrics_collector.py

+            payload_str = json.dumps(sanitized_payload)
            logging.debug(f"Sending metric: {payload_str[:100]}...")

-            return self._client.submit_metrics(payload, token)


This is an old existing bug in chuck-data. This request always failed as there were some unserialisable data in the json. Sanitised it using pydantic and it works now.

punit-naik-amp · 2025-10-30T09:13:40Z

chuck_data/commands/job_status.py

+        if (
+            fetch_live
+            and databricks_run_id
+            and databricks_run_id != "UNSET_DATABRICKS_RUN_ID"


If the job has finished running in databricks, the cluster will remove the databricks run id from it's env. So if we try to fetch "live" stats, the ai endpoint will result in an error, hence this line.

punit-naik-amp · 2025-10-30T09:19:24Z

chuck_data/commands/job_status.py

+        return CommandResult(False, message="Either --job-id or --run-id is required")

    try:
-        # Get the job run status


Extracted all this out into a method _extract_databricks_run_info above.

pragyan-amp · 2025-10-30T19:27:30Z

chuck_data/commands/job_status.py

+            databricks_raw = client.get_job_run_status(databricks_run_id)
+            job_data["databricks_live"] = _extract_databricks_run_info(databricks_raw)
+
+        # Format output message - build a comprehensive summary


can be extracted as a separate method _format_job_status_message... with an example about how it would look for a given use case... I think in later iteration this formatting may change a lot..

pragyan-amp

LGTM...

Already looks great... just few nitpicks:

UNSET_DATABRICKS_RUN_ID extract in a constant... as it's being compared at two different places..
format message in _query_by_job_id method as commented...

Approving the PR as things look find..

Implement comprehensive job status monitoring in chuck-data CLI using job-id as the primary identifier and Chuck backend as the source of truth. AmperityAPIClient (chuck_data/clients/amperity.py): - Add get_job_status(job_id, token) to query Chuck backend for job state - Add record_job_submission(databricks_run_id, token, job_id) to link identifiers - Both methods use Amperity API with JWT (CLI token) authentication Job-status command (chuck_data/commands/job_status.py): - Refactor into three private helper functions for better testability: * _extract_databricks_run_info() for cleaning Databricks API responses * _query_by_job_id() for primary Chuck backend queries * _query_by_run_id() for legacy Databricks API fallback - Use job-id as primary parameter, run_id as legacy fallback - Support --live flag to enrich Chuck data with Databricks telemetry - Extract and structure task information, durations, and cluster details - Handle UNSET_DATABRICKS_RUN_ID gracefully (skip Databricks API call) Stitch tools (chuck_data/commands/stitch_tools.py): - Extract job-id from /api/job/launch response during prepare phase - Propagate job-id through metadata (prepare → launch phases) - Call record_job_submission() after Databricks job launch - Pass job-id in request body for CLI token authentication - Return job-id in launch results for monitoring - Add non-fatal error handling with graceful degradation The TUI now tracks job-id as the primary identifier and queries status from the Chuck backend, enabling proper state tracking (:pending → :submitted → :running → :succeeded/:failed) with Chuck telemetry (record counts, credits, errors) while maintaining backward compatibility with run_id-based queries. Related to CHUCK-3

Add 100+ tests covering job status monitoring implementation across all layers: AmperityAPIClient, job-status command, and stitch-tools integration. AmperityAPIClient tests (tests/unit/clients/test_amperity.py): - get_job_status() with success (200), not found (404), and network errors - record_job_submission() with success (200/201), failures (4xx/5xx), network errors - Payload format validation (kebab-case keys, correct headers) - Job-id parameter inclusion in request body - Authentication token handling and error propagation Job-status command tests (tests/unit/commands/test_job_status.py): Integration tests (11 tests): - Query by job_id (primary) using Chuck backend with/without live data - Query by run_id (legacy) using Databricks API with task information - Authentication handling (missing/invalid tokens) - Error handling and message formatting (credits, records, errors) Unit tests for private functions (9 tests): - _extract_databricks_run_info(): basic extraction, with tasks, without tasks - _query_by_job_id(): basic query, live data enrichment, missing token - _query_by_run_id(): basic query, missing client, not found Stitch-tools tests (tests/unit/commands/test_stitch_tools.py): - Job-id extraction from /api/job/launch response during prepare phase - Job-id propagation through metadata (prepare → launch phases) - Job-id returned in launch results for monitoring - record_job_submission() invocation with correct parameters - Error handling when job-id is missing from API response - Edge cases: missing token, missing job-id (graceful degradation) The refactored code structure with private helper functions enables better test isolation and validates both high-level integration flows and low-level data transformation logic. All 27 tests passing. Related to CHUCK-3

punit-naik-amp · 2025-10-30T19:47:16Z

LGTM...

Already looks great... just few nitpicks:

UNSET_DATABRICKS_RUN_ID extract in a constant... as it's being compared at two different places..

format message in _query_by_job_id method as commented...

Approving the PR as things look find..

Both points addressed 👍

punit-naik-amp requested a review from pragyan-amp October 28, 2025 09:00

punit-naik-amp force-pushed the CHUCK-3-job-status-monitoring-for-chuck-data branch from 7d0e9d3 to afe5103 Compare October 28, 2025 09:12

pragyan-amp marked this pull request as ready for review October 28, 2025 14:04

pragyan-amp marked this pull request as draft October 28, 2025 14:23

punit-naik-amp force-pushed the CHUCK-3-job-status-monitoring-for-chuck-data branch 5 times, most recently from bc5316d to cdc8dc8 Compare October 28, 2025 20:06

punit-naik-amp commented Oct 30, 2025

View reviewed changes

punit-naik-amp force-pushed the CHUCK-3-job-status-monitoring-for-chuck-data branch from 3516cd2 to 11b2322 Compare October 30, 2025 09:07

punit-naik-amp self-assigned this Oct 30, 2025

punit-naik-amp commented Oct 30, 2025

View reviewed changes

punit-naik-amp force-pushed the CHUCK-3-job-status-monitoring-for-chuck-data branch from 11b2322 to 165ebe6 Compare October 30, 2025 09:15

punit-naik-amp commented Oct 30, 2025

View reviewed changes

punit-naik-amp marked this pull request as ready for review October 30, 2025 09:22

pragyan-amp reviewed Oct 30, 2025

View reviewed changes

pragyan-amp approved these changes Oct 30, 2025

View reviewed changes

punit-naik-amp added 4 commits October 31, 2025 01:15

Fixed json payload of usage endpoint by encoding it properly

71c3c71

updated valid response codes for usage endpoint response

e736d20

punit-naik-amp force-pushed the CHUCK-3-job-status-monitoring-for-chuck-data branch from 165ebe6 to e736d20 Compare October 30, 2025 19:46

punit-naik-amp merged commit fa66aba into main Oct 30, 2025
2 checks passed

punit-naik-amp deleted the CHUCK-3-job-status-monitoring-for-chuck-data branch October 30, 2025 19:47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Job status monitoring for chuck data #36

Job status monitoring for chuck data #36

Uh oh!

punit-naik-amp commented Oct 28, 2025 •

edited

Loading

Uh oh!

punit-naik-amp Oct 30, 2025 •

edited

Loading

Uh oh!

punit-naik-amp Oct 30, 2025

Uh oh!

punit-naik-amp Oct 30, 2025

Uh oh!

punit-naik-amp Oct 30, 2025

Uh oh!

pragyan-amp Oct 30, 2025

Uh oh!

pragyan-amp left a comment

Uh oh!

punit-naik-amp commented Oct 30, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Job status monitoring for chuck data #36

Job status monitoring for chuck data #36

Uh oh!

Conversation

punit-naik-amp commented Oct 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

punit-naik-amp Oct 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

punit-naik-amp Oct 30, 2025

Choose a reason for hiding this comment

Uh oh!

punit-naik-amp Oct 30, 2025

Choose a reason for hiding this comment

Uh oh!

punit-naik-amp Oct 30, 2025

Choose a reason for hiding this comment

Uh oh!

pragyan-amp Oct 30, 2025

Choose a reason for hiding this comment

Uh oh!

pragyan-amp left a comment

Choose a reason for hiding this comment

Uh oh!

punit-naik-amp commented Oct 30, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

punit-naik-amp commented Oct 28, 2025 •

edited

Loading

punit-naik-amp Oct 30, 2025 •

edited

Loading