From 24395dcfd9735b330fe05413d087a307dde86c2c Mon Sep 17 00:00:00 2001 From: Tom Booth Date: Wed, 11 Mar 2026 19:43:36 -0400 Subject: [PATCH 01/22] commit for combined current context use cases draft --- docs/context_use_cases_current_pipeline.md | 576 +++++++++++++++++++++ 1 file changed, 576 insertions(+) create mode 100644 docs/context_use_cases_current_pipeline.md diff --git a/docs/context_use_cases_current_pipeline.md b/docs/context_use_cases_current_pipeline.md new file mode 100644 index 0000000..a810ad6 --- /dev/null +++ b/docs/context_use_cases_current_pipeline.md @@ -0,0 +1,576 @@ +# Pipeline Context Use Cases + +## Overview + +The pipeline `Context` class (`pipeline.infrastructure.launcher.Context`) is a single mutable state object used for an entire pipeline execution. It is simultaneously a session state container, a domain metadata container, a cross-stage communication channel, and a persistence unit. + +This document catalogues the current use cases of the pipeline Context as determined by examination of the codebase. The goal is to support design and development of a prototype of a system which serves a similar role to that of the context for RADPS. + +### Key implementation references + +- `Context` / `Pipeline`: `pipeline/infrastructure/launcher.py` +- CLI lifecycle tasks: `pipeline/h/cli/h_init.py`, `pipeline/h/cli/h_save.py`, `pipeline/h/cli/h_resume.py` +- Task dispatch & result acceptance: `pipeline/h/cli/utils.py`, `pipeline/infrastructure/basetask.py` +- PPR-driven execution loops: + - ALMA: `pipeline/infrastructure/executeppr.py` (used by `pipeline/runpipeline.py`) + - VLA: `pipeline/infrastructure/executevlappr.py` (used by `pipeline/runvlapipeline.py`) +- XML procedure execution: `pipeline/recipereducer.py` +- MPI distribution: `pipeline/infrastructure/mpihelpers.py` +- QA framework: `pipeline/qa/` +- Weblog renderer: `pipeline/infrastructure/renderer/htmlrenderer.py` + +--- + +## 1. Context Lifecycle + +The canonical flow through the context is: + +1. **Create session** — `h_init()` constructs a `launcher.Pipeline(...)` and returns a new `Context`. In PPR-driven execution, `executeppr()` or `executevlappr()` also populates project metadata at this point. +2. **Load data** — Import tasks (`h*_importdata`) attach datasets to the context's domain model (`context.observing_run`, measurement sets, scans, SPWs, etc.). +3. **Execute tasks** — Tasks execute against the in-memory context and return a `Results` object. After each task, `Results.accept(context)` records the outcome and mutates shared state. +4. **Accept results** — Inside `accept()`, results are merged via `Results.merge_with_context(context)`. A `ResultsProxy` is pickled to disk per-stage to keep the in-memory context bounded. The weblog is typically rendered after each top-level stage. +5. **Save / resume** — `h_save()` pickles the context; `h_resume(filename='last')` restores it. Driver-managed breakpoints and developer debugging workflows rely on this cycle. + +--- + +## 2. Context Responsibility Overview + +| # | Responsibility | Description | Examples / References | +|---|---|---|---| +| 1 | **Static Observation & Project Data** | Load, store, and provide access to static observation and project data and metadata in memory | `context.observing_run`, `context.project_summary`, `context.project_structure` | +| 2 | **Mutable Observation State** | Load, store, provide in-memory access, and update mutable dynamic observation data or metadata | MS registration, virtual SPW mappings, reference antenna ordering | +| 3 | **Path Management** | Specify and store output paths as part of configuration setup | `output_dir`, `products_dir`, `report_dir`, log paths | +| 4 | **Imaging State Management** | Manage imaging state across pipeline stages | `clean_list_pending`, `imaging_parameters`, masks, thresholds, `synthesized_beams` | +| 5 | **Calibration State Management** | Register, query, and update calibration state | Calibration library (`callibrary`), active/applied cal tables, interval trees | +| 6 | **Image Library Management** | Register and query image products across pipeline stages | `sciimlist`, `calimlist`, `rmsimlist`, `subimlist` | +| 7 | **Session Persistence** | Save and restore the full pipeline session | Pickle serialization, `h_save()`, `h_resume()`, `ResultsProxy` | +| 8 | **MPI / Parallel Distribution** | Pass context to parallel workers and merge results back | Context pickle broadcast to MPI servers; results merged on client | +| 9 | **Inter-Task Data Passing** | Accept task results and merge state back into the context | `merge_with_context()` pattern | +| 10 | **Stage Tracking & Result Accumulation** | Track execution progress, stage numbering, accumulated results | `context.results`, `stage_number`, `task_counter`, result proxies | +| 11 | **Reporting & Export Support** | Provide context data for weblog, QA reports, AQUA XML, and product packaging | `context.observing_run` for weblog, `context.project_structure` for archive labels | +| 12 | **QA Score Storage** | Store and provide access to QA scores | QA score objects appended to `result.qa.pool` | +| 13 | **Debuggability / Inspectability** | Context state must be human-readable and inspectable for post-mortem analysis | Per-stage tracebacks, timings, timetracker integration | +| 14 | **Telescope-Specific State** | Sub-context used only by telescope-specific code | `context.evla` (VLA), conditionally created | +| 15 | **Lifecycle Notifications** | Emit events at key lifecycle points | Event bus: `ResultAcceptingEvent`, `ContextCreated`, `TaskStarted`, `TaskComplete` | + +--- + +## 3. Use Cases + +Each use case describes a need that the pipeline's central state management must satisfy. They are written to be implementation-neutral in their core description, with implementation notes appended where the codebase provides important detail. + +--- + +### UC-01 — Load and Provide Access to Observation Metadata +*Responsibilities: 1, 2* + +| Field | Content | +|-------|---------| +| **Actor(s)** | Data import task, any downstream task, heuristics, renderers, QA handlers | +| **Summary** | The system must load observation metadata (datasets, spectral windows, fields, antennas, scans, time ranges) and make it queryable by all subsequent processing steps. It must also provide a unified identifier scheme when multiple datasets use different native numbering. | +| **Postconditions** | All registered datasets are queryable by name, type, or internal identifier without re-reading raw data from disk. | + +**Implementation notes** — `context.observing_run` is the single most heavily queried context facet: + +- `context.observing_run.get_ms(name=vis)` — resolve an MS by filename +- `context.observing_run.measurement_sets` — iterate all registered MS objects +- `context.observing_run.get_measurement_sets_of_type(dtypes)` — filter by data type (RAW, REGCAL_CONTLINE_ALL, BASELINED, etc.) +- `context.observing_run.virtual2real_spw_id(vspw, ms)` / `real2virtual_spw_id(...)` — translate between abstract pipeline SPW IDs and CASA-native IDs +- `context.observing_run.virtual_science_spw_ids` — virtual SPW catalog +- `context.observing_run.ms_reduction_group` — per-group reduction metadata (single-dish) +- Provenance fields: `.start_datetime`, `.end_datetime`, `.project_ids`, `.schedblock_ids`, `.execblock_ids`, `.observers` + +MS objects are rich domain objects carrying scans, fields, SPWs, antennas, reference antenna ordering, etc. Tasks read per-MS state like `ms.reference_antenna`, `ms.session`, `ms.start_time`, `ms.origin_ms`. + +--- + +### UC-02 — Store and Provide Project-Level Metadata +*Responsibilities: 1* + +| Field | Content | +|-------|---------| +| **Actor(s)** | Initialization, any task, report generators | +| **Summary** | The system must store project-level metadata (proposal code, PI, telescope, desired sensitivities, processing recipe, OUS identifiers) and make it available to tasks for decision-making and to report generators for labelling outputs. | +| **Postconditions** | Project metadata is available for the lifetime of the processing session. | + +**Implementation notes** — project metadata is typically set once at session start and read many times: + +- `context.project_summary = project.ProjectSummary(...)` — set by `executeppr()` / `executevlappr()` +- `context.project_structure = project.ProjectStructure(...)` — set by PPR executors +- `context.project_performance_parameters` — performance parameters from the PPR +- `context.set_state('ProjectStructure', 'recipe_name', value)` — used by `recipereducer.reduce()` and SD heuristics +- `context.processing_intents` — set by `Pipeline` during initialization + +This is a strong candidate for a separate, immutable-after-init sub-record in any future context schema. + +--- + +### UC-03 — Manage Execution Paths and Output Locations +*Responsibilities: 3* + +| Field | Content | +|-------|---------| +| **Actor(s)** | Initialization, any task, report generators, export code | +| **Summary** | The system must centrally define and provide working directories, report directories, product directories, and logical filenames for logs, scripts, and reports. Tasks resolve file paths through these centrally managed locations. On session restore, paths must be overridable to adapt to a new environment. | +| **Postconditions** | All tasks share a consistent set of paths for inputs and outputs. | + +**Implementation notes:** + +- Path roots: `output_dir`, `report_dir`, `products_dir` +- Context name drives deterministic, named run directories +- Relocation semantics are supported for results proxies (basenames stored) and common output layout +- PPR-driven execution may derive paths from environment variables (e.g., `SCIPIPE_ROOTDIR`) + +--- + +### UC-04 — Register and Query Calibration State +*Responsibilities: 5* + +| Field | Content | +|-------|---------| +| **Actor(s)** | Calibration tasks (bandpass, gaincal, applycal, polcal, selfcal, restoredata), heuristics, importdata, mstransform, uvcontsub | +| **Summary** | The system must allow calibration tasks to register solutions (indexed by data selection: field, spectral window, antenna, intent, time interval), and allow downstream tasks to query for all calibrations applicable to a given data selection. It must distinguish between calibrations pending application and those already applied. Registration must support transactional multi-entry updates — tasks often register multiple calibrations atomically within a single result acceptance. | +| **Postconditions** | Calibration state is queryable and correctly scoped to data selections. | + +**Implementation notes** — `context.callibrary` is the primary cross-stage communication channel for calibration workflows: + +- **Write:** `context.callibrary.add(calto, calfrom)` — register a calibration application (cal table + target selection); `context.callibrary.unregister_calibrations(matcher)` — remove by predicate +- **Read:** `context.callibrary.active.get_caltable(caltypes=...)` — list active cal tables; `context.callibrary.get_calstate(calto)` — get full application state for a target selection +- Backed by `CalApplication` → `CalTo` / `CalFrom` objects with interval trees for efficient matching; append-mostly, ordered by registration time + +--- + +### UC-05 — Accumulate Imaging State Across Multiple Steps +*Responsibilities: 4, 6* + +| Field | Content | +|-------|---------| +| **Actor(s)** | Imaging-related tasks (planning, production, quality checking, self-calibration, export) | +| **Summary** | The system must allow imaging state — target lists, imaging parameters, masks, thresholds, sensitivity estimates, and produced image references — to be computed by one step and read or refined by later steps. Multiple steps may contribute to a progressively refined imaging configuration. The system must also maintain typed registries of produced images (science, calibrator, RMS, sub-product) with add/query semantics. | +| **Postconditions** | The accumulated imaging state reflects contributions from all completed imaging-related steps; all produced images are registered and queryable. | + +**Implementation notes** — this is the most fragile part of the current context design. Attributes are added ad-hoc, there is no schema, and defensive `hasattr()` checks appear in the code: + +| Attribute | Written by | Read by | +|---|---|---| +| `clean_list_pending` | `editimlist`, `makeimlist`, `findcont`, `makeimages` | `findcont`, `tclean`, `transformimagedata`, `uvcontsub`, `checkproductsize` | +| `clean_list_info` | `makeimlist`, `makeimages` | display/renderer code | +| `imaging_mode` | `editimlist` | `makermsimages`, `makecutoutimages`, `makeimages` | +| `imaging_parameters` | PPR / `editimlist` | `tclean`, `checkproductsize`, heuristics | +| `synthesized_beams` | `imageprecheck`, `tclean`, `checkproductsize`, `makeimlist`, `makeimages` | `checkproductsize`, heuristics | +| `size_mitigation_parameters` | `checkproductsize` | downstream stages | +| `selfcal_targets`, `selfcal_resources` | `selfcal` | `exportdata` | + +Image libraries provide typed registries: + +- `context.sciimlist.add_item(imageitem)` / `.get_imlist()` — science images +- `context.calimlist` — calibrator images +- `context.rmsimlist` — RMS images +- `context.subimlist` — sub-product images (cutouts, cubes) + +A future design should formalize imaging state as a typed state machine or versioned configuration sub-document, and consider separating image *metadata* (tracked in context) from image *data* (stored in artifact store). + +--- + +### UC-06 — Track Execution Progress and Stage History +*Responsibilities: 9, 10* + +| Field | Content | +|-------|---------| +| **Actor(s)** | Workflow engine, any task, report generators, operators | +| **Summary** | The system must track which processing step is currently executing, assign a unique sequential identifier to each step, and maintain an ordered history of all completed steps and their outcomes. This history must be available for reporting, script generation, and resumption after interruption. Per-stage tracebacks and timings must be preserved. Stage numbering must remain coherent across resumes. | +| **Postconditions** | The full execution history is retrievable in order; the current step is identifiable. | + +**Implementation notes:** + +- `context.results` holds an ordered list of `ResultsProxy` objects (proxied to disk to bound memory) +- `context.stage_number` and `context.task_counter` track progress +- Timetracker integration provides per-stage timing data +- Results proxies store basenames for portability + +--- + +### UC-07 — Propagate Task Outputs to Downstream Tasks +*Responsibilities: 9, 10* + +| Field | Content | +|-------|---------| +| **Actor(s)** | Any task producing output that subsequent tasks depend on | +| **Summary** | When a task produces outputs that change the processing state (e.g., new calibrations, updated flag summaries, image products, revised parameters), the system must provide a mechanism for those outputs to become visible to all subsequent tasks that need them. The system must also record the output for later retrieval by reports and exports. | +| **Postconditions** | Downstream tasks see an updated view of the processing state; the output is recorded in the execution history. | + +**Implementation notes** — there are two propagation mechanisms: + +1. **Structured state merge** — `Results.merge_with_context(context)` updates calibration library, image libraries, and other typed state. +2. **Results-list walking** — tasks read `context.results` to find outputs from earlier stages. For example: + - VLA tasks compute `stage_number` from `context.results[-1].read().stage_number + 1` + - `vlassmasking` iterates `context.results[::-1]` to find the latest `MakeImagesResult` + - Export/AQUA code reads `context.results[0]` and `context.results[-1]` for timestamps + +The results-list walking pattern is fragile (indices shift if stages are inserted/skipped), slow (requires unpickling), and implicit (no declared dependency). A future design should provide explicit stage-to-stage data dependencies. + +--- + +### UC-08 — Support Multiple Orchestration Drivers +*Responsibilities: 9, 10* + +| Field | Content | +|-------|---------| +| **Actor(s)** | Operations / automated processing (PPR-driven batch), pipeline developer / power user (interactive), recipe executor | +| **Summary** | The context is created and consumed by multiple front-ends: PPR command lists, XML procedures, or interactive task calls. The system must remain the stable state contract across these drivers. It must be createable and resumable from non-interactive and interactive drivers, support driver-injected run metadata, tolerate partial execution controls (`startstage`, `exitstage`) and breakpoint-driven stop/resume semantics, and provide machine-detectable success/failure signals. | +| **Postconditions** | The same context state is usable regardless of which orchestration driver created or resumed it. | + +**Implementation notes** — two orchestration planes converge on the same task implementations: + +- **Task-driven**: direct task calls via CLI wrappers in `pipeline/h/cli/` +- **Command-list-driven**: PPR and XML procedure commands via `executeppr.py` / `executevlappr.py` and `recipereducer.py` + +They differ in how inputs are marshalled, how session paths are selected, and how resume is initiated, but the persisted context is the same. + +--- + +### UC-09 — Save and Restore a Processing Session +*Responsibilities: 7* + +| Field | Content | +|-------|---------| +| **Actor(s)** | Pipeline operator, workflow engine, developers | +| **Summary** | The system must be able to serialize the complete processing state to disk (all observation data, calibration state, execution history, imaging state, project metadata) and later restore it so that processing can resume from the saved point. The serialization must preserve enough state to resume within a compatible version window. On restore, paths must be adaptable to a new filesystem environment. | +| **Postconditions** | After restore, the system is in the same state as when saved; processing can continue. | + +**Implementation notes:** + +- `h_save()` pickles the whole context to `.context` +- `h_resume(filename='last')` loads the most recent `.context` file +- Per-stage results are proxied to disk (`saved_state/result-stageX.pickle`) to keep the in-memory context smaller +- Used by driver-managed breakpoint/resume (`executeppr(..., bpaction='resume')`) and developer debugging workflows + +--- + +### UC-10 — Provide State to Parallel Workers +*Responsibilities: 8* + +| Field | Content | +|-------|---------| +| **Actor(s)** | Workflow engine, MPI worker processes | +| **Summary** | When work is distributed across parallel workers, each worker needs read-only access to the current processing state (observation metadata, calibration state, etc.). The system must provide a mechanism for workers to obtain a consistent snapshot of the state. Workers must not be able to modify the authoritative state directly. The snapshot must be small enough to broadcast efficiently. | +| **Postconditions** | Each worker has a consistent, read-only view of the processing state for the duration of its work. | + +**Implementation notes** — `pipeline/infrastructure/mpihelpers.py`, class `Tier0PipelineTask`: + +1. The MPI client saves the context to disk as a pickle: `context.save(path)`. +2. Task arguments are also pickled to disk alongside the context. +3. On the server, `get_executable()` loads the context, modifies `context.logs['casa_commands']` to a server-local temp path, creates the task's `Inputs(context, **task_args)`, then executes the task. +4. For `Tier0JobRequest` (lower-level distribution), the executor is shallow-copied *excluding* the context reference to stay within the MPI buffer limit (~150 MiB, per PIPE-1337). + +--- + +### UC-11 — Aggregate Results from Parallel Workers +*Responsibilities: 8, 9* + +| Field | Content | +|-------|---------| +| **Actor(s)** | Workflow engine | +| **Summary** | After parallel workers complete, the system must collect their individual results and incorporate them into the authoritative processing state. The aggregation must be safe (no conflicting concurrent writes) and complete before the next sequential step begins. | +| **Postconditions** | The processing state reflects the combined outcomes of all parallel workers. | + +--- + +### UC-12 — Provide Data for Report Generation +*Responsibilities: 11, 12* + +| Field | Content | +|-------|---------| +| **Actor(s)** | Report generators (weblog, quality reports, reproducibility scripts, AQUA reports, pipeline statistics) | +| **Summary** | The system must provide report generators with read-only access to: observation metadata, project metadata, execution history (including per-step outcomes and QA scores), log references, and path information. Reports include human-readable web pages, machine-readable quality reports, and reproducible processing scripts. | +| **Postconditions** | Reports accurately reflect the processing state at the time of generation. | + +**Implementation notes** — `WebLogGenerator.render(context)` in `pipeline/infrastructure/renderer/htmlrenderer.py`: + +- Reads `context.results` — unpickled from `ResultsProxy` objects, iterated for every renderer +- Reads `context.report_dir`, `context.output_dir` — filesystem layout +- Reads `context.observing_run.*` — MS metadata, scheduling blocks, execution blocks, observers, project IDs, start/end times +- Reads `context.project_summary.telescope` — to determine telescope-specific page layouts (ALMA vs VLA vs NRO) +- Reads `context.project_structure.*` — OUS IDs, PPR file, recipe name +- Reads `context.logs['casa_commands']` — CASA command history + +The renderer iterates `context.results` multiple times (assigning to topics, extracting flags, building timelines). The current approach requires unpickling *every* result into memory, then re-proxying when done. A lazy or streaming model would reduce peak memory. + +--- + +### UC-13 — Compute and Store Quality Assessments +*Responsibilities: 12* + +| Field | Content | +|-------|---------| +| **Actor(s)** | QA scoring framework, report generators, downstream decision-making | +| **Summary** | After each processing step completes, the system must support evaluating the outcome against quality thresholds (which may depend on telescope, project parameters, or observation properties) and recording normalized quality scores. These scores must be retrievable for reporting and for downstream decision-making. | +| **Postconditions** | Quality scores are associated with the relevant processing step and accessible to reports and downstream logic. | + +**Implementation notes** — after `merge_with_context()`, `accept()` triggers `pipelineqa.qa_registry.do_qa(context, result)`: + +- QA handlers implement `QAPlugin.handle(context, result)` +- They typically call `context.observing_run.get_ms(vis)` to look up metadata for scoring (antenna count, channel count, SPW properties, field intents) +- Some handlers check `context.imaging_mode` to branch on VLASS-specific scoring +- Scores are appended to `result.qa.pool` — they don't mutate the context directly + +QA handlers are *read-only* with respect to context and could operate on a frozen snapshot, making them a good candidate for parallelization. + +--- + +### UC-14 — Support Interactive Inspection and Debugging +*Responsibilities: 13* + +| Field | Content | +|-------|---------| +| **Actor(s)** | Pipeline developer, pipeline operator, CI harnesses | +| **Summary** | The system must allow an operator to inspect the current processing state: which datasets are registered, what calibrations exist, how many steps have completed, what their outcomes were. On failure, a snapshot of the state should be available for post-mortem analysis. The system must provide deterministic paths/outputs that a test harness can validate, and must surface failures beyond raw task exceptions (e.g., weblog rendering failures captured via timetracker). | +| **Postconditions** | The operator can understand the current state of processing and diagnose problems. | + +--- + +### UC-15 — Isolate Telescope-Specific State +*Responsibilities: 14* + +| Field | Content | +|-------|---------| +| **Actor(s)** | Telescope-specific tasks and heuristics | +| **Summary** | The system must support storing instrument-specific state (e.g., VLA-specific solution intervals or gain metadata) in a way that is accessible to telescope-specific tasks but does not pollute the state used by generic or other-telescope tasks. This state is created conditionally based on the instrument. | +| **Postconditions** | Telescope-specific state is available to the tasks that need it; absent when the instrument does not require it. | + +**Implementation notes** — `context.evla` is a `collections.defaultdict(dict)`, keyed as `context.evla['msinfo'][ms_name].`: + +- **Written by:** `hifv_importdata` (creates + initializes), `testBPdcals` (gain intervals, ignorerefant), `fluxscale/solint`, `fluxboot` +- **Read by:** nearly every VLA calibration task and heuristic +- Accessed fields include: `gain_solint1`, `gain_solint2`, `setjy_results`, `ignorerefant`, various `*_field_select_string` / `*_scan_select_string` values, `fluxscale_sources`, `spindex_results`, and many more + +This is a completely untyped, dictionary-of-dictionaries sidecar. A future design should define a typed state object, provide accessor methods rather than raw dict lookups, and separate telescope-specific concerns from the generic context via composition (e.g., `context.get_extension('evla')`). + +--- + +### UC-16 — Package and Export Pipeline Products +*Responsibilities: 3, 11* + +| Field | Content | +|-------|---------| +| **Actor(s)** | Export task, archive system | +| **Summary** | The system must provide an export mechanism that reads datasets, calibration state, image products, reports, scripts, and project identifiers from the processing state and assembles them into a deliverable product package. The package must be structured for downstream archive ingestion. | +| **Postconditions** | A self-contained product package exists on disk. | + +--- + +### UC-17 — Emit Lifecycle Notifications +*Responsibilities: 15* + +| Field | Content | +|-------|---------| +| **Actor(s)** | Workflow engine, event subscribers (loggers, progress monitors) | +| **Summary** | The system must emit notifications at key lifecycle points (session start, session restore, step start, step completion, result acceptance) so that external observers (logging, progress reporting, live dashboards) can track execution without polling. | +| **Postconditions** | Subscribers are notified of lifecycle transitions as they occur. | + +**Implementation notes** — `pipeline.infrastructure.eventbus.send_message(event)`: + +- Event types: `ResultAcceptingEvent`, `ContextCreated`, `TaskStarted`, `TaskComplete` +- The event bus exists and fires events, but is lightly used — `merge_with_context` remains the primary data flow mechanism +- A future design could elevate the event bus to the primary state mutation channel (event-sourcing pattern), enabling audit trails, undo, and distributed observation + +--- + +## 4. Responsibility-to-Use-Case Traceability + +| # | Responsibility | Use Cases | +|---|---|---| +| 1 | Static Observation & Project Data | UC-01, UC-02 | +| 2 | Mutable Observation State | UC-01 | +| 3 | Path Management | UC-03, UC-16 | +| 4 | Imaging State Management | UC-05 | +| 5 | Calibration State Management | UC-04 | +| 6 | Image Library Management | UC-05 | +| 7 | Session Persistence | UC-09 | +| 8 | MPI / Parallel Distribution | UC-10, UC-11 | +| 9 | Inter-Task Data Passing | UC-06, UC-07, UC-11 | +| 10 | Stage Tracking & Result Accumulation | UC-06, UC-07, UC-08 | +| 11 | Reporting & Export Support | UC-12, UC-16 | +| 12 | QA Score Storage | UC-13 | +| 13 | Debuggability / Inspectability | UC-14 | +| 14 | Telescope-Specific State | UC-15 | +| 15 | Lifecycle Notifications | UC-17 | + +--- + +## 5. Use Cases the Current Design Cannot Handle + +The following describe scenarios that the current context design *does not support* but that could be valuable in a future architecture. + +### FUC-01 — Concurrent / Overlapping Task Execution + +Today all task execution is strictly serial. The context is a mutable, shared-everything singleton with no locking or isolation between stages. Many calibration stages are independent per-MS or per-SPW and could benefit from parallelization. + +**What would be needed:** + +- A context that supports isolated read snapshots (like database transactions or copy-on-write) +- A merge/reconciliation step when concurrent results are accepted +- Explicit declaration of which context fields each task reads and writes + +### FUC-02 — Cloud / Distributed Execution Without Shared Filesystem + +The current context is a pickle file on a local/shared filesystem. MPI distribution requires all nodes to see the same filesystem. + +**What would be needed:** + +- A context store backed by a database or object store (S3, GCS) +- Artifact references rather than filesystem paths for cal tables and images +- Tasks that can operate on remote datasets without requiring local copies + +### FUC-03 — Multi-Language / Multi-Framework Access to Context + +The context is a Python object graph, tightly coupled to CASA's Python runtime. Non-Python clients (C++, Julia, JavaScript dashboards) cannot access it. + +**What would be needed:** + +- A language-neutral serialization format (Protocol Buffers, JSON-Schema, Arrow) +- A query API (REST, gRPC, or GraphQL) +- Type definitions shared across languages + +### FUC-04 — Streaming / Incremental Processing + +The current session model assumes all data is available at session start and cannot process data as it arrives from the correlator or archive. + +**What would be needed:** + +- A context that supports incremental dataset registration (add new scans/EBs to a live session) +- Tasks that can detect "new data available" and re-process incrementally +- A results model that supports versioning (re-run produces a new version rather than overwriting) + +### FUC-05 — Provenance and Reproducibility Guarantees + +There is no formal record of which context state a task observed when it ran. Re-running from a saved context yields the state *after* the last save, not the state live at stage N. + +**What would be needed:** + +- Immutable snapshots of context state per-stage (event sourcing) +- Hashing of all task inputs (context fields + parameters) for cache invalidation / reproducibility tokens +- Ability to replay a run from the event log + +### FUC-06 — Fine-Grained Access Control / Multi-Tenant Context + +The context is an all-or-nothing object with no concept of access control, multi-user sessions, or data isolation between projects. + +**What would be needed:** + +- Per-project context namespacing +- Role-based access to context fields +- Audit logging of all context mutations + +### FUC-07 — Partial Re-Execution / Targeted Stage Re-Run + +Today "resume" means "restart from the last completed stage". There is no way to selectively re-run a single mid-pipeline stage with different parameters while keeping earlier and later stages intact. + +**What would be needed:** + +- Explicit dependency tracking between stages (which context fields does stage N consume?) +- Ability to invalidate downstream stages when a mid-pipeline stage is re-run +- Versioned results per stage (keep old version alongside new version) + +### FUC-08 — External System Integration (Archive, Scheduling, QA Dashboards) + +Today, context is a local, in-process object. Other systems (archive ingest, scheduling database, QA dashboards) interact only via offline products (weblog, manifest, AQUA XML). + +**What would be needed:** + +- A stable, queryable context API (REST/gRPC) that external systems can poll or subscribe to +- Webhook / event notification support for state transitions +- A standard schema for context summaries consumable by external systems + +--- + +## 6. Architectural Observations + +### The context is a "big ball of state", by design + +The current approach is extremely flexible for a long-running, stateful CASA session, but there is no explicit schema boundary between persisted state, ephemeral caches, runtime-only services, and large artifacts. Tasks can (and do) add new fields in an ad-hoc way over time. + +### Persistence is pickle-based + +Pickle works for short-lived resume/debug use cases, but it is fragile across version changes, risky as a long-term archive format, and not friendly to multi-writer or multi-process updates. The codebase mitigates size by proxying stage results to disk, but the context itself remains a potentially large and unstable object graph. + +### Two orchestration planes converge on the same context + +Task-driven (interactive CLI) and command-list-driven (PPR / XML procedures) execution both produce and consume the same context. They differ in how inputs are marshalled, how paths are selected, and how resume is initiated, but the persisted context is the same object. + +--- + +## 7. Improvement Suggestions for Next-Generation Design + +These are phrased as requirements and design directions, not as a call to rewrite everything immediately. + +### 1) Split "context data" from "context services" + +Define a minimal, explicit **ContextData** model that is typed, schema-versioned, and serializable in a stable format (JSON/MsgPack/Arrow). Attach runtime-only services (CASA tool handles, caches, heuristics engines) around it rather than mixing them into the same object. + +### 2) Introduce a ContextStore interface + +Replace "pickle a Python object graph" with a storage abstraction (`get`, `put`, `list_runs`). Backends can start simple (SQLite) and grow (Postgres/object store) without changing task logic. + +### 3) Make state transitions explicit (event-sourced or patch-based) + +The existing event bus (`pipeline.infrastructure.eventbus`) could be elevated to record task lifecycle events and key state changes, yielding reproducibility, easier partial rebuilds, and better distributed orchestration. + +### 4) Treat large artifacts as references, not context fields + +Store large arrays/images/tables in an artifact store and carry only references in context data. This avoids "accidentally pickle a GiB array" and makes distribution/cloud execution more realistic. + +### 5) Remove reliance on global interactive stacks for non-interactive execution + +Make tasks accept an explicit context handle. Keep interactive convenience wrappers but do not make them the core contract. + +### 6) Represent the execution plan as context data + +Record the effective execution plan (linear or DAG) alongside run state to support provenance, partial execution, and targeted re-runs. + +### 7) Adopt a versioned compatibility policy + +Define whether operational contexts must be resumable within a supported release window (with schema versioning + migrations) versus best-effort for development contexts. + +--- + +## 8. Context Contract Summary + +The following capabilities appear to be **hard requirements** for any replacement system, derived from current behavior and internal usage patterns: + +**System-level requirements:** + +- Run identity: `context_id`, recipe/procedure name, inputs, operator/mode +- Path layout: working/report/products directories with ability to relocate +- Dataset inventory: execution blocks / measurement sets with per-MS metadata +- Stage results timeline: ordered stages, durations, QA outcomes, tracebacks +- Export products: weblog tar, manifest, AQUA report, scripts +- Resume: restart from last known good stage (or after a breakpoint) + +**Internal usage requirements:** + +- Fast MS lookup: random-access by name, filtering by data type, virtual↔real SPW translation +- Calibration library: append-oriented, ordered, with transactional multi-entry updates and predicate-based queries +- Image library: four typed registries (science, calibrator, RMS, sub-product) with add/query semantics +- Imaging state: typed, versioned configuration for the imaging sub-pipeline +- QA scoring: read-only context snapshot for parallel-safe QA handler execution +- Weblog rendering: read-only traversal of full results timeline + MS metadata + project metadata +- MPI/distributed: efficient context snapshot broadcast + results write-back +- Cross-stage data flow: explicit named outputs rather than results-list walking +- Project metadata: immutable-after-init sub-record +- Telescope-specific state: typed, composable extension rather than untyped dict + +--- + +## 9. Open Questions + +- Are there additional use cases not captured by either review? Reviewer input may surface new cases. +- Should the future use cases (Section 5) be prioritized? If so, which are most impactful for RADPS? +- What compatibility guarantees should the next-generation context provide across pipeline releases? + +--- + +## 10. Contributors + +- **Kristin Berry** — Worked on this draft +- **Shawn Booth** — Worked on this draft From ae2a245f15913402aec47b2a6c9e8052d0f43fe2 Mon Sep 17 00:00:00 2001 From: Tom Booth Date: Mon, 16 Mar 2026 13:03:06 -0400 Subject: [PATCH 02/22] moved analysis/commentary/design recommendations into a separate appendix file; updated some wording choices for more accurate language and removed deployment-level GAP scenario --- docs/context_current_pipeline_appendix.md | 348 +++++++++++++++++ docs/context_use_cases_current_pipeline.md | 423 +++------------------ 2 files changed, 402 insertions(+), 369 deletions(-) create mode 100644 docs/context_current_pipeline_appendix.md diff --git a/docs/context_current_pipeline_appendix.md b/docs/context_current_pipeline_appendix.md new file mode 100644 index 0000000..cdfd8f4 --- /dev/null +++ b/docs/context_current_pipeline_appendix.md @@ -0,0 +1,348 @@ +# Pipeline Context: Supplementary Analysis + +This document contains architectural observations, design recommendations, and reference material that supplement the use cases in [context_use_cases_current_pipeline.md](context_use_cases_current_pipeline.md). These sections were separated to keep the use-case document focused on requirements. + +--- + +## Context Responsibility Overview + +| # | Responsibility | Description | Examples / References | +|---|---|---|---| +| 1 | **Static Observation & Project Data** | Load, store, and provide access to static observation and project data and metadata in memory | `context.observing_run`, `context.project_summary`, `context.project_structure` | +| 2 | **Mutable Observation State** | Load, store, provide in-memory access, and update mutable dynamic observation data or metadata | MS registration, virtual SPW mappings, reference antenna ordering | +| 3 | **Path Management** | Specify and store output paths as part of configuration setup | `output_dir`, `products_dir`, `report_dir`, log paths | +| 4 | **Imaging State Management** | Manage imaging state across pipeline stages | `clean_list_pending`, `imaging_parameters`, masks, thresholds, `synthesized_beams` | +| 5 | **Calibration State Management** | Register, query, and update calibration state | Calibration library (`callibrary`), active/applied cal tables, interval trees | +| 6 | **Image Library Management** | Register and query image products across pipeline stages | `sciimlist`, `calimlist`, `rmsimlist`, `subimlist` | +| 7 | **Session Persistence** | Save and restore the full pipeline session | Pickle serialization, `h_save()`, `h_resume()`, `ResultsProxy` | +| 8 | **MPI / Parallel Distribution** | Pass context to parallel workers and merge results back | Context pickle broadcast to MPI servers; results merged on client | +| 9 | **Inter-Task Data Passing** | Accept task results and merge state back into the context | `merge_with_context()` pattern | +| 10 | **Stage Tracking & Result Accumulation** | Track execution progress, stage numbering, accumulated results | `context.results`, `stage_number`, `task_counter`, result proxies | +| 11 | **Reporting & Export Support** | Provide context data for weblog, QA reports, AQUA XML, and product packaging | `context.observing_run` for weblog, `context.project_structure` for archive labels | +| 12 | **QA Score Storage** | Store and provide access to QA scores | QA score objects appended to `result.qa.pool` | +| 13 | **Debuggability / Inspectability** | Context state must be human-readable and inspectable for post-mortem analysis | Per-stage tracebacks, timings, timetracker integration | +| 14 | **Telescope-Specific State** | Sub-context used only by telescope-specific code | `context.evla` (VLA), conditionally created | +| 15 | **Lifecycle Notifications** | Emit events at key lifecycle points | Event bus: `ResultAcceptingEvent`, `ContextCreated`, `TaskStarted`, `TaskComplete` | + +--- + +## Responsibility-to-Use-Case Traceability + +| # | Responsibility | Use Cases | +|---|---|---| +| 1 | Static Observation & Project Data | UC-01, UC-02 | +| 2 | Mutable Observation State | UC-01 | +| 3 | Path Management | UC-03, UC-16 | +| 4 | Imaging State Management | UC-05 | +| 5 | Calibration State Management | UC-04 | +| 6 | Image Library Management | UC-05 | +| 7 | Session Persistence | UC-09 | +| 8 | MPI / Parallel Distribution | UC-10, UC-11 | +| 9 | Inter-Task Data Passing | UC-06, UC-07, UC-11 | +| 10 | Stage Tracking & Result Accumulation | UC-06, UC-07, UC-08 | +| 11 | Reporting & Export Support | UC-12, UC-16 | +| 12 | QA Score Storage | UC-13 | +| 13 | Debuggability / Inspectability | UC-14 | +| 14 | Telescope-Specific State | UC-15 | +| 15 | Lifecycle Notifications | UC-17 | + +--- + +## Implementation Notes by Use Case + +The following implementation notes describe how each use case is realized in the current pipeline codebase. They were separated from the use-case definitions to keep the requirements document implementation-neutral. + +### UC-01 — Load and Provide Access to Observation Metadata + +**Implementation notes** — `context.observing_run` is the single most heavily queried context facet: + +- `context.observing_run.get_ms(name=vis)` — resolve an MS by filename +- `context.observing_run.measurement_sets` — iterate all registered MS objects +- `context.observing_run.get_measurement_sets_of_type(dtypes)` — filter by data type (RAW, REGCAL_CONTLINE_ALL, BASELINED, etc.) +- `context.observing_run.virtual2real_spw_id(vspw, ms)` / `real2virtual_spw_id(...)` — translate between abstract pipeline SPW IDs and CASA-native IDs +- `context.observing_run.virtual_science_spw_ids` — virtual SPW catalog +- `context.observing_run.ms_reduction_group` — per-group reduction metadata (single-dish) +- Provenance fields: `.start_datetime`, `.end_datetime`, `.project_ids`, `.schedblock_ids`, `.execblock_ids`, `.observers` + +MS objects are rich domain objects carrying scans, fields, SPWs, antennas, reference antenna ordering, etc. Tasks read per-MS state like `ms.reference_antenna`, `ms.session`, `ms.start_time`, `ms.origin_ms`. + +--- + +### UC-02 — Store and Provide Project-Level Metadata + +**Implementation notes** — project metadata is typically set once at session start and read many times: + +- `context.project_summary = project.ProjectSummary(...)` — set by `executeppr()` / `executevlappr()` +- `context.project_structure = project.ProjectStructure(...)` — set by PPR executors +- `context.project_performance_parameters` — performance parameters from the PPR +- `context.set_state('ProjectStructure', 'recipe_name', value)` — used by `recipereducer.reduce()` and SD heuristics +- `context.processing_intents` — set by `Pipeline` during initialization + +This is a strong candidate for a separate, immutable-after-init sub-record in any future context schema. + +--- + +### UC-03 — Manage Execution Paths and Output Locations + +**Implementation notes:** + +- Path roots: `output_dir`, `report_dir`, `products_dir` +- Context name drives deterministic, named run directories +- Relocation semantics are supported for results proxies (basenames stored) and common output layout +- PPR-driven execution may derive paths from environment variables (e.g., `SCIPIPE_ROOTDIR`) + +--- + +### UC-04 — Register and Query Calibration State + +**Implementation notes** — `context.callibrary` is the primary cross-stage communication channel for calibration workflows: + +- **Write:** `context.callibrary.add(calto, calfrom)` — register a calibration application (cal table + target selection); `context.callibrary.unregister_calibrations(matcher)` — remove by predicate +- **Read:** `context.callibrary.active.get_caltable(caltypes=...)` — list active cal tables; `context.callibrary.get_calstate(calto)` — get full application state for a target selection +- Backed by `CalApplication` → `CalTo` / `CalFrom` objects with interval trees for efficient matching; append-mostly, ordered by registration time + +--- + +### UC-05 — Accumulate Imaging State Across Multiple Steps + +**Implementation notes** — this is the most fragile part of the current context design. Attributes are added ad-hoc, there is no schema, and defensive `hasattr()` checks appear in the code: + +| Attribute | Written by | Read by | +|---|---|---| +| `clean_list_pending` | `editimlist`, `makeimlist`, `findcont`, `makeimages` | `findcont`, `tclean`, `transformimagedata`, `uvcontsub`, `checkproductsize` | +| `clean_list_info` | `makeimlist`, `makeimages` | display/renderer code | +| `imaging_mode` | `editimlist` | `makermsimages`, `makecutoutimages`, `makeimages` | +| `imaging_parameters` | PPR / `editimlist` | `tclean`, `checkproductsize`, heuristics | +| `synthesized_beams` | `imageprecheck`, `tclean`, `checkproductsize`, `makeimlist`, `makeimages` | `checkproductsize`, heuristics | +| `size_mitigation_parameters` | `checkproductsize` | downstream stages | +| `selfcal_targets`, `selfcal_resources` | `selfcal` | `exportdata` | + +Image libraries provide typed registries: + +- `context.sciimlist.add_item(imageitem)` / `.get_imlist()` — science images +- `context.calimlist` — calibrator images +- `context.rmsimlist` — RMS images +- `context.subimlist` — sub-product images (cutouts, cubes) + +A future design should formalize imaging state as a typed state machine or versioned configuration sub-document, and consider separating image *metadata* (tracked in context) from image *data* (stored in artifact store). + +--- + +### UC-06 — Track Execution Progress and Stage History + +**Implementation notes:** + +- `context.results` holds an ordered list of `ResultsProxy` objects (proxied to disk to bound memory) +- `context.stage_number` and `context.task_counter` track progress +- Timetracker integration provides per-stage timing data +- Results proxies store basenames for portability + +--- + +### UC-07 — Propagate Task Outputs to Downstream Tasks + +**Implementation notes** — there are two propagation mechanisms: + +1. **Structured state merge** — `Results.merge_with_context(context)` updates calibration library, image libraries, and other typed state. +2. **Results-list walking** — tasks read `context.results` to find outputs from earlier stages. For example: + - VLA tasks compute `stage_number` from `context.results[-1].read().stage_number + 1` + - `vlassmasking` iterates `context.results[::-1]` to find the latest `MakeImagesResult` + - Export/AQUA code reads `context.results[0]` and `context.results[-1]` for timestamps + +The results-list walking pattern is fragile (indices shift if stages are inserted/skipped), slow (requires unpickling), and implicit (no declared dependency). A future design should provide explicit stage-to-stage data dependencies. + +--- + +### UC-08 — Support Multiple Orchestration Drivers + +**Implementation notes** — two orchestration planes converge on the same task implementations: + +- **Task-driven**: direct task calls via CLI wrappers in `pipeline/h/cli/` +- **Command-list-driven**: PPR and XML procedure commands via `executeppr.py` / `executevlappr.py` and `recipereducer.py` + +They differ in how inputs are marshalled, how session paths are selected, and how resume is initiated, but the persisted context is the same. + +--- + +### UC-09 — Save and Restore a Processing Session + +**Implementation notes:** + +- `h_save()` pickles the whole context to `.context` +- `h_resume(filename='last')` loads the most recent `.context` file +- Per-stage results are proxied to disk (`saved_state/result-stageX.pickle`) to keep the in-memory context smaller +- Used by driver-managed breakpoint/resume (`executeppr(..., bpaction='resume')`) and developer debugging workflows + +--- + +### UC-10 — Provide State to Parallel Workers + +**Implementation notes** — `pipeline/infrastructure/mpihelpers.py`, class `Tier0PipelineTask`: + +1. The MPI client saves the context to disk as a pickle: `context.save(path)`. +2. Task arguments are also pickled to disk alongside the context. +3. On the server, `get_executable()` loads the context, modifies `context.logs['casa_commands']` to a server-local temp path, creates the task's `Inputs(context, **task_args)`, then executes the task. +4. For `Tier0JobRequest` (lower-level distribution), the executor is shallow-copied *excluding* the context reference to stay within the MPI buffer limit (~150 MiB, per PIPE-1337). + +--- + +### UC-12 — Provide Data for Report Generation + +**Implementation notes** — `WebLogGenerator.render(context)` in `pipeline/infrastructure/renderer/htmlrenderer.py`: + +- Reads `context.results` — unpickled from `ResultsProxy` objects, iterated for every renderer +- Reads `context.report_dir`, `context.output_dir` — filesystem layout +- Reads `context.observing_run.*` — MS metadata, scheduling blocks, execution blocks, observers, project IDs, start/end times +- Reads `context.project_summary.telescope` — to determine telescope-specific page layouts (ALMA vs VLA vs NRO) +- Reads `context.project_structure.*` — OUS IDs, PPR file, recipe name +- Reads `context.logs['casa_commands']` — CASA command history + +The renderer iterates `context.results` multiple times (assigning to topics, extracting flags, building timelines). The current approach requires unpickling *every* result into memory, then re-proxying when done. A lazy or streaming model would reduce peak memory. + +--- + +### UC-13 — Compute and Store Quality Assessments + +**Implementation notes** — after `merge_with_context()`, `accept()` triggers `pipelineqa.qa_registry.do_qa(context, result)`: + +- QA handlers implement `QAPlugin.handle(context, result)` +- They typically call `context.observing_run.get_ms(vis)` to look up metadata for scoring (antenna count, channel count, SPW properties, field intents) +- Some handlers check `context.imaging_mode` to branch on VLASS-specific scoring +- Scores are appended to `result.qa.pool` — they don't mutate the context directly + +QA handlers are *read-only* with respect to context and could operate on a frozen snapshot, making them a good candidate for parallelization. + +--- + +### UC-15 — Isolate Telescope-Specific State + +**Implementation notes** — `context.evla` is a `collections.defaultdict(dict)`, keyed as `context.evla['msinfo'][ms_name].`: + +- **Written by:** `hifv_importdata` (creates + initializes), `testBPdcals` (gain intervals, ignorerefant), `fluxscale/solint`, `fluxboot` +- **Read by:** nearly every VLA calibration task and heuristic +- Accessed fields include: `gain_solint1`, `gain_solint2`, `setjy_results`, `ignorerefant`, various `*_field_select_string` / `*_scan_select_string` values, `fluxscale_sources`, `spindex_results`, and many more + +This is a completely untyped, dictionary-of-dictionaries sidecar. A future design should define a typed state object, provide accessor methods rather than raw dict lookups, and separate telescope-specific concerns from the generic context via composition (e.g., `context.get_extension('evla')`). + +--- + +### UC-17 — Emit Lifecycle Notifications + +**Implementation notes** — `pipeline.infrastructure.eventbus.send_message(event)`: + +- Event types: `ResultAcceptingEvent`, `ContextCreated`, `TaskStarted`, `TaskComplete` +- The event bus exists and fires events, but is lightly used — `merge_with_context` remains the primary data flow mechanism +- A future design could elevate the event bus to the primary state mutation channel (event-sourcing pattern), enabling audit trails, undo, and distributed observation + +--- + +## Key Implementation References + +- `Context` / `Pipeline`: `pipeline/infrastructure/launcher.py` +- CLI lifecycle tasks: `pipeline/h/cli/h_init.py`, `pipeline/h/cli/h_save.py`, `pipeline/h/cli/h_resume.py` +- Task dispatch & result acceptance: `pipeline/h/cli/utils.py`, `pipeline/infrastructure/basetask.py` +- PPR-driven execution loops: + - ALMA: `pipeline/infrastructure/executeppr.py` (used by `pipeline/runpipeline.py`) + - VLA: `pipeline/infrastructure/executevlappr.py` (used by `pipeline/runvlapipeline.py`) +- XML procedure execution: `pipeline/recipereducer.py` +- MPI distribution: `pipeline/infrastructure/mpihelpers.py` +- QA framework: `pipeline/qa/` +- Weblog renderer: `pipeline/infrastructure/renderer/htmlrenderer.py` + +--- + +## Context Lifecycle + +The canonical flow through the context is: + +1. **Create session** — `h_init()` constructs a `launcher.Pipeline(...)` and returns a new `Context`. In PPR-driven execution, `executeppr()` or `executevlappr()` also populates project metadata at this point. +2. **Load data** — Import tasks (`h*_importdata`) attach datasets to the context's domain model (`context.observing_run`, measurement sets, scans, SPWs, etc.). +3. **Execute tasks** — Tasks execute against the in-memory context and return a `Results` object. After each task, `Results.accept(context)` records the outcome and mutates shared state. +4. **Accept results** — Inside `accept()`, results are merged via `Results.merge_with_context(context)`. A `ResultsProxy` is pickled to disk per-stage to keep the in-memory context bounded. The weblog is typically rendered after each top-level stage. +5. **Save / resume** — `h_save()` pickles the context; `h_resume(filename='last')` restores it. Driver-managed breakpoints and developer debugging workflows rely on this cycle. + +--- + +## Architectural Observations + +### The context is a "big ball of state", by design + +The current approach is extremely flexible for a long-running, stateful CASA session, but there is no explicit schema boundary between persisted state, ephemeral caches, runtime-only services, and large artifacts. Tasks can (and do) add new fields in an ad-hoc way over time. + +### Persistence is pickle-based + +Pickle works for short-lived resume/debug use cases, but it is fragile across version changes, risky as a long-term archive format, and not friendly to multi-writer or multi-process updates. The codebase mitigates size by proxying stage results to disk, but the context itself remains a potentially large and unstable object graph. + +### Two orchestration planes converge on the same context + +Task-driven (interactive CLI) and command-list-driven (PPR / XML procedures) execution both produce and consume the same context. They differ in how inputs are marshalled, how paths are selected, and how resume is initiated, but the persisted context is the same object. + +--- + +## Improvement Suggestions for Next-Generation Design + +These are phrased as requirements and design directions, not as a call to rewrite everything immediately. + +### 1) Split "context data" from "context services" + +Define a minimal, explicit **ContextData** model that is typed, schema-versioned, and serializable in a stable format (JSON/MsgPack/Arrow). Attach runtime-only services (CASA tool handles, caches, heuristics engines) around it rather than mixing them into the same object. + +### 2) Introduce a ContextStore interface + +Replace "pickle a Python object graph" with a storage abstraction (`get`, `put`, `list_runs`). Backends can start simple (SQLite) and grow (Postgres/object store) without changing task logic. + +### 3) Make state transitions explicit (event-sourced or patch-based) + +The existing event bus (`pipeline.infrastructure.eventbus`) could be elevated to record task lifecycle events and key state changes, yielding reproducibility, easier partial rebuilds, and better distributed orchestration. + +### 4) Treat large artifacts as references, not context fields + +Store large arrays/images/tables in an artifact store and carry only references in context data. This avoids "accidentally pickle a GiB array" and makes distribution/cloud execution more realistic. + +### 5) Remove reliance on global interactive stacks for non-interactive execution + +Make tasks accept an explicit context handle. Keep interactive convenience wrappers but do not make them the core contract. + +### 6) Represent the execution plan as context data + +Record the effective execution plan (linear or DAG) alongside run state to support provenance, partial execution, and targeted re-runs. + +### 7) Adopt a versioned compatibility policy + +Define whether operational contexts must be resumable within a supported release window (with schema versioning + migrations) versus best-effort for development contexts. + +--- + +## Context Contract Summary + +The following capabilities appear to be **hard requirements** for any replacement system, derived from current behavior and internal usage patterns: + +**System-level requirements:** + +- Run identity: `context_id`, recipe/procedure name, inputs, operator/mode +- Path layout: working/report/products directories with ability to relocate +- Dataset inventory: execution blocks / measurement sets with per-MS metadata +- Stage results timeline: ordered stages, durations, QA outcomes, tracebacks +- Export products: weblog tar, manifest, AQUA report, scripts +- Resume: restart from last known good stage (or after a breakpoint) + +**Internal usage requirements:** + +- Fast MS lookup: random-access by name, filtering by data type, virtual↔real SPW translation +- Calibration library: append-oriented, ordered, with transactional multi-entry updates and predicate-based queries +- Image library: four typed registries (science, calibrator, RMS, sub-product) with add/query semantics +- Imaging state: typed, versioned configuration for the imaging sub-pipeline +- QA scoring: read-only context snapshot for parallel-safe QA handler execution +- Weblog rendering: read-only traversal of full results timeline + MS metadata + project metadata +- MPI/distributed: efficient context snapshot broadcast + results write-back +- Cross-stage data flow: explicit named outputs rather than results-list walking +- Project metadata: immutable-after-init sub-record +- Telescope-specific state: typed, composable extension rather than untyped dict + +--- + +## Open Questions + +- Are there additional use cases not captured by either review? Reviewer input may surface new cases. +- Should the future use cases (GAP section) be prioritized? If so, which are most impactful for RADPS? +- What compatibility guarantees should the next-generation context provide across pipeline releases? diff --git a/docs/context_use_cases_current_pipeline.md b/docs/context_use_cases_current_pipeline.md index a810ad6..e738da9 100644 --- a/docs/context_use_cases_current_pipeline.md +++ b/docs/context_use_cases_current_pipeline.md @@ -2,67 +2,21 @@ ## Overview -The pipeline `Context` class (`pipeline.infrastructure.launcher.Context`) is a single mutable state object used for an entire pipeline execution. It is simultaneously a session state container, a domain metadata container, a cross-stage communication channel, and a persistence unit. +The pipeline `Context` class (`pipeline.infrastructure.launcher.Context`) is the central mutable state object used for an entire pipeline execution. It is simultaneously a session state container, a domain metadata container, a cross-stage communication channel, and a persistence unit. This document catalogues the current use cases of the pipeline Context as determined by examination of the codebase. The goal is to support design and development of a prototype of a system which serves a similar role to that of the context for RADPS. -### Key implementation references - -- `Context` / `Pipeline`: `pipeline/infrastructure/launcher.py` -- CLI lifecycle tasks: `pipeline/h/cli/h_init.py`, `pipeline/h/cli/h_save.py`, `pipeline/h/cli/h_resume.py` -- Task dispatch & result acceptance: `pipeline/h/cli/utils.py`, `pipeline/infrastructure/basetask.py` -- PPR-driven execution loops: - - ALMA: `pipeline/infrastructure/executeppr.py` (used by `pipeline/runpipeline.py`) - - VLA: `pipeline/infrastructure/executevlappr.py` (used by `pipeline/runvlapipeline.py`) -- XML procedure execution: `pipeline/recipereducer.py` -- MPI distribution: `pipeline/infrastructure/mpihelpers.py` -- QA framework: `pipeline/qa/` -- Weblog renderer: `pipeline/infrastructure/renderer/htmlrenderer.py` +See also: [Supplementary analysis and design recommendations](context_current_pipeline_appendix.md) --- -## 1. Context Lifecycle - -The canonical flow through the context is: - -1. **Create session** — `h_init()` constructs a `launcher.Pipeline(...)` and returns a new `Context`. In PPR-driven execution, `executeppr()` or `executevlappr()` also populates project metadata at this point. -2. **Load data** — Import tasks (`h*_importdata`) attach datasets to the context's domain model (`context.observing_run`, measurement sets, scans, SPWs, etc.). -3. **Execute tasks** — Tasks execute against the in-memory context and return a `Results` object. After each task, `Results.accept(context)` records the outcome and mutates shared state. -4. **Accept results** — Inside `accept()`, results are merged via `Results.merge_with_context(context)`. A `ResultsProxy` is pickled to disk per-stage to keep the in-memory context bounded. The weblog is typically rendered after each top-level stage. -5. **Save / resume** — `h_save()` pickles the context; `h_resume(filename='last')` restores it. Driver-managed breakpoints and developer debugging workflows rely on this cycle. - ---- +## 1. Use Cases -## 2. Context Responsibility Overview - -| # | Responsibility | Description | Examples / References | -|---|---|---|---| -| 1 | **Static Observation & Project Data** | Load, store, and provide access to static observation and project data and metadata in memory | `context.observing_run`, `context.project_summary`, `context.project_structure` | -| 2 | **Mutable Observation State** | Load, store, provide in-memory access, and update mutable dynamic observation data or metadata | MS registration, virtual SPW mappings, reference antenna ordering | -| 3 | **Path Management** | Specify and store output paths as part of configuration setup | `output_dir`, `products_dir`, `report_dir`, log paths | -| 4 | **Imaging State Management** | Manage imaging state across pipeline stages | `clean_list_pending`, `imaging_parameters`, masks, thresholds, `synthesized_beams` | -| 5 | **Calibration State Management** | Register, query, and update calibration state | Calibration library (`callibrary`), active/applied cal tables, interval trees | -| 6 | **Image Library Management** | Register and query image products across pipeline stages | `sciimlist`, `calimlist`, `rmsimlist`, `subimlist` | -| 7 | **Session Persistence** | Save and restore the full pipeline session | Pickle serialization, `h_save()`, `h_resume()`, `ResultsProxy` | -| 8 | **MPI / Parallel Distribution** | Pass context to parallel workers and merge results back | Context pickle broadcast to MPI servers; results merged on client | -| 9 | **Inter-Task Data Passing** | Accept task results and merge state back into the context | `merge_with_context()` pattern | -| 10 | **Stage Tracking & Result Accumulation** | Track execution progress, stage numbering, accumulated results | `context.results`, `stage_number`, `task_counter`, result proxies | -| 11 | **Reporting & Export Support** | Provide context data for weblog, QA reports, AQUA XML, and product packaging | `context.observing_run` for weblog, `context.project_structure` for archive labels | -| 12 | **QA Score Storage** | Store and provide access to QA scores | QA score objects appended to `result.qa.pool` | -| 13 | **Debuggability / Inspectability** | Context state must be human-readable and inspectable for post-mortem analysis | Per-stage tracebacks, timings, timetracker integration | -| 14 | **Telescope-Specific State** | Sub-context used only by telescope-specific code | `context.evla` (VLA), conditionally created | -| 15 | **Lifecycle Notifications** | Emit events at key lifecycle points | Event bus: `ResultAcceptingEvent`, `ContextCreated`, `TaskStarted`, `TaskComplete` | - ---- - -## 3. Use Cases - -Each use case describes a need that the pipeline's central state management must satisfy. They are written to be implementation-neutral in their core description, with implementation notes appended where the codebase provides important detail. +Each use case describes a need that the pipeline's central state management must satisfy. They are written to be implementation-neutral in their core description. For pipeline-specific implementation details per use case, see [Implementation Notes by Use Case](context_current_pipeline_appendix.md#implementation-notes-by-use-case) in the appendix. --- ### UC-01 — Load and Provide Access to Observation Metadata -*Responsibilities: 1, 2* | Field | Content | |-------|---------| @@ -70,22 +24,9 @@ Each use case describes a need that the pipeline's central state management must | **Summary** | The system must load observation metadata (datasets, spectral windows, fields, antennas, scans, time ranges) and make it queryable by all subsequent processing steps. It must also provide a unified identifier scheme when multiple datasets use different native numbering. | | **Postconditions** | All registered datasets are queryable by name, type, or internal identifier without re-reading raw data from disk. | -**Implementation notes** — `context.observing_run` is the single most heavily queried context facet: - -- `context.observing_run.get_ms(name=vis)` — resolve an MS by filename -- `context.observing_run.measurement_sets` — iterate all registered MS objects -- `context.observing_run.get_measurement_sets_of_type(dtypes)` — filter by data type (RAW, REGCAL_CONTLINE_ALL, BASELINED, etc.) -- `context.observing_run.virtual2real_spw_id(vspw, ms)` / `real2virtual_spw_id(...)` — translate between abstract pipeline SPW IDs and CASA-native IDs -- `context.observing_run.virtual_science_spw_ids` — virtual SPW catalog -- `context.observing_run.ms_reduction_group` — per-group reduction metadata (single-dish) -- Provenance fields: `.start_datetime`, `.end_datetime`, `.project_ids`, `.schedblock_ids`, `.execblock_ids`, `.observers` - -MS objects are rich domain objects carrying scans, fields, SPWs, antennas, reference antenna ordering, etc. Tasks read per-MS state like `ms.reference_antenna`, `ms.session`, `ms.start_time`, `ms.origin_ms`. - --- ### UC-02 — Store and Provide Project-Level Metadata -*Responsibilities: 1* | Field | Content | |-------|---------| @@ -93,20 +34,9 @@ MS objects are rich domain objects carrying scans, fields, SPWs, antennas, refer | **Summary** | The system must store project-level metadata (proposal code, PI, telescope, desired sensitivities, processing recipe, OUS identifiers) and make it available to tasks for decision-making and to report generators for labelling outputs. | | **Postconditions** | Project metadata is available for the lifetime of the processing session. | -**Implementation notes** — project metadata is typically set once at session start and read many times: - -- `context.project_summary = project.ProjectSummary(...)` — set by `executeppr()` / `executevlappr()` -- `context.project_structure = project.ProjectStructure(...)` — set by PPR executors -- `context.project_performance_parameters` — performance parameters from the PPR -- `context.set_state('ProjectStructure', 'recipe_name', value)` — used by `recipereducer.reduce()` and SD heuristics -- `context.processing_intents` — set by `Pipeline` during initialization - -This is a strong candidate for a separate, immutable-after-init sub-record in any future context schema. - --- ### UC-03 — Manage Execution Paths and Output Locations -*Responsibilities: 3* | Field | Content | |-------|---------| @@ -114,17 +44,9 @@ This is a strong candidate for a separate, immutable-after-init sub-record in an | **Summary** | The system must centrally define and provide working directories, report directories, product directories, and logical filenames for logs, scripts, and reports. Tasks resolve file paths through these centrally managed locations. On session restore, paths must be overridable to adapt to a new environment. | | **Postconditions** | All tasks share a consistent set of paths for inputs and outputs. | -**Implementation notes:** - -- Path roots: `output_dir`, `report_dir`, `products_dir` -- Context name drives deterministic, named run directories -- Relocation semantics are supported for results proxies (basenames stored) and common output layout -- PPR-driven execution may derive paths from environment variables (e.g., `SCIPIPE_ROOTDIR`) - --- ### UC-04 — Register and Query Calibration State -*Responsibilities: 5* | Field | Content | |-------|---------| @@ -132,16 +54,9 @@ This is a strong candidate for a separate, immutable-after-init sub-record in an | **Summary** | The system must allow calibration tasks to register solutions (indexed by data selection: field, spectral window, antenna, intent, time interval), and allow downstream tasks to query for all calibrations applicable to a given data selection. It must distinguish between calibrations pending application and those already applied. Registration must support transactional multi-entry updates — tasks often register multiple calibrations atomically within a single result acceptance. | | **Postconditions** | Calibration state is queryable and correctly scoped to data selections. | -**Implementation notes** — `context.callibrary` is the primary cross-stage communication channel for calibration workflows: - -- **Write:** `context.callibrary.add(calto, calfrom)` — register a calibration application (cal table + target selection); `context.callibrary.unregister_calibrations(matcher)` — remove by predicate -- **Read:** `context.callibrary.active.get_caltable(caltypes=...)` — list active cal tables; `context.callibrary.get_calstate(calto)` — get full application state for a target selection -- Backed by `CalApplication` → `CalTo` / `CalFrom` objects with interval trees for efficient matching; append-mostly, ordered by registration time - --- ### UC-05 — Accumulate Imaging State Across Multiple Steps -*Responsibilities: 4, 6* | Field | Content | |-------|---------| @@ -149,31 +64,9 @@ This is a strong candidate for a separate, immutable-after-init sub-record in an | **Summary** | The system must allow imaging state — target lists, imaging parameters, masks, thresholds, sensitivity estimates, and produced image references — to be computed by one step and read or refined by later steps. Multiple steps may contribute to a progressively refined imaging configuration. The system must also maintain typed registries of produced images (science, calibrator, RMS, sub-product) with add/query semantics. | | **Postconditions** | The accumulated imaging state reflects contributions from all completed imaging-related steps; all produced images are registered and queryable. | -**Implementation notes** — this is the most fragile part of the current context design. Attributes are added ad-hoc, there is no schema, and defensive `hasattr()` checks appear in the code: - -| Attribute | Written by | Read by | -|---|---|---| -| `clean_list_pending` | `editimlist`, `makeimlist`, `findcont`, `makeimages` | `findcont`, `tclean`, `transformimagedata`, `uvcontsub`, `checkproductsize` | -| `clean_list_info` | `makeimlist`, `makeimages` | display/renderer code | -| `imaging_mode` | `editimlist` | `makermsimages`, `makecutoutimages`, `makeimages` | -| `imaging_parameters` | PPR / `editimlist` | `tclean`, `checkproductsize`, heuristics | -| `synthesized_beams` | `imageprecheck`, `tclean`, `checkproductsize`, `makeimlist`, `makeimages` | `checkproductsize`, heuristics | -| `size_mitigation_parameters` | `checkproductsize` | downstream stages | -| `selfcal_targets`, `selfcal_resources` | `selfcal` | `exportdata` | - -Image libraries provide typed registries: - -- `context.sciimlist.add_item(imageitem)` / `.get_imlist()` — science images -- `context.calimlist` — calibrator images -- `context.rmsimlist` — RMS images -- `context.subimlist` — sub-product images (cutouts, cubes) - -A future design should formalize imaging state as a typed state machine or versioned configuration sub-document, and consider separating image *metadata* (tracked in context) from image *data* (stored in artifact store). - --- ### UC-06 — Track Execution Progress and Stage History -*Responsibilities: 9, 10* | Field | Content | |-------|---------| @@ -181,17 +74,9 @@ A future design should formalize imaging state as a typed state machine or versi | **Summary** | The system must track which processing step is currently executing, assign a unique sequential identifier to each step, and maintain an ordered history of all completed steps and their outcomes. This history must be available for reporting, script generation, and resumption after interruption. Per-stage tracebacks and timings must be preserved. Stage numbering must remain coherent across resumes. | | **Postconditions** | The full execution history is retrievable in order; the current step is identifiable. | -**Implementation notes:** - -- `context.results` holds an ordered list of `ResultsProxy` objects (proxied to disk to bound memory) -- `context.stage_number` and `context.task_counter` track progress -- Timetracker integration provides per-stage timing data -- Results proxies store basenames for portability - --- ### UC-07 — Propagate Task Outputs to Downstream Tasks -*Responsibilities: 9, 10* | Field | Content | |-------|---------| @@ -199,56 +84,29 @@ A future design should formalize imaging state as a typed state machine or versi | **Summary** | When a task produces outputs that change the processing state (e.g., new calibrations, updated flag summaries, image products, revised parameters), the system must provide a mechanism for those outputs to become visible to all subsequent tasks that need them. The system must also record the output for later retrieval by reports and exports. | | **Postconditions** | Downstream tasks see an updated view of the processing state; the output is recorded in the execution history. | -**Implementation notes** — there are two propagation mechanisms: - -1. **Structured state merge** — `Results.merge_with_context(context)` updates calibration library, image libraries, and other typed state. -2. **Results-list walking** — tasks read `context.results` to find outputs from earlier stages. For example: - - VLA tasks compute `stage_number` from `context.results[-1].read().stage_number + 1` - - `vlassmasking` iterates `context.results[::-1]` to find the latest `MakeImagesResult` - - Export/AQUA code reads `context.results[0]` and `context.results[-1]` for timestamps - -The results-list walking pattern is fragile (indices shift if stages are inserted/skipped), slow (requires unpickling), and implicit (no declared dependency). A future design should provide explicit stage-to-stage data dependencies. - --- ### UC-08 — Support Multiple Orchestration Drivers -*Responsibilities: 9, 10* | Field | Content | |-------|---------| | **Actor(s)** | Operations / automated processing (PPR-driven batch), pipeline developer / power user (interactive), recipe executor | -| **Summary** | The context is created and consumed by multiple front-ends: PPR command lists, XML procedures, or interactive task calls. The system must remain the stable state contract across these drivers. It must be createable and resumable from non-interactive and interactive drivers, support driver-injected run metadata, tolerate partial execution controls (`startstage`, `exitstage`) and breakpoint-driven stop/resume semantics, and provide machine-detectable success/failure signals. | +| **Summary** | The context is created and consumed by multiple front-ends: PPR command lists, XML procedures, or interactive task calls. The system must remain the stable state contract across these drivers. It must be creatable and resumable from non-interactive and interactive drivers, support driver-injected run metadata, tolerate partial execution controls (`startstage`, `exitstage`) and breakpoint-driven stop/resume semantics, and provide machine-detectable success/failure signals. | | **Postconditions** | The same context state is usable regardless of which orchestration driver created or resumed it. | -**Implementation notes** — two orchestration planes converge on the same task implementations: - -- **Task-driven**: direct task calls via CLI wrappers in `pipeline/h/cli/` -- **Command-list-driven**: PPR and XML procedure commands via `executeppr.py` / `executevlappr.py` and `recipereducer.py` - -They differ in how inputs are marshalled, how session paths are selected, and how resume is initiated, but the persisted context is the same. - --- ### UC-09 — Save and Restore a Processing Session -*Responsibilities: 7* | Field | Content | |-------|---------| | **Actor(s)** | Pipeline operator, workflow engine, developers | -| **Summary** | The system must be able to serialize the complete processing state to disk (all observation data, calibration state, execution history, imaging state, project metadata) and later restore it so that processing can resume from the saved point. The serialization must preserve enough state to resume within a compatible version window. On restore, paths must be adaptable to a new filesystem environment. | +| **Summary** | The system must be able to serialize the complete processing state to disk (all observation data, calibration state, execution history, imaging state, project metadata) and later restore it so that processing can resume from the saved point. The serialization must preserve enough state to resume when serialization compatibility is maintained; backward compatibility across pipeline releases is not guaranteed. On restore, paths must be adaptable to a new filesystem environment. | | **Postconditions** | After restore, the system is in the same state as when saved; processing can continue. | -**Implementation notes:** - -- `h_save()` pickles the whole context to `.context` -- `h_resume(filename='last')` loads the most recent `.context` file -- Per-stage results are proxied to disk (`saved_state/result-stageX.pickle`) to keep the in-memory context smaller -- Used by driver-managed breakpoint/resume (`executeppr(..., bpaction='resume')`) and developer debugging workflows - --- ### UC-10 — Provide State to Parallel Workers -*Responsibilities: 8* | Field | Content | |-------|---------| @@ -256,17 +114,9 @@ They differ in how inputs are marshalled, how session paths are selected, and ho | **Summary** | When work is distributed across parallel workers, each worker needs read-only access to the current processing state (observation metadata, calibration state, etc.). The system must provide a mechanism for workers to obtain a consistent snapshot of the state. Workers must not be able to modify the authoritative state directly. The snapshot must be small enough to broadcast efficiently. | | **Postconditions** | Each worker has a consistent, read-only view of the processing state for the duration of its work. | -**Implementation notes** — `pipeline/infrastructure/mpihelpers.py`, class `Tier0PipelineTask`: - -1. The MPI client saves the context to disk as a pickle: `context.save(path)`. -2. Task arguments are also pickled to disk alongside the context. -3. On the server, `get_executable()` loads the context, modifies `context.logs['casa_commands']` to a server-local temp path, creates the task's `Inputs(context, **task_args)`, then executes the task. -4. For `Tier0JobRequest` (lower-level distribution), the executor is shallow-copied *excluding* the context reference to stay within the MPI buffer limit (~150 MiB, per PIPE-1337). - --- ### UC-11 — Aggregate Results from Parallel Workers -*Responsibilities: 8, 9* | Field | Content | |-------|---------| @@ -277,7 +127,6 @@ They differ in how inputs are marshalled, how session paths are selected, and ho --- ### UC-12 — Provide Data for Report Generation -*Responsibilities: 11, 12* | Field | Content | |-------|---------| @@ -285,41 +134,19 @@ They differ in how inputs are marshalled, how session paths are selected, and ho | **Summary** | The system must provide report generators with read-only access to: observation metadata, project metadata, execution history (including per-step outcomes and QA scores), log references, and path information. Reports include human-readable web pages, machine-readable quality reports, and reproducible processing scripts. | | **Postconditions** | Reports accurately reflect the processing state at the time of generation. | -**Implementation notes** — `WebLogGenerator.render(context)` in `pipeline/infrastructure/renderer/htmlrenderer.py`: - -- Reads `context.results` — unpickled from `ResultsProxy` objects, iterated for every renderer -- Reads `context.report_dir`, `context.output_dir` — filesystem layout -- Reads `context.observing_run.*` — MS metadata, scheduling blocks, execution blocks, observers, project IDs, start/end times -- Reads `context.project_summary.telescope` — to determine telescope-specific page layouts (ALMA vs VLA vs NRO) -- Reads `context.project_structure.*` — OUS IDs, PPR file, recipe name -- Reads `context.logs['casa_commands']` — CASA command history - -The renderer iterates `context.results` multiple times (assigning to topics, extracting flags, building timelines). The current approach requires unpickling *every* result into memory, then re-proxying when done. A lazy or streaming model would reduce peak memory. - --- ### UC-13 — Compute and Store Quality Assessments -*Responsibilities: 12* | Field | Content | |-------|---------| -| **Actor(s)** | QA scoring framework, report generators, downstream decision-making | -| **Summary** | After each processing step completes, the system must support evaluating the outcome against quality thresholds (which may depend on telescope, project parameters, or observation properties) and recording normalized quality scores. These scores must be retrievable for reporting and for downstream decision-making. | +| **Actor(s)** | QA scoring framework, report generators, later pipeline logic that consults recorded QA outcomes | +| **Summary** | After each processing step completes, the system must support evaluating the outcome against quality thresholds (which may depend on telescope, project parameters, or observation properties) and recording normalized quality scores. These scores must be retrievable for reporting and for later pipeline logic that explicitly consults recorded QA outcomes. | | **Postconditions** | Quality scores are associated with the relevant processing step and accessible to reports and downstream logic. | -**Implementation notes** — after `merge_with_context()`, `accept()` triggers `pipelineqa.qa_registry.do_qa(context, result)`: - -- QA handlers implement `QAPlugin.handle(context, result)` -- They typically call `context.observing_run.get_ms(vis)` to look up metadata for scoring (antenna count, channel count, SPW properties, field intents) -- Some handlers check `context.imaging_mode` to branch on VLASS-specific scoring -- Scores are appended to `result.qa.pool` — they don't mutate the context directly - -QA handlers are *read-only* with respect to context and could operate on a frozen snapshot, making them a good candidate for parallelization. - --- ### UC-14 — Support Interactive Inspection and Debugging -*Responsibilities: 13* | Field | Content | |-------|---------| @@ -330,7 +157,6 @@ QA handlers are *read-only* with respect to context and could operate on a froze --- ### UC-15 — Isolate Telescope-Specific State -*Responsibilities: 14* | Field | Content | |-------|---------| @@ -338,18 +164,9 @@ QA handlers are *read-only* with respect to context and could operate on a froze | **Summary** | The system must support storing instrument-specific state (e.g., VLA-specific solution intervals or gain metadata) in a way that is accessible to telescope-specific tasks but does not pollute the state used by generic or other-telescope tasks. This state is created conditionally based on the instrument. | | **Postconditions** | Telescope-specific state is available to the tasks that need it; absent when the instrument does not require it. | -**Implementation notes** — `context.evla` is a `collections.defaultdict(dict)`, keyed as `context.evla['msinfo'][ms_name].`: - -- **Written by:** `hifv_importdata` (creates + initializes), `testBPdcals` (gain intervals, ignorerefant), `fluxscale/solint`, `fluxboot` -- **Read by:** nearly every VLA calibration task and heuristic -- Accessed fields include: `gain_solint1`, `gain_solint2`, `setjy_results`, `ignorerefant`, various `*_field_select_string` / `*_scan_select_string` values, `fluxscale_sources`, `spindex_results`, and many more - -This is a completely untyped, dictionary-of-dictionaries sidecar. A future design should define a typed state object, provide accessor methods rather than raw dict lookups, and separate telescope-specific concerns from the generic context via composition (e.g., `context.get_extension('evla')`). - --- ### UC-16 — Package and Export Pipeline Products -*Responsibilities: 3, 11* | Field | Content | |-------|---------| @@ -360,7 +177,6 @@ This is a completely untyped, dictionary-of-dictionaries sidecar. A future desig --- ### UC-17 — Emit Lifecycle Notifications -*Responsibilities: 15* | Field | Content | |-------|---------| @@ -368,209 +184,78 @@ This is a completely untyped, dictionary-of-dictionaries sidecar. A future desig | **Summary** | The system must emit notifications at key lifecycle points (session start, session restore, step start, step completion, result acceptance) so that external observers (logging, progress reporting, live dashboards) can track execution without polling. | | **Postconditions** | Subscribers are notified of lifecycle transitions as they occur. | -**Implementation notes** — `pipeline.infrastructure.eventbus.send_message(event)`: - -- Event types: `ResultAcceptingEvent`, `ContextCreated`, `TaskStarted`, `TaskComplete` -- The event bus exists and fires events, but is lightly used — `merge_with_context` remains the primary data flow mechanism -- A future design could elevate the event bus to the primary state mutation channel (event-sourcing pattern), enabling audit trails, undo, and distributed observation - ---- - -## 4. Responsibility-to-Use-Case Traceability - -| # | Responsibility | Use Cases | -|---|---|---| -| 1 | Static Observation & Project Data | UC-01, UC-02 | -| 2 | Mutable Observation State | UC-01 | -| 3 | Path Management | UC-03, UC-16 | -| 4 | Imaging State Management | UC-05 | -| 5 | Calibration State Management | UC-04 | -| 6 | Image Library Management | UC-05 | -| 7 | Session Persistence | UC-09 | -| 8 | MPI / Parallel Distribution | UC-10, UC-11 | -| 9 | Inter-Task Data Passing | UC-06, UC-07, UC-11 | -| 10 | Stage Tracking & Result Accumulation | UC-06, UC-07, UC-08 | -| 11 | Reporting & Export Support | UC-12, UC-16 | -| 12 | QA Score Storage | UC-13 | -| 13 | Debuggability / Inspectability | UC-14 | -| 14 | Telescope-Specific State | UC-15 | -| 15 | Lifecycle Notifications | UC-17 | - --- -## 5. Use Cases the Current Design Cannot Handle - -The following describe scenarios that the current context design *does not support* but that could be valuable in a future architecture. - -### FUC-01 — Concurrent / Overlapping Task Execution - -Today all task execution is strictly serial. The context is a mutable, shared-everything singleton with no locking or isolation between stages. Many calibration stages are independent per-MS or per-SPW and could benefit from parallelization. - -**What would be needed:** - -- A context that supports isolated read snapshots (like database transactions or copy-on-write) -- A merge/reconciliation step when concurrent results are accepted -- Explicit declaration of which context fields each task reads and writes - -### FUC-02 — Cloud / Distributed Execution Without Shared Filesystem - -The current context is a pickle file on a local/shared filesystem. MPI distribution requires all nodes to see the same filesystem. +## 2. Use Cases the Current Design Cannot Handle -**What would be needed:** +The following describe scenarios that the current context design *does not support* but that could be valuable in a future architecture. They are numbered GAP-01 through GAP-07 to indicate gaps in the current design's capabilities. -- A context store backed by a database or object store (S3, GCS) -- Artifact references rather than filesystem paths for cal tables and images -- Tasks that can operate on remote datasets without requiring local copies +> **Design note:** The context design should remain compatible with multi-tenant deployments (no global singletons that leak state between runs, no hardcoded single-user paths), but access control, role-based permissions, and audit logging are concerns of the deployment platform rather than the context itself. -### FUC-03 — Multi-Language / Multi-Framework Access to Context +### GAP-01 — Concurrent / Overlapping Task Execution -The context is a Python object graph, tightly coupled to CASA's Python runtime. Non-Python clients (C++, Julia, JavaScript dashboards) cannot access it. - -**What would be needed:** - -- A language-neutral serialization format (Protocol Buffers, JSON-Schema, Arrow) -- A query API (REST, gRPC, or GraphQL) -- Type definitions shared across languages - -### FUC-04 — Streaming / Incremental Processing - -The current session model assumes all data is available at session start and cannot process data as it arrives from the correlator or archive. - -**What would be needed:** - -- A context that supports incremental dataset registration (add new scans/EBs to a live session) -- Tasks that can detect "new data available" and re-process incrementally -- A results model that supports versioning (re-run produces a new version rather than overwriting) - -### FUC-05 — Provenance and Reproducibility Guarantees - -There is no formal record of which context state a task observed when it ran. Re-running from a saved context yields the state *after* the last save, not the state live at stage N. - -**What would be needed:** - -- Immutable snapshots of context state per-stage (event sourcing) -- Hashing of all task inputs (context fields + parameters) for cache invalidation / reproducibility tokens -- Ability to replay a run from the event log - -### FUC-06 — Fine-Grained Access Control / Multi-Tenant Context - -The context is an all-or-nothing object with no concept of access control, multi-user sessions, or data isolation between projects. - -**What would be needed:** - -- Per-project context namespacing -- Role-based access to context fields -- Audit logging of all context mutations - -### FUC-07 — Partial Re-Execution / Targeted Stage Re-Run - -Today "resume" means "restart from the last completed stage". There is no way to selectively re-run a single mid-pipeline stage with different parameters while keeping earlier and later stages intact. - -**What would be needed:** - -- Explicit dependency tracking between stages (which context fields does stage N consume?) -- Ability to invalidate downstream stages when a mid-pipeline stage is re-run -- Versioned results per stage (keep old version alongside new version) - -### FUC-08 — External System Integration (Archive, Scheduling, QA Dashboards) - -Today, context is a local, in-process object. Other systems (archive ingest, scheduling database, QA dashboards) interact only via offline products (weblog, manifest, AQUA XML). - -**What would be needed:** - -- A stable, queryable context API (REST/gRPC) that external systems can poll or subscribe to -- Webhook / event notification support for state transitions -- A standard schema for context summaries consumable by external systems +| Field | Content | +|-------|---------| +| **Actor(s)** | Workflow engine, parallel task scheduler | +| **Summary** | The system should support concurrent execution of otherwise independent tasks (e.g., per-MS or per-SPW calibration stages) with isolated read snapshots, a merge/reconciliation step when concurrent results are accepted, and explicit declaration of which state fields each task reads and writes. This gap is about overlapping task execution against shared state, not the existing worker fan-out/fan-in pattern used for some parallel work. | +| **Postconditions** | Independent tasks execute in parallel without corrupting shared state; results are merged safely before the next sequential step. | --- -## 6. Architectural Observations - -### The context is a "big ball of state", by design - -The current approach is extremely flexible for a long-running, stateful CASA session, but there is no explicit schema boundary between persisted state, ephemeral caches, runtime-only services, and large artifacts. Tasks can (and do) add new fields in an ad-hoc way over time. - -### Persistence is pickle-based - -Pickle works for short-lived resume/debug use cases, but it is fragile across version changes, risky as a long-term archive format, and not friendly to multi-writer or multi-process updates. The codebase mitigates size by proxying stage results to disk, but the context itself remains a potentially large and unstable object graph. - -### Two orchestration planes converge on the same context +### GAP-02 — Cloud / Distributed Execution Without Shared Filesystem -Task-driven (interactive CLI) and command-list-driven (PPR / XML procedures) execution both produce and consume the same context. They differ in how inputs are marshalled, how paths are selected, and how resume is initiated, but the persisted context is the same object. +| Field | Content | +|-------|---------| +| **Actor(s)** | Workflow engine, cloud orchestrator, distributed workers | +| **Summary** | The system should support execution across nodes that do not share a filesystem. This requires a context store backed by a database or object store, artifact references rather than filesystem paths for calibration tables and images, and tasks that can operate on remote datasets without requiring local copies. | +| **Postconditions** | Processing completes successfully across distributed nodes without reliance on a shared filesystem. | --- -## 7. Improvement Suggestions for Next-Generation Design - -These are phrased as requirements and design directions, not as a call to rewrite everything immediately. - -### 1) Split "context data" from "context services" - -Define a minimal, explicit **ContextData** model that is typed, schema-versioned, and serializable in a stable format (JSON/MsgPack/Arrow). Attach runtime-only services (CASA tool handles, caches, heuristics engines) around it rather than mixing them into the same object. - -### 2) Introduce a ContextStore interface - -Replace "pickle a Python object graph" with a storage abstraction (`get`, `put`, `list_runs`). Backends can start simple (SQLite) and grow (Postgres/object store) without changing task logic. - -### 3) Make state transitions explicit (event-sourced or patch-based) - -The existing event bus (`pipeline.infrastructure.eventbus`) could be elevated to record task lifecycle events and key state changes, yielding reproducibility, easier partial rebuilds, and better distributed orchestration. - -### 4) Treat large artifacts as references, not context fields - -Store large arrays/images/tables in an artifact store and carry only references in context data. This avoids "accidentally pickle a GiB array" and makes distribution/cloud execution more realistic. - -### 5) Remove reliance on global interactive stacks for non-interactive execution - -Make tasks accept an explicit context handle. Keep interactive convenience wrappers but do not make them the core contract. - -### 6) Represent the execution plan as context data +### GAP-03 — Multi-Language / Multi-Framework Access to Context -Record the effective execution plan (linear or DAG) alongside run state to support provenance, partial execution, and targeted re-runs. - -### 7) Adopt a versioned compatibility policy - -Define whether operational contexts must be resumable within a supported release window (with schema versioning + migrations) versus best-effort for development contexts. +| Field | Content | +|-------|---------| +| **Actor(s)** | Non-Python clients (C++, Julia, JavaScript dashboards), external tools | +| **Summary** | The system should expose context state through a language-neutral interface, using a portable serialization format (Protocol Buffers, JSON-Schema, Arrow), a query API (REST, gRPC, or GraphQL), and type definitions shared across languages. | +| **Postconditions** | Clients in any supported language can read context state through a stable, typed API. | --- -## 8. Context Contract Summary +### GAP-04 — Streaming / Incremental Processing -The following capabilities appear to be **hard requirements** for any replacement system, derived from current behavior and internal usage patterns: - -**System-level requirements:** +| Field | Content | +|-------|---------| +| **Actor(s)** | Data ingest system, workflow engine, incremental processing tasks | +| **Summary** | The system should support incremental dataset registration (adding new scans or execution blocks to a live session), tasks that detect new data and re-process incrementally, and a results model that supports versioning so that re-runs produce new versions rather than overwriting previous outputs. | +| **Postconditions** | New data is incorporated into an active session and processed without restarting from scratch. | -- Run identity: `context_id`, recipe/procedure name, inputs, operator/mode -- Path layout: working/report/products directories with ability to relocate -- Dataset inventory: execution blocks / measurement sets with per-MS metadata -- Stage results timeline: ordered stages, durations, QA outcomes, tracebacks -- Export products: weblog tar, manifest, AQUA report, scripts -- Resume: restart from last known good stage (or after a breakpoint) +--- -**Internal usage requirements:** +### GAP-05 — Provenance and Reproducibility Guarantees -- Fast MS lookup: random-access by name, filtering by data type, virtual↔real SPW translation -- Calibration library: append-oriented, ordered, with transactional multi-entry updates and predicate-based queries -- Image library: four typed registries (science, calibrator, RMS, sub-product) with add/query semantics -- Imaging state: typed, versioned configuration for the imaging sub-pipeline -- QA scoring: read-only context snapshot for parallel-safe QA handler execution -- Weblog rendering: read-only traversal of full results timeline + MS metadata + project metadata -- MPI/distributed: efficient context snapshot broadcast + results write-back -- Cross-stage data flow: explicit named outputs rather than results-list walking -- Project metadata: immutable-after-init sub-record -- Telescope-specific state: typed, composable extension rather than untyped dict +| Field | Content | +|-------|---------| +| **Actor(s)** | Pipeline operator, auditor, reproducibility tooling | +| **Summary** | The system should maintain immutable per-stage snapshots of context state (event sourcing), hash all task inputs (context fields and parameters) for cache invalidation and reproducibility tokens, record software provenance (pipeline version, framework version, dependency manifest) and input data identity (checksums or URIs for raw datasets) per run, and support replaying a run from the event log. | +| **Postconditions** | Any past processing step can be precisely reproduced or audited from the recorded provenance chain, including the exact software and data versions that produced it. | --- -## 9. Open Questions +### GAP-06 — Partial Re-Execution / Targeted Stage Re-Run -- Are there additional use cases not captured by either review? Reviewer input may surface new cases. -- Should the future use cases (Section 5) be prioritized? If so, which are most impactful for RADPS? -- What compatibility guarantees should the next-generation context provide across pipeline releases? +| Field | Content | +|-------|---------| +| **Actor(s)** | Pipeline operator, developer, workflow engine | +| **Summary** | The system should support selectively re-running a single mid-pipeline stage with different parameters while keeping earlier and later stages intact. This requires explicit dependency tracking between stages, the ability to invalidate downstream stages, and versioned per-stage results. | +| **Postconditions** | A targeted stage is re-executed and downstream state is correctly invalidated or updated; unaffected stages are preserved. | --- -## 10. Contributors +### GAP-07 — External System Integration (Archive, Scheduling, QA Dashboards) -- **Kristin Berry** — Worked on this draft -- **Shawn Booth** — Worked on this draft +| Field | Content | +|-------|---------| +| **Actor(s)** | Archive ingest system, scheduling database, QA dashboards, monitoring tools | +| **Summary** | The system could expose a stable, queryable API (REST/gRPC) that external systems can poll or subscribe to, support webhook/event notifications for state transitions, and publish a standard schema for context summaries consumable by external systems. However, this involves a significant trade-off: the current context has no stable API, which gives development teams full flexibility to evolve internal structures without cross-team coordination. A public API would require a formal stability contract, versioning discipline, and potentially a slow-to-change external interface layered over rapidly evolving internals. Whether this gap is in scope — or whether external integration should remain an offline, product-file-based concern — is an open design question. | +| **Postconditions** | External systems receive timely, structured updates about processing state without relying on offline product files. | From f5125aef195ce8c0553d3b04d425cbc076fbfc83 Mon Sep 17 00:00:00 2001 From: Tom Booth Date: Wed, 18 Mar 2026 21:48:37 -0400 Subject: [PATCH 03/22] integrated review notes for the draft of the current pipeline context document --- docs/context_current_pipeline_appendix.md | 62 ++++++++------- docs/context_use_cases_current_pipeline.md | 88 ++++++++++++---------- 2 files changed, 83 insertions(+), 67 deletions(-) diff --git a/docs/context_current_pipeline_appendix.md b/docs/context_current_pipeline_appendix.md index cdfd8f4..22e8918 100644 --- a/docs/context_current_pipeline_appendix.md +++ b/docs/context_current_pipeline_appendix.md @@ -32,25 +32,25 @@ This document contains architectural observations, design recommendations, and r |---|---|---| | 1 | Static Observation & Project Data | UC-01, UC-02 | | 2 | Mutable Observation State | UC-01 | -| 3 | Path Management | UC-03, UC-16 | +| 3 | Path Management | UC-03, UC-17 | | 4 | Imaging State Management | UC-05 | | 5 | Calibration State Management | UC-04 | -| 6 | Image Library Management | UC-05 | -| 7 | Session Persistence | UC-09 | -| 8 | MPI / Parallel Distribution | UC-10, UC-11 | -| 9 | Inter-Task Data Passing | UC-06, UC-07, UC-11 | -| 10 | Stage Tracking & Result Accumulation | UC-06, UC-07, UC-08 | -| 11 | Reporting & Export Support | UC-12, UC-16 | -| 12 | QA Score Storage | UC-13 | -| 13 | Debuggability / Inspectability | UC-14 | -| 14 | Telescope-Specific State | UC-15 | -| 15 | Lifecycle Notifications | UC-17 | +| 6 | Image Library Management | UC-06 | +| 7 | Session Persistence | UC-10 | +| 8 | MPI / Parallel Distribution | UC-11, UC-12 | +| 9 | Inter-Task Data Passing | UC-07, UC-08, UC-09, UC-12 | +| 10 | Stage Tracking & Result Accumulation | UC-07, UC-08, UC-09 | +| 11 | Reporting & Export Support | UC-13, UC-17 | +| 12 | QA Score Storage | UC-13, UC-14 | +| 13 | Debuggability / Inspectability | UC-15 | +| 14 | Telescope-Specific State | UC-16 | +| 15 | Lifecycle Notifications | UC-18 | --- ## Implementation Notes by Use Case -The following implementation notes describe how each use case is realized in the current pipeline codebase. They were separated from the use-case definitions to keep the requirements document implementation-neutral. +The following implementation notes describe how each use case is realized in the current pipeline codebase. They were separated from the use-case definitions to keep the requirements document focused on requirements. ### UC-01 — Load and Provide Access to Observation Metadata @@ -117,18 +117,22 @@ This is a strong candidate for a separate, immutable-after-init sub-record in an | `size_mitigation_parameters` | `checkproductsize` | downstream stages | | `selfcal_targets`, `selfcal_resources` | `selfcal` | `exportdata` | -Image libraries provide typed registries: +A future design should formalize imaging state as a typed state machine or versioned configuration sub-document, and consider separating image *metadata* (tracked in context) from image *data* (stored in artifact store). + +--- + +### UC-06 — Register and Query Produced Image Products + +**Implementation notes** — image libraries provide typed registries: - `context.sciimlist.add_item(imageitem)` / `.get_imlist()` — science images - `context.calimlist` — calibrator images - `context.rmsimlist` — RMS images - `context.subimlist` — sub-product images (cutouts, cubes) -A future design should formalize imaging state as a typed state machine or versioned configuration sub-document, and consider separating image *metadata* (tracked in context) from image *data* (stored in artifact store). - --- -### UC-06 — Track Execution Progress and Stage History +### UC-07 — Track Execution Progress and Stage History **Implementation notes:** @@ -139,12 +143,12 @@ A future design should formalize imaging state as a typed state machine or versi --- -### UC-07 — Propagate Task Outputs to Downstream Tasks +### UC-08 — Propagate Task Outputs to Downstream Tasks -**Implementation notes** — there are two propagation mechanisms: +**Implementation notes** — the current pipeline satisfies these needs through two different propagation paths: -1. **Structured state merge** — `Results.merge_with_context(context)` updates calibration library, image libraries, and other typed state. -2. **Results-list walking** — tasks read `context.results` to find outputs from earlier stages. For example: +1. **Immediate state propagation** — `Results.merge_with_context(context)` updates calibration library, image libraries, and other typed state so later tasks can access the current processing state directly. +2. **Retained step-result access** — tasks read `context.results` to find outputs from earlier stages when those outputs are needed from the recorded execution history rather than from merged shared state. For example: - VLA tasks compute `stage_number` from `context.results[-1].read().stage_number + 1` - `vlassmasking` iterates `context.results[::-1]` to find the latest `MakeImagesResult` - Export/AQUA code reads `context.results[0]` and `context.results[-1]` for timestamps @@ -153,7 +157,7 @@ The results-list walking pattern is fragile (indices shift if stages are inserte --- -### UC-08 — Support Multiple Orchestration Drivers +### UC-09 — Support Multiple Orchestration Drivers **Implementation notes** — two orchestration planes converge on the same task implementations: @@ -164,7 +168,7 @@ They differ in how inputs are marshalled, how session paths are selected, and ho --- -### UC-09 — Save and Restore a Processing Session +### UC-10 — Save and Restore a Processing Session **Implementation notes:** @@ -175,7 +179,7 @@ They differ in how inputs are marshalled, how session paths are selected, and ho --- -### UC-10 — Provide State to Parallel Workers +### UC-11 — Provide State to Parallel Workers **Implementation notes** — `pipeline/infrastructure/mpihelpers.py`, class `Tier0PipelineTask`: @@ -186,7 +190,7 @@ They differ in how inputs are marshalled, how session paths are selected, and ho --- -### UC-12 — Provide Data for Report Generation +### UC-13 — Provide Read-Only Context for Reporting Consumers **Implementation notes** — `WebLogGenerator.render(context)` in `pipeline/infrastructure/renderer/htmlrenderer.py`: @@ -201,20 +205,20 @@ The renderer iterates `context.results` multiple times (assigning to topics, ext --- -### UC-13 — Compute and Store Quality Assessments +### UC-14 — Support QA Evaluation and Store Quality Assessments **Implementation notes** — after `merge_with_context()`, `accept()` triggers `pipelineqa.qa_registry.do_qa(context, result)`: - QA handlers implement `QAPlugin.handle(context, result)` - They typically call `context.observing_run.get_ms(vis)` to look up metadata for scoring (antenna count, channel count, SPW properties, field intents) - Some handlers check `context.imaging_mode` to branch on VLASS-specific scoring -- Scores are appended to `result.qa.pool` — they don't mutate the context directly +- Scores are appended to `result.qa.pool` — the context provides inputs to QA evaluation, but the scores are stored on the result rather than as direct context mutations QA handlers are *read-only* with respect to context and could operate on a frozen snapshot, making them a good candidate for parallelization. --- -### UC-15 — Isolate Telescope-Specific State +### UC-16 — Manage Telescope-Specific Context Extensions **Implementation notes** — `context.evla` is a `collections.defaultdict(dict)`, keyed as `context.evla['msinfo'][ms_name].`: @@ -222,11 +226,11 @@ QA handlers are *read-only* with respect to context and could operate on a froze - **Read by:** nearly every VLA calibration task and heuristic - Accessed fields include: `gain_solint1`, `gain_solint2`, `setjy_results`, `ignorerefant`, various `*_field_select_string` / `*_scan_select_string` values, `fluxscale_sources`, `spindex_results`, and many more -This is a completely untyped, dictionary-of-dictionaries sidecar. A future design should define a typed state object, provide accessor methods rather than raw dict lookups, and separate telescope-specific concerns from the generic context via composition (e.g., `context.get_extension('evla')`). +This is a completely untyped, dictionary-of-dictionaries sidecar attached to the top-level context. A future design should define a typed state object, provide accessor methods rather than raw dict lookups, and separate telescope-specific concerns from the generic context via composition (e.g., `context.get_extension('evla')`). --- -### UC-17 — Emit Lifecycle Notifications +### UC-18 — Emit Lifecycle Notifications **Implementation notes** — `pipeline.infrastructure.eventbus.send_message(event)`: diff --git a/docs/context_use_cases_current_pipeline.md b/docs/context_use_cases_current_pipeline.md index e738da9..43bd2b9 100644 --- a/docs/context_use_cases_current_pipeline.md +++ b/docs/context_use_cases_current_pipeline.md @@ -4,7 +4,7 @@ The pipeline `Context` class (`pipeline.infrastructure.launcher.Context`) is the central mutable state object used for an entire pipeline execution. It is simultaneously a session state container, a domain metadata container, a cross-stage communication channel, and a persistence unit. -This document catalogues the current use cases of the pipeline Context as determined by examination of the codebase. The goal is to support design and development of a prototype of a system which serves a similar role to that of the context for RADPS. +This document catalogues the current use cases of the pipeline Context as determined by examination of the codebase. The goal is to support design and development of a prototype of a system which serves a similar role to that of the context for RADPS. It also includes additional use cases and gap scenarios identified by examining the RADPS documentation and requirements, to ensure future extensibility and alignment with next-generation needs. See also: [Supplementary analysis and design recommendations](context_current_pipeline_appendix.md) @@ -14,6 +14,8 @@ See also: [Supplementary analysis and design recommendations](context_current_pi Each use case describes a need that the pipeline's central state management must satisfy. They are written to be implementation-neutral in their core description. For pipeline-specific implementation details per use case, see [Implementation Notes by Use Case](context_current_pipeline_appendix.md#implementation-notes-by-use-case) in the appendix. +In the tables below, **Actor(s)** identifies the human or system role that directly creates, updates, consumes, or inspects the context state described by the use case. Actors are role categories, not specific task names or implementations. + --- ### UC-01 — Load and Provide Access to Observation Metadata @@ -22,7 +24,7 @@ Each use case describes a need that the pipeline's central state management must |-------|---------| | **Actor(s)** | Data import task, any downstream task, heuristics, renderers, QA handlers | | **Summary** | The system must load observation metadata (datasets, spectral windows, fields, antennas, scans, time ranges) and make it queryable by all subsequent processing steps. It must also provide a unified identifier scheme when multiple datasets use different native numbering. | -| **Postconditions** | All registered datasets are queryable by name, type, or internal identifier without re-reading raw data from disk. | +| **Postconditions** | All registered datasets remain queryable for the lifetime of the session without repeating the import process. | --- @@ -31,7 +33,7 @@ Each use case describes a need that the pipeline's central state management must | Field | Content | |-------|---------| | **Actor(s)** | Initialization, any task, report generators | -| **Summary** | The system must store project-level metadata (proposal code, PI, telescope, desired sensitivities, processing recipe, OUS identifiers) and make it available to tasks for decision-making and to report generators for labelling outputs. | +| **Summary** | The system must store project-level metadata (proposal code, PI, telescope, desired sensitivities, processing recipe) and make it available to tasks for decision-making and to report generators for labelling outputs. | | **Postconditions** | Project metadata is available for the lifetime of the processing session. | --- @@ -50,7 +52,7 @@ Each use case describes a need that the pipeline's central state management must | Field | Content | |-------|---------| -| **Actor(s)** | Calibration tasks (bandpass, gaincal, applycal, polcal, selfcal, restoredata), heuristics, importdata, mstransform, uvcontsub | +| **Actor(s)** | Calibration tasks | | **Summary** | The system must allow calibration tasks to register solutions (indexed by data selection: field, spectral window, antenna, intent, time interval), and allow downstream tasks to query for all calibrations applicable to a given data selection. It must distinguish between calibrations pending application and those already applied. Registration must support transactional multi-entry updates — tasks often register multiple calibrations atomically within a single result acceptance. | | **Postconditions** | Calibration state is queryable and correctly scoped to data selections. | @@ -60,33 +62,43 @@ Each use case describes a need that the pipeline's central state management must | Field | Content | |-------|---------| -| **Actor(s)** | Imaging-related tasks (planning, production, quality checking, self-calibration, export) | -| **Summary** | The system must allow imaging state — target lists, imaging parameters, masks, thresholds, sensitivity estimates, and produced image references — to be computed by one step and read or refined by later steps. Multiple steps may contribute to a progressively refined imaging configuration. The system must also maintain typed registries of produced images (science, calibrator, RMS, sub-product) with add/query semantics. | -| **Postconditions** | The accumulated imaging state reflects contributions from all completed imaging-related steps; all produced images are registered and queryable. | +| **Actor(s)** | Imaging tasks, downstream heuristics, and export tasks | +| **Summary** | The system must allow imaging state — target lists, imaging parameters, masks, thresholds, and sensitivity estimates — to be computed by one step and read or refined by later steps. Multiple steps may contribute to a progressively refined imaging configuration. | +| **Postconditions** | The accumulated imaging state reflects contributions from all completed imaging-related steps and is available to later imaging steps. | + +--- + +### UC-06 — Register and Query Produced Image Products + +| Field | Content | +|-------|---------| +| **Actor(s)** | Imaging tasks, export tasks, report generators | +| **Summary** | The system must maintain typed registries of produced image products with add/query semantics. Later tasks must be able to discover previously produced science, calibrator, RMS, and sub-product images through these registries. | +| **Postconditions** | Produced image products are registered by type and remain queryable for downstream processing, reporting, and export. | --- -### UC-06 — Track Execution Progress and Stage History +### UC-07 — Track Execution Progress and Stage History | Field | Content | |-------|---------| -| **Actor(s)** | Workflow engine, any task, report generators, operators | -| **Summary** | The system must track which processing step is currently executing, assign a unique sequential identifier to each step, and maintain an ordered history of all completed steps and their outcomes. This history must be available for reporting, script generation, and resumption after interruption. Per-stage tracebacks and timings must be preserved. Stage numbering must remain coherent across resumes. | -| **Postconditions** | The full execution history is retrievable in order; the current step is identifiable. | +| **Actor(s)** | Workflow orchestration layer, tasks, report generators, human operators | +| **Summary** | The system must track which processing step is currently executing and maintain a stable, ordered history of completed steps and their outcomes. This history must support reporting, script generation, and resumption after interruption. Per-stage tracebacks and timings must be preserved, and stage identity and ordering must remain coherent across resumes. | +| **Postconditions** | The full execution history is retrievable in order; each recorded step retains its stage identity, outcome, timing, traceback information, and the arguments or effective parameters used to invoke it. | --- -### UC-07 — Propagate Task Outputs to Downstream Tasks +### UC-08 — Propagate Task Outputs to Downstream Tasks | Field | Content | |-------|---------| | **Actor(s)** | Any task producing output that subsequent tasks depend on | -| **Summary** | When a task produces outputs that change the processing state (e.g., new calibrations, updated flag summaries, image products, revised parameters), the system must provide a mechanism for those outputs to become visible to all subsequent tasks that need them. The system must also record the output for later retrieval by reports and exports. | -| **Postconditions** | Downstream tasks see an updated view of the processing state; the output is recorded in the execution history. | +| **Summary** | When a task produces outputs that change the processing state (e.g., new calibrations, updated flag summaries, image products, revised parameters), the system must provide a mechanism for those outputs to become available to subsequent processing steps. It must also retain those outputs as part of the execution record for later inspection, reporting, and export. These two needs may be satisfied through different access paths. | +| **Postconditions** | Downstream tasks can access the propagated processing state they need, and the task outputs are retained in the execution history for later retrieval. | --- -### UC-08 — Support Multiple Orchestration Drivers +### UC-09 — Support Multiple Orchestration Drivers | Field | Content | |-------|---------| @@ -96,91 +108,91 @@ Each use case describes a need that the pipeline's central state management must --- -### UC-09 — Save and Restore a Processing Session +### UC-10 — Save and Restore a Processing Session | Field | Content | |-------|---------| -| **Actor(s)** | Pipeline operator, workflow engine, developers | +| **Actor(s)** | Pipeline operator, workflow orchestration layer, developers | | **Summary** | The system must be able to serialize the complete processing state to disk (all observation data, calibration state, execution history, imaging state, project metadata) and later restore it so that processing can resume from the saved point. The serialization must preserve enough state to resume when serialization compatibility is maintained; backward compatibility across pipeline releases is not guaranteed. On restore, paths must be adaptable to a new filesystem environment. | -| **Postconditions** | After restore, the system is in the same state as when saved; processing can continue. | +| **Postconditions** | After restore, the processing state is operationally equivalent to the saved state for supported resume workflows, and processing can continue. | --- -### UC-10 — Provide State to Parallel Workers +### UC-11 — Provide State to Parallel Workers | Field | Content | |-------|---------| -| **Actor(s)** | Workflow engine, MPI worker processes | -| **Summary** | When work is distributed across parallel workers, each worker needs read-only access to the current processing state (observation metadata, calibration state, etc.). The system must provide a mechanism for workers to obtain a consistent snapshot of the state. Workers must not be able to modify the authoritative state directly. The snapshot must be small enough to broadcast efficiently. | +| **Actor(s)** | Workflow orchestration layer, parallel worker processes | +| **Summary** | When work is distributed across parallel workers, each worker needs read-only access to the current processing state (observation metadata, calibration state, etc.). The system must provide a mechanism for workers to obtain a consistent snapshot of that state. Workers must not be able to modify the shared processing state directly. The snapshot mechanism must support efficient distribution to workers. | | **Postconditions** | Each worker has a consistent, read-only view of the processing state for the duration of its work. | --- -### UC-11 — Aggregate Results from Parallel Workers +### UC-12 — Aggregate Results from Parallel Workers | Field | Content | |-------|---------| -| **Actor(s)** | Workflow engine | -| **Summary** | After parallel workers complete, the system must collect their individual results and incorporate them into the authoritative processing state. The aggregation must be safe (no conflicting concurrent writes) and complete before the next sequential step begins. | +| **Actor(s)** | Workflow orchestration layer | +| **Summary** | After parallel workers complete, the system must collect their individual results and incorporate them into the shared processing state. The aggregation must be safe (no conflicting concurrent writes) and complete before the next sequential step begins. | | **Postconditions** | The processing state reflects the combined outcomes of all parallel workers. | --- -### UC-12 — Provide Data for Report Generation +### UC-13 — Provide Read-Only Context for Reporting Consumers | Field | Content | |-------|---------| | **Actor(s)** | Report generators (weblog, quality reports, reproducibility scripts, AQUA reports, pipeline statistics) | -| **Summary** | The system must provide report generators with read-only access to: observation metadata, project metadata, execution history (including per-step outcomes and QA scores), log references, and path information. Reports include human-readable web pages, machine-readable quality reports, and reproducible processing scripts. | +| **Summary** | The system must provide reporting consumers with read-only access to the observation metadata, project metadata, execution history, QA outcomes, log references, and path information needed to generate reporting products such as weblogs, quality reports, reproducibility scripts, AQUA reports, and pipeline statistics. | | **Postconditions** | Reports accurately reflect the processing state at the time of generation. | --- -### UC-13 — Compute and Store Quality Assessments +### UC-14 — Support QA Evaluation and Store Quality Assessments | Field | Content | |-------|---------| | **Actor(s)** | QA scoring framework, report generators, later pipeline logic that consults recorded QA outcomes | -| **Summary** | After each processing step completes, the system must support evaluating the outcome against quality thresholds (which may depend on telescope, project parameters, or observation properties) and recording normalized quality scores. These scores must be retrievable for reporting and for later pipeline logic that explicitly consults recorded QA outcomes. | +| **Summary** | After each processing step completes, the system must make the relevant processing state available to QA handlers so they can evaluate the outcome against quality thresholds, which may depend on telescope, project parameters, or observation properties. The resulting normalized quality scores must be recorded and remain retrievable for reporting and for later pipeline logic that explicitly consults recorded QA outcomes. | | **Postconditions** | Quality scores are associated with the relevant processing step and accessible to reports and downstream logic. | --- -### UC-14 — Support Interactive Inspection and Debugging +### UC-15 — Support Interactive Inspection and Debugging | Field | Content | |-------|---------| | **Actor(s)** | Pipeline developer, pipeline operator, CI harnesses | -| **Summary** | The system must allow an operator to inspect the current processing state: which datasets are registered, what calibrations exist, how many steps have completed, what their outcomes were. On failure, a snapshot of the state should be available for post-mortem analysis. The system must provide deterministic paths/outputs that a test harness can validate, and must surface failures beyond raw task exceptions (e.g., weblog rendering failures captured via timetracker). | +| **Summary** | The system must allow an operator to inspect the current processing state: which datasets are registered, what calibrations exist, how many steps have completed, and what their outcomes were. On failure, a snapshot of the state should be available for post-mortem analysis. The system must provide deterministic paths and outputs that a test harness can validate, and must surface failures beyond raw task exceptions. | | **Postconditions** | The operator can understand the current state of processing and diagnose problems. | --- -### UC-15 — Isolate Telescope-Specific State +### UC-16 — Manage Telescope-Specific Context Extensions | Field | Content | |-------|---------| | **Actor(s)** | Telescope-specific tasks and heuristics | -| **Summary** | The system must support storing instrument-specific state (e.g., VLA-specific solution intervals or gain metadata) in a way that is accessible to telescope-specific tasks but does not pollute the state used by generic or other-telescope tasks. This state is created conditionally based on the instrument. | -| **Postconditions** | Telescope-specific state is available to the tasks that need it; absent when the instrument does not require it. | +| **Summary** | The system must support conditional telescope-specific extensions to the processing state. These extensions must be available to telescope-specific tasks and heuristics while remaining outside the assumed contract of shared pipeline code. Their presence must depend on the instrument being processed rather than being treated as universally available context state. | +| **Postconditions** | Telescope-specific extensions are present only for runs that require them, available to the tasks that need them, and not assumed by shared pipeline code. | --- -### UC-16 — Package and Export Pipeline Products +### UC-17 — Provide Context for Product Export | Field | Content | |-------|---------| | **Actor(s)** | Export task, archive system | -| **Summary** | The system must provide an export mechanism that reads datasets, calibration state, image products, reports, scripts, and project identifiers from the processing state and assembles them into a deliverable product package. The package must be structured for downstream archive ingestion. | -| **Postconditions** | A self-contained product package exists on disk. | +| **Summary** | The system must make the datasets, calibration state, image products, reports, scripts, path information, and project identifiers available through the processing state so export tasks can assemble them into a deliverable product package. The package must be structured for downstream archive ingestion. | +| **Postconditions** | The information needed to assemble the product package is accessible through the processing state, and a self-contained product package can be produced. | --- -### UC-17 — Emit Lifecycle Notifications +### UC-18 — Emit Lifecycle Notifications | Field | Content | |-------|---------| -| **Actor(s)** | Workflow engine, event subscribers (loggers, progress monitors) | +| **Actor(s)** | Workflow orchestration layer, event subscribers (loggers, progress monitors) | | **Summary** | The system must emit notifications at key lifecycle points (session start, session restore, step start, step completion, result acceptance) so that external observers (logging, progress reporting, live dashboards) can track execution without polling. | | **Postconditions** | Subscribers are notified of lifecycle transitions as they occur. | From 51e7f38b67649a34148c24369f5cc4c16ffca7f6 Mon Sep 17 00:00:00 2001 From: kberry Date: Tue, 24 Mar 2026 01:14:47 -0400 Subject: [PATCH 04/22] Clean up GAP use cases and move exploratory future use cases to appendix. --- docs/context_current_pipeline_appendix.md | 35 +++++++++++ docs/context_use_cases_current_pipeline.md | 70 ++++++++-------------- 2 files changed, 59 insertions(+), 46 deletions(-) diff --git a/docs/context_current_pipeline_appendix.md b/docs/context_current_pipeline_appendix.md index 22e8918..6612c33 100644 --- a/docs/context_current_pipeline_appendix.md +++ b/docs/context_current_pipeline_appendix.md @@ -265,6 +265,41 @@ The canonical flow through the context is: 4. **Accept results** — Inside `accept()`, results are merged via `Results.merge_with_context(context)`. A `ResultsProxy` is pickled to disk per-stage to keep the in-memory context bounded. The weblog is typically rendered after each top-level stage. 5. **Save / resume** — `h_save()` pickles the context; `h_resume(filename='last')` restores it. Driver-managed breakpoints and developer debugging workflows rely on this cycle. +--- +## Exploratory Future Use Cases + +The following are potential future use cases that do not trace to current RADPS architecture or requirements. +They are recorded here for completeness but should not be treated as requirements. + +### Multi-Language / Multi-Framework Access to Context + +| Field | Content | +|-------|---------| +| **Actor(s)** | Non-Python clients (C++, Julia, JavaScript dashboards), external tools | +| **Summary** | The system should expose context state through a language-neutral interface, using a portable serialization format (Protocol Buffers, JSON-Schema, Arrow), a query API (REST, gRPC, or GraphQL), and type definitions shared across languages. | +| **Postconditions** | Clients in any supported language can read context state through a stable, typed API. | + +--- + +### Streaming / Incremental Processing + +| Field | Content | +|-------|---------| +| **Actor(s)** | Data ingest system, workflow engine, incremental processing tasks | +| **Summary** | The system should support incremental dataset registration (adding new scans or execution blocks to a live session), tasks that detect new data and re-process incrementally, and a results model that supports versioning so that re-runs produce new versions rather than overwriting previous outputs. | +| **Postconditions** | New data is incorporated into an active session and processed without restarting from scratch. | + +--- + +### External System Integration (Archive, Scheduling, QA Dashboards) + +| Field | Content | +|-------|---------| +| **Actor(s)** | Archive ingest system, scheduling database, QA dashboards, monitoring tools | +| **Summary** | The system could expose a stable, queryable API (REST/gRPC) that external systems can poll or subscribe to, support webhook/event notifications for state transitions, and publish a standard schema for context summaries consumable by external systems. However, this involves a significant trade-off: the current context has no stable API, which gives development teams full flexibility to evolve internal structures without cross-team coordination. A public API would require a formal stability contract, versioning discipline, and potentially a slow-to-change external interface layered over rapidly evolving internals. Whether this gap is in scope — or whether external integration should remain an offline, product-file-based concern — is an open design question. | +| **Postconditions** | External systems receive timely, structured updates about processing state without relying on offline product files. | + + --- ## Architectural Observations diff --git a/docs/context_use_cases_current_pipeline.md b/docs/context_use_cases_current_pipeline.md index 43bd2b9..e5c5933 100644 --- a/docs/context_use_cases_current_pipeline.md +++ b/docs/context_use_cases_current_pipeline.md @@ -2,9 +2,9 @@ ## Overview -The pipeline `Context` class (`pipeline.infrastructure.launcher.Context`) is the central mutable state object used for an entire pipeline execution. It is simultaneously a session state container, a domain metadata container, a cross-stage communication channel, and a persistence unit. +The pipeline `Context` class is the central mutable state object used for an entire pipeline execution. It is simultaneously a session state container, a domain metadata container, a cross-stage communication channel, and a persistence unit. -This document catalogues the current use cases of the pipeline Context as determined by examination of the codebase. The goal is to support design and development of a prototype of a system which serves a similar role to that of the context for RADPS. It also includes additional use cases and gap scenarios identified by examining the RADPS documentation and requirements, to ensure future extensibility and alignment with next-generation needs. +This document catalogues the current use cases of the pipeline Context as determined by examination of the codebase. The goal is to support design and development of a prototype of a system which serves a similar role to that of the context for RADPS. It also identifies an initial set of use cases that the current design does not support but that are required or implied by RADPS requirements documentation. The goal is to inform the design of a system serving a similar role to the Context in RADPS. See also: [Supplementary analysis and design recommendations](context_current_pipeline_appendix.md) @@ -200,74 +200,52 @@ In the tables below, **Actor(s)** identifies the human or system role that direc ## 2. Use Cases the Current Design Cannot Handle -The following describe scenarios that the current context design *does not support* but that could be valuable in a future architecture. They are numbered GAP-01 through GAP-07 to indicate gaps in the current design's capabilities. +The following use cases are not supported by the current context design but are required or strongly +implied by RADPS requirement and design documentation. These use cases were identified through a first pass +of the RADPS requirements documentation and are not exhaustive. A full gap analysis mapping current context use cases to RADPS requirements is a separate activity which is underway. These are numbered GAP-01 through GAP-04 to indicate gaps in the current design's capabilities. -> **Design note:** The context design should remain compatible with multi-tenant deployments (no global singletons that leak state between runs, no hardcoded single-user paths), but access control, role-based permissions, and audit logging are concerns of the deployment platform rather than the context itself. +Reviewer input on missing or incorrectly included items is welcome. -### GAP-01 — Concurrent / Overlapping Task Execution +### GAP-01 — Concurrent Execution of Independent Work | Field | Content | |-------|---------| -| **Actor(s)** | Workflow engine, parallel task scheduler | -| **Summary** | The system should support concurrent execution of otherwise independent tasks (e.g., per-MS or per-SPW calibration stages) with isolated read snapshots, a merge/reconciliation step when concurrent results are accepted, and explicit declaration of which state fields each task reads and writes. This gap is about overlapping task execution against shared state, not the existing worker fan-out/fan-in pattern used for some parallel work. | -| **Postconditions** | Independent tasks execute in parallel without corrupting shared state; results are merged safely before the next sequential step. | +| **Actor(s)** | Workflow orchestration layer, parallel task scheduler | +| **Summary** | The system must support concurrent execution of independent work at multiple granularities — both at the stage level, where independent stages execute simultaneously, and within a stage, where work is parallelized across processing axes such as MS or SPW. In both cases the system must ensure results are correctly incorporated into processing state without inconsistency. This is distinct from the existing parallel worker pattern (UC-11, UC-12), which distributes work within a single stage but requires all work to complete before the next stage can begin.| +| **Invariant** | Independent tasks are executed concurrently without producing inconsistent or incorrect processing state. | +| **Postconditions** | Results from concurrently executed work are fully incorporated into processing state before any dependent work begins. | +| **RADPS Requirements** | CSS9017, CSS9063 | --- -### GAP-02 — Cloud / Distributed Execution Without Shared Filesystem +### GAP-02 — Distributed Execution Without Shared Filesystem | Field | Content | |-------|---------| -| **Actor(s)** | Workflow engine, cloud orchestrator, distributed workers | -| **Summary** | The system should support execution across nodes that do not share a filesystem. This requires a context store backed by a database or object store, artifact references rather than filesystem paths for calibration tables and images, and tasks that can operate on remote datasets without requiring local copies. | -| **Postconditions** | Processing completes successfully across distributed nodes without reliance on a shared filesystem. | +| **Actor(s)** | Workflow orchestration layer, distributed workers | +| **Summary** | The system must support execution across nodes that do not share a filesystem. Processing state, artifacts, and datasets must be accessible to all participating nodes without relying on a shared local filesystem.| +| **Postconditions** | Processing completes correctly across distributed nodes without reliance on a shared filesystem. | +| **RADPS Requirements** | CSS9002, CSS9030 | --- -### GAP-03 — Multi-Language / Multi-Framework Access to Context - -| Field | Content | -|-------|---------| -| **Actor(s)** | Non-Python clients (C++, Julia, JavaScript dashboards), external tools | -| **Summary** | The system should expose context state through a language-neutral interface, using a portable serialization format (Protocol Buffers, JSON-Schema, Arrow), a query API (REST, gRPC, or GraphQL), and type definitions shared across languages. | -| **Postconditions** | Clients in any supported language can read context state through a stable, typed API. | - ---- - -### GAP-04 — Streaming / Incremental Processing - -| Field | Content | -|-------|---------| -| **Actor(s)** | Data ingest system, workflow engine, incremental processing tasks | -| **Summary** | The system should support incremental dataset registration (adding new scans or execution blocks to a live session), tasks that detect new data and re-process incrementally, and a results model that supports versioning so that re-runs produce new versions rather than overwriting previous outputs. | -| **Postconditions** | New data is incorporated into an active session and processed without restarting from scratch. | - ---- - -### GAP-05 — Provenance and Reproducibility Guarantees +### GAP-03 — Provenance and Reproducibility | Field | Content | |-------|---------| | **Actor(s)** | Pipeline operator, auditor, reproducibility tooling | -| **Summary** | The system should maintain immutable per-stage snapshots of context state (event sourcing), hash all task inputs (context fields and parameters) for cache invalidation and reproducibility tokens, record software provenance (pipeline version, framework version, dependency manifest) and input data identity (checksums or URIs for raw datasets) per run, and support replaying a run from the event log. | +| **Summary** | The system must record sufficient provenance information — software versions, input data identity, task parameters, and processing state at each stage — to enable a past processing run to be precisely reproduced or audited.| | **Postconditions** | Any past processing step can be precisely reproduced or audited from the recorded provenance chain, including the exact software and data versions that produced it. | +| **RADPS Requirements** | ALMA-TR103, ALMA-TR104, ALMA-TR105 | --- -### GAP-06 — Partial Re-Execution / Targeted Stage Re-Run +### GAP-04 — Partial Re-Execution / Targeted Stage Re-Run | Field | Content | |-------|---------| | **Actor(s)** | Pipeline operator, developer, workflow engine | -| **Summary** | The system should support selectively re-running a single mid-pipeline stage with different parameters while keeping earlier and later stages intact. This requires explicit dependency tracking between stages, the ability to invalidate downstream stages, and versioned per-stage results. | -| **Postconditions** | A targeted stage is re-executed and downstream state is correctly invalidated or updated; unaffected stages are preserved. | - ---- - -### GAP-07 — External System Integration (Archive, Scheduling, QA Dashboards) +| **Summary** | The system must support selectively re-running a single mid-pipeline stage with different parameters while keeping earlier and later stages intact. Stages that depend on the re-executed stage's outputs must be invalidated or updated; stages that do not must be preserved. Note: CSS9038 explicitly requires re-start at discrete stages; dependency-aware invalidation of downstream stages is implied rather than explicitly stated.| +| **Postconditions** | After a targeted re-execution, processing state reflects the new outcome for the re-run stage, affected downstream stages are invalidated or updated, and unaffected stages are preserved. | +| **RADPS Requirements** | CSS9038 | -| Field | Content | -|-------|---------| -| **Actor(s)** | Archive ingest system, scheduling database, QA dashboards, monitoring tools | -| **Summary** | The system could expose a stable, queryable API (REST/gRPC) that external systems can poll or subscribe to, support webhook/event notifications for state transitions, and publish a standard schema for context summaries consumable by external systems. However, this involves a significant trade-off: the current context has no stable API, which gives development teams full flexibility to evolve internal structures without cross-team coordination. A public API would require a formal stability contract, versioning discipline, and potentially a slow-to-change external interface layered over rapidly evolving internals. Whether this gap is in scope — or whether external integration should remain an offline, product-file-based concern — is an open design question. | -| **Postconditions** | External systems receive timely, structured updates about processing state without relying on offline product files. | From f95849354fc66cf5f30ab78958b4d086ec9e5dc9 Mon Sep 17 00:00:00 2001 From: kberry Date: Tue, 24 Mar 2026 01:55:00 -0400 Subject: [PATCH 05/22] Add descriptions of all use case fields. Update wording for intro paragraphs --- docs/context_use_cases_current_pipeline.md | 21 +++++++++++++++------ 1 file changed, 15 insertions(+), 6 deletions(-) diff --git a/docs/context_use_cases_current_pipeline.md b/docs/context_use_cases_current_pipeline.md index e5c5933..6ae6996 100644 --- a/docs/context_use_cases_current_pipeline.md +++ b/docs/context_use_cases_current_pipeline.md @@ -1,20 +1,29 @@ # Pipeline Context Use Cases ## Overview +The pipeline `Context` is the central state object used for an entire pipeline execution. +It carries observation data, calibration state, imaging state, execution history, +and project metadata, and serves as the primary communication channel between pipeline stages. -The pipeline `Context` class is the central mutable state object used for an entire pipeline execution. It is simultaneously a session state container, a domain metadata container, a cross-stage communication channel, and a persistence unit. +This document catalogues the current use cases of the pipeline Context as determined by examination of the codebase. The goal is to inform the design of a system serving a similar role to the current pipeline Context for RADPS. It also identifies an initial set of use cases that the current design does not support but that are required or implied by RADPS requirements documentation. -This document catalogues the current use cases of the pipeline Context as determined by examination of the codebase. The goal is to support design and development of a prototype of a system which serves a similar role to that of the context for RADPS. It also identifies an initial set of use cases that the current design does not support but that are required or implied by RADPS requirements documentation. The goal is to inform the design of a system serving a similar role to the Context in RADPS. - -See also: [Supplementary analysis and design recommendations](context_current_pipeline_appendix.md) +For additional details about the current implementation, reference material, and exploratory future use cases, see [Supplementary Analysis](context_current_pipeline_appendix.md). --- ## 1. Use Cases -Each use case describes a need that the pipeline's central state management must satisfy. They are written to be implementation-neutral in their core description. For pipeline-specific implementation details per use case, see [Implementation Notes by Use Case](context_current_pipeline_appendix.md#implementation-notes-by-use-case) in the appendix. +Each use case describes a need that the pipeline `Context` must satisfy. They are written to be implementation-neutral — the goal is to capture what the system must do, not +how the current pipeline implementation achieves it. For pipeline-specific implementation details by use case, see [Implementation Notes by Use Case](context_current_pipeline_appendix.md#implementation-notes-by-use-case) in the appendix. + +The following fields are used in each use case: -In the tables below, **Actor(s)** identifies the human or system role that directly creates, updates, consumes, or inspects the context state described by the use case. Actors are role categories, not specific task names or implementations. +- **Actor(s):** The human or system role that directly creates, updates, consumes, or inspects the +context state described by the use case. Actors are role categories, not specific task names or +current implementations. +- **Summary:** What the system must do to satisfy the use case. +- **Invariant:** A condition that must always be true while the system is operating. Present only where a meaningful invariant exists. +- **Postcondition:** A condition that must be true after a specific operation completes. Present only where a meaningful postcondition exists. --- From 4c27b9a7872c4970d3b3c17d964ab39d6554cc20 Mon Sep 17 00:00:00 2001 From: kberry Date: Tue, 24 Mar 2026 09:17:39 -0400 Subject: [PATCH 06/22] Update wording and change invariant vs postcondition label for some usecases. --- docs/context_use_cases_current_pipeline.md | 51 +++++++++++----------- 1 file changed, 26 insertions(+), 25 deletions(-) diff --git a/docs/context_use_cases_current_pipeline.md b/docs/context_use_cases_current_pipeline.md index 6ae6996..83d2fd5 100644 --- a/docs/context_use_cases_current_pipeline.md +++ b/docs/context_use_cases_current_pipeline.md @@ -33,7 +33,7 @@ current implementations. |-------|---------| | **Actor(s)** | Data import task, any downstream task, heuristics, renderers, QA handlers | | **Summary** | The system must load observation metadata (datasets, spectral windows, fields, antennas, scans, time ranges) and make it queryable by all subsequent processing steps. It must also provide a unified identifier scheme when multiple datasets use different native numbering. | -| **Postconditions** | All registered datasets remain queryable for the lifetime of the session without repeating the import process. | +| **Invariant** | All registered datasets remain queryable for the lifetime of the session without repeating the import process. | --- @@ -43,7 +43,7 @@ current implementations. |-------|---------| | **Actor(s)** | Initialization, any task, report generators | | **Summary** | The system must store project-level metadata (proposal code, PI, telescope, desired sensitivities, processing recipe) and make it available to tasks for decision-making and to report generators for labelling outputs. | -| **Postconditions** | Project metadata is available for the lifetime of the processing session. | +| **Invariant** | Project metadata is available for the lifetime of the processing session. | --- @@ -53,7 +53,7 @@ current implementations. |-------|---------| | **Actor(s)** | Initialization, any task, report generators, export code | | **Summary** | The system must centrally define and provide working directories, report directories, product directories, and logical filenames for logs, scripts, and reports. Tasks resolve file paths through these centrally managed locations. On session restore, paths must be overridable to adapt to a new environment. | -| **Postconditions** | All tasks share a consistent set of paths for inputs and outputs. | +| **Invariant** | All tasks share a consistent set of paths for inputs and outputs. | --- @@ -63,7 +63,7 @@ current implementations. |-------|---------| | **Actor(s)** | Calibration tasks | | **Summary** | The system must allow calibration tasks to register solutions (indexed by data selection: field, spectral window, antenna, intent, time interval), and allow downstream tasks to query for all calibrations applicable to a given data selection. It must distinguish between calibrations pending application and those already applied. Registration must support transactional multi-entry updates — tasks often register multiple calibrations atomically within a single result acceptance. | -| **Postconditions** | Calibration state is queryable and correctly scoped to data selections. | +| **Invariant** | Calibration state is queryable and correctly scoped to data selections. | --- @@ -73,7 +73,7 @@ current implementations. |-------|---------| | **Actor(s)** | Imaging tasks, downstream heuristics, and export tasks | | **Summary** | The system must allow imaging state — target lists, imaging parameters, masks, thresholds, and sensitivity estimates — to be computed by one step and read or refined by later steps. Multiple steps may contribute to a progressively refined imaging configuration. | -| **Postconditions** | The accumulated imaging state reflects contributions from all completed imaging-related steps and is available to later imaging steps. | +| **Invariant** | The accumulated imaging state reflects contributions from all completed imaging-related steps and is available to later imaging steps. | --- @@ -83,7 +83,7 @@ current implementations. |-------|---------| | **Actor(s)** | Imaging tasks, export tasks, report generators | | **Summary** | The system must maintain typed registries of produced image products with add/query semantics. Later tasks must be able to discover previously produced science, calibrator, RMS, and sub-product images through these registries. | -| **Postconditions** | Produced image products are registered by type and remain queryable for downstream processing, reporting, and export. | +| **Invariant** | Produced image products are registered by type and remain queryable for downstream processing, reporting, and export. | --- @@ -93,7 +93,7 @@ current implementations. |-------|---------| | **Actor(s)** | Workflow orchestration layer, tasks, report generators, human operators | | **Summary** | The system must track which processing step is currently executing and maintain a stable, ordered history of completed steps and their outcomes. This history must support reporting, script generation, and resumption after interruption. Per-stage tracebacks and timings must be preserved, and stage identity and ordering must remain coherent across resumes. | -| **Postconditions** | The full execution history is retrievable in order; each recorded step retains its stage identity, outcome, timing, traceback information, and the arguments or effective parameters used to invoke it. | +| **Invariant** | The full execution history is retrievable in order; each recorded step retains its stage identity, outcome, timing, traceback information, and the arguments or effective parameters used to invoke it. | --- @@ -112,8 +112,8 @@ current implementations. | Field | Content | |-------|---------| | **Actor(s)** | Operations / automated processing (PPR-driven batch), pipeline developer / power user (interactive), recipe executor | -| **Summary** | The context is created and consumed by multiple front-ends: PPR command lists, XML procedures, or interactive task calls. The system must remain the stable state contract across these drivers. It must be creatable and resumable from non-interactive and interactive drivers, support driver-injected run metadata, tolerate partial execution controls (`startstage`, `exitstage`) and breakpoint-driven stop/resume semantics, and provide machine-detectable success/failure signals. | -| **Postconditions** | The same context state is usable regardless of which orchestration driver created or resumed it. | +| **Summary** | The context is created and consumed by multiple front-ends: PPR command lists, XML procedures, or interactive task calls. Processing state must remain consistent and usable regardless of which driver created or resumed it. It must be creatable and resumable from non-interactive and interactive drivers, support driver-injected run metadata, tolerate partial execution controls (`startstage`, `exitstage`) and breakpoint-driven stop/resume semantics, and provide machine-detectable success/failure signals. | +| **Invariant** | Processing state is consistent and usable regardless of which orchestration driver created or resumed it. | --- @@ -122,8 +122,8 @@ current implementations. | Field | Content | |-------|---------| | **Actor(s)** | Pipeline operator, workflow orchestration layer, developers | -| **Summary** | The system must be able to serialize the complete processing state to disk (all observation data, calibration state, execution history, imaging state, project metadata) and later restore it so that processing can resume from the saved point. The serialization must preserve enough state to resume when serialization compatibility is maintained; backward compatibility across pipeline releases is not guaranteed. On restore, paths must be adaptable to a new filesystem environment. | -| **Postconditions** | After restore, the processing state is operationally equivalent to the saved state for supported resume workflows, and processing can continue. | +| **Summary** | The system must be able to serialize the complete processing state to disk (all observation data, calibration state, execution history, imaging state, project metadata, etc) and later restore it so that processing can resume from the saved point. The serialization must preserve enough state to resume; backward compatibility across pipeline releases is not guaranteed. On restore, paths must be adaptable to a new filesystem environment. | +| **Postconditions** | After restore, the processing state is operationally equivalent to the saved state for supported resume workflows, and processing can continue from the specified point. | --- @@ -132,8 +132,9 @@ current implementations. | Field | Content | |-------|---------| | **Actor(s)** | Workflow orchestration layer, parallel worker processes | -| **Summary** | When work is distributed across parallel workers, each worker needs read-only access to the current processing state (observation metadata, calibration state, etc.). The system must provide a mechanism for workers to obtain a consistent snapshot of that state. Workers must not be able to modify the shared processing state directly. The snapshot mechanism must support efficient distribution to workers. | -| **Postconditions** | Each worker has a consistent, read-only view of the processing state for the duration of its work. | +| **Summary** | When work is distributed across parallel workers, each worker needs read-only access to the current processing state (observation metadata, calibration state). The system must provide a mechanism for workers to obtain a consistent snapshot of that state. Workers must not be able to modify the shared processing state directly. The snapshot mechanism must support efficient distribution to workers. | +| **Invariant:**| Worker processes cannot modify shared processing state directly. | +| **Postconditions** | After distribution, each worker has a consistent, read-only view of the processing state for the duration of its work. | --- @@ -147,7 +148,7 @@ current implementations. --- -### UC-13 — Provide Read-Only Context for Reporting Consumers +### UC-13 — Provide Read-Only State for Reporting | Field | Content | |-------|---------| @@ -161,8 +162,8 @@ current implementations. | Field | Content | |-------|---------| -| **Actor(s)** | QA scoring framework, report generators, later pipeline logic that consults recorded QA outcomes | -| **Summary** | After each processing step completes, the system must make the relevant processing state available to QA handlers so they can evaluate the outcome against quality thresholds, which may depend on telescope, project parameters, or observation properties. The resulting normalized quality scores must be recorded and remain retrievable for reporting and for later pipeline logic that explicitly consults recorded QA outcomes. | +| **Actor(s)** | QA scoring framework, report generators, tasks that consult recorded QA outcomes | +| **Summary** | After each processing step completes, the system must make the relevant processing state available to QA handlers so they can evaluate the outcome against quality thresholds, which may depend on telescope, project parameters, or observation properties. The resulting quality scores must be recorded and remain retrievable for reporting and for later pipeline logic that consults recorded QA outcomes. | | **Postconditions** | Quality scores are associated with the relevant processing step and accessible to reports and downstream logic. | --- @@ -172,8 +173,8 @@ current implementations. | Field | Content | |-------|---------| | **Actor(s)** | Pipeline developer, pipeline operator, CI harnesses | -| **Summary** | The system must allow an operator to inspect the current processing state: which datasets are registered, what calibrations exist, how many steps have completed, and what their outcomes were. On failure, a snapshot of the state should be available for post-mortem analysis. The system must provide deterministic paths and outputs that a test harness can validate, and must surface failures beyond raw task exceptions. | -| **Postconditions** | The operator can understand the current state of processing and diagnose problems. | +| **Summary** | The system must allow an operator to inspect the current processing state, for example: which datasets are registered, what calibrations exist, how many steps have completed, and what their outcomes were. On failure, a snapshot of the state must be available for post-mortem analysis. The system must provide deterministic paths and outputs that a test harness can validate, and must surface failures that happen outside of task execution, e.g. weblog rendering. | +| **Invariant** | The current processing state is inspectable at any point during execution, and sufficient information is retained to diagnose failures after the fact. | --- @@ -182,18 +183,18 @@ current implementations. | Field | Content | |-------|---------| | **Actor(s)** | Telescope-specific tasks and heuristics | -| **Summary** | The system must support conditional telescope-specific extensions to the processing state. These extensions must be available to telescope-specific tasks and heuristics while remaining outside the assumed contract of shared pipeline code. Their presence must depend on the instrument being processed rather than being treated as universally available context state. | -| **Postconditions** | Telescope-specific extensions are present only for runs that require them, available to the tasks that need them, and not assumed by shared pipeline code. | +| **Summary** | The system must support conditional telescope-specific extensions to the processing state. These extensions must be available to telescope-specific tasks and heuristics while remaining outside the assumed contract of shared pipeline code. Generic pipeline components must not depend on or require knowledge of telescope-specific extensions. | +| **Invariant** | Telescope-specific extensions are present only for runs that require them, available to the tasks that need them, and are never assumed by shared pipeline code. | --- -### UC-17 — Provide Context for Product Export +### UC-17 — Provide State for Product Export | Field | Content | |-------|---------| -| **Actor(s)** | Export task, archive system | -| **Summary** | The system must make the datasets, calibration state, image products, reports, scripts, path information, and project identifiers available through the processing state so export tasks can assemble them into a deliverable product package. The package must be structured for downstream archive ingestion. | -| **Postconditions** | The information needed to assemble the product package is accessible through the processing state, and a self-contained product package can be produced. | +| **Actor(s)** | Export task | +| **Summary** | The system must make the datasets, calibration state, image products, reports, scripts, path information, and project identifiers available through the processing state so export tasks can assemble them into a deliverable product package. | +| **Invariant** | The information needed to assemble a deliverable product package is accessible through the processing state. | --- @@ -203,7 +204,7 @@ current implementations. |-------|---------| | **Actor(s)** | Workflow orchestration layer, event subscribers (loggers, progress monitors) | | **Summary** | The system must emit notifications at key lifecycle points (session start, session restore, step start, step completion, result acceptance) so that external observers (logging, progress reporting, live dashboards) can track execution without polling. | -| **Postconditions** | Subscribers are notified of lifecycle transitions as they occur. | +| **Invariant** | Subscribers are notified of lifecycle transitions as they occur. | --- From f00edab480ada99fc9291c80b03ac5c666e3a148 Mon Sep 17 00:00:00 2001 From: kberry Date: Wed, 25 Mar 2026 14:19:24 -0400 Subject: [PATCH 07/22] Move external systems integration use case back to GAP and update wording --- docs/context_use_cases_current_pipeline.md | 18 ++++++++++++++---- 1 file changed, 14 insertions(+), 4 deletions(-) diff --git a/docs/context_use_cases_current_pipeline.md b/docs/context_use_cases_current_pipeline.md index 83d2fd5..2cd8a3e 100644 --- a/docs/context_use_cases_current_pipeline.md +++ b/docs/context_use_cases_current_pipeline.md @@ -5,7 +5,7 @@ The pipeline `Context` is the central state object used for an entire pipeline e It carries observation data, calibration state, imaging state, execution history, and project metadata, and serves as the primary communication channel between pipeline stages. -This document catalogues the current use cases of the pipeline Context as determined by examination of the codebase. The goal is to inform the design of a system serving a similar role to the current pipeline Context for RADPS. It also identifies an initial set of use cases that the current design does not support but that are required or implied by RADPS requirements documentation. +This document catalogues the current use cases of the pipeline `Context` as determined by examination of the codebase. The goal is to inform the design of a system serving a similar role to the current pipeline Context for RADPS. It also identifies an initial set of use cases that the current design does not support but that are required or implied by RADPS requirements documentation. For additional details about the current implementation, reference material, and exploratory future use cases, see [Supplementary Analysis](context_current_pipeline_appendix.md). @@ -13,8 +13,7 @@ For additional details about the current implementation, reference material, and ## 1. Use Cases -Each use case describes a need that the pipeline `Context` must satisfy. They are written to be implementation-neutral — the goal is to capture what the system must do, not -how the current pipeline implementation achieves it. For pipeline-specific implementation details by use case, see [Implementation Notes by Use Case](context_current_pipeline_appendix.md#implementation-notes-by-use-case) in the appendix. +Each use case describes a need that the pipeline `Context` must satisfy. They are written to be implementation-neutral — the goal is to capture what the system must do, not how the current pipeline implementation achieves it. For pipeline-specific implementation details by use case, see [Implementation Notes by Use Case](context_current_pipeline_appendix.md#implementation-notes-by-use-case) in the appendix. The following fields are used in each use case: @@ -203,7 +202,7 @@ current implementations. | Field | Content | |-------|---------| | **Actor(s)** | Workflow orchestration layer, event subscribers (loggers, progress monitors) | -| **Summary** | The system must emit notifications at key lifecycle points (session start, session restore, step start, step completion, result acceptance) so that external observers (logging, progress reporting, live dashboards) can track execution without polling. | +| **Summary** | The system must emit notifications at key lifecycle points (session start, session restore, step start, step completion, result acceptance) so that external observers (logging, progress reporting) can track execution without polling. | | **Invariant** | Subscribers are notified of lifecycle transitions as they occur. | --- @@ -259,3 +258,14 @@ Reviewer input on missing or incorrectly included items is welcome. | **Postconditions** | After a targeted re-execution, processing state reflects the new outcome for the re-run stage, affected downstream stages are invalidated or updated, and unaffected stages are preserved. | | **RADPS Requirements** | CSS9038 | +--- + +### GAP-05: External System Integration (Archive, Scheduling, QA Dashboards) + +| Field | Content | +|-------|---------| +| **Actor(s)** | QA dashboards, monitoring tools, archive ingest systems, scheduling systems | +| **Summary** | External systems need access to current processing state — including current stage, processing time, QA results, and lifecycle transitions — without relying on offline product files. The system must expose sufficient state for these consumers to track and respond to processing status in a timely way. | +| **Invariant** | The processing state needed by external consumers is accessible and current throughout execution. | +| **Postconditions** | External systems can access processing state and lifecycle transitions without waiting for offline products to be generated. | +| **RADPS Requirements** | CSS9046, CSS9047, CSS9048, CSS9049, CSS9050, CSS9056 | \ No newline at end of file From a753595055301843f91fcd96e65b1900597f5747 Mon Sep 17 00:00:00 2001 From: kberry Date: Thu, 26 Mar 2026 10:02:21 -0400 Subject: [PATCH 08/22] Update wording for several use cases and merge minor standalone use cases. --- docs/context_use_cases_current_pipeline.md | 91 +++++++++------------- 1 file changed, 37 insertions(+), 54 deletions(-) diff --git a/docs/context_use_cases_current_pipeline.md b/docs/context_use_cases_current_pipeline.md index 2cd8a3e..98e1a5c 100644 --- a/docs/context_use_cases_current_pipeline.md +++ b/docs/context_use_cases_current_pipeline.md @@ -31,7 +31,7 @@ current implementations. | Field | Content | |-------|---------| | **Actor(s)** | Data import task, any downstream task, heuristics, renderers, QA handlers | -| **Summary** | The system must load observation metadata (datasets, spectral windows, fields, antennas, scans, time ranges) and make it queryable by all subsequent processing steps. It must also provide a unified identifier scheme when multiple datasets use different native numbering. | +| **Summary** | The context must load observation metadata (datasets, spectral windows, fields, antennas, scans, time ranges) and make it queryable by all subsequent processing steps. It must also provide a unified identifier scheme when multiple datasets use different native numbering. | | **Invariant** | All registered datasets remain queryable for the lifetime of the session without repeating the import process. | --- @@ -41,62 +41,52 @@ current implementations. | Field | Content | |-------|---------| | **Actor(s)** | Initialization, any task, report generators | -| **Summary** | The system must store project-level metadata (proposal code, PI, telescope, desired sensitivities, processing recipe) and make it available to tasks for decision-making and to report generators for labelling outputs. | +| **Summary** | The context must store project-level metadata (proposal code, PI, telescope, desired sensitivities, processing recipe) and make it available to tasks for decision-making and to report generators for labelling outputs. | | **Invariant** | Project metadata is available for the lifetime of the processing session. | --- -### UC-03 — Manage Execution Paths and Output Locations - -| Field | Content | -|-------|---------| -| **Actor(s)** | Initialization, any task, report generators, export code | -| **Summary** | The system must centrally define and provide working directories, report directories, product directories, and logical filenames for logs, scripts, and reports. Tasks resolve file paths through these centrally managed locations. On session restore, paths must be overridable to adapt to a new environment. | -| **Invariant** | All tasks share a consistent set of paths for inputs and outputs. | - ---- - -### UC-04 — Register and Query Calibration State +### UC-03 — Register and Query Calibration State | Field | Content | |-------|---------| | **Actor(s)** | Calibration tasks | -| **Summary** | The system must allow calibration tasks to register solutions (indexed by data selection: field, spectral window, antenna, intent, time interval), and allow downstream tasks to query for all calibrations applicable to a given data selection. It must distinguish between calibrations pending application and those already applied. Registration must support transactional multi-entry updates — tasks often register multiple calibrations atomically within a single result acceptance. | +| **Summary** | The context must allow calibration tasks to register solutions (indexed by data selection: field, spectral window, antenna, intent, time interval), and allow downstream tasks to query for all calibrations applicable to a given data selection. It must distinguish between calibrations pending application and those already applied. Registration must support transactional multi-entry updates — tasks often register multiple calibrations atomically within a single result acceptance. | | **Invariant** | Calibration state is queryable and correctly scoped to data selections. | --- -### UC-05 — Accumulate Imaging State Across Multiple Steps +### UC-04 — Accumulate Imaging State Across Multiple Steps | Field | Content | |-------|---------| | **Actor(s)** | Imaging tasks, downstream heuristics, and export tasks | -| **Summary** | The system must allow imaging state — target lists, imaging parameters, masks, thresholds, and sensitivity estimates — to be computed by one step and read or refined by later steps. Multiple steps may contribute to a progressively refined imaging configuration. | +| **Summary** | The context must allow imaging state — target lists, imaging parameters, masks, thresholds, and sensitivity estimates — to be computed by one step and read or refined by later steps. Multiple steps may contribute to a progressively refined imaging configuration. | | **Invariant** | The accumulated imaging state reflects contributions from all completed imaging-related steps and is available to later imaging steps. | --- -### UC-06 — Register and Query Produced Image Products +### UC-05 — Register and Query Produced Image Products | Field | Content | |-------|---------| | **Actor(s)** | Imaging tasks, export tasks, report generators | -| **Summary** | The system must maintain typed registries of produced image products with add/query semantics. Later tasks must be able to discover previously produced science, calibrator, RMS, and sub-product images through these registries. | +| **Summary** | The context must maintain typed registries of produced image products with add/query semantics. Later tasks must be able to discover previously produced science, calibrator, RMS, and sub-product images through these registries. | | **Invariant** | Produced image products are registered by type and remain queryable for downstream processing, reporting, and export. | --- -### UC-07 — Track Execution Progress and Stage History +### UC-06 — Track Execution Progress and Stage History | Field | Content | |-------|---------| | **Actor(s)** | Workflow orchestration layer, tasks, report generators, human operators | -| **Summary** | The system must track which processing step is currently executing and maintain a stable, ordered history of completed steps and their outcomes. This history must support reporting, script generation, and resumption after interruption. Per-stage tracebacks and timings must be preserved, and stage identity and ordering must remain coherent across resumes. | +| **Summary** | The context must track which processing step is currently executing and maintain a stable, ordered history of completed steps and their outcomes. This history must support reporting, script generation, and resumption after interruption. Per-stage tracebacks and timings must be preserved, and stage identity and ordering must remain coherent across resumes. | | **Invariant** | The full execution history is retrievable in order; each recorded step retains its stage identity, outcome, timing, traceback information, and the arguments or effective parameters used to invoke it. | --- -### UC-08 — Propagate Task Outputs to Downstream Tasks +### UC-07 — Propagate Task Outputs to Downstream Tasks | Field | Content | |-------|---------| @@ -104,109 +94,102 @@ current implementations. | **Summary** | When a task produces outputs that change the processing state (e.g., new calibrations, updated flag summaries, image products, revised parameters), the system must provide a mechanism for those outputs to become available to subsequent processing steps. It must also retain those outputs as part of the execution record for later inspection, reporting, and export. These two needs may be satisfied through different access paths. | | **Postconditions** | Downstream tasks can access the propagated processing state they need, and the task outputs are retained in the execution history for later retrieval. | +# TODO: Tom suggestion +The context must provide explicit, structured mechanisms for accepting task outputs into shared processing state and for retaining those outputs in the execution record. + --- -### UC-09 — Support Multiple Orchestration Drivers +### UC-08 — Support Multiple Orchestration Drivers | Field | Content | |-------|---------| | **Actor(s)** | Operations / automated processing (PPR-driven batch), pipeline developer / power user (interactive), recipe executor | -| **Summary** | The context is created and consumed by multiple front-ends: PPR command lists, XML procedures, or interactive task calls. Processing state must remain consistent and usable regardless of which driver created or resumed it. It must be creatable and resumable from non-interactive and interactive drivers, support driver-injected run metadata, tolerate partial execution controls (`startstage`, `exitstage`) and breakpoint-driven stop/resume semantics, and provide machine-detectable success/failure signals. | -| **Invariant** | Processing state is consistent and usable regardless of which orchestration driver created or resumed it. | +| **Summary** | The context is created and consumed by multiple front-ends: PPR command lists, XML procedures, or interactive task calls. The state stored by the context must remain consistent and usable regardless of which driver created or resumed it. It must be creatable and resumable from non-interactive and interactive drivers, support driver-injected run metadata, tolerate partial execution controls (`startstage`, `exitstage`) and breakpoint-driven stop/resume semantics, provide machine-detectable success/failure signals, and emit notifications at key lifecycle points (session start, session restore, step start, step completion).| +| **Invariant** | Processing state is consistent and usable regardless of which orchestration driver created or resumed it, and success/failure signals and lifecycle notifications are produced when appropriate.| --- -### UC-10 — Save and Restore a Processing Session +### UC-09 — Save and Restore a Processing Session | Field | Content | |-------|---------| | **Actor(s)** | Pipeline operator, workflow orchestration layer, developers | -| **Summary** | The system must be able to serialize the complete processing state to disk (all observation data, calibration state, execution history, imaging state, project metadata, etc) and later restore it so that processing can resume from the saved point. The serialization must preserve enough state to resume; backward compatibility across pipeline releases is not guaranteed. On restore, paths must be adaptable to a new filesystem environment. | +| **Summary** | The context must be able to serialize the complete processing state to disk (all observation data, calibration state, execution history, imaging state, project metadata, etc) and later restore it so that processing can resume from the saved point. The serialization must preserve enough state to resume; backward compatibility across pipeline releases is not guaranteed. On restore, paths must be adaptable to a new filesystem environment. | | **Postconditions** | After restore, the processing state is operationally equivalent to the saved state for supported resume workflows, and processing can continue from the specified point. | --- -### UC-11 — Provide State to Parallel Workers +### UC-10 — Provide State to Parallel Workers | Field | Content | |-------|---------| | **Actor(s)** | Workflow orchestration layer, parallel worker processes | -| **Summary** | When work is distributed across parallel workers, each worker needs read-only access to the current processing state (observation metadata, calibration state). The system must provide a mechanism for workers to obtain a consistent snapshot of that state. Workers must not be able to modify the shared processing state directly. The snapshot mechanism must support efficient distribution to workers. | +| **Summary** | When work is distributed across parallel workers, each worker needs read-only access to the current processing state (observation metadata, calibration state). The context must provide a mechanism for workers to obtain a consistent snapshot of that state. Workers must not be able to modify the shared processing state directly. The snapshot mechanism must support efficient distribution to workers. | | **Invariant:**| Worker processes cannot modify shared processing state directly. | | **Postconditions** | After distribution, each worker has a consistent, read-only view of the processing state for the duration of its work. | --- -### UC-12 — Aggregate Results from Parallel Workers +### UC-11 — Aggregate Results from Parallel Workers | Field | Content | |-------|---------| | **Actor(s)** | Workflow orchestration layer | -| **Summary** | After parallel workers complete, the system must collect their individual results and incorporate them into the shared processing state. The aggregation must be safe (no conflicting concurrent writes) and complete before the next sequential step begins. | +| **Summary** | After parallel workers complete, the context must collect their individual results and incorporate them into the shared processing state. The aggregation must be safe (no conflicting concurrent writes) and complete before the next sequential step begins. | | **Postconditions** | The processing state reflects the combined outcomes of all parallel workers. | --- -### UC-13 — Provide Read-Only State for Reporting +### UC-12 — Provide Read-Only State for Reporting | Field | Content | |-------|---------| | **Actor(s)** | Report generators (weblog, quality reports, reproducibility scripts, AQUA reports, pipeline statistics) | -| **Summary** | The system must provide reporting consumers with read-only access to the observation metadata, project metadata, execution history, QA outcomes, log references, and path information needed to generate reporting products such as weblogs, quality reports, reproducibility scripts, AQUA reports, and pipeline statistics. | +| **Summary** | The context must provide read-only access to the observation metadata, project metadata, execution history, QA outcomes, log references, and path information needed to generate reporting products such as weblogs, quality reports, reproducibility scripts, AQUA reports, and pipeline statistics. | | **Postconditions** | Reports accurately reflect the processing state at the time of generation. | --- -### UC-14 — Support QA Evaluation and Store Quality Assessments +### UC-13 — Support QA Evaluation and Store Quality Assessments | Field | Content | |-------|---------| | **Actor(s)** | QA scoring framework, report generators, tasks that consult recorded QA outcomes | -| **Summary** | After each processing step completes, the system must make the relevant processing state available to QA handlers so they can evaluate the outcome against quality thresholds, which may depend on telescope, project parameters, or observation properties. The resulting quality scores must be recorded and remain retrievable for reporting and for later pipeline logic that consults recorded QA outcomes. | +| **Summary** | After each processing step completes, the context must make the relevant processing state available to QA handlers so they can evaluate the outcome against quality thresholds, which may depend on telescope, project parameters, or observation properties. The resulting quality scores must be recorded and remain retrievable for reporting and for later pipeline logic that consults recorded QA outcomes. | | **Postconditions** | Quality scores are associated with the relevant processing step and accessible to reports and downstream logic. | --- -### UC-15 — Support Interactive Inspection and Debugging +### UC-14 — Support Interactive Inspection and Debugging | Field | Content | |-------|---------| | **Actor(s)** | Pipeline developer, pipeline operator, CI harnesses | -| **Summary** | The system must allow an operator to inspect the current processing state, for example: which datasets are registered, what calibrations exist, how many steps have completed, and what their outcomes were. On failure, a snapshot of the state must be available for post-mortem analysis. The system must provide deterministic paths and outputs that a test harness can validate, and must surface failures that happen outside of task execution, e.g. weblog rendering. | +| **Summary** | The context must allow an operator to inspect the current processing state, for example: which datasets are registered, what calibrations exist, how many steps have completed, and what their outcomes were. On failure, a snapshot of the state must be available for post-mortem analysis. | | **Invariant** | The current processing state is inspectable at any point during execution, and sufficient information is retained to diagnose failures after the fact. | --- -### UC-16 — Manage Telescope-Specific Context Extensions +### UC-15 — Manage Telescope-Specific Context Extensions | Field | Content | |-------|---------| | **Actor(s)** | Telescope-specific tasks and heuristics | -| **Summary** | The system must support conditional telescope-specific extensions to the processing state. These extensions must be available to telescope-specific tasks and heuristics while remaining outside the assumed contract of shared pipeline code. Generic pipeline components must not depend on or require knowledge of telescope-specific extensions. | +| **Summary** | The context must support conditional telescope-specific extensions to the processing state. These extensions must be available to telescope-specific tasks and heuristics while remaining outside the assumed contract of shared pipeline code. Generic pipeline components must not depend on or require knowledge of telescope-specific extensions. | | **Invariant** | Telescope-specific extensions are present only for runs that require them, available to the tasks that need them, and are never assumed by shared pipeline code. | --- -### UC-17 — Provide State for Product Export +### UC-16 — Provide State for Product Export | Field | Content | |-------|---------| | **Actor(s)** | Export task | -| **Summary** | The system must make the datasets, calibration state, image products, reports, scripts, path information, and project identifiers available through the processing state so export tasks can assemble them into a deliverable product package. | +| **Summary** | The context must make the datasets, calibration state, image products, reports, scripts, path information, and project identifiers available through the processing state so export tasks can assemble them into a deliverable product package. | | **Invariant** | The information needed to assemble a deliverable product package is accessible through the processing state. | --- -### UC-18 — Emit Lifecycle Notifications - -| Field | Content | -|-------|---------| -| **Actor(s)** | Workflow orchestration layer, event subscribers (loggers, progress monitors) | -| **Summary** | The system must emit notifications at key lifecycle points (session start, session restore, step start, step completion, result acceptance) so that external observers (logging, progress reporting) can track execution without polling. | -| **Invariant** | Subscribers are notified of lifecycle transitions as they occur. | - ---- - ## 2. Use Cases the Current Design Cannot Handle The following use cases are not supported by the current context design but are required or strongly @@ -220,7 +203,7 @@ Reviewer input on missing or incorrectly included items is welcome. | Field | Content | |-------|---------| | **Actor(s)** | Workflow orchestration layer, parallel task scheduler | -| **Summary** | The system must support concurrent execution of independent work at multiple granularities — both at the stage level, where independent stages execute simultaneously, and within a stage, where work is parallelized across processing axes such as MS or SPW. In both cases the system must ensure results are correctly incorporated into processing state without inconsistency. This is distinct from the existing parallel worker pattern (UC-11, UC-12), which distributes work within a single stage but requires all work to complete before the next stage can begin.| +| **Summary** | The context must support concurrent execution of independent work at multiple granularities — both at the stage level, where independent stages execute simultaneously, and within a stage, where work is parallelized across processing axes such as MS or SPW. In both cases the context must ensure results are correctly incorporated into processing state without inconsistency. This is distinct from the existing parallel worker pattern (UC-11, UC-12), which distributes work within a single stage but requires all work to complete before the next stage can begin.| | **Invariant** | Independent tasks are executed concurrently without producing inconsistent or incorrect processing state. | | **Postconditions** | Results from concurrently executed work are fully incorporated into processing state before any dependent work begins. | | **RADPS Requirements** | CSS9017, CSS9063 | @@ -232,7 +215,7 @@ Reviewer input on missing or incorrectly included items is welcome. | Field | Content | |-------|---------| | **Actor(s)** | Workflow orchestration layer, distributed workers | -| **Summary** | The system must support execution across nodes that do not share a filesystem. Processing state, artifacts, and datasets must be accessible to all participating nodes without relying on a shared local filesystem.| +| **Summary** | The context must support execution across nodes that do not share a filesystem. Processing state, artifacts, and datasets must be accessible to all participating nodes without relying on a shared local filesystem.| | **Postconditions** | Processing completes correctly across distributed nodes without reliance on a shared filesystem. | | **RADPS Requirements** | CSS9002, CSS9030 | @@ -243,7 +226,7 @@ Reviewer input on missing or incorrectly included items is welcome. | Field | Content | |-------|---------| | **Actor(s)** | Pipeline operator, auditor, reproducibility tooling | -| **Summary** | The system must record sufficient provenance information — software versions, input data identity, task parameters, and processing state at each stage — to enable a past processing run to be precisely reproduced or audited.| +| **Summary** | The context must record sufficient provenance information — software versions, input data identity, task parameters, and processing state at each stage — to enable a past processing run to be precisely reproduced or audited.| | **Postconditions** | Any past processing step can be precisely reproduced or audited from the recorded provenance chain, including the exact software and data versions that produced it. | | **RADPS Requirements** | ALMA-TR103, ALMA-TR104, ALMA-TR105 | @@ -254,7 +237,7 @@ Reviewer input on missing or incorrectly included items is welcome. | Field | Content | |-------|---------| | **Actor(s)** | Pipeline operator, developer, workflow engine | -| **Summary** | The system must support selectively re-running a single mid-pipeline stage with different parameters while keeping earlier and later stages intact. Stages that depend on the re-executed stage's outputs must be invalidated or updated; stages that do not must be preserved. Note: CSS9038 explicitly requires re-start at discrete stages; dependency-aware invalidation of downstream stages is implied rather than explicitly stated.| +| **Summary** | The context must support selectively re-running a single mid-pipeline stage with different parameters while keeping earlier and later stages intact. Stages that depend on the re-executed stage's outputs must be invalidated or updated; stages that do not must be preserved. Note: CSS9038 explicitly requires re-start at discrete stages; dependency-aware invalidation of downstream stages is implied rather than explicitly stated.| | **Postconditions** | After a targeted re-execution, processing state reflects the new outcome for the re-run stage, affected downstream stages are invalidated or updated, and unaffected stages are preserved. | | **RADPS Requirements** | CSS9038 | From 381da100f427997c746cab711841c4471107ea61 Mon Sep 17 00:00:00 2001 From: kberry Date: Thu, 26 Mar 2026 12:27:25 -0400 Subject: [PATCH 09/22] Update UC-08 based on feedback. Update UC-01 to include updates. Split UC-06 into two use cases and update use case numbering --- docs/context_use_cases_current_pipeline.md | 57 ++++++++++++---------- 1 file changed, 32 insertions(+), 25 deletions(-) diff --git a/docs/context_use_cases_current_pipeline.md b/docs/context_use_cases_current_pipeline.md index 98e1a5c..ab73017 100644 --- a/docs/context_use_cases_current_pipeline.md +++ b/docs/context_use_cases_current_pipeline.md @@ -26,13 +26,13 @@ current implementations. --- -### UC-01 — Load and Provide Access to Observation Metadata +### UC-01 — Load, Update, and Provide Access to Observation Metadata | Field | Content | |-------|---------| | **Actor(s)** | Data import task, any downstream task, heuristics, renderers, QA handlers | -| **Summary** | The context must load observation metadata (datasets, spectral windows, fields, antennas, scans, time ranges) and make it queryable by all subsequent processing steps. It must also provide a unified identifier scheme when multiple datasets use different native numbering. | -| **Invariant** | All registered datasets remain queryable for the lifetime of the session without repeating the import process. | +| **Summary** | The context must load observation metadata (datasets, spectral windows, fields, antennas, scans, time ranges), make it queryable by all subsequent processing steps, and allow downstream tasks to update it as processing progresses (e.g., registering new derived datasets, data column and type changes, reference antenna selection). It must also provide a unified identifier scheme when multiple datasets use different native numbering. | +| **Invariant** | All registered datasets remain queryable and updatable for the lifetime of the session without repeating the import process. | --- @@ -41,7 +41,7 @@ current implementations. | Field | Content | |-------|---------| | **Actor(s)** | Initialization, any task, report generators | -| **Summary** | The context must store project-level metadata (proposal code, PI, telescope, desired sensitivities, processing recipe) and make it available to tasks for decision-making and to report generators for labelling outputs. | +| **Summary** | The context must store project-level metadata (proposal code, PI, telescope, desired sensitivities, processing recipe) and make it available to tasks for decision-making and to report generators to inform the output. | | **Invariant** | Project metadata is available for the lifetime of the processing session. | --- @@ -76,30 +76,37 @@ current implementations. --- -### UC-06 — Track Execution Progress and Stage History +### UC-06 — Track Current Execution Progress | Field | Content | |-------|---------| -| **Actor(s)** | Workflow orchestration layer, tasks, report generators, human operators | -| **Summary** | The context must track which processing step is currently executing and maintain a stable, ordered history of completed steps and their outcomes. This history must support reporting, script generation, and resumption after interruption. Per-stage tracebacks and timings must be preserved, and stage identity and ordering must remain coherent across resumes. | -| **Invariant** | The full execution history is retrievable in order; each recorded step retains its stage identity, outcome, timing, traceback information, and the arguments or effective parameters used to invoke it. | +| **Actor(s)** | Workflow orchestration layer, tasks, human operators | +| **Summary** | The system must track which processing stage is currently executing and maintain a stable, ordered record of completed stages. Stage identity and ordering must remain coherent across session saves and resumes. | +| **Invariant** | The currently executing stage is identifiable and completed stages are recorded in stable order. | ---- +___ -### UC-07 — Propagate Task Outputs to Downstream Tasks +### UC-07 — Preserve Per-Stage Execution Record | Field | Content | |-------|---------| -| **Actor(s)** | Any task producing output that subsequent tasks depend on | -| **Summary** | When a task produces outputs that change the processing state (e.g., new calibrations, updated flag summaries, image products, revised parameters), the system must provide a mechanism for those outputs to become available to subsequent processing steps. It must also retain those outputs as part of the execution record for later inspection, reporting, and export. These two needs may be satisfied through different access paths. | -| **Postconditions** | Downstream tasks can access the propagated processing state they need, and the task outputs are retained in the execution history for later retrieval. | +| **Actor(s)** | Report generators, human operators, workflow orchestration layer | +| **Summary** | The system must preserve a complete execution record for each completed stage, including timing, traceback information, outcomes, and the arguments used to invoke it. This record must support reporting, post-mortem diagnosis of failures, and resumption after interruption. | +| **Invariant** | Each completed stage retains its full execution record identity, outcome, timing, traceback, and invocation arguments for the lifetime of the session. | + +--- -# TODO: Tom suggestion -The context must provide explicit, structured mechanisms for accepting task outputs into shared processing state and for retaining those outputs in the execution record. +### UC-08 — Propagate Task Outputs to Downstream Tasks + +| Field | Content | +|-------|---------| +| **Actor(s)** | Any task producing output, downstream tasks | +| **Summary** | When a task produces outputs that change the processing state (e.g., new calibrations, updated flag summaries, image products, revised parameters), the context must provide a mechanism for those outputs to become available to subsequent processing steps before they execute. UC-03, UC-04, UC-05, and UC-14 are domain-specific instances of this pattern. | +| **Postconditions** | Downstream tasks can access the propagated processing state they need.| --- -### UC-08 — Support Multiple Orchestration Drivers +### UC-09 — Support Multiple Orchestration Drivers | Field | Content | |-------|---------| @@ -109,7 +116,7 @@ The context must provide explicit, structured mechanisms for accepting task outp --- -### UC-09 — Save and Restore a Processing Session +### UC-10 — Save and Restore a Processing Session | Field | Content | |-------|---------| @@ -119,7 +126,7 @@ The context must provide explicit, structured mechanisms for accepting task outp --- -### UC-10 — Provide State to Parallel Workers +### UC-11 — Provide State to Parallel Workers | Field | Content | |-------|---------| @@ -130,7 +137,7 @@ The context must provide explicit, structured mechanisms for accepting task outp --- -### UC-11 — Aggregate Results from Parallel Workers +### UC-12 — Aggregate Results from Parallel Workers | Field | Content | |-------|---------| @@ -140,17 +147,17 @@ The context must provide explicit, structured mechanisms for accepting task outp --- -### UC-12 — Provide Read-Only State for Reporting +### UC-13 — Provide Read-Only State for Reporting | Field | Content | |-------|---------| | **Actor(s)** | Report generators (weblog, quality reports, reproducibility scripts, AQUA reports, pipeline statistics) | -| **Summary** | The context must provide read-only access to the observation metadata, project metadata, execution history, QA outcomes, log references, and path information needed to generate reporting products such as weblogs, quality reports, reproducibility scripts, AQUA reports, and pipeline statistics. | +| **Summary** | The context must provide read-only access to the observation metadata, project metadata, execution history (including per-stage domain-specific outputs such as flag summaries and plot references), QA outcomes, log references, and path information needed to generate reporting products such as weblogs, quality reports, reproducibility scripts, AQUA reports, and pipeline statistics. | | **Postconditions** | Reports accurately reflect the processing state at the time of generation. | --- -### UC-13 — Support QA Evaluation and Store Quality Assessments +### UC-14 — Support QA Evaluation and Store Quality Assessments | Field | Content | |-------|---------| @@ -160,7 +167,7 @@ The context must provide explicit, structured mechanisms for accepting task outp --- -### UC-14 — Support Interactive Inspection and Debugging +### UC-15 — Support Interactive Inspection and Debugging | Field | Content | |-------|---------| @@ -170,7 +177,7 @@ The context must provide explicit, structured mechanisms for accepting task outp --- -### UC-15 — Manage Telescope-Specific Context Extensions +### UC-16 — Manage Telescope-Specific Context Extensions | Field | Content | |-------|---------| @@ -180,7 +187,7 @@ The context must provide explicit, structured mechanisms for accepting task outp --- -### UC-16 — Provide State for Product Export +### UC-17 — Provide State for Product Export | Field | Content | |-------|---------| From 4a31b201f8ebcb0c9fcf2efafb180a6afc109a57 Mon Sep 17 00:00:00 2001 From: kberry Date: Thu, 26 Mar 2026 12:47:01 -0400 Subject: [PATCH 10/22] Add mutability to UC-01; add updating to UC-03; update some wording in UC-04. --- docs/context_use_cases_current_pipeline.md | 14 +++++++------- 1 file changed, 7 insertions(+), 7 deletions(-) diff --git a/docs/context_use_cases_current_pipeline.md b/docs/context_use_cases_current_pipeline.md index ab73017..c34140c 100644 --- a/docs/context_use_cases_current_pipeline.md +++ b/docs/context_use_cases_current_pipeline.md @@ -41,28 +41,28 @@ current implementations. | Field | Content | |-------|---------| | **Actor(s)** | Initialization, any task, report generators | -| **Summary** | The context must store project-level metadata (proposal code, PI, telescope, desired sensitivities, processing recipe) and make it available to tasks for decision-making and to report generators to inform the output. | +| **Summary** | The context must store project-level metadata (proposal code, PI, telescope, desired sensitivities, processing recipe) and make it available to all components, e.g,, for decision-making in heuristics and to label outputs in reports. | | **Invariant** | Project metadata is available for the lifetime of the processing session. | --- -### UC-03 — Register and Query Calibration State +### UC-03 — Register, Query, and Update Calibration State | Field | Content | |-------|---------| -| **Actor(s)** | Calibration tasks | -| **Summary** | The context must allow calibration tasks to register solutions (indexed by data selection: field, spectral window, antenna, intent, time interval), and allow downstream tasks to query for all calibrations applicable to a given data selection. It must distinguish between calibrations pending application and those already applied. Registration must support transactional multi-entry updates — tasks often register multiple calibrations atomically within a single result acceptance. | -| **Invariant** | Calibration state is queryable and correctly scoped to data selections. | +| **Actor(s)** | Calibration tasks, imaging tasks, flagging tasks, export tasks, report generators | +| **Summary** | The context must allow calibration tasks to register and update solutions (indexed by data selection: field, spectral window, antenna, intent, time interval), and allow downstream tasks to query for all calibrations applicable to a given data selection. It must distinguish between calibrations pending application and those already applied. Registration must support registering multiple calibrations atomically as part of a single operation. | +| **Invariant** | Calibration state is queryable and correctly scoped to data selections, and can be updated as processing progresses. | --- -### UC-04 — Accumulate Imaging State Across Multiple Steps +### UC-04 — Manage Imaging State | Field | Content | |-------|---------| | **Actor(s)** | Imaging tasks, downstream heuristics, and export tasks | | **Summary** | The context must allow imaging state — target lists, imaging parameters, masks, thresholds, and sensitivity estimates — to be computed by one step and read or refined by later steps. Multiple steps may contribute to a progressively refined imaging configuration. | -| **Invariant** | The accumulated imaging state reflects contributions from all completed imaging-related steps and is available to later imaging steps. | +| **Invariant** | Imaging state reflects contributions from all completed imaging-related stages, and available for reading or refinement by subsequent stages. | --- From 32653f2b9da3d842814a37f00177a2c454d5f8b8 Mon Sep 17 00:00:00 2001 From: kberry Date: Thu, 26 Mar 2026 12:51:01 -0400 Subject: [PATCH 11/22] Remove table header labels. --- docs/context_use_cases_current_pipeline.md | 44 +++++++++++----------- 1 file changed, 22 insertions(+), 22 deletions(-) diff --git a/docs/context_use_cases_current_pipeline.md b/docs/context_use_cases_current_pipeline.md index c34140c..7e7bf89 100644 --- a/docs/context_use_cases_current_pipeline.md +++ b/docs/context_use_cases_current_pipeline.md @@ -28,7 +28,7 @@ current implementations. ### UC-01 — Load, Update, and Provide Access to Observation Metadata -| Field | Content | +| | | |-------|---------| | **Actor(s)** | Data import task, any downstream task, heuristics, renderers, QA handlers | | **Summary** | The context must load observation metadata (datasets, spectral windows, fields, antennas, scans, time ranges), make it queryable by all subsequent processing steps, and allow downstream tasks to update it as processing progresses (e.g., registering new derived datasets, data column and type changes, reference antenna selection). It must also provide a unified identifier scheme when multiple datasets use different native numbering. | @@ -38,7 +38,7 @@ current implementations. ### UC-02 — Store and Provide Project-Level Metadata -| Field | Content | +| | | |-------|---------| | **Actor(s)** | Initialization, any task, report generators | | **Summary** | The context must store project-level metadata (proposal code, PI, telescope, desired sensitivities, processing recipe) and make it available to all components, e.g,, for decision-making in heuristics and to label outputs in reports. | @@ -48,7 +48,7 @@ current implementations. ### UC-03 — Register, Query, and Update Calibration State -| Field | Content | +| | | |-------|---------| | **Actor(s)** | Calibration tasks, imaging tasks, flagging tasks, export tasks, report generators | | **Summary** | The context must allow calibration tasks to register and update solutions (indexed by data selection: field, spectral window, antenna, intent, time interval), and allow downstream tasks to query for all calibrations applicable to a given data selection. It must distinguish between calibrations pending application and those already applied. Registration must support registering multiple calibrations atomically as part of a single operation. | @@ -58,7 +58,7 @@ current implementations. ### UC-04 — Manage Imaging State -| Field | Content | +| | | |-------|---------| | **Actor(s)** | Imaging tasks, downstream heuristics, and export tasks | | **Summary** | The context must allow imaging state — target lists, imaging parameters, masks, thresholds, and sensitivity estimates — to be computed by one step and read or refined by later steps. Multiple steps may contribute to a progressively refined imaging configuration. | @@ -68,7 +68,7 @@ current implementations. ### UC-05 — Register and Query Produced Image Products -| Field | Content | +| | | |-------|---------| | **Actor(s)** | Imaging tasks, export tasks, report generators | | **Summary** | The context must maintain typed registries of produced image products with add/query semantics. Later tasks must be able to discover previously produced science, calibrator, RMS, and sub-product images through these registries. | @@ -78,7 +78,7 @@ current implementations. ### UC-06 — Track Current Execution Progress -| Field | Content | +| | | |-------|---------| | **Actor(s)** | Workflow orchestration layer, tasks, human operators | | **Summary** | The system must track which processing stage is currently executing and maintain a stable, ordered record of completed stages. Stage identity and ordering must remain coherent across session saves and resumes. | @@ -88,7 +88,7 @@ ___ ### UC-07 — Preserve Per-Stage Execution Record -| Field | Content | +| | | |-------|---------| | **Actor(s)** | Report generators, human operators, workflow orchestration layer | | **Summary** | The system must preserve a complete execution record for each completed stage, including timing, traceback information, outcomes, and the arguments used to invoke it. This record must support reporting, post-mortem diagnosis of failures, and resumption after interruption. | @@ -98,7 +98,7 @@ ___ ### UC-08 — Propagate Task Outputs to Downstream Tasks -| Field | Content | +| | | |-------|---------| | **Actor(s)** | Any task producing output, downstream tasks | | **Summary** | When a task produces outputs that change the processing state (e.g., new calibrations, updated flag summaries, image products, revised parameters), the context must provide a mechanism for those outputs to become available to subsequent processing steps before they execute. UC-03, UC-04, UC-05, and UC-14 are domain-specific instances of this pattern. | @@ -108,7 +108,7 @@ ___ ### UC-09 — Support Multiple Orchestration Drivers -| Field | Content | +| | | |-------|---------| | **Actor(s)** | Operations / automated processing (PPR-driven batch), pipeline developer / power user (interactive), recipe executor | | **Summary** | The context is created and consumed by multiple front-ends: PPR command lists, XML procedures, or interactive task calls. The state stored by the context must remain consistent and usable regardless of which driver created or resumed it. It must be creatable and resumable from non-interactive and interactive drivers, support driver-injected run metadata, tolerate partial execution controls (`startstage`, `exitstage`) and breakpoint-driven stop/resume semantics, provide machine-detectable success/failure signals, and emit notifications at key lifecycle points (session start, session restore, step start, step completion).| @@ -118,7 +118,7 @@ ___ ### UC-10 — Save and Restore a Processing Session -| Field | Content | +| | | |-------|---------| | **Actor(s)** | Pipeline operator, workflow orchestration layer, developers | | **Summary** | The context must be able to serialize the complete processing state to disk (all observation data, calibration state, execution history, imaging state, project metadata, etc) and later restore it so that processing can resume from the saved point. The serialization must preserve enough state to resume; backward compatibility across pipeline releases is not guaranteed. On restore, paths must be adaptable to a new filesystem environment. | @@ -128,7 +128,7 @@ ___ ### UC-11 — Provide State to Parallel Workers -| Field | Content | +| | | |-------|---------| | **Actor(s)** | Workflow orchestration layer, parallel worker processes | | **Summary** | When work is distributed across parallel workers, each worker needs read-only access to the current processing state (observation metadata, calibration state). The context must provide a mechanism for workers to obtain a consistent snapshot of that state. Workers must not be able to modify the shared processing state directly. The snapshot mechanism must support efficient distribution to workers. | @@ -139,7 +139,7 @@ ___ ### UC-12 — Aggregate Results from Parallel Workers -| Field | Content | +| | | |-------|---------| | **Actor(s)** | Workflow orchestration layer | | **Summary** | After parallel workers complete, the context must collect their individual results and incorporate them into the shared processing state. The aggregation must be safe (no conflicting concurrent writes) and complete before the next sequential step begins. | @@ -149,7 +149,7 @@ ___ ### UC-13 — Provide Read-Only State for Reporting -| Field | Content | +| | | |-------|---------| | **Actor(s)** | Report generators (weblog, quality reports, reproducibility scripts, AQUA reports, pipeline statistics) | | **Summary** | The context must provide read-only access to the observation metadata, project metadata, execution history (including per-stage domain-specific outputs such as flag summaries and plot references), QA outcomes, log references, and path information needed to generate reporting products such as weblogs, quality reports, reproducibility scripts, AQUA reports, and pipeline statistics. | @@ -159,7 +159,7 @@ ___ ### UC-14 — Support QA Evaluation and Store Quality Assessments -| Field | Content | +| | | |-------|---------| | **Actor(s)** | QA scoring framework, report generators, tasks that consult recorded QA outcomes | | **Summary** | After each processing step completes, the context must make the relevant processing state available to QA handlers so they can evaluate the outcome against quality thresholds, which may depend on telescope, project parameters, or observation properties. The resulting quality scores must be recorded and remain retrievable for reporting and for later pipeline logic that consults recorded QA outcomes. | @@ -169,7 +169,7 @@ ___ ### UC-15 — Support Interactive Inspection and Debugging -| Field | Content | +| | | |-------|---------| | **Actor(s)** | Pipeline developer, pipeline operator, CI harnesses | | **Summary** | The context must allow an operator to inspect the current processing state, for example: which datasets are registered, what calibrations exist, how many steps have completed, and what their outcomes were. On failure, a snapshot of the state must be available for post-mortem analysis. | @@ -179,7 +179,7 @@ ___ ### UC-16 — Manage Telescope-Specific Context Extensions -| Field | Content | +| | | |-------|---------| | **Actor(s)** | Telescope-specific tasks and heuristics | | **Summary** | The context must support conditional telescope-specific extensions to the processing state. These extensions must be available to telescope-specific tasks and heuristics while remaining outside the assumed contract of shared pipeline code. Generic pipeline components must not depend on or require knowledge of telescope-specific extensions. | @@ -189,7 +189,7 @@ ___ ### UC-17 — Provide State for Product Export -| Field | Content | +| | | |-------|---------| | **Actor(s)** | Export task | | **Summary** | The context must make the datasets, calibration state, image products, reports, scripts, path information, and project identifiers available through the processing state so export tasks can assemble them into a deliverable product package. | @@ -207,7 +207,7 @@ Reviewer input on missing or incorrectly included items is welcome. ### GAP-01 — Concurrent Execution of Independent Work -| Field | Content | +| | | |-------|---------| | **Actor(s)** | Workflow orchestration layer, parallel task scheduler | | **Summary** | The context must support concurrent execution of independent work at multiple granularities — both at the stage level, where independent stages execute simultaneously, and within a stage, where work is parallelized across processing axes such as MS or SPW. In both cases the context must ensure results are correctly incorporated into processing state without inconsistency. This is distinct from the existing parallel worker pattern (UC-11, UC-12), which distributes work within a single stage but requires all work to complete before the next stage can begin.| @@ -219,7 +219,7 @@ Reviewer input on missing or incorrectly included items is welcome. ### GAP-02 — Distributed Execution Without Shared Filesystem -| Field | Content | +| | | |-------|---------| | **Actor(s)** | Workflow orchestration layer, distributed workers | | **Summary** | The context must support execution across nodes that do not share a filesystem. Processing state, artifacts, and datasets must be accessible to all participating nodes without relying on a shared local filesystem.| @@ -230,7 +230,7 @@ Reviewer input on missing or incorrectly included items is welcome. ### GAP-03 — Provenance and Reproducibility -| Field | Content | +| | | |-------|---------| | **Actor(s)** | Pipeline operator, auditor, reproducibility tooling | | **Summary** | The context must record sufficient provenance information — software versions, input data identity, task parameters, and processing state at each stage — to enable a past processing run to be precisely reproduced or audited.| @@ -241,7 +241,7 @@ Reviewer input on missing or incorrectly included items is welcome. ### GAP-04 — Partial Re-Execution / Targeted Stage Re-Run -| Field | Content | +| | | |-------|---------| | **Actor(s)** | Pipeline operator, developer, workflow engine | | **Summary** | The context must support selectively re-running a single mid-pipeline stage with different parameters while keeping earlier and later stages intact. Stages that depend on the re-executed stage's outputs must be invalidated or updated; stages that do not must be preserved. Note: CSS9038 explicitly requires re-start at discrete stages; dependency-aware invalidation of downstream stages is implied rather than explicitly stated.| @@ -252,7 +252,7 @@ Reviewer input on missing or incorrectly included items is welcome. ### GAP-05: External System Integration (Archive, Scheduling, QA Dashboards) -| Field | Content | +| | | |-------|---------| | **Actor(s)** | QA dashboards, monitoring tools, archive ingest systems, scheduling systems | | **Summary** | External systems need access to current processing state — including current stage, processing time, QA results, and lifecycle transitions — without relying on offline product files. The system must expose sufficient state for these consumers to track and respond to processing status in a timely way. | From abb603e62a55a2d48e7efd40425cf1a4ae721bb2 Mon Sep 17 00:00:00 2001 From: kberry Date: Thu, 26 Mar 2026 14:05:31 -0400 Subject: [PATCH 12/22] Remove future architectural suggestions from appendix document and update use case titles and numbering. --- docs/context_current_pipeline_appendix.md | 193 +++------------------- 1 file changed, 23 insertions(+), 170 deletions(-) diff --git a/docs/context_current_pipeline_appendix.md b/docs/context_current_pipeline_appendix.md index 6612c33..86b86b1 100644 --- a/docs/context_current_pipeline_appendix.md +++ b/docs/context_current_pipeline_appendix.md @@ -1,50 +1,6 @@ # Pipeline Context: Supplementary Analysis -This document contains architectural observations, design recommendations, and reference material that supplement the use cases in [context_use_cases_current_pipeline.md](context_use_cases_current_pipeline.md). These sections were separated to keep the use-case document focused on requirements. - ---- - -## Context Responsibility Overview - -| # | Responsibility | Description | Examples / References | -|---|---|---|---| -| 1 | **Static Observation & Project Data** | Load, store, and provide access to static observation and project data and metadata in memory | `context.observing_run`, `context.project_summary`, `context.project_structure` | -| 2 | **Mutable Observation State** | Load, store, provide in-memory access, and update mutable dynamic observation data or metadata | MS registration, virtual SPW mappings, reference antenna ordering | -| 3 | **Path Management** | Specify and store output paths as part of configuration setup | `output_dir`, `products_dir`, `report_dir`, log paths | -| 4 | **Imaging State Management** | Manage imaging state across pipeline stages | `clean_list_pending`, `imaging_parameters`, masks, thresholds, `synthesized_beams` | -| 5 | **Calibration State Management** | Register, query, and update calibration state | Calibration library (`callibrary`), active/applied cal tables, interval trees | -| 6 | **Image Library Management** | Register and query image products across pipeline stages | `sciimlist`, `calimlist`, `rmsimlist`, `subimlist` | -| 7 | **Session Persistence** | Save and restore the full pipeline session | Pickle serialization, `h_save()`, `h_resume()`, `ResultsProxy` | -| 8 | **MPI / Parallel Distribution** | Pass context to parallel workers and merge results back | Context pickle broadcast to MPI servers; results merged on client | -| 9 | **Inter-Task Data Passing** | Accept task results and merge state back into the context | `merge_with_context()` pattern | -| 10 | **Stage Tracking & Result Accumulation** | Track execution progress, stage numbering, accumulated results | `context.results`, `stage_number`, `task_counter`, result proxies | -| 11 | **Reporting & Export Support** | Provide context data for weblog, QA reports, AQUA XML, and product packaging | `context.observing_run` for weblog, `context.project_structure` for archive labels | -| 12 | **QA Score Storage** | Store and provide access to QA scores | QA score objects appended to `result.qa.pool` | -| 13 | **Debuggability / Inspectability** | Context state must be human-readable and inspectable for post-mortem analysis | Per-stage tracebacks, timings, timetracker integration | -| 14 | **Telescope-Specific State** | Sub-context used only by telescope-specific code | `context.evla` (VLA), conditionally created | -| 15 | **Lifecycle Notifications** | Emit events at key lifecycle points | Event bus: `ResultAcceptingEvent`, `ContextCreated`, `TaskStarted`, `TaskComplete` | - ---- - -## Responsibility-to-Use-Case Traceability - -| # | Responsibility | Use Cases | -|---|---|---| -| 1 | Static Observation & Project Data | UC-01, UC-02 | -| 2 | Mutable Observation State | UC-01 | -| 3 | Path Management | UC-03, UC-17 | -| 4 | Imaging State Management | UC-05 | -| 5 | Calibration State Management | UC-04 | -| 6 | Image Library Management | UC-06 | -| 7 | Session Persistence | UC-10 | -| 8 | MPI / Parallel Distribution | UC-11, UC-12 | -| 9 | Inter-Task Data Passing | UC-07, UC-08, UC-09, UC-12 | -| 10 | Stage Tracking & Result Accumulation | UC-07, UC-08, UC-09 | -| 11 | Reporting & Export Support | UC-13, UC-17 | -| 12 | QA Score Storage | UC-13, UC-14 | -| 13 | Debuggability / Inspectability | UC-15 | -| 14 | Telescope-Specific State | UC-16 | -| 15 | Lifecycle Notifications | UC-18 | +This document contains and reference material that supplement the use cases in [context_use_cases_current_pipeline.md](context_use_cases_current_pipeline.md). These sections were separated to keep the use-case document focused on requirements. --- @@ -52,9 +8,9 @@ This document contains architectural observations, design recommendations, and r The following implementation notes describe how each use case is realized in the current pipeline codebase. They were separated from the use-case definitions to keep the requirements document focused on requirements. -### UC-01 — Load and Provide Access to Observation Metadata +### UC-01 — Load, Update, and Provide Access to Observation Metadata -**Implementation notes** — `context.observing_run` is the single most heavily queried context facet: +**Implementation notes** — `context.observing_run` is the most heavily queried attribute of the context: - `context.observing_run.get_ms(name=vis)` — resolve an MS by filename - `context.observing_run.measurement_sets` — iterate all registered MS objects @@ -62,7 +18,7 @@ The following implementation notes describe how each use case is realized in the - `context.observing_run.virtual2real_spw_id(vspw, ms)` / `real2virtual_spw_id(...)` — translate between abstract pipeline SPW IDs and CASA-native IDs - `context.observing_run.virtual_science_spw_ids` — virtual SPW catalog - `context.observing_run.ms_reduction_group` — per-group reduction metadata (single-dish) -- Provenance fields: `.start_datetime`, `.end_datetime`, `.project_ids`, `.schedblock_ids`, `.execblock_ids`, `.observers` +- Provenance attributes: `start_datetime`, `end_datetime`, `project_ids`, `schedblock_ids`, `execblock_ids`, `observers` MS objects are rich domain objects carrying scans, fields, SPWs, antennas, reference antenna ordering, etc. Tasks read per-MS state like `ms.reference_antenna`, `ms.session`, `ms.start_time`, `ms.origin_ms`. @@ -70,7 +26,7 @@ MS objects are rich domain objects carrying scans, fields, SPWs, antennas, refer ### UC-02 — Store and Provide Project-Level Metadata -**Implementation notes** — project metadata is typically set once at session start and read many times: +**Implementation notes** — project metadata is set once at session start, is not modified after initialization, and is read many times: - `context.project_summary = project.ProjectSummary(...)` — set by `executeppr()` / `executevlappr()` - `context.project_structure = project.ProjectStructure(...)` — set by PPR executors @@ -78,13 +34,7 @@ MS objects are rich domain objects carrying scans, fields, SPWs, antennas, refer - `context.set_state('ProjectStructure', 'recipe_name', value)` — used by `recipereducer.reduce()` and SD heuristics - `context.processing_intents` — set by `Pipeline` during initialization -This is a strong candidate for a separate, immutable-after-init sub-record in any future context schema. - ---- - -### UC-03 — Manage Execution Paths and Output Locations - -**Implementation notes:** +Execution paths and output locations are also managed as part of project-level metadata: - Path roots: `output_dir`, `report_dir`, `products_dir` - Context name drives deterministic, named run directories @@ -93,7 +43,7 @@ This is a strong candidate for a separate, immutable-after-init sub-record in an --- -### UC-04 — Register and Query Calibration State +### UC-03 — Register, Query, and Update Calibration State **Implementation notes** — `context.callibrary` is the primary cross-stage communication channel for calibration workflows: @@ -103,9 +53,9 @@ This is a strong candidate for a separate, immutable-after-init sub-record in an --- -### UC-05 — Accumulate Imaging State Across Multiple Steps +### UC-04 — Manage Imaging State -**Implementation notes** — this is the most fragile part of the current context design. Attributes are added ad-hoc, there is no schema, and defensive `hasattr()` checks appear in the code: +**Implementation notes** — imaging state is stored as ad-hoc attributes on the context object with no formal schema. Defensive `hasattr()` checks appear throughout the code to guard against attributes that may not yet exist: | Attribute | Written by | Read by | |---|---|---| @@ -117,11 +67,9 @@ This is a strong candidate for a separate, immutable-after-init sub-record in an | `size_mitigation_parameters` | `checkproductsize` | downstream stages | | `selfcal_targets`, `selfcal_resources` | `selfcal` | `exportdata` | -A future design should formalize imaging state as a typed state machine or versioned configuration sub-document, and consider separating image *metadata* (tracked in context) from image *data* (stored in artifact store). - --- -### UC-06 — Register and Query Produced Image Products +### UC-05 — Register and Query Produced Image Products **Implementation notes** — image libraries provide typed registries: @@ -132,12 +80,19 @@ A future design should formalize imaging state as a typed state machine or versi --- -### UC-07 — Track Execution Progress and Stage History +### UC-06 — Track Current Execution Progress **Implementation notes:** -- `context.results` holds an ordered list of `ResultsProxy` objects (proxied to disk to bound memory) - `context.stage_number` and `context.task_counter` track progress + +--- + +### UC-07 — Preserve Per-Stage Execution Record + +**Implementation notes:** + +- `context.results` holds an ordered list of `ResultsProxy` objects (proxied to disk to bound memory) - Timetracker integration provides per-stage timing data - Results proxies store basenames for portability @@ -153,8 +108,6 @@ A future design should formalize imaging state as a typed state machine or versi - `vlassmasking` iterates `context.results[::-1]` to find the latest `MakeImagesResult` - Export/AQUA code reads `context.results[0]` and `context.results[-1]` for timestamps -The results-list walking pattern is fragile (indices shift if stages are inserted/skipped), slow (requires unpickling), and implicit (no declared dependency). A future design should provide explicit stage-to-stage data dependencies. - --- ### UC-09 — Support Multiple Orchestration Drivers @@ -190,7 +143,7 @@ They differ in how inputs are marshalled, how session paths are selected, and ho --- -### UC-13 — Provide Read-Only Context for Reporting Consumers +### UC-13 — Provide Read-Only State for Reporting **Implementation notes** — `WebLogGenerator.render(context)` in `pipeline/infrastructure/renderer/htmlrenderer.py`: @@ -201,8 +154,6 @@ They differ in how inputs are marshalled, how session paths are selected, and ho - Reads `context.project_structure.*` — OUS IDs, PPR file, recipe name - Reads `context.logs['casa_commands']` — CASA command history -The renderer iterates `context.results` multiple times (assigning to topics, extracting flags, building timelines). The current approach requires unpickling *every* result into memory, then re-proxying when done. A lazy or streaming model would reduce peak memory. - --- ### UC-14 — Support QA Evaluation and Store Quality Assessments @@ -214,7 +165,7 @@ The renderer iterates `context.results` multiple times (assigning to topics, ext - Some handlers check `context.imaging_mode` to branch on VLASS-specific scoring - Scores are appended to `result.qa.pool` — the context provides inputs to QA evaluation, but the scores are stored on the result rather than as direct context mutations -QA handlers are *read-only* with respect to context and could operate on a frozen snapshot, making them a good candidate for parallelization. +QA handlers are read-only with respect to the context. --- @@ -226,18 +177,7 @@ QA handlers are *read-only* with respect to context and could operate on a froze - **Read by:** nearly every VLA calibration task and heuristic - Accessed fields include: `gain_solint1`, `gain_solint2`, `setjy_results`, `ignorerefant`, various `*_field_select_string` / `*_scan_select_string` values, `fluxscale_sources`, `spindex_results`, and many more -This is a completely untyped, dictionary-of-dictionaries sidecar attached to the top-level context. A future design should define a typed state object, provide accessor methods rather than raw dict lookups, and separate telescope-specific concerns from the generic context via composition (e.g., `context.get_extension('evla')`). - ---- - -### UC-18 — Emit Lifecycle Notifications - -**Implementation notes** — `pipeline.infrastructure.eventbus.send_message(event)`: - -- Event types: `ResultAcceptingEvent`, `ContextCreated`, `TaskStarted`, `TaskComplete` -- The event bus exists and fires events, but is lightly used — `merge_with_context` remains the primary data flow mechanism -- A future design could elevate the event bus to the primary state mutation channel (event-sourcing pattern), enabling audit trails, undo, and distributed observation - +This is an untyped, dictionary-of-dictionaries sidecar attached to the top-level context --- ## Key Implementation References @@ -297,91 +237,4 @@ They are recorded here for completeness but should not be treated as requirement |-------|---------| | **Actor(s)** | Archive ingest system, scheduling database, QA dashboards, monitoring tools | | **Summary** | The system could expose a stable, queryable API (REST/gRPC) that external systems can poll or subscribe to, support webhook/event notifications for state transitions, and publish a standard schema for context summaries consumable by external systems. However, this involves a significant trade-off: the current context has no stable API, which gives development teams full flexibility to evolve internal structures without cross-team coordination. A public API would require a formal stability contract, versioning discipline, and potentially a slow-to-change external interface layered over rapidly evolving internals. Whether this gap is in scope — or whether external integration should remain an offline, product-file-based concern — is an open design question. | -| **Postconditions** | External systems receive timely, structured updates about processing state without relying on offline product files. | - - ---- - -## Architectural Observations - -### The context is a "big ball of state", by design - -The current approach is extremely flexible for a long-running, stateful CASA session, but there is no explicit schema boundary between persisted state, ephemeral caches, runtime-only services, and large artifacts. Tasks can (and do) add new fields in an ad-hoc way over time. - -### Persistence is pickle-based - -Pickle works for short-lived resume/debug use cases, but it is fragile across version changes, risky as a long-term archive format, and not friendly to multi-writer or multi-process updates. The codebase mitigates size by proxying stage results to disk, but the context itself remains a potentially large and unstable object graph. - -### Two orchestration planes converge on the same context - -Task-driven (interactive CLI) and command-list-driven (PPR / XML procedures) execution both produce and consume the same context. They differ in how inputs are marshalled, how paths are selected, and how resume is initiated, but the persisted context is the same object. - ---- - -## Improvement Suggestions for Next-Generation Design - -These are phrased as requirements and design directions, not as a call to rewrite everything immediately. - -### 1) Split "context data" from "context services" - -Define a minimal, explicit **ContextData** model that is typed, schema-versioned, and serializable in a stable format (JSON/MsgPack/Arrow). Attach runtime-only services (CASA tool handles, caches, heuristics engines) around it rather than mixing them into the same object. - -### 2) Introduce a ContextStore interface - -Replace "pickle a Python object graph" with a storage abstraction (`get`, `put`, `list_runs`). Backends can start simple (SQLite) and grow (Postgres/object store) without changing task logic. - -### 3) Make state transitions explicit (event-sourced or patch-based) - -The existing event bus (`pipeline.infrastructure.eventbus`) could be elevated to record task lifecycle events and key state changes, yielding reproducibility, easier partial rebuilds, and better distributed orchestration. - -### 4) Treat large artifacts as references, not context fields - -Store large arrays/images/tables in an artifact store and carry only references in context data. This avoids "accidentally pickle a GiB array" and makes distribution/cloud execution more realistic. - -### 5) Remove reliance on global interactive stacks for non-interactive execution - -Make tasks accept an explicit context handle. Keep interactive convenience wrappers but do not make them the core contract. - -### 6) Represent the execution plan as context data - -Record the effective execution plan (linear or DAG) alongside run state to support provenance, partial execution, and targeted re-runs. - -### 7) Adopt a versioned compatibility policy - -Define whether operational contexts must be resumable within a supported release window (with schema versioning + migrations) versus best-effort for development contexts. - ---- - -## Context Contract Summary - -The following capabilities appear to be **hard requirements** for any replacement system, derived from current behavior and internal usage patterns: - -**System-level requirements:** - -- Run identity: `context_id`, recipe/procedure name, inputs, operator/mode -- Path layout: working/report/products directories with ability to relocate -- Dataset inventory: execution blocks / measurement sets with per-MS metadata -- Stage results timeline: ordered stages, durations, QA outcomes, tracebacks -- Export products: weblog tar, manifest, AQUA report, scripts -- Resume: restart from last known good stage (or after a breakpoint) - -**Internal usage requirements:** - -- Fast MS lookup: random-access by name, filtering by data type, virtual↔real SPW translation -- Calibration library: append-oriented, ordered, with transactional multi-entry updates and predicate-based queries -- Image library: four typed registries (science, calibrator, RMS, sub-product) with add/query semantics -- Imaging state: typed, versioned configuration for the imaging sub-pipeline -- QA scoring: read-only context snapshot for parallel-safe QA handler execution -- Weblog rendering: read-only traversal of full results timeline + MS metadata + project metadata -- MPI/distributed: efficient context snapshot broadcast + results write-back -- Cross-stage data flow: explicit named outputs rather than results-list walking -- Project metadata: immutable-after-init sub-record -- Telescope-specific state: typed, composable extension rather than untyped dict - ---- - -## Open Questions - -- Are there additional use cases not captured by either review? Reviewer input may surface new cases. -- Should the future use cases (GAP section) be prioritized? If so, which are most impactful for RADPS? -- What compatibility guarantees should the next-generation context provide across pipeline releases? +| **Postconditions** | External systems receive timely, structured updates about processing state without relying on offline product files. | \ No newline at end of file From 78f349eac34c5bb57f798053c4bcc58b2f45ede0 Mon Sep 17 00:00:00 2001 From: Tom Booth Date: Thu, 26 Mar 2026 17:45:04 -0400 Subject: [PATCH 13/22] updated wording throughout to be more consistent with scope of document --- docs/context_current_pipeline_appendix.md | 2 +- docs/context_use_cases_current_pipeline.md | 21 ++++++++++----------- 2 files changed, 11 insertions(+), 12 deletions(-) diff --git a/docs/context_current_pipeline_appendix.md b/docs/context_current_pipeline_appendix.md index 86b86b1..5bdbe51 100644 --- a/docs/context_current_pipeline_appendix.md +++ b/docs/context_current_pipeline_appendix.md @@ -237,4 +237,4 @@ They are recorded here for completeness but should not be treated as requirement |-------|---------| | **Actor(s)** | Archive ingest system, scheduling database, QA dashboards, monitoring tools | | **Summary** | The system could expose a stable, queryable API (REST/gRPC) that external systems can poll or subscribe to, support webhook/event notifications for state transitions, and publish a standard schema for context summaries consumable by external systems. However, this involves a significant trade-off: the current context has no stable API, which gives development teams full flexibility to evolve internal structures without cross-team coordination. A public API would require a formal stability contract, versioning discipline, and potentially a slow-to-change external interface layered over rapidly evolving internals. Whether this gap is in scope — or whether external integration should remain an offline, product-file-based concern — is an open design question. | -| **Postconditions** | External systems receive timely, structured updates about processing state without relying on offline product files. | \ No newline at end of file +| **Postconditions** | External systems receive timely, structured updates about processing state without relying on offline product files. | diff --git a/docs/context_use_cases_current_pipeline.md b/docs/context_use_cases_current_pipeline.md index 7e7bf89..d145454 100644 --- a/docs/context_use_cases_current_pipeline.md +++ b/docs/context_use_cases_current_pipeline.md @@ -5,7 +5,7 @@ The pipeline `Context` is the central state object used for an entire pipeline e It carries observation data, calibration state, imaging state, execution history, and project metadata, and serves as the primary communication channel between pipeline stages. -This document catalogues the current use cases of the pipeline `Context` as determined by examination of the codebase. The goal is to inform the design of a system serving a similar role to the current pipeline Context for RADPS. It also identifies an initial set of use cases that the current design does not support but that are required or implied by RADPS requirements documentation. +This document catalogues the current use cases of the pipeline `Context` as determined by examination of the codebase. The goal is to inform the design of a system serving a similar role to the current pipeline `Context` for RADPS. It also identifies an initial set of use cases that the current design does not support but that are required or implied by RADPS requirements documentation. For additional details about the current implementation, reference material, and exploratory future use cases, see [Supplementary Analysis](context_current_pipeline_appendix.md). @@ -13,14 +13,14 @@ For additional details about the current implementation, reference material, and ## 1. Use Cases -Each use case describes a need that the pipeline `Context` must satisfy. They are written to be implementation-neutral — the goal is to capture what the system must do, not how the current pipeline implementation achieves it. For pipeline-specific implementation details by use case, see [Implementation Notes by Use Case](context_current_pipeline_appendix.md#implementation-notes-by-use-case) in the appendix. +Each use case describes the required capabilities of the context system and the interactions through which those capabilities are exercised. They are written to be implementation-neutral — the goal is to capture what the context must do, not how the current pipeline implementation achieves it. For pipeline-specific implementation details by use case, see [Implementation Notes by Use Case](context_current_pipeline_appendix.md#implementation-notes-by-use-case) in the appendix. The following fields are used in each use case: - **Actor(s):** The human or system role that directly creates, updates, consumes, or inspects the context state described by the use case. Actors are role categories, not specific task names or current implementations. -- **Summary:** What the system must do to satisfy the use case. +- **Summary:** What the `Context` must do to satisfy the use case. - **Invariant:** A condition that must always be true while the system is operating. Present only where a meaningful invariant exists. - **Postcondition:** A condition that must be true after a specific operation completes. Present only where a meaningful postcondition exists. @@ -81,7 +81,7 @@ current implementations. | | | |-------|---------| | **Actor(s)** | Workflow orchestration layer, tasks, human operators | -| **Summary** | The system must track which processing stage is currently executing and maintain a stable, ordered record of completed stages. Stage identity and ordering must remain coherent across session saves and resumes. | +| **Summary** | The `Context` must track which processing stage is currently executing and maintain a stable, ordered record of completed stages. Stage identity and ordering must remain coherent across session saves and resumes. | | **Invariant** | The currently executing stage is identifiable and completed stages are recorded in stable order. | ___ @@ -91,7 +91,7 @@ ___ | | | |-------|---------| | **Actor(s)** | Report generators, human operators, workflow orchestration layer | -| **Summary** | The system must preserve a complete execution record for each completed stage, including timing, traceback information, outcomes, and the arguments used to invoke it. This record must support reporting, post-mortem diagnosis of failures, and resumption after interruption. | +| **Summary** | The `Context` must preserve a complete execution record for each completed stage, including timing, traceback information, outcomes, and the arguments used to invoke it. This record must support reporting, post-mortem diagnosis of failures, and resumption after interruption. | | **Invariant** | Each completed stage retains its full execution record identity, outcome, timing, traceback, and invocation arguments for the lifetime of the session. | --- @@ -102,7 +102,7 @@ ___ |-------|---------| | **Actor(s)** | Any task producing output, downstream tasks | | **Summary** | When a task produces outputs that change the processing state (e.g., new calibrations, updated flag summaries, image products, revised parameters), the context must provide a mechanism for those outputs to become available to subsequent processing steps before they execute. UC-03, UC-04, UC-05, and UC-14 are domain-specific instances of this pattern. | -| **Postconditions** | Downstream tasks can access the propagated processing state they need.| +| **Postconditions** | Downstream tasks can access the propagated processing state they need. | --- @@ -111,8 +111,8 @@ ___ | | | |-------|---------| | **Actor(s)** | Operations / automated processing (PPR-driven batch), pipeline developer / power user (interactive), recipe executor | -| **Summary** | The context is created and consumed by multiple front-ends: PPR command lists, XML procedures, or interactive task calls. The state stored by the context must remain consistent and usable regardless of which driver created or resumed it. It must be creatable and resumable from non-interactive and interactive drivers, support driver-injected run metadata, tolerate partial execution controls (`startstage`, `exitstage`) and breakpoint-driven stop/resume semantics, provide machine-detectable success/failure signals, and emit notifications at key lifecycle points (session start, session restore, step start, step completion).| -| **Invariant** | Processing state is consistent and usable regardless of which orchestration driver created or resumed it, and success/failure signals and lifecycle notifications are produced when appropriate.| +| **Summary** | The context is created and consumed by multiple front-ends: PPR command lists, XML procedures, or interactive task calls. The state stored by the context must remain consistent and usable regardless of which driver created or resumed it. It must be creatable and resumable from non-interactive and interactive drivers, support driver-injected run metadata, and tolerate partial execution controls (`startstage`, `exitstage`) and breakpoint-driven stop/resume semantics. | +| **Invariant** | Processing state is consistent and usable regardless of which orchestration driver created or resumed it, and success/failure signals are produced when appropriate. | --- @@ -200,8 +200,7 @@ ___ ## 2. Use Cases the Current Design Cannot Handle The following use cases are not supported by the current context design but are required or strongly -implied by RADPS requirement and design documentation. These use cases were identified through a first pass -of the RADPS requirements documentation and are not exhaustive. A full gap analysis mapping current context use cases to RADPS requirements is a separate activity which is underway. These are numbered GAP-01 through GAP-04 to indicate gaps in the current design's capabilities. +implied by RADPS requirement and design documentation. These use cases were identified through a first pass of the RADPS requirements documentation and are not exhaustive. A full gap analysis mapping current context use cases to RADPS requirements is a separate activity which is underway. These are numbered GAP-01 through GAP-04 to indicate gaps in the current design's capabilities. Reviewer input on missing or incorrectly included items is welcome. @@ -255,7 +254,7 @@ Reviewer input on missing or incorrectly included items is welcome. | | | |-------|---------| | **Actor(s)** | QA dashboards, monitoring tools, archive ingest systems, scheduling systems | -| **Summary** | External systems need access to current processing state — including current stage, processing time, QA results, and lifecycle transitions — without relying on offline product files. The system must expose sufficient state for these consumers to track and respond to processing status in a timely way. | +| **Summary** | External systems need access to current processing state — including current stage, processing time, QA results, and lifecycle transitions — without relying on offline product files. The `Context` must expose sufficient state for these consumers to track and respond to processing status in a timely way. | | **Invariant** | The processing state needed by external consumers is accessible and current throughout execution. | | **Postconditions** | External systems can access processing state and lifecycle transitions without waiting for offline products to be generated. | | **RADPS Requirements** | CSS9046, CSS9047, CSS9048, CSS9049, CSS9050, CSS9056 | \ No newline at end of file From 5f8cd1dc8f8aff7d754823a729ffde8c6710cd76 Mon Sep 17 00:00:00 2001 From: Tom Booth Date: Thu, 26 Mar 2026 21:44:30 -0400 Subject: [PATCH 14/22] removed GAP and future use cases from docs --- docs/context_current_pipeline_appendix.md | 34 ----------- docs/context_use_cases_current_pipeline.md | 66 +--------------------- 2 files changed, 1 insertion(+), 99 deletions(-) diff --git a/docs/context_current_pipeline_appendix.md b/docs/context_current_pipeline_appendix.md index 5bdbe51..3df5e5d 100644 --- a/docs/context_current_pipeline_appendix.md +++ b/docs/context_current_pipeline_appendix.md @@ -204,37 +204,3 @@ The canonical flow through the context is: 3. **Execute tasks** — Tasks execute against the in-memory context and return a `Results` object. After each task, `Results.accept(context)` records the outcome and mutates shared state. 4. **Accept results** — Inside `accept()`, results are merged via `Results.merge_with_context(context)`. A `ResultsProxy` is pickled to disk per-stage to keep the in-memory context bounded. The weblog is typically rendered after each top-level stage. 5. **Save / resume** — `h_save()` pickles the context; `h_resume(filename='last')` restores it. Driver-managed breakpoints and developer debugging workflows rely on this cycle. - ---- -## Exploratory Future Use Cases - -The following are potential future use cases that do not trace to current RADPS architecture or requirements. -They are recorded here for completeness but should not be treated as requirements. - -### Multi-Language / Multi-Framework Access to Context - -| Field | Content | -|-------|---------| -| **Actor(s)** | Non-Python clients (C++, Julia, JavaScript dashboards), external tools | -| **Summary** | The system should expose context state through a language-neutral interface, using a portable serialization format (Protocol Buffers, JSON-Schema, Arrow), a query API (REST, gRPC, or GraphQL), and type definitions shared across languages. | -| **Postconditions** | Clients in any supported language can read context state through a stable, typed API. | - ---- - -### Streaming / Incremental Processing - -| Field | Content | -|-------|---------| -| **Actor(s)** | Data ingest system, workflow engine, incremental processing tasks | -| **Summary** | The system should support incremental dataset registration (adding new scans or execution blocks to a live session), tasks that detect new data and re-process incrementally, and a results model that supports versioning so that re-runs produce new versions rather than overwriting previous outputs. | -| **Postconditions** | New data is incorporated into an active session and processed without restarting from scratch. | - ---- - -### External System Integration (Archive, Scheduling, QA Dashboards) - -| Field | Content | -|-------|---------| -| **Actor(s)** | Archive ingest system, scheduling database, QA dashboards, monitoring tools | -| **Summary** | The system could expose a stable, queryable API (REST/gRPC) that external systems can poll or subscribe to, support webhook/event notifications for state transitions, and publish a standard schema for context summaries consumable by external systems. However, this involves a significant trade-off: the current context has no stable API, which gives development teams full flexibility to evolve internal structures without cross-team coordination. A public API would require a formal stability contract, versioning discipline, and potentially a slow-to-change external interface layered over rapidly evolving internals. Whether this gap is in scope — or whether external integration should remain an offline, product-file-based concern — is an open design question. | -| **Postconditions** | External systems receive timely, structured updates about processing state without relying on offline product files. | diff --git a/docs/context_use_cases_current_pipeline.md b/docs/context_use_cases_current_pipeline.md index d145454..a75a38b 100644 --- a/docs/context_use_cases_current_pipeline.md +++ b/docs/context_use_cases_current_pipeline.md @@ -11,7 +11,7 @@ For additional details about the current implementation, reference material, and --- -## 1. Use Cases +## Use Cases Each use case describes the required capabilities of the context system and the interactions through which those capabilities are exercised. They are written to be implementation-neutral — the goal is to capture what the context must do, not how the current pipeline implementation achieves it. For pipeline-specific implementation details by use case, see [Implementation Notes by Use Case](context_current_pipeline_appendix.md#implementation-notes-by-use-case) in the appendix. @@ -194,67 +194,3 @@ ___ | **Actor(s)** | Export task | | **Summary** | The context must make the datasets, calibration state, image products, reports, scripts, path information, and project identifiers available through the processing state so export tasks can assemble them into a deliverable product package. | | **Invariant** | The information needed to assemble a deliverable product package is accessible through the processing state. | - ---- - -## 2. Use Cases the Current Design Cannot Handle - -The following use cases are not supported by the current context design but are required or strongly -implied by RADPS requirement and design documentation. These use cases were identified through a first pass of the RADPS requirements documentation and are not exhaustive. A full gap analysis mapping current context use cases to RADPS requirements is a separate activity which is underway. These are numbered GAP-01 through GAP-04 to indicate gaps in the current design's capabilities. - -Reviewer input on missing or incorrectly included items is welcome. - -### GAP-01 — Concurrent Execution of Independent Work - -| | | -|-------|---------| -| **Actor(s)** | Workflow orchestration layer, parallel task scheduler | -| **Summary** | The context must support concurrent execution of independent work at multiple granularities — both at the stage level, where independent stages execute simultaneously, and within a stage, where work is parallelized across processing axes such as MS or SPW. In both cases the context must ensure results are correctly incorporated into processing state without inconsistency. This is distinct from the existing parallel worker pattern (UC-11, UC-12), which distributes work within a single stage but requires all work to complete before the next stage can begin.| -| **Invariant** | Independent tasks are executed concurrently without producing inconsistent or incorrect processing state. | -| **Postconditions** | Results from concurrently executed work are fully incorporated into processing state before any dependent work begins. | -| **RADPS Requirements** | CSS9017, CSS9063 | - ---- - -### GAP-02 — Distributed Execution Without Shared Filesystem - -| | | -|-------|---------| -| **Actor(s)** | Workflow orchestration layer, distributed workers | -| **Summary** | The context must support execution across nodes that do not share a filesystem. Processing state, artifacts, and datasets must be accessible to all participating nodes without relying on a shared local filesystem.| -| **Postconditions** | Processing completes correctly across distributed nodes without reliance on a shared filesystem. | -| **RADPS Requirements** | CSS9002, CSS9030 | - ---- - -### GAP-03 — Provenance and Reproducibility - -| | | -|-------|---------| -| **Actor(s)** | Pipeline operator, auditor, reproducibility tooling | -| **Summary** | The context must record sufficient provenance information — software versions, input data identity, task parameters, and processing state at each stage — to enable a past processing run to be precisely reproduced or audited.| -| **Postconditions** | Any past processing step can be precisely reproduced or audited from the recorded provenance chain, including the exact software and data versions that produced it. | -| **RADPS Requirements** | ALMA-TR103, ALMA-TR104, ALMA-TR105 | - ---- - -### GAP-04 — Partial Re-Execution / Targeted Stage Re-Run - -| | | -|-------|---------| -| **Actor(s)** | Pipeline operator, developer, workflow engine | -| **Summary** | The context must support selectively re-running a single mid-pipeline stage with different parameters while keeping earlier and later stages intact. Stages that depend on the re-executed stage's outputs must be invalidated or updated; stages that do not must be preserved. Note: CSS9038 explicitly requires re-start at discrete stages; dependency-aware invalidation of downstream stages is implied rather than explicitly stated.| -| **Postconditions** | After a targeted re-execution, processing state reflects the new outcome for the re-run stage, affected downstream stages are invalidated or updated, and unaffected stages are preserved. | -| **RADPS Requirements** | CSS9038 | - ---- - -### GAP-05: External System Integration (Archive, Scheduling, QA Dashboards) - -| | | -|-------|---------| -| **Actor(s)** | QA dashboards, monitoring tools, archive ingest systems, scheduling systems | -| **Summary** | External systems need access to current processing state — including current stage, processing time, QA results, and lifecycle transitions — without relying on offline product files. The `Context` must expose sufficient state for these consumers to track and respond to processing status in a timely way. | -| **Invariant** | The processing state needed by external consumers is accessible and current throughout execution. | -| **Postconditions** | External systems can access processing state and lifecycle transitions without waiting for offline products to be generated. | -| **RADPS Requirements** | CSS9046, CSS9047, CSS9048, CSS9049, CSS9050, CSS9056 | \ No newline at end of file From 46be1ddfdb58b5ae6efb42c656d106d1a671173e Mon Sep 17 00:00:00 2001 From: kberry Date: Fri, 27 Mar 2026 15:20:41 -0400 Subject: [PATCH 15/22] Update references to the context to not refer to the specific class in the pipeline (the Context with backticks). --- docs/context_use_cases_current_pipeline.md | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/docs/context_use_cases_current_pipeline.md b/docs/context_use_cases_current_pipeline.md index a75a38b..f4df55b 100644 --- a/docs/context_use_cases_current_pipeline.md +++ b/docs/context_use_cases_current_pipeline.md @@ -1,11 +1,11 @@ # Pipeline Context Use Cases ## Overview -The pipeline `Context` is the central state object used for an entire pipeline execution. +The pipeline context is the central state object used for an entire pipeline execution. It carries observation data, calibration state, imaging state, execution history, and project metadata, and serves as the primary communication channel between pipeline stages. -This document catalogues the current use cases of the pipeline `Context` as determined by examination of the codebase. The goal is to inform the design of a system serving a similar role to the current pipeline `Context` for RADPS. It also identifies an initial set of use cases that the current design does not support but that are required or implied by RADPS requirements documentation. +This document catalogues the current use cases of the pipeline context as determined by examination of the codebase. The goal is to inform the design of a system serving a similar role to the current pipeline context for RADPS. It also identifies an initial set of use cases that the current design does not support but that are required or implied by RADPS requirements documentation. For additional details about the current implementation, reference material, and exploratory future use cases, see [Supplementary Analysis](context_current_pipeline_appendix.md). @@ -20,7 +20,7 @@ The following fields are used in each use case: - **Actor(s):** The human or system role that directly creates, updates, consumes, or inspects the context state described by the use case. Actors are role categories, not specific task names or current implementations. -- **Summary:** What the `Context` must do to satisfy the use case. +- **Summary:** What the context must do to satisfy the use case. - **Invariant:** A condition that must always be true while the system is operating. Present only where a meaningful invariant exists. - **Postcondition:** A condition that must be true after a specific operation completes. Present only where a meaningful postcondition exists. @@ -81,7 +81,7 @@ current implementations. | | | |-------|---------| | **Actor(s)** | Workflow orchestration layer, tasks, human operators | -| **Summary** | The `Context` must track which processing stage is currently executing and maintain a stable, ordered record of completed stages. Stage identity and ordering must remain coherent across session saves and resumes. | +| **Summary** | The context must track which processing stage is currently executing and maintain a stable, ordered record of completed stages. Stage identity and ordering must remain coherent across session saves and resumes. | | **Invariant** | The currently executing stage is identifiable and completed stages are recorded in stable order. | ___ @@ -91,7 +91,7 @@ ___ | | | |-------|---------| | **Actor(s)** | Report generators, human operators, workflow orchestration layer | -| **Summary** | The `Context` must preserve a complete execution record for each completed stage, including timing, traceback information, outcomes, and the arguments used to invoke it. This record must support reporting, post-mortem diagnosis of failures, and resumption after interruption. | +| **Summary** | The context must preserve a complete execution record for each completed stage, including timing, traceback information, outcomes, and the arguments used to invoke it. This record must support reporting, post-mortem diagnosis of failures, and resumption after interruption. | | **Invariant** | Each completed stage retains its full execution record identity, outcome, timing, traceback, and invocation arguments for the lifetime of the session. | --- From 506047aad3e74dac689aa97c25a90d1286db1d45 Mon Sep 17 00:00:00 2001 From: kberry Date: Fri, 27 Mar 2026 19:58:15 -0400 Subject: [PATCH 16/22] Update qa score use case to refelct that qa scores are not used for decision making in downstream tasks. Make other assorted wording updates including removing references to removed use cases and standardizing actor names. --- docs/context_use_cases_current_pipeline.md | 38 +++++++++++----------- 1 file changed, 19 insertions(+), 19 deletions(-) diff --git a/docs/context_use_cases_current_pipeline.md b/docs/context_use_cases_current_pipeline.md index f4df55b..f0dcd34 100644 --- a/docs/context_use_cases_current_pipeline.md +++ b/docs/context_use_cases_current_pipeline.md @@ -2,12 +2,12 @@ ## Overview The pipeline context is the central state object used for an entire pipeline execution. -It carries observation data, calibration state, imaging state, execution history, -and project metadata, and serves as the primary communication channel between pipeline stages. +It carries observation data, calibration state, imaging state, execution history and state, +project metadata, and serves as the primary communication channel between pipeline stages. -This document catalogues the current use cases of the pipeline context as determined by examination of the codebase. The goal is to inform the design of a system serving a similar role to the current pipeline context for RADPS. It also identifies an initial set of use cases that the current design does not support but that are required or implied by RADPS requirements documentation. +This document catalogues the current use cases of the pipeline context as determined by examination of the codebase. The goal is to inform the design of a system serving a similar role to the current pipeline context for RADPS. -For additional details about the current implementation, reference material, and exploratory future use cases, see [Supplementary Analysis](context_current_pipeline_appendix.md). +For additional details about the current implementation and reference material. see [Supplementary Analysis](context_current_pipeline_appendix.md). --- @@ -62,7 +62,7 @@ current implementations. |-------|---------| | **Actor(s)** | Imaging tasks, downstream heuristics, and export tasks | | **Summary** | The context must allow imaging state — target lists, imaging parameters, masks, thresholds, and sensitivity estimates — to be computed by one step and read or refined by later steps. Multiple steps may contribute to a progressively refined imaging configuration. | -| **Invariant** | Imaging state reflects contributions from all completed imaging-related stages, and available for reading or refinement by subsequent stages. | +| **Invariant** | Imaging state reflects contributions from all completed imaging-related stages, and is available for reading or refinement by subsequent stages. | --- @@ -80,7 +80,7 @@ current implementations. | | | |-------|---------| -| **Actor(s)** | Workflow orchestration layer, tasks, human operators | +| **Actor(s)** | Workflow orchestration layer, tasks, pipeline operators | | **Summary** | The context must track which processing stage is currently executing and maintain a stable, ordered record of completed stages. Stage identity and ordering must remain coherent across session saves and resumes. | | **Invariant** | The currently executing stage is identifiable and completed stages are recorded in stable order. | @@ -90,7 +90,7 @@ ___ | | | |-------|---------| -| **Actor(s)** | Report generators, human operators, workflow orchestration layer | +| **Actor(s)** | Report generators, pipeline operators, workflow orchestration layer | | **Summary** | The context must preserve a complete execution record for each completed stage, including timing, traceback information, outcomes, and the arguments used to invoke it. This record must support reporting, post-mortem diagnosis of failures, and resumption after interruption. | | **Invariant** | Each completed stage retains its full execution record identity, outcome, timing, traceback, and invocation arguments for the lifetime of the session. | @@ -101,7 +101,7 @@ ___ | | | |-------|---------| | **Actor(s)** | Any task producing output, downstream tasks | -| **Summary** | When a task produces outputs that change the processing state (e.g., new calibrations, updated flag summaries, image products, revised parameters), the context must provide a mechanism for those outputs to become available to subsequent processing steps before they execute. UC-03, UC-04, UC-05, and UC-14 are domain-specific instances of this pattern. | +| **Summary** | When a task produces outputs that change the processing state (e.g., new calibrations, updated flag summaries, image products, revised parameters), the context must provide a mechanism for those outputs to become available to subsequent processing steps before they execute. UC-03, UC-04, UC-05, and UC-14 are specific instances of this pattern. | | **Postconditions** | Downstream tasks can access the propagated processing state they need. | --- @@ -120,7 +120,7 @@ ___ | | | |-------|---------| -| **Actor(s)** | Pipeline operator, workflow orchestration layer, developers | +| **Actor(s)** | Pipeline operator, workflow orchestration layer, pipeline developer | | **Summary** | The context must be able to serialize the complete processing state to disk (all observation data, calibration state, execution history, imaging state, project metadata, etc) and later restore it so that processing can resume from the saved point. The serialization must preserve enough state to resume; backward compatibility across pipeline releases is not guaranteed. On restore, paths must be adaptable to a new filesystem environment. | | **Postconditions** | After restore, the processing state is operationally equivalent to the saved state for supported resume workflows, and processing can continue from the specified point. | @@ -151,8 +151,8 @@ ___ | | | |-------|---------| -| **Actor(s)** | Report generators (weblog, quality reports, reproducibility scripts, AQUA reports, pipeline statistics) | -| **Summary** | The context must provide read-only access to the observation metadata, project metadata, execution history (including per-stage domain-specific outputs such as flag summaries and plot references), QA outcomes, log references, and path information needed to generate reporting products such as weblogs, quality reports, reproducibility scripts, AQUA reports, and pipeline statistics. | +| **Actor(s)** | Report generators (weblog, quality reports, reproducibility scripts, pipeline statistics) | +| **Summary** | The context must provide read-only access to the observation metadata, project metadata, execution history (including per-stage domain-specific outputs such as flag summaries and plot references), QA outcomes, log references, and path information needed to generate reporting products such as weblogs, quality reports, reproducibility scripts, and pipeline statistics. | | **Postconditions** | Reports accurately reflect the processing state at the time of generation. | --- @@ -161,13 +161,13 @@ ___ | | | |-------|---------| -| **Actor(s)** | QA scoring framework, report generators, tasks that consult recorded QA outcomes | -| **Summary** | After each processing step completes, the context must make the relevant processing state available to QA handlers so they can evaluate the outcome against quality thresholds, which may depend on telescope, project parameters, or observation properties. The resulting quality scores must be recorded and remain retrievable for reporting and for later pipeline logic that consults recorded QA outcomes. | -| **Postconditions** | Quality scores are associated with the relevant processing step and accessible to reports and downstream logic. | +| **Actor(s)** | QA scoring framework, report generators | +| **Summary** | After each processing step completes, the context must make the relevant processing state available to QA handlers so they can evaluate the outcome against quality thresholds, which may depend on e.g. telescope, project parameters, or observation properties. The resulting quality scores must be recorded and remain retrievable for reporting. | +| **Postconditions** | Quality scores are associated with the relevant processing step and accessible to reports. | --- -### UC-15 — Support Interactive Inspection and Debugging +### UC-15 — Support Inspection and Debugging | | | |-------|---------| @@ -177,12 +177,12 @@ ___ --- -### UC-16 — Manage Telescope-Specific Context Extensions +### UC-16 — Manage Telescope-Specific State | | | |-------|---------| | **Actor(s)** | Telescope-specific tasks and heuristics | -| **Summary** | The context must support conditional telescope-specific extensions to the processing state. These extensions must be available to telescope-specific tasks and heuristics while remaining outside the assumed contract of shared pipeline code. Generic pipeline components must not depend on or require knowledge of telescope-specific extensions. | +| **Summary** | The context must support conditional telescope-specific extensions to the processing state. These extensions must be available to telescope-specific tasks and heuristics. Generic pipeline components must not depend on or require knowledge of telescope-specific extensions. | | **Invariant** | Telescope-specific extensions are present only for runs that require them, available to the tasks that need them, and are never assumed by shared pipeline code. | --- @@ -191,6 +191,6 @@ ___ | | | |-------|---------| -| **Actor(s)** | Export task | -| **Summary** | The context must make the datasets, calibration state, image products, reports, scripts, path information, and project identifiers available through the processing state so export tasks can assemble them into a deliverable product package. | +| **Actor(s)** | Export tasks | +| **Summary** | The context must make the datasets, calibration state, image products, reports, scripts, path information, project identifiers, and any other information needed for export available through the processing state so export tasks can assemble them into a deliverable product package. | | **Invariant** | The information needed to assemble a deliverable product package is accessible through the processing state. | From 4b79e0e3c670d6a0f51389e93eb5b76a0786d91c Mon Sep 17 00:00:00 2001 From: kberry Date: Mon, 30 Mar 2026 13:03:33 -0400 Subject: [PATCH 17/22] Fixed typo and removed task category without interactions with the callibrary --- docs/context_use_cases_current_pipeline.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/context_use_cases_current_pipeline.md b/docs/context_use_cases_current_pipeline.md index f0dcd34..898202b 100644 --- a/docs/context_use_cases_current_pipeline.md +++ b/docs/context_use_cases_current_pipeline.md @@ -41,7 +41,7 @@ current implementations. | | | |-------|---------| | **Actor(s)** | Initialization, any task, report generators | -| **Summary** | The context must store project-level metadata (proposal code, PI, telescope, desired sensitivities, processing recipe) and make it available to all components, e.g,, for decision-making in heuristics and to label outputs in reports. | +| **Summary** | The context must store project-level metadata (proposal code, PI, telescope, desired sensitivities, processing recipe) and make it available to all components, e.g., for decision-making in heuristics and to label outputs in reports. | | **Invariant** | Project metadata is available for the lifetime of the processing session. | --- @@ -50,7 +50,7 @@ current implementations. | | | |-------|---------| -| **Actor(s)** | Calibration tasks, imaging tasks, flagging tasks, export tasks, report generators | +| **Actor(s)** | Calibration tasks, export tasks, restore tasks, report generators | | **Summary** | The context must allow calibration tasks to register and update solutions (indexed by data selection: field, spectral window, antenna, intent, time interval), and allow downstream tasks to query for all calibrations applicable to a given data selection. It must distinguish between calibrations pending application and those already applied. Registration must support registering multiple calibrations atomically as part of a single operation. | | **Invariant** | Calibration state is queryable and correctly scoped to data selections, and can be updated as processing progresses. | From 273743175b66f186ff72eddc75d0032b2c91acef Mon Sep 17 00:00:00 2001 From: Tom Booth Date: Mon, 30 Mar 2026 13:26:12 -0400 Subject: [PATCH 18/22] updated wording and syntax and removed implementation-specific language --- docs/context_use_cases_current_pipeline.md | 16 ++++++---------- 1 file changed, 6 insertions(+), 10 deletions(-) diff --git a/docs/context_use_cases_current_pipeline.md b/docs/context_use_cases_current_pipeline.md index 898202b..8d16635 100644 --- a/docs/context_use_cases_current_pipeline.md +++ b/docs/context_use_cases_current_pipeline.md @@ -1,13 +1,11 @@ # Pipeline Context Use Cases ## Overview -The pipeline context is the central state object used for an entire pipeline execution. -It carries observation data, calibration state, imaging state, execution history and state, -project metadata, and serves as the primary communication channel between pipeline stages. +The pipeline context is the central state object used for a pipeline execution. It carries observation data, calibration state, imaging state, execution history and state, project metadata, and serves as the primary communication channel between pipeline stages. -This document catalogues the current use cases of the pipeline context as determined by examination of the codebase. The goal is to inform the design of a system serving a similar role to the current pipeline context for RADPS. +This document catalogues the use cases of the current pipeline context as determined by examination of the codebase. The goal is to inform the design of a system serving a similar role to the current pipeline context for RADPS. -For additional details about the current implementation and reference material. see [Supplementary Analysis](context_current_pipeline_appendix.md). +For additional details about the current implementation and reference material, see [Supplementary Analysis](context_current_pipeline_appendix.md). --- @@ -17,9 +15,7 @@ Each use case describes the required capabilities of the context system and the The following fields are used in each use case: -- **Actor(s):** The human or system role that directly creates, updates, consumes, or inspects the -context state described by the use case. Actors are role categories, not specific task names or -current implementations. +- **Actor(s):** The human or system role that directly creates, updates, consumes, or inspects the context state described by the use case. Actors are role categories, not specific task names or current implementations. - **Summary:** What the context must do to satisfy the use case. - **Invariant:** A condition that must always be true while the system is operating. Present only where a meaningful invariant exists. - **Postcondition:** A condition that must be true after a specific operation completes. Present only where a meaningful postcondition exists. @@ -111,7 +107,7 @@ ___ | | | |-------|---------| | **Actor(s)** | Operations / automated processing (PPR-driven batch), pipeline developer / power user (interactive), recipe executor | -| **Summary** | The context is created and consumed by multiple front-ends: PPR command lists, XML procedures, or interactive task calls. The state stored by the context must remain consistent and usable regardless of which driver created or resumed it. It must be creatable and resumable from non-interactive and interactive drivers, support driver-injected run metadata, and tolerate partial execution controls (`startstage`, `exitstage`) and breakpoint-driven stop/resume semantics. | +| **Summary** | The context is created and consumed by multiple front-ends: PPR command lists, XML procedures, or interactive task calls. The state stored by the context must remain consistent and usable regardless of which driver created or resumed it. It must be creatable and resumable from non-interactive and interactive drivers, support driver-injected run metadata, and tolerate partial execution controls and breakpoint-driven stop/resume semantics. | | **Invariant** | Processing state is consistent and usable regardless of which orchestration driver created or resumed it, and success/failure signals are produced when appropriate. | --- @@ -121,7 +117,7 @@ ___ | | | |-------|---------| | **Actor(s)** | Pipeline operator, workflow orchestration layer, pipeline developer | -| **Summary** | The context must be able to serialize the complete processing state to disk (all observation data, calibration state, execution history, imaging state, project metadata, etc) and later restore it so that processing can resume from the saved point. The serialization must preserve enough state to resume; backward compatibility across pipeline releases is not guaranteed. On restore, paths must be adaptable to a new filesystem environment. | +| **Summary** | The context must be able to serialize the complete processing state to disk (all observation data, calibration state, execution history, imaging state, project metadata, etc.) and later restore it so that processing can resume from the saved point. The serialization must preserve enough state to resume; backward compatibility across pipeline releases is not guaranteed. On restore, paths must be adaptable to a new filesystem environment. | | **Postconditions** | After restore, the processing state is operationally equivalent to the saved state for supported resume workflows, and processing can continue from the specified point. | --- From 7323f20d1313a60ddab6341a9a2569469d8784cb Mon Sep 17 00:00:00 2001 From: kberry Date: Mon, 30 Mar 2026 15:48:43 -0400 Subject: [PATCH 19/22] Update wording in the appendix for clarity and correctness --- docs/context_current_pipeline_appendix.md | 58 +++++++++++------------ 1 file changed, 27 insertions(+), 31 deletions(-) diff --git a/docs/context_current_pipeline_appendix.md b/docs/context_current_pipeline_appendix.md index 3df5e5d..a5eda22 100644 --- a/docs/context_current_pipeline_appendix.md +++ b/docs/context_current_pipeline_appendix.md @@ -1,6 +1,6 @@ # Pipeline Context: Supplementary Analysis -This document contains and reference material that supplement the use cases in [context_use_cases_current_pipeline.md](context_use_cases_current_pipeline.md). These sections were separated to keep the use-case document focused on requirements. +This document contains implementation details and reference material that supplement the use cases in [context_use_cases_current_pipeline.md](context_use_cases_current_pipeline.md). These sections were separated to keep the use-case document focused on requirements. --- @@ -10,23 +10,23 @@ The following implementation notes describe how each use case is realized in the ### UC-01 — Load, Update, and Provide Access to Observation Metadata -**Implementation notes** — `context.observing_run` is the most heavily queried attribute of the context: +**Implementation notes** — `context.observing_run` holds the observation metadata and is the most frequently queried attribute of the context: - `context.observing_run.get_ms(name=vis)` — resolve an MS by filename -- `context.observing_run.measurement_sets` — iterate all registered MS objects +- `context.observing_run.measurement_sets` — all registered MS objects - `context.observing_run.get_measurement_sets_of_type(dtypes)` — filter by data type (RAW, REGCAL_CONTLINE_ALL, BASELINED, etc.) - `context.observing_run.virtual2real_spw_id(vspw, ms)` / `real2virtual_spw_id(...)` — translate between abstract pipeline SPW IDs and CASA-native IDs - `context.observing_run.virtual_science_spw_ids` — virtual SPW catalog - `context.observing_run.ms_reduction_group` — per-group reduction metadata (single-dish) - Provenance attributes: `start_datetime`, `end_datetime`, `project_ids`, `schedblock_ids`, `execblock_ids`, `observers` -MS objects are rich domain objects carrying scans, fields, SPWs, antennas, reference antenna ordering, etc. Tasks read per-MS state like `ms.reference_antenna`, `ms.session`, `ms.start_time`, `ms.origin_ms`. +The MS objects stored by `context.observing_run` carry information about scans, fields, SPWs, antennas, reference antenna ordering, etc. Tasks read per-MS state like `ms.reference_antenna`, `ms.session`, `ms.start_time`, `ms.origin_ms`. --- ### UC-02 — Store and Provide Project-Level Metadata -**Implementation notes** — project metadata is set once at session start, is not modified after initialization, and is read many times: +**Implementation notes** — project metadata is set during initialization or import, is not modified after import, and is read many times: - `context.project_summary = project.ProjectSummary(...)` — set by `executeppr()` / `executevlappr()` - `context.project_structure = project.ProjectStructure(...)` — set by PPR executors @@ -34,13 +34,6 @@ MS objects are rich domain objects carrying scans, fields, SPWs, antennas, refer - `context.set_state('ProjectStructure', 'recipe_name', value)` — used by `recipereducer.reduce()` and SD heuristics - `context.processing_intents` — set by `Pipeline` during initialization -Execution paths and output locations are also managed as part of project-level metadata: - -- Path roots: `output_dir`, `report_dir`, `products_dir` -- Context name drives deterministic, named run directories -- Relocation semantics are supported for results proxies (basenames stored) and common output layout -- PPR-driven execution may derive paths from environment variables (e.g., `SCIPIPE_ROOTDIR`) - --- ### UC-03 — Register, Query, and Update Calibration State @@ -49,7 +42,7 @@ Execution paths and output locations are also managed as part of project-level m - **Write:** `context.callibrary.add(calto, calfrom)` — register a calibration application (cal table + target selection); `context.callibrary.unregister_calibrations(matcher)` — remove by predicate - **Read:** `context.callibrary.active.get_caltable(caltypes=...)` — list active cal tables; `context.callibrary.get_calstate(calto)` — get full application state for a target selection -- Backed by `CalApplication` → `CalTo` / `CalFrom` objects with interval trees for efficient matching; append-mostly, ordered by registration time +- Backed by `CalApplication` → `CalTo` / `CalFrom` objects with interval trees for efficient matching. --- @@ -73,7 +66,7 @@ Execution paths and output locations are also managed as part of project-level m **Implementation notes** — image libraries provide typed registries: -- `context.sciimlist.add_item(imageitem)` / `.get_imlist()` — science images +- `context.sciimlist` — science images - `context.calimlist` — calibrator images - `context.rmsimlist` — RMS images - `context.subimlist` — sub-product images (cutouts, cubes) @@ -84,7 +77,7 @@ Execution paths and output locations are also managed as part of project-level m **Implementation notes:** -- `context.stage_number` and `context.task_counter` track progress +- `context.stage`, `context.task_counter`, `context.subtask_counter` track progress --- @@ -92,7 +85,7 @@ Execution paths and output locations are also managed as part of project-level m **Implementation notes:** -- `context.results` holds an ordered list of `ResultsProxy` objects (proxied to disk to bound memory) +- `context.results` holds an ordered list of `ResultsProxy` objects which are proxied to disk to bound memory - Timetracker integration provides per-stage timing data - Results proxies store basenames for portability @@ -102,8 +95,8 @@ Execution paths and output locations are also managed as part of project-level m **Implementation notes** — the current pipeline satisfies these needs through two different propagation paths: -1. **Immediate state propagation** — `Results.merge_with_context(context)` updates calibration library, image libraries, and other typed state so later tasks can access the current processing state directly. -2. **Retained step-result access** — tasks read `context.results` to find outputs from earlier stages when those outputs are needed from the recorded execution history rather than from merged shared state. For example: +1. **Immediate state propagation** — `Results.merge_with_context(context)` updates calibration library, image libraries, and more so later tasks can access the current processing state directly. +2. **Serialized Results** — tasks read `context.results` to find outputs from earlier stages when those outputs are needed from the recorded results rather than from merged shared state. For example: - VLA tasks compute `stage_number` from `context.results[-1].read().stage_number + 1` - `vlassmasking` iterates `context.results[::-1]` to find the latest `MakeImagesResult` - Export/AQUA code reads `context.results[0]` and `context.results[-1]` for timestamps @@ -112,12 +105,12 @@ Execution paths and output locations are also managed as part of project-level m ### UC-09 — Support Multiple Orchestration Drivers -**Implementation notes** — two orchestration planes converge on the same task implementations: +**Implementation notes** — multiple entry points converge on the same task execution path: - **Task-driven**: direct task calls via CLI wrappers in `pipeline/h/cli/` - **Command-list-driven**: PPR and XML procedure commands via `executeppr.py` / `executevlappr.py` and `recipereducer.py` -They differ in how inputs are marshalled, how session paths are selected, and how resume is initiated, but the persisted context is the same. +They differ in how inputs are specified, how session paths are selected, and how resume is initiated, but the persisted context is the same. --- @@ -126,7 +119,7 @@ They differ in how inputs are marshalled, how session paths are selected, and ho **Implementation notes:** - `h_save()` pickles the whole context to `.context` -- `h_resume(filename='last')` loads the most recent `.context` file +- `h_resume(filename)` loads a `.context` file, defaulting to the most recent context file available if `filename` is `None` or `last` is used. - Per-stage results are proxied to disk (`saved_state/result-stageX.pickle`) to keep the in-memory context smaller - Used by driver-managed breakpoint/resume (`executeppr(..., bpaction='resume')`) and developer debugging workflows @@ -139,7 +132,7 @@ They differ in how inputs are marshalled, how session paths are selected, and ho 1. The MPI client saves the context to disk as a pickle: `context.save(path)`. 2. Task arguments are also pickled to disk alongside the context. 3. On the server, `get_executable()` loads the context, modifies `context.logs['casa_commands']` to a server-local temp path, creates the task's `Inputs(context, **task_args)`, then executes the task. -4. For `Tier0JobRequest` (lower-level distribution), the executor is shallow-copied *excluding* the context reference to stay within the MPI buffer limit (~150 MiB, per PIPE-1337). +4. For `Tier0JobRequest` (lower-level distribution), the executor is shallow-copied *excluding* the context reference to stay within the MPI buffer limit (~150 MiB, see PIPE-1337). --- @@ -161,23 +154,26 @@ They differ in how inputs are marshalled, how session paths are selected, and ho **Implementation notes** — after `merge_with_context()`, `accept()` triggers `pipelineqa.qa_registry.do_qa(context, result)`: - QA handlers implement `QAPlugin.handle(context, result)` -- They typically call `context.observing_run.get_ms(vis)` to look up metadata for scoring (antenna count, channel count, SPW properties, field intents) -- Some handlers check `context.imaging_mode` to branch on VLASS-specific scoring -- Scores are appended to `result.qa.pool` — the context provides inputs to QA evaluation, but the scores are stored on the result rather than as direct context mutations +- The context provides inputs to QA evaluation: + - Most handlers call `context.observing_run.get_ms(vis)` to look up metadata for scoring (antenna count, channel count, SPW properties, field intents) + - Some handlers check `context.imaging_mode` to branch on VLASS-specific scoring + - Others check things in `context.observing_run`, `context.project_structure`, or the callibrary (`context.callibrary`) +- Scores are appended to `result.qa.pool`, so the scores are stored on the results rather than directly on the context. -QA handlers are read-only with respect to the context. +QA handlers write scores to `result.qa.pool` and do not modify the shared context directly. --- -### UC-16 — Manage Telescope-Specific Context Extensions +### UC-16 — Manage Telescope-Specific State + +This use case is based on a VLA-specific sub-context (`context.evla`) which is created during `hifv_importdata` and is updated by several subsequent tasks. Functionally, it provides a way to store observation metadata and pass state between tasks under `context.evla` rather than using the top-level context directly or other context objects (e.g. the domain objects). `context.evla` is an untyped, dictionary-of-dictionaries sidecar dynamically attached to the top-level context with no schema, no type annotations, and no declaration in `Context.__init__`. **Implementation notes** — `context.evla` is a `collections.defaultdict(dict)`, keyed as `context.evla['msinfo'][ms_name].`: - **Written by:** `hifv_importdata` (creates + initializes), `testBPdcals` (gain intervals, ignorerefant), `fluxscale/solint`, `fluxboot` -- **Read by:** nearly every VLA calibration task and heuristic +- **Read by:** Most VLA calibration tasks and heuristics - Accessed fields include: `gain_solint1`, `gain_solint2`, `setjy_results`, `ignorerefant`, various `*_field_select_string` / `*_scan_select_string` values, `fluxscale_sources`, `spindex_results`, and many more -This is an untyped, dictionary-of-dictionaries sidecar attached to the top-level context --- ## Key Implementation References @@ -188,9 +184,9 @@ This is an untyped, dictionary-of-dictionaries sidecar attached to the top-level - PPR-driven execution loops: - ALMA: `pipeline/infrastructure/executeppr.py` (used by `pipeline/runpipeline.py`) - VLA: `pipeline/infrastructure/executevlappr.py` (used by `pipeline/runvlapipeline.py`) -- XML procedure execution: `pipeline/recipereducer.py` +- Direct XML procedure execution: `pipeline/recipereducer.py` - MPI distribution: `pipeline/infrastructure/mpihelpers.py` -- QA framework: `pipeline/qa/` +- QA framework: `pipeline/infrastructure/pipelineqa.py`, `pipeline/qa/` - Weblog renderer: `pipeline/infrastructure/renderer/htmlrenderer.py` --- From 8ec2dadf32fff0ce7344c1c1d350b2e8b2b2952b Mon Sep 17 00:00:00 2001 From: Tom Booth Date: Wed, 1 Apr 2026 09:04:03 -0400 Subject: [PATCH 20/22] created doc for gap scenarios --- docs/context_gap_use_cases.md | 78 +++++++++++++++++++++++++++++++++++ 1 file changed, 78 insertions(+) create mode 100644 docs/context_gap_use_cases.md diff --git a/docs/context_gap_use_cases.md b/docs/context_gap_use_cases.md new file mode 100644 index 0000000..f0ee733 --- /dev/null +++ b/docs/context_gap_use_cases.md @@ -0,0 +1,78 @@ +## Missing capabilities (GAPs) and implications for RADPS context design + +This document records capabilities the current pipeline context design cannot yet support. Not every item below is strictly a "context" feature, but each implies changes to context responsibilities, schema, or interfaces. The gaps are enumerated as GAP-01 through GAP-07. A separate, more exhaustive gap analysis mapping these use cases to RADPS requirements is recommended. + +## High-level gap list + +- GAP-01: Concurrent / overlapping execution — requires snapshot isolation, transactional merges, partition-scoped writes, and conflict detection. +- GAP-02: Distributed execution without a shared filesystem — requires artifact references decoupled from POSIX paths and a context that serves as the system-of-record. +- GAP-03: Provenance and reproducibility — requires immutable per-attempt records, input hashing, and lineage capture. +- GAP-04: Partial re-execution / targeted rerun — requires explicit dependency tracking and invalidation semantics at the context level. +- GAP-05: External system integration — requires stable identifiers, event subscriptions/webhooks, and exportable summaries/manifests. +- GAP-06: Multi-language access — requires a language-neutral schema and API for context state and artifact queries. +- GAP-07: Streaming / incremental processing — requires versioned dataset registration and versioned results/artifacts. + +## Detailed use cases + +### GAP-01 — Concurrent execution of independent work + +| | | +|-------|---------| +| **Actor(s)** | Workflow orchestration layer, parallel task scheduler | +| **Summary** | The context must support concurrent execution at multiple granularities (stage-level and within-stage parallelism) while preventing inconsistent processing state. This differs from the current parallel-worker pattern, which waits for all work to finish before proceeding. | +| **Invariant** | Independent tasks may run concurrently but must not produce conflicting state. | +| **Postconditions** | Results from concurrent tasks are fully and consistently incorporated into processing state before any dependent work begins. | +| **RADPS requirements** | CSS9017, CSS9063 | + +### GAP-02 — Distributed execution without a shared filesystem + +| | | +|-------|---------| +| **Actor(s)** | Workflow orchestration layer, distributed workers | +| **Summary** | Execution must be possible across nodes that do not share a filesystem. Artifacts, datasets, and processing state must be addressable and accessible without relying on local POSIX paths. | +| **Postconditions** | Processing completes across distributed nodes with context-hosted references providing the necessary artifact access. | +| **RADPS requirements** | CSS9002, CSS9030 | + +### GAP-03 — Provenance and reproducibility + +| | | +|-------|---------| +| **Actor(s)** | Pipeline operator, auditor, reproducibility tooling | +| **Summary** | The context must record sufficient provenance — software versions, exact input identities/hashes, task parameters, and per-stage state — to enable precise reproduction and audit of past runs. | +| **Postconditions** | Any past processing step can be reproduced or audited using the recorded provenance chain. | +| **RADPS requirements** | ALMA-TR103, ALMA-TR104, ALMA-TR105 | + +### GAP-04 — Partial re-execution / targeted stage re-run + +| | | +|-------|---------| +| **Actor(s)** | Pipeline operator, developer, workflow engine | +| **Summary** | The context must support selectively re-running one or more mid-pipeline stages with new parameters while preserving unaffected stages. Downstream stages that depend on changed outputs must be invalidated or recomputed. | +| **Postconditions** | Processing state reflects the re-run outcomes; affected downstream stages are invalidated or updated; unaffected stages remain intact. | +| **RADPS requirements** | CSS9038 | + +### GAP-05 — External system integration (archive, scheduling, QA dashboards) + +| | | +|-------|---------| +| **Actor(s)** | QA dashboards, monitoring tools, archive ingest systems, schedulers | +| **Summary** | External systems need timely access to processing state (current stage, processing time, QA results, lifecycle events) without waiting for offline product files. The context should expose the necessary state via queryable interfaces or event feeds. | +| **Invariant** | External consumers can access the processing state they require while it remains current. | +| **Postconditions** | External systems can track processing progress and lifecycle transitions in near real time. | +| **RADPS requirements** | CSS9046, CSS9047, CSS9048, CSS9049, CSS9050, CSS9056 | + +### GAP-06 — Multi-language / multi-framework access to context + +| | | +|-------|---------| +| **Actor(s)** | Non-Python clients (C++, Julia, JavaScript dashboards), external tools | +| **Summary** | Expose context state through a language-neutral interface (e.g., Protocol Buffers, JSON-Schema, Arrow) and a stable API (REST/gRPC) so clients in any supported language can query context state and artifacts. | +| **Postconditions** | Multi-language clients can reliably query context state through a typed API. | + +### GAP-07 — Streaming / incremental processing + +| | | +|-------|---------| +| **Actor(s)** | Data ingest systems, workflow engine, incremental processing tasks | +| **Summary** | Support incremental dataset registration (adding new scans or execution blocks to a live session), incremental detection and processing of new data, and versioned results so re-runs produce new versions rather than overwriting. | +| **Postconditions** | New data may be incorporated into an active session and processed incrementally without restarting the pipeline from scratch. | From c7126e7ba2c16aab403d63a276de98e32565751c Mon Sep 17 00:00:00 2001 From: Tom Booth Date: Tue, 7 Apr 2026 22:05:23 -0400 Subject: [PATCH 21/22] =?UTF-8?q?Add=20UC-09=20(intra-stage=20workspace),?= =?UTF-8?q?=20renumber=20UC-09=E2=80=93UC-18;=20fix=20UC-04/UC-12/UC-14=20?= =?UTF-8?q?impl=20notes;=20add=20GAP-08=20(cross-MS=20matching);=20update?= =?UTF-8?q?=20GAP-06=20title=20and=20summary?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit --- docs/context_current_pipeline_appendix.md | 46 ++++++++++++++-------- docs/context_gap_use_cases.md | 20 +++++++--- docs/context_use_cases_current_pipeline.md | 30 +++++++++----- 3 files changed, 65 insertions(+), 31 deletions(-) diff --git a/docs/context_current_pipeline_appendix.md b/docs/context_current_pipeline_appendix.md index a5eda22..a5a373b 100644 --- a/docs/context_current_pipeline_appendix.md +++ b/docs/context_current_pipeline_appendix.md @@ -52,11 +52,11 @@ The MS objects stored by `context.observing_run` carry information about scans, | Attribute | Written by | Read by | |---|---|---| -| `clean_list_pending` | `editimlist`, `makeimlist`, `findcont`, `makeimages` | `findcont`, `tclean`, `transformimagedata`, `uvcontsub`, `checkproductsize` | +| `clean_list_pending` | `editimlist`, `makeimlist`, `findcont`, `makeimages` | `findcont`, `transformimagedata`, `makeimages`, `vlassmasking` | | `clean_list_info` | `makeimlist`, `makeimages` | display/renderer code | -| `imaging_mode` | `editimlist` | `makermsimages`, `makecutoutimages`, `makeimages` | -| `imaging_parameters` | PPR / `editimlist` | `tclean`, `checkproductsize`, heuristics | -| `synthesized_beams` | `imageprecheck`, `tclean`, `checkproductsize`, `makeimlist`, `makeimages` | `checkproductsize`, heuristics | +| `imaging_mode` | `editimlist` | `makermsimages`, `makecutoutimages`, `makeimages`, VLASS export/display code | +| `imaging_parameters` | `imageprecheck` | `tclean`, `checkproductsize`, `makeimlist`, heuristics | +| `synthesized_beams` | `imageprecheck`, `tclean`, `checkproductsize`, `makeimlist`, `makeimages` | `imageprecheck`, `editimlist`, `tclean`, `uvcontsub`, `checkproductsize`, heuristics | | `size_mitigation_parameters` | `checkproductsize` | downstream stages | | `selfcal_targets`, `selfcal_resources` | `selfcal` | `exportdata` | @@ -93,17 +93,31 @@ The MS objects stored by `context.observing_run` carry information about scans, ### UC-08 — Propagate Task Outputs to Downstream Tasks -**Implementation notes** — the current pipeline satisfies these needs through two different propagation paths: +**Implementation notes** — the intended primary mechanism in the current pipeline is immediate propagation through context state updated during result acceptance. Over time, some workflows also came to inspect recorded results directly. Both patterns exist in the codebase, but the second should be understood as an accreted pattern rather than the original design intent. -1. **Immediate state propagation** — `Results.merge_with_context(context)` updates calibration library, image libraries, and more so later tasks can access the current processing state directly. -2. **Serialized Results** — tasks read `context.results` to find outputs from earlier stages when those outputs are needed from the recorded results rather than from merged shared state. For example: +This use case is also a concrete example of context creep caused by weakly enforced contracts: the intended contract was that downstream tasks would consume explicitly merged shared state, but later code sometimes reached into `context.results` directly when that contract was not maintained consistently. + +1. **Immediate state propagation** — `Results.merge_with_context(context)` updates calibration library, image libraries, and dedicated context attributes such as `clean_list_pending`, `clean_list_info`, `synthesized_beams`, `size_mitigation_parameters`, `selfcal_targets`, and `selfcal_resources` so later tasks can access the current processing state directly without parsing another task's results object. +2. **Recorded-result inspection** — some tasks read `context.results` to find outputs from earlier stages when those outputs are needed from the recorded results rather than from merged shared state. This pattern introduces coupling to recipe order or to another task's result class structure. For example: - VLA tasks compute `stage_number` from `context.results[-1].read().stage_number + 1` - `vlassmasking` iterates `context.results[::-1]` to find the latest `MakeImagesResult` - Export/AQUA code reads `context.results[0]` and `context.results[-1]` for timestamps --- -### UC-09 — Support Multiple Orchestration Drivers +### UC-09 — Provide a Transient Intra-Stage Workspace + +**Implementation notes** — the current framework implements this behavior in `pipeline/infrastructure/basetask.py`: + +- `StandardTaskTemplate.execute()` replaces `self.inputs` with a pickled copy of the original inputs, including the context, before task logic runs, and restores the original inputs in `finally` +- Child tasks therefore execute against a duplicated context that may be mutated freely during `prepare()` / `analyse()` +- `Executor.execute(job, merge=True)` commits a child result by calling `result.accept(self._context)`; with `merge=False`, the child task may still be run and inspected without committing its state +- This makes it possible for aggregate tasks to try tentative calibration paths or other destructive edits inside a stage and keep only the results they explicitly accept +- The rollback mechanism is in-memory copy/restore of task inputs and context; it is distinct from explicit session save/resume workflows + +--- + +### UC-10 — Support Multiple Orchestration Drivers **Implementation notes** — multiple entry points converge on the same task execution path: @@ -114,7 +128,7 @@ They differ in how inputs are specified, how session paths are selected, and how --- -### UC-10 — Save and Restore a Processing Session +### UC-11 — Save and Restore a Processing Session **Implementation notes:** @@ -125,18 +139,18 @@ They differ in how inputs are specified, how session paths are selected, and how --- -### UC-11 — Provide State to Parallel Workers +### UC-12 — Provide State to Parallel Workers **Implementation notes** — `pipeline/infrastructure/mpihelpers.py`, class `Tier0PipelineTask`: 1. The MPI client saves the context to disk as a pickle: `context.save(path)`. 2. Task arguments are also pickled to disk alongside the context. 3. On the server, `get_executable()` loads the context, modifies `context.logs['casa_commands']` to a server-local temp path, creates the task's `Inputs(context, **task_args)`, then executes the task. -4. For `Tier0JobRequest` (lower-level distribution), the executor is shallow-copied *excluding* the context reference to stay within the MPI buffer limit (~150 MiB, see PIPE-1337). +4. For `Tier0JobRequest` (lower-level distribution), the executor is shallow-copied *excluding* the context reference to stay within the pipeline-enforced MPI buffer limit (100 MiB). Comments in the code note CASA's higher native limit (~150 MiB; see PIPE-1337 / CAS-13656). --- -### UC-13 — Provide Read-Only State for Reporting +### UC-14 — Provide Read-Only State for Reporting **Implementation notes** — `WebLogGenerator.render(context)` in `pipeline/infrastructure/renderer/htmlrenderer.py`: @@ -145,11 +159,11 @@ They differ in how inputs are specified, how session paths are selected, and how - Reads `context.observing_run.*` — MS metadata, scheduling blocks, execution blocks, observers, project IDs, start/end times - Reads `context.project_summary.telescope` — to determine telescope-specific page layouts (ALMA vs VLA vs NRO) - Reads `context.project_structure.*` — OUS IDs, PPR file, recipe name -- Reads `context.logs['casa_commands']` — CASA command history +- The larger renderer stack, including the Mako templates under `pipeline/infrastructure/renderer/templates/`, reads `context.logs['casa_commands']` and related log references when generating weblog links --- -### UC-14 — Support QA Evaluation and Store Quality Assessments +### UC-15 — Support QA Evaluation and Store Quality Assessments **Implementation notes** — after `merge_with_context()`, `accept()` triggers `pipelineqa.qa_registry.do_qa(context, result)`: @@ -158,13 +172,13 @@ They differ in how inputs are specified, how session paths are selected, and how - Most handlers call `context.observing_run.get_ms(vis)` to look up metadata for scoring (antenna count, channel count, SPW properties, field intents) - Some handlers check `context.imaging_mode` to branch on VLASS-specific scoring - Others check things in `context.observing_run`, `context.project_structure`, or the callibrary (`context.callibrary`) -- Scores are appended to `result.qa.pool`, so the scores are stored on the results rather than directly on the context. +- Scores are appended to `result.qa.pool`, so the scores are stored on the results rather than directly on the context. This also keeps detailed QA collections scoped to the stage result that produced them; in current code, a `QAScorePool` can hold many `QAScore` objects, and each score may carry fine-grained `applies_to` selections (e.g. vis, field, SPW, antenna, polarization), so the per-result pool can become fairly large for detailed assessments. QA handlers write scores to `result.qa.pool` and do not modify the shared context directly. --- -### UC-16 — Manage Telescope-Specific State +### UC-17 — Manage Telescope-Specific State This use case is based on a VLA-specific sub-context (`context.evla`) which is created during `hifv_importdata` and is updated by several subsequent tasks. Functionally, it provides a way to store observation metadata and pass state between tasks under `context.evla` rather than using the top-level context directly or other context objects (e.g. the domain objects). `context.evla` is an untyped, dictionary-of-dictionaries sidecar dynamically attached to the top-level context with no schema, no type annotations, and no declaration in `Context.__init__`. diff --git a/docs/context_gap_use_cases.md b/docs/context_gap_use_cases.md index f0ee733..6ae6726 100644 --- a/docs/context_gap_use_cases.md +++ b/docs/context_gap_use_cases.md @@ -1,6 +1,6 @@ ## Missing capabilities (GAPs) and implications for RADPS context design -This document records capabilities the current pipeline context design cannot yet support. Not every item below is strictly a "context" feature, but each implies changes to context responsibilities, schema, or interfaces. The gaps are enumerated as GAP-01 through GAP-07. A separate, more exhaustive gap analysis mapping these use cases to RADPS requirements is recommended. +This document records capabilities the current pipeline context design cannot yet support. Not every item below is strictly a "context" feature, but each implies changes to context responsibilities, schema, or interfaces. The gaps are enumerated as GAP-01 through GAP-08. A separate, more exhaustive gap analysis mapping these use cases to RADPS requirements is recommended. ## High-level gap list @@ -9,8 +9,9 @@ This document records capabilities the current pipeline context design cannot ye - GAP-03: Provenance and reproducibility — requires immutable per-attempt records, input hashing, and lineage capture. - GAP-04: Partial re-execution / targeted rerun — requires explicit dependency tracking and invalidation semantics at the context level. - GAP-05: External system integration — requires stable identifiers, event subscriptions/webhooks, and exportable summaries/manifests. -- GAP-06: Multi-language access — requires a language-neutral schema and API for context state and artifact queries. +- GAP-06: Programming-language / client-framework access — requires a language-neutral contract and stable middleware/API layer with extensible value types so clients in any language can access context state without coupling to the storage representation. - GAP-07: Streaming / incremental processing — requires versioned dataset registration and versioned results/artifacts. +- GAP-08: Cross-MS matching and heterogeneous dataset coordination — requires flexible SPW matching semantics (exact and partial/overlap) and data-type tracking across measurement sets, rather than the current single-master-MS assumption. ## Detailed use cases @@ -61,13 +62,13 @@ This document records capabilities the current pipeline context design cannot ye | **Postconditions** | External systems can track processing progress and lifecycle transitions in near real time. | | **RADPS requirements** | CSS9046, CSS9047, CSS9048, CSS9049, CSS9050, CSS9056 | -### GAP-06 — Multi-language / multi-framework access to context +### GAP-06 — Programming-Language / Client-Framework Access to Context | | | |-------|---------| | **Actor(s)** | Non-Python clients (C++, Julia, JavaScript dashboards), external tools | -| **Summary** | Expose context state through a language-neutral interface (e.g., Protocol Buffers, JSON-Schema, Arrow) and a stable API (REST/gRPC) so clients in any supported language can query context state and artifacts. | -| **Postconditions** | Multi-language clients can reliably query context state through a typed API. | +| **Summary** | Expose context state through a programming-language-neutral contract and a stable middleware/API layer (e.g., REST/gRPC over a typed schema such as Protocol Buffers, JSON-Schema, or Arrow) so local language bindings do not need to know how the context is physically stored. The contract should provide a small set of standard extensible value types (scalars, lists, maps/dictionaries, records/structs) alongside typed core records so developers can add new items quickly without redefining the storage model for each client. | +| **Postcondition** | Clients in any supported programming language can query context state and artifacts through a stable typed API without coupling themselves to the underlying storage representation. | ### GAP-07 — Streaming / incremental processing @@ -76,3 +77,12 @@ This document records capabilities the current pipeline context design cannot ye | **Actor(s)** | Data ingest systems, workflow engine, incremental processing tasks | | **Summary** | Support incremental dataset registration (adding new scans or execution blocks to a live session), incremental detection and processing of new data, and versioned results so re-runs produce new versions rather than overwriting. | | **Postconditions** | New data may be incorporated into an active session and processed incrementally without restarting the pipeline from scratch. | + +### GAP-08 — Cross-MS matching and heterogeneous dataset coordination + +| | | +|-------|---------| +| **Actor(s)** | Data import tasks, calibration tasks, imaging tasks, heuristics | +| **Summary** | The current design provides limited support for heterogeneous multi-MS datasets through virtual SPW translation and per-MS data-column tracking, but many workflows still rely on a single reference-MS or master-MS model and do not expose general cross-MS matching semantics. The context must instead support heterogeneous multi-MS scenarios by providing: (1) cross-MS SPW matching with distinct semantics for exact matching (required by calibration tasks) and partial/overlap matching (required for imaging tasks that can combine overlapping spectral windows); and (2) data-type and column tracking across multiple MSes without assuming a shared layout. Because the current virtual-SPW translation mechanism is tightly coupled to the single-master-MS assumption, a fresh design is preferable to extending it. | +| **Invariant** | SPW identity and data-column state are queryable across all registered datasets, regardless of whether those datasets share native numbering or column layout. | +| **Postconditions** | Calibration and imaging tasks can look up applicable SPWs and data columns across an arbitrary collection of heterogeneous MSes using the appropriate matching semantics for their use. | diff --git a/docs/context_use_cases_current_pipeline.md b/docs/context_use_cases_current_pipeline.md index 8d16635..8699b3b 100644 --- a/docs/context_use_cases_current_pipeline.md +++ b/docs/context_use_cases_current_pipeline.md @@ -97,12 +97,22 @@ ___ | | | |-------|---------| | **Actor(s)** | Any task producing output, downstream tasks | -| **Summary** | When a task produces outputs that change the processing state (e.g., new calibrations, updated flag summaries, image products, revised parameters), the context must provide a mechanism for those outputs to become available to subsequent processing steps before they execute. UC-03, UC-04, UC-05, and UC-14 are specific instances of this pattern. | +| **Summary** | When a task produces outputs that change the processing state (e.g., new calibrations, updated flag summaries, image products, revised parameters), the context must provide a mechanism for those outputs to become available to subsequent processing steps before they execute. UC-03, UC-04, UC-05, and UC-15 are specific instances of this pattern. | | **Postconditions** | Downstream tasks can access the propagated processing state they need. | --- -### UC-09 — Support Multiple Orchestration Drivers +### UC-09 — Provide a Transient Intra-Stage Workspace + +| | | +|-------|---------| +| **Actor(s)** | Aggregate tasks, child tasks, task execution framework | +| **Summary** | Within a stage, the context must be usable as a temporary working space for child-task execution. Child tasks must be able to modify context state destructively while they run, including adding, removing, or replacing tentative calibration and processing state, without requiring explicit cleanup logic. Only outputs that are explicitly accepted into the enclosing task's context should survive stage execution. | +| **Invariant** | State changes made while executing against a temporary child-task context do not escape that workspace unless they are explicitly accepted and merged. | +| **Postcondition** | When a child task finishes, the enclosing task retains only the accepted state changes; unaccepted mutations to the temporary workspace are discarded. | + +--- +### UC-10 — Support Multiple Orchestration Drivers | | | |-------|---------| @@ -112,7 +122,7 @@ ___ --- -### UC-10 — Save and Restore a Processing Session +### UC-11 — Save and Restore a Processing Session | | | |-------|---------| @@ -122,7 +132,7 @@ ___ --- -### UC-11 — Provide State to Parallel Workers +### UC-12 — Provide State to Parallel Workers | | | |-------|---------| @@ -133,7 +143,7 @@ ___ --- -### UC-12 — Aggregate Results from Parallel Workers +### UC-13 — Aggregate Results from Parallel Workers | | | |-------|---------| @@ -143,7 +153,7 @@ ___ --- -### UC-13 — Provide Read-Only State for Reporting +### UC-14 — Provide Read-Only State for Reporting | | | |-------|---------| @@ -153,7 +163,7 @@ ___ --- -### UC-14 — Support QA Evaluation and Store Quality Assessments +### UC-15 — Support QA Evaluation and Store Quality Assessments | | | |-------|---------| @@ -163,7 +173,7 @@ ___ --- -### UC-15 — Support Inspection and Debugging +### UC-16 — Support Inspection and Debugging | | | |-------|---------| @@ -173,7 +183,7 @@ ___ --- -### UC-16 — Manage Telescope-Specific State +### UC-17 — Manage Telescope-Specific State | | | |-------|---------| @@ -183,7 +193,7 @@ ___ --- -### UC-17 — Provide State for Product Export +### UC-18 — Provide State for Product Export | | | |-------|---------| From 3b6e9f45cdc8a7812480a242d0895d39f00c164c Mon Sep 17 00:00:00 2001 From: Tom Booth Date: Wed, 8 Apr 2026 09:17:02 -0400 Subject: [PATCH 22/22] add UC-09, renumber use-cases; update appendix and GAPs (fix UC-04, add GAP-08, refine GAP-06) --- docs/context_current_pipeline_appendix.md | 19 +++++++++++++------ docs/context_use_cases_current_pipeline.md | 10 +++++----- 2 files changed, 18 insertions(+), 11 deletions(-) diff --git a/docs/context_current_pipeline_appendix.md b/docs/context_current_pipeline_appendix.md index a5a373b..e3c97da 100644 --- a/docs/context_current_pipeline_appendix.md +++ b/docs/context_current_pipeline_appendix.md @@ -22,6 +22,8 @@ The following implementation notes describe how each use case is realized in the The MS objects stored by `context.observing_run` carry information about scans, fields, SPWs, antennas, reference antenna ordering, etc. Tasks read per-MS state like `ms.reference_antenna`, `ms.session`, `ms.start_time`, `ms.origin_ms`. +For the single-dish pipeline, this use case also includes per-MS `DataTable` products referenced through `context.observing_run.ms_datatable_name`. These are not just raw imported metadata tables: they persist row-level metadata and derived quantities used by downstream SD tasks. During SD import, the reader populates `DataTable` columns such as `RA`, `DEC`, `AZ`, `EL`, `SHIFT_RA`, `SHIFT_DEC`, `OFS_RA`, and `OFS_DEC`, including coordinate conversions into the pipeline's chosen celestial frame (for example ICRS) so later imaging, gridding, plotting, and QA code can reuse those values efficiently. + --- ### UC-02 — Store and Provide Project-Level Metadata @@ -53,12 +55,13 @@ The MS objects stored by `context.observing_run` carry information about scans, | Attribute | Written by | Read by | |---|---|---| | `clean_list_pending` | `editimlist`, `makeimlist`, `findcont`, `makeimages` | `findcont`, `transformimagedata`, `makeimages`, `vlassmasking` | -| `clean_list_info` | `makeimlist`, `makeimages` | display/renderer code | +| `clean_list_info` | `makeimlist`, `makeimages` | `makeimages` | | `imaging_mode` | `editimlist` | `makermsimages`, `makecutoutimages`, `makeimages`, VLASS export/display code | | `imaging_parameters` | `imageprecheck` | `tclean`, `checkproductsize`, `makeimlist`, heuristics | | `synthesized_beams` | `imageprecheck`, `tclean`, `checkproductsize`, `makeimlist`, `makeimages` | `imageprecheck`, `editimlist`, `tclean`, `uvcontsub`, `checkproductsize`, heuristics | | `size_mitigation_parameters` | `checkproductsize` | downstream stages | -| `selfcal_targets`, `selfcal_resources` | `selfcal` | `exportdata` | +| `selfcal_targets` | `selfcal` | `makeimlist` | +| `selfcal_resources` | `selfcal` | `exportdata` | --- @@ -154,7 +157,7 @@ They differ in how inputs are specified, how session paths are selected, and how **Implementation notes** — `WebLogGenerator.render(context)` in `pipeline/infrastructure/renderer/htmlrenderer.py`: -- Reads `context.results` — unpickled from `ResultsProxy` objects, iterated for every renderer +- `WebLogGenerator.render(context)` explicitly does `context.results = [proxy.read() for proxy in context.results]` once before the renderer loop, so individual renderers iterate fully unpickled result objects rather than calling `read()` themselves - Reads `context.report_dir`, `context.output_dir` — filesystem layout - Reads `context.observing_run.*` — MS metadata, scheduling blocks, execution blocks, observers, project IDs, start/end times - Reads `context.project_summary.telescope` — to determine telescope-specific page layouts (ALMA vs VLA vs NRO) @@ -178,16 +181,20 @@ QA handlers write scores to `result.qa.pool` and do not modify the shared contex --- -### UC-17 — Manage Telescope-Specific State +### UC-17 — Manage Telescope- and Array-Specific State + +**Implementation notes** — the current codebase shows at least two different forms of telescope-/array-specific state. -This use case is based on a VLA-specific sub-context (`context.evla`) which is created during `hifv_importdata` and is updated by several subsequent tasks. Functionally, it provides a way to store observation metadata and pass state between tasks under `context.evla` rather than using the top-level context directly or other context objects (e.g. the domain objects). `context.evla` is an untyped, dictionary-of-dictionaries sidecar dynamically attached to the top-level context with no schema, no type annotations, and no declaration in `Context.__init__`. +One is a VLA-specific sub-context (`context.evla`) which is created during `hifv_importdata` and is updated by several subsequent tasks. Functionally, it provides a way to store observation metadata and pass state between tasks under `context.evla` rather than using the top-level context directly or other context objects (e.g. the domain objects). `context.evla` is an untyped, dictionary-of-dictionaries sidecar dynamically attached to the top-level context with no schema, no type annotations, and no declaration in `Context.__init__`. -**Implementation notes** — `context.evla` is a `collections.defaultdict(dict)`, keyed as `context.evla['msinfo'][ms_name].`: +`context.evla` is a `collections.defaultdict(dict)`, keyed as `context.evla['msinfo'][ms_name].`: - **Written by:** `hifv_importdata` (creates + initializes), `testBPdcals` (gain intervals, ignorerefant), `fluxscale/solint`, `fluxboot` - **Read by:** Most VLA calibration tasks and heuristics - Accessed fields include: `gain_solint1`, `gain_solint2`, `setjy_results`, `ignorerefant`, various `*_field_select_string` / `*_scan_select_string` values, `fluxscale_sources`, `spindex_results`, and many more +Another is ALMA TP / single-dish state, which is array-specific rather than telescope-wide and is carried mainly through SD-specific structures under `context.observing_run`, such as `ms_datatable_name`, `ms_reduction_group`, and `org_directions`, plus the per-MS `DataTable` products referenced from that state. This is a useful reminder that array-specific extensions do not always appear as a single sidecar object like `context.evla`; they may instead live in domain-model extensions and array-specific cached metadata products. + --- ## Key Implementation References diff --git a/docs/context_use_cases_current_pipeline.md b/docs/context_use_cases_current_pipeline.md index 8699b3b..b5a5a02 100644 --- a/docs/context_use_cases_current_pipeline.md +++ b/docs/context_use_cases_current_pipeline.md @@ -27,7 +27,7 @@ The following fields are used in each use case: | | | |-------|---------| | **Actor(s)** | Data import task, any downstream task, heuristics, renderers, QA handlers | -| **Summary** | The context must load observation metadata (datasets, spectral windows, fields, antennas, scans, time ranges), make it queryable by all subsequent processing steps, and allow downstream tasks to update it as processing progresses (e.g., registering new derived datasets, data column and type changes, reference antenna selection). It must also provide a unified identifier scheme when multiple datasets use different native numbering. | +| **Summary** | The context must load observation metadata (datasets, spectral windows, fields, antennas, scans, time ranges), make it queryable by all subsequent processing steps, and allow downstream tasks to update it as processing progresses (e.g., registering new derived datasets, data column and type changes, reference antenna selection). It must also be able to hold derived or cached metadata products created during import when later stages rely on them for efficiency rather than recomputing them from the raw measurement set, and it must provide a unified identifier scheme when multiple datasets use different native numbering. | | **Invariant** | All registered datasets remain queryable and updatable for the lifetime of the session without repeating the import process. | --- @@ -183,13 +183,13 @@ ___ --- -### UC-17 — Manage Telescope-Specific State +### UC-17 — Manage Telescope- and Array-Specific State | | | |-------|---------| -| **Actor(s)** | Telescope-specific tasks and heuristics | -| **Summary** | The context must support conditional telescope-specific extensions to the processing state. These extensions must be available to telescope-specific tasks and heuristics. Generic pipeline components must not depend on or require knowledge of telescope-specific extensions. | -| **Invariant** | Telescope-specific extensions are present only for runs that require them, available to the tasks that need them, and are never assumed by shared pipeline code. | +| **Actor(s)** | Telescope-specific tasks and heuristics, array-specific tasks and heuristics | +| **Summary** | The context must support conditional telescope-specific and array-specific extensions to the processing state. These extensions must be available to the tasks and heuristics that need them, including cases where one array mode within a telescope family has materially different state requirements from another. Generic pipeline components must not depend on or require knowledge of those telescope- or array-specific extensions. | +| **Invariant** | Telescope- and array-specific extensions are present only for runs that require them, available to the tasks that need them, and are never assumed by shared pipeline code. | ---