diff --git a/partner-built/magellan/.claude-plugin/plugin.json b/partner-built/magellan/.claude-plugin/plugin.json new file mode 100644 index 00000000..09673db5 --- /dev/null +++ b/partner-built/magellan/.claude-plugin/plugin.json @@ -0,0 +1,17 @@ +{ + "name": "magellan", + "version": "2.0.0", + "description": "Enterprise knowledge discovery — extracts structured knowledge from code, documents, and transcripts, then builds a shared, queryable knowledge graph with contradictions and open questions as primary output", + "author": { + "name": "Slalom" + }, + "license": "Apache-2.0", + "keywords": [ + "knowledge-discovery", + "knowledge-graph", + "enterprise", + "architecture", + "due-diligence", + "legacy-systems" + ] +} diff --git a/partner-built/magellan/CLAUDE.md b/partner-built/magellan/CLAUDE.md new file mode 100644 index 00000000..7ce9a222 --- /dev/null +++ b/partner-built/magellan/CLAUDE.md @@ -0,0 +1,30 @@ +# CLAUDE.md + +Magellan is an enterprise knowledge discovery plugin. It extracts structured +knowledge from collected materials and builds a queryable knowledge graph. + +## Commands + +- `/magellan` — Run the discovery pipeline or show status +- `/magellan:add ` — Add a file or directory +- `/magellan:add --correction "..."` — Record a verbal correction +- `/magellan:ask ` — Query the knowledge graph + +## Four Principles + +1. Every fact traces to a source document. Nothing is invented. +2. Contradictions and open questions are the primary output, not a side effect. +3. Nothing is silently skipped. Every file gets a recorded disposition. +4. The model does the heavy lifting. Humans steer and correct. + +## Key Skills + +- `skills/file-conventions/` — JSON schemas for all KG file types. Read this + before writing any file to `.magellan/`. +- `skills/ingestion/` — Fact extraction rules and language guides for legacy code. +- `skills/pipeline-review/` — Quality gate criteria. Run after every pipeline step. + +## Output Location + +All outputs go in `/.magellan/`. See the file-conventions skill for +the complete directory layout and JSON schemas. diff --git a/partner-built/magellan/CONNECTORS.md b/partner-built/magellan/CONNECTORS.md new file mode 100644 index 00000000..2eebdc67 --- /dev/null +++ b/partner-built/magellan/CONNECTORS.md @@ -0,0 +1,33 @@ +# Connectors + +Magellan is a **self-contained** knowledge discovery plugin. It does not require +connections to external SaaS tools to function — all knowledge extraction, graph +building, and querying happens locally using the files in your workspace. + +## Optional Tool Integrations + +While the core pipeline is self-contained, you can optionally connect external tools +to enhance Magellan's workflows: + +| Category | Use Case | Example Tools | +|----------|----------|---------------| +| ~~project-tracker~~ | Route open questions to your team as tickets | Jira, Linear, Asana, GitHub Issues | +| ~~chat~~ | Send contradiction summaries to team channels | Slack, Microsoft Teams | +| ~~knowledge-base~~ | Fetch referenced documents from team wikis | Confluence, Notion, Guru | + +These integrations are **tool-agnostic** — any MCP server in the category works. +Add the relevant MCP servers to your `.mcp.json` to enable them. + +## No Required Connectors + +Unlike plugins that depend on external data sources, Magellan works entirely from +local files. Point it at a folder of collected materials (code, documents, transcripts, +diagrams) and it builds the knowledge graph from those files directly. + +## System Requirements + +Magellan has no system requirements beyond Claude Code. It reads files directly +using Claude's built-in capabilities — no external tools needed. + +> **Note:** Claude does not yet natively read DOCX, PPTX, or XLSX files. Until it +> does, convert these to PDF before adding them to your workspace. diff --git a/partner-built/magellan/LICENSE b/partner-built/magellan/LICENSE new file mode 100644 index 00000000..057af020 --- /dev/null +++ b/partner-built/magellan/LICENSE @@ -0,0 +1,191 @@ + + Apache License + Version 2.0, January 2004 + http://www.apache.org/licenses/ + + TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION + + 1. Definitions. + + "License" shall mean the terms and conditions for use, reproduction, + and distribution as defined by Sections 1 through 9 of this document. + + "Licensor" shall mean the copyright owner or entity authorized by + the copyright owner that is granting the License. + + "Legal Entity" shall mean the union of the acting entity and all + other entities that control, are controlled by, or are under common + control with that entity. For the purposes of this definition, + "control" means (i) the power, direct or indirect, to cause the + direction or management of such entity, whether by contract or + otherwise, or (ii) ownership of fifty percent (50%) or more of the + outstanding shares, or (iii) beneficial ownership of such entity. + + "You" (or "Your") shall mean an individual or Legal Entity + exercising permissions granted by this License. + + "Source" form shall mean the preferred form for making modifications, + including but not limited to software source code, documentation + source, and configuration files. + + "Object" form shall mean any form resulting from mechanical + transformation or translation of a Source form, including but + not limited to compiled object code, generated documentation, + and conversions to other media types. + + "Work" shall mean the work of authorship, whether in Source or + Object form, made available under the License, as indicated by a + copyright notice that is included in or attached to the work + (an example is provided in the Appendix below). + + "Derivative Works" shall mean any work, whether in Source or Object + form, that is based on (or derived from) the Work and for which the + editorial revisions, annotations, elaborations, or other modifications + represent, as a whole, an original work of authorship. For the purposes + of this License, Derivative Works shall not include works that remain + separable from, or merely link (or bind by name) to the interfaces of, + the Work and Derivative Works thereof. + + "Contribution" shall mean any work of authorship, including + the original version of the Work and any modifications or additions + to that Work or Derivative Works thereof, that is intentionally + submitted to the Licensor for inclusion in the Work by the copyright owner + or by an individual or Legal Entity authorized to submit on behalf of + the copyright owner. For the purposes of this definition, "submitted" + means any form of electronic, verbal, or written communication sent + to the Licensor or its representatives, including but not limited to + communication on electronic mailing lists, source code control systems, + and issue tracking systems that are managed by, or on behalf of, the + Licensor for the purpose of discussing and improving the Work, but + excluding communication that is conspicuously marked or otherwise + designated in writing by the copyright owner as "Not a Contribution." + + "Contributor" shall mean Licensor and any individual or Legal Entity + on behalf of whom a Contribution has been received by the Licensor and + subsequently incorporated within the Work. + + 2. Grant of Copyright License. Subject to the terms and conditions of + this License, each Contributor hereby grants to You a perpetual, + worldwide, non-exclusive, no-charge, royalty-free, irrevocable + copyright license to reproduce, prepare Derivative Works of, + publicly display, publicly perform, sublicense, and distribute the + Work and such Derivative Works in Source or Object form. + + 3. Grant of Patent License. Subject to the terms and conditions of + this License, each Contributor hereby grants to You a perpetual, + worldwide, non-exclusive, no-charge, royalty-free, irrevocable + (except as stated in this section) patent license to make, have made, + use, offer to sell, sell, import, and otherwise transfer the Work, + where such license applies only to those patent claims licensable + by such Contributor that are necessarily infringed by their + Contribution(s) alone or by combination of their Contribution(s) + with the Work to which such Contribution(s) was submitted. If You + institute patent litigation against any entity (including a + cross-claim or counterclaim in a lawsuit) alleging that the Work + or a Contribution incorporated within the Work constitutes direct + or contributory patent infringement, then any patent licenses + granted to You under this License for that Work shall terminate + as of the date such litigation is filed. + + 4. Redistribution. You may reproduce and distribute copies of the + Work or Derivative Works thereof in any medium, with or without + modifications, and in Source or Object form, provided that You + meet the following conditions: + + (a) You must give any other recipients of the Work or + Derivative Works a copy of this License; and + + (b) You must cause any modified files to carry prominent notices + stating that You changed the files; and + + (c) You must retain, in the Source form of any Derivative Works + that You distribute, all copyright, patent, trademark, and + attribution notices from the Source form of the Work, + excluding those notices that do not pertain to any part of + the Derivative Works; and + + (d) If the Work includes a "NOTICE" text file as part of its + distribution, then any Derivative Works that You distribute must + include a readable copy of the attribution notices contained + within such NOTICE file, excluding any notices that do not + pertain to any part of the Derivative Works, in at least one + of the following places: within a NOTICE text file distributed + as part of the Derivative Works; within the Source form or + documentation, if provided along with the Derivative Works; or, + within a display generated by the Derivative Works, if and + wherever such third-party notices normally appear. The contents + of the NOTICE file are for informational purposes only and + do not modify the License. You may add Your own attribution + notices within Derivative Works that You distribute, alongside + or as an addendum to the NOTICE text from the Work, provided + that such additional attribution notices cannot be construed + as modifying the License. + + You may add Your own copyright statement to Your modifications and + may provide additional or different license terms and conditions + for use, reproduction, or distribution of Your modifications, or + for any such Derivative Works as a whole, provided Your use, + reproduction, and distribution of the Work otherwise complies with + the conditions stated in this License. + + 5. Submission of Contributions. Unless You explicitly state otherwise, + any Contribution intentionally submitted for inclusion in the Work + by You to the Licensor shall be under the terms and conditions of + this License, without any additional terms or conditions. + Notwithstanding the above, nothing herein shall supersede or modify + the terms of any separate license agreement you may have executed + with Licensor regarding such Contributions. + + 6. Trademarks. This License does not grant permission to use the trade + names, trademarks, service marks, or product names of the Licensor, + except as required for reasonable and customary use in describing the + origin of the Work and reproducing the content of the NOTICE file. + + 7. Disclaimer of Warranty. Unless required by applicable law or + agreed to in writing, Licensor provides the Work (and each + Contributor provides its Contributions) on an "AS IS" BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or + implied, including, without limitation, any warranties or conditions + of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A + PARTICULAR PURPOSE. You are solely responsible for determining the + appropriateness of using or redistributing the Work and assume any + risks associated with Your exercise of permissions under this License. + + 8. Limitation of Liability. In no event and under no legal theory, + whether in tort (including negligence), contract, or otherwise, + unless required by applicable law (such as deliberate and grossly + negligent acts) or agreed to in writing, shall any Contributor be + liable to You for damages, including any direct, indirect, special, + incidental, or consequential damages of any character arising as a + result of this License or out of the use or inability to use the + Work (including but not limited to damages for loss of goodwill, + work stoppage, computer failure or malfunction, or any and all + other commercial damages or losses), even if such Contributor + has been advised of the possibility of such damages. + + 9. Accepting Warranty or Additional Liability. While redistributing + the Work or Derivative Works thereof, You may choose to offer, + and charge a fee for, acceptance of support, warranty, indemnity, + or other liability obligations and/or rights consistent with this + License. However, in accepting such obligations, You may act only + on Your own behalf and on Your sole responsibility, not on behalf + of any other Contributor, and only if You agree to indemnify, + defend, and hold each Contributor harmless for any liability + incurred by, or claims asserted against, such Contributor by reason + of your accepting any such warranty or additional liability. + + END OF TERMS AND CONDITIONS + + Copyright 2025 Slalom LLC + + Licensed under the Apache License, Version 2.0 (the "License"); + you may not use this file except in compliance with the License. + You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + + Unless required by applicable law or agreed to in writing, software + distributed under the License is distributed on an "AS IS" BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + See the License for the specific language governing permissions and + limitations under the License. diff --git a/partner-built/magellan/README.md b/partner-built/magellan/README.md new file mode 100644 index 00000000..9a941040 --- /dev/null +++ b/partner-built/magellan/README.md @@ -0,0 +1,83 @@ +# Magellan + +Enterprise knowledge discovery. Point Magellan at a folder of collected materials — +code, documents, meeting transcripts, API specs, diagrams — and it builds a structured, +queryable knowledge graph organized by business domain. Contradictions and open questions +are surfaced as primary outputs, not side effects. + +## Quick Start + +```bash +claude plugin install magellan@knowledge-work-plugins +``` + +``` +/magellan Run the discovery pipeline +/magellan:add /path/to/document.pdf Add a file +/magellan:add --correction "..." Record a verbal correction +/magellan:add --resolve c_001 "..." Resolve a contradiction +/magellan:add --resolve oq_003 "..." Answer an open question +/magellan:ask How does billing work? Query the knowledge graph +``` + +## What You Get + +``` +.magellan/ + contradictions_dashboard.md ← Contradictions & open questions (the priority) + contradictions_dashboard.html ← Print-friendly version + onboarding_guide.md ← Briefing for new team members + diagrams/ ← C4 architecture diagrams (Mermaid + PlantUML) + + domains// + facts/ ← Atomic facts from source documents + entities/ ← One file per knowledge graph entity + relationships.json ← How entities connect within this domain + summary.json ← Plain language domain narrative + contradictions.json ← What disagrees + open_questions.json ← What's unknown + deliverables/ ← Business rules, DDD specs, API specs, contracts +``` + +## How It Works + +The pipeline runs in two phases with quality gates after every step: + +**Phase 1 — Discovery**: Read files → extract atomic facts → build entities and +relationships → detect contradictions → link across domains → summarize each domain +→ generate onboarding guide, dashboard, and C4 diagrams. + +**Phase 2 — Design**: Formalize business rules (HARD / SOFT / QUESTIONABLE) → generate +DDD specs → implementation contracts → export rules as DMN, JSON, CSV, Gherkin → +generate OpenAPI and AsyncAPI specs. + +Every fact traces to a source document with an exact quote. Nothing is invented. +On subsequent runs, only new and modified files are processed. + +## Input Files + +Magellan reads anything Claude can read — text, code, markdown, CSV, JSON, YAML, XML, +PDF, images, and meeting transcripts. + +> **Note:** Claude does not yet natively read DOCX, PPTX, or XLSX files. Until it +> does, convert these to PDF before adding them to your workspace. + +Includes 12 language guides for legacy systems (RPG, COBOL, CL, DDS, JCL, CICS, +Assembler/370, NATURAL/ADABAS, IDMS, Easytrieve, PL/I, REXX) that improve extraction +precision. Add your own for proprietary languages. + +## Installation + +```bash +claude plugin install magellan@knowledge-work-plugins +``` + +Requires only Claude Code. No Python, no build steps, no system dependencies. + +## Contributing + +- **Skills**: Add or improve domain expertise in `skills/` +- **Language guides**: Add guides for new languages in `skills/ingestion/language_guides/` +- **Feature requests**: Open an issue + +Apache 2.0 license. diff --git a/partner-built/magellan/commands/add.md b/partner-built/magellan/commands/add.md new file mode 100644 index 00000000..c251135b --- /dev/null +++ b/partner-built/magellan/commands/add.md @@ -0,0 +1,117 @@ +--- +description: Add materials or corrections to the Magellan knowledge graph +--- + +# Add Materials + +Add a file, directory, correction, or resolution to the knowledge graph. + +## Usage + +``` +/magellan:add Add a file or directory +/magellan:add --correction "..." Record a verbal correction +/magellan:add --resolve "..." Resolve a contradiction or answer a question +``` + +## Pre-Flight Check + +Verify `.magellan/` exists. If not, initialize the workspace: +1. `mkdir -p .magellan/domains .magellan/diagrams .magellan/language_guides` +2. Write `.magellan/state.json` with `{"initialized_at": ""}`. +3. Copy starter language guides from skills directory if available. + +## Adding Files + +When a file path is provided: + +1. Read the file with the Read tool. +2. If it's a code file, check `.magellan/language_guides/` for a matching language + guide. Read the guide for context before extracting facts. +3. Apply the ingestion skill to extract atomic facts. +4. Write facts to `.magellan/domains//facts/.json` + following the fact schema in file-conventions. +5. Update `.magellan/processed_files.json` with the file's disposition. +6. Report: facts extracted, domain, any issues. + +When a directory path is provided: + +1. List all files using Glob. +2. Read `.magellan/processed_files.json` to find already-processed files. +3. Skip unchanged files. Display: "Processing N new files (M skipped)." +4. Process each file using the single-file workflow above. +5. Update processed_files.json after each file. +6. Report: total processed, facts per file, skipped count. + +## Adding Corrections + +When `--correction` is provided with a quoted string: + +1. Create a correction fact: + - Parse the text to identify the subject and claim + - Set `source.document` to `_corrections/.json` + - Set `source.location` to "verbal correction" + - Set `source.quote` to the exact text provided + - Set `confidence` to 0.95 + - Set tags to `["correction"]` + +2. Write to `.magellan/domains//facts/_corrections/.json`. + +3. Report what was recorded. The graph builder will detect contradictions + on the next pipeline run. + +## Resolving Contradictions and Answering Questions + +When `--resolve ` is provided with a resolution note: + +The `` can be a contradiction ID (e.g., `c_001`) or a question ID (e.g., `oq_001`). + +**For contradictions (c_xxx):** + +1. Search for the contradiction across all domains: + - Use Glob to find `domains/*/contradictions.json`. + - Read each file and find the entry matching the ID in the `active` array. +2. Move the contradiction from `active` to `resolved`: + - Remove it from the `active` array. + - Add `resolution_note` (the quoted text), `resolved_at` (current ISO timestamp), + and set `status` to `"resolved"`. + - Append it to the `resolved` array in the same file. +3. Write the updated file back. +4. If the contradiction had `related_entities`, read each entity and remove the + `contested: true` flag if no other active contradictions reference it. +5. Display: + ``` + Resolved: c_001 (billing) + Resolution: "Confirmed with John Smith: threshold changed to $15k in Q4" + Affected entities updated: billing:manual_review_bypass + ``` + +**For open questions (oq_xxx):** + +1. Search across all domains: + - Use Glob to find `domains/*/open_questions.json`. + - Read each file and find the matching entry in the `active` array. +2. Move from `active` to `resolved`: + - Remove from `active`. + - Add `answer_source` (the quoted text), `answered_at` (current ISO timestamp), + and set `status` to `"answered"`. + - Append to the `resolved` array. +3. Write the updated file back. +4. Display: + ``` + Answered: oq_003 (billing) + Answer: "The $10k threshold is still active per Jane Doe (Finance)" + ``` + +**If the ID is not found** in any domain, display: +``` +Not found: . Use /magellan:ask to list active contradictions and questions. +``` + +## Notes + +- Every fact traces to a source document. Corrections create a record document. +- Follow the fact schema in file-conventions exactly. +- For large files, extract facts in batches of 10-15 to stay within output limits. +- Resolving a contradiction creates an audit trail — the dashboard shows both + active and resolved items. diff --git a/partner-built/magellan/commands/ask.md b/partner-built/magellan/commands/ask.md new file mode 100644 index 00000000..3b85dda2 --- /dev/null +++ b/partner-built/magellan/commands/ask.md @@ -0,0 +1,47 @@ +--- +description: Query the Magellan knowledge graph using natural language +--- + +# Ask the Knowledge Graph + +Answer questions about the target systems using the knowledge graph. + +## Usage + +``` +/magellan:ask +``` + +## Behavior + +1. Locate the `.magellan/` directory in the workspace. If it doesn't exist, inform + the user that no knowledge graph has been built yet and suggest running + `/magellan:add` to ingest materials first. + +2. Read `.magellan/index.json` to understand the current KG scope (domains, entity + counts). + +3. Apply the querying skill to answer the question. The skill determines the right + approach based on question type: + - Overview questions → read domain summaries via Read tool + - Factual lookups → read specific entity files via Read tool + - Structural/dependency questions → read relationships.json and cross_domain.json, + traverse edges manually by following entity references + - Cross-domain questions → read cross_domain.json + follow edges + - Open questions/contradictions → read the per-domain JSON files + +4. Present the answer with: + - Direct response to the question + - Source citations for every factual claim (entity ID, document, location, confidence) + - Any relevant contradictions or open questions + - Low-confidence facts flagged explicitly + +## Examples + +``` +/magellan:ask How does the billing system process invoices? +/magellan:ask What systems depend on the AS/400 batch job? +/magellan:ask What are the known contradictions in the title domain? +/magellan:ask List all components that handle PII data +/magellan:ask What open questions do we have for the client? +``` diff --git a/partner-built/magellan/commands/magellan.md b/partner-built/magellan/commands/magellan.md new file mode 100644 index 00000000..4d7e532b --- /dev/null +++ b/partner-built/magellan/commands/magellan.md @@ -0,0 +1,412 @@ +--- +description: Magellan knowledge management system — show status or run the full discovery pipeline +--- + +# Magellan + +The main entry point for Magellan. Shows workspace status or runs the full +pipeline (Phase 1 Discovery + Phase 2 Design). + +## Usage + +``` +/magellan Run incremental pipeline (or full if first run) +/magellan Run pipeline on a specific workspace +/magellan --status Show workspace status only (no processing) +/magellan --full Force full pipeline re-run (ignore change detection) +/magellan --from-step N Re-run pipeline starting from Step N (skip earlier steps) +``` + +## Critical Rules + +1. ALL file writes to `.magellan/` MUST follow the schemas defined in the + file-conventions skill. Read the skill before writing any JSON file. +2. Facts MUST follow the atomic fact schema (required fields: statement, subject, + subject_domain, predicate, object, source with quote, confidence). +3. Entities are one file per entity in `domains//entities/`. +4. Do NOT create a monolithic `knowledge_graph.json`. The KG is stored as individual + entity files. +5. Facts MUST be organized by domain: `domains//facts/.json`. +6. When appending to contradictions or open questions, always read the existing file + first, add to the array, then write back. + +## Execution Rules + +1. **No background agents.** Every step runs in the foreground. Process files + sequentially. Complete each step fully before starting the next. + +2. **No step skipping.** Every numbered step is MANDATORY. Do not combine steps. + If a step fails, record the failure and continue — never skip silently. + +3. **Quality gate after every step.** Apply the pipeline-review skill after each + step. Fix blockers before proceeding. Accumulate findings in + `.magellan/pipeline_feedback.json`. + +4. **No subagent delegation.** Every step executes in the main conversation context. + +5. **Context hygiene.** Use Glob to count files rather than reading them all. + Use Read with offset/limit for large files. Read only the fields you need. + +## Behavior + +When run, determine the target workspace: +- If a path argument is provided, use that path. +- If no argument is provided, use the current working directory. + +Then determine the run mode: + +- **`--status`** → show status only (Status Mode). +- **`--full`** → force full pipeline re-run. +- **`--from-step N`** → skip to Step N using existing data from earlier steps. +- `.magellan/` does not exist → full pipeline. +- `.magellan/` exists with `last_run` in state.json → incremental mode. +- `.magellan/` exists without `last_run` → show status. + +## Status Mode + +1. Read `.magellan/state.json` and `.magellan/index.json`. +2. Use Glob to find `domains/*/open_questions.json` and `domains/*/contradictions.json`. + Read each and count entries. +3. Read `.magellan/processed_files.json` for file tracking data. +4. If `state.json` has `last_run.git_ref`, run `git diff --name-only HEAD` + via Bash to detect changes. +5. Display status dashboard: + +``` +Magellan Knowledge Graph Status +================================ +Files tracked: 200 (197 ingested, 3 no_facts) +Domains: 5 (billing, title, transportation, dealer_management, infrastructure) +Total entities: 312 +Total edges: 489 + +Open questions: 12 +Contradictions: 4 + +Top priority items: + [critical] c_003: Settlement threshold mismatch between code and config + [high] oq_003: Is the $10,000 MANUAL_REVIEW threshold still active? +``` + +If no `.magellan/` directory is found: + +``` +No Magellan workspace found. + /magellan /path/to/workspace Run the full discovery pipeline + /magellan:add Add a single document +``` + +--- + +## Pipeline + +### Step 1: Initialize and Discover Files + +**Initialize** (if `.magellan/` doesn't exist): + +1. Create directory structure via Bash: + ``` + mkdir -p .magellan/domains .magellan/diagrams .magellan/language_guides + ``` +2. Write `.magellan/state.json`: `{"initialized_at": ""}` +3. Copy starter language guides from `skills/ingestion/language_guides/` to + `.magellan/language_guides/` (skip existing — user may have customized). +4. Initialize `.magellan/pipeline_feedback.json` with empty structure. + +**Resume check**: If `.magellan/` exists and `state.json` has `pipeline_step`, +offer to resume from the last completed step. + +**Discover files**: + +- **Full mode**: Use Glob to list all files, excluding `.magellan/` and `.git/`. +- **Incremental mode**: Read `state.json` for `last_run.git_ref`. Use + `git diff --name-only HEAD` and `git ls-files --others --exclude-standard` + via Bash to find new/modified files. Skip unchanged files. + +Display: "Found N files to process." + +**Quality Gate.** Update state.json. + +### Step 2: Extract Facts (Stage 1) + +For each file in the processing list: + +1. **Check file size** via Bash (`wc -l` for text, `wc -c` for binary). +2. **Read** the file following the ingestion skill's reading strategy: + - Small files (under ~5,000 lines): read entire file in one pass. + - Large files (over ~5,000 lines): read in sections using `offset` and `limit`. + See the "Reading Large Documents" section in the ingestion skill. + - If it's a code file, check `.magellan/language_guides/` for a matching guide. + Read the guide once per language (cache in context for subsequent files). +3. **Extract facts** by applying the ingestion skill. +4. **Write facts** to `.magellan/domains//facts/.json` + following the fact schema in file-conventions. +5. **Record disposition** in `.magellan/processed_files.json` (read, update, write back). +6. Display: "Ingested [N/total]: filename (M facts → domain)" + +**Track affected domains** as you process files. + +If a file cannot be read or produces no facts, record the disposition and continue. +**Nothing is silently skipped.** + +After all files, display: + +``` +File Processing Summary +======================= +Total files: 52 + ingested: 47 + no_facts: 3 + unreadable: 2 + --- + Accounted: 52/52 +``` + +**Verify — File Ledger Reconciliation:** +1. Count all files in the workspace via Glob (excluding `.magellan/`, `.git/`). +2. Count all entries in `.magellan/processed_files.json`. +3. If workspace files > ledger entries, list the missing files by name. + These were silently skipped — this is a **blocker**. Process them before continuing. +4. Display: "Ledger: N/N files accounted for ✓" or "MISSING: file1, file2, ..." + +**Verify — Fact Count Cross-Check:** +1. For each domain, sum `fact_count` across all files in `domains//facts/`. +2. Compare this to the total facts reported during ingestion. +3. If they differ, some facts were lost during write. Flag as blocker. + +**Quality Gate.** Update state.json. + +### Step 3: Build Graph (Stage 2a) + +Build entities and intra-domain relationships from atomic facts. + +For each fact file in affected domains: +1. Read the facts. +2. Apply the graph-building skill: process 5-10 facts at a time, write each + entity immediately to `.magellan/domains//entities/.json`. +3. Apply contradiction-detection: compare new facts against existing entities. + Append contradictions and open questions to the domain's JSON files. +4. Write relationships to `.magellan/domains//relationships.json`. +5. Display: "Built: domain (N entities, M relationships)" + +**Verify — Entity-to-Source Traceability:** +For each domain, read 3 entities and verify each evidence entry references a +source document that has a corresponding fact file in `domains//facts/`. +If an entity cites a source with no matching fact file, flag as warning — the +evidence chain is broken. + +**Quality Gate.** Update state.json. + +### Step 4: Cross-Domain Linking (Stage 2b) + +Separate, mandatory pass. Do NOT fold into Step 3. + +1. Use Glob to list all domains. +2. For each domain, list entities and read names + summaries. +3. Compare across domains for SAME_AS candidates. +4. Write `.magellan/cross_domain.json`. +5. Detect cross-domain contradictions. +6. Display: "Cross-domain: N SAME_AS, M relationships" + +Skip if fewer than 2 domains. + +**Verify — Relationship Integrity:** +Read `cross_domain.json` and each domain's `relationships.json`. For every edge, +verify both the `from` and `to` entity IDs exist as files in their respective +`entities/` directories (use Glob). List any dangling references — these point +to entities that were never created or were lost. Flag as warning. + +**Quality Gate.** Update state.json. + +### Step 5: Entity Deduplication + +Scan each domain for near-duplicate entities (>80% name similarity or +near-identical summaries). Merge duplicates: keep the entity with more evidence, +mark the other as superseded. + +**Verify — Evidence Preservation:** +For each merge performed, read the kept entity and verify its `evidence` array +contains entries from both original entities. Count evidence entries before and +after — the kept entity must have ≥ the sum of both originals. If evidence was +lost during merge, flag as blocker and restore from the superseded entity file +(which still exists, marked as superseded). + +**Quality Gate.** Update state.json. + +### Step 6: Domain Summarization (Stage 2c) + +For each domain: +1. Count entities, read relationships, calculate hub scores. +2. Read top 10-15 hub entities. +3. Count contradictions and open questions. +4. Synthesize a 3-8 paragraph narrative. +5. Write `.magellan/domains//summary.json`. + +**Quality Gate.** Update state.json. + +### Step 7: Onboarding Guide + +Apply the onboarding-guide skill to generate `.magellan/onboarding_guide.md`. + +**Quality Gate.** Update state.json. + +### Step 8: Contradictions Dashboard + +Apply the dashboard-generation skill to generate the markdown and HTML dashboard. + +**Quality Gate.** Update state.json. + +### Step 9: C4 Architecture Diagrams + +Apply the diagram-generation skill. Generate both Mermaid and PlantUML for +each level (context, containers, per-domain components). + +**Quality Gate.** Update state.json. + +### Step 10: Update State and Index + +1. Update `state.json` with `last_run` block (timestamp, git_ref, mode, file count). +2. Update `index.json` with domain stats. +3. Display status dashboard. + +### Step 11: Phase 1 Verification + +Verify Phase 1 outputs exist and contain meaningful content: +- At least 1 domain with entities +- Entities have summaries (50+ chars), evidence with quotes, weight > 0 +- Relationships exist for domains with 3+ entities +- Domain summaries have narratives (200+ chars) with hubs +- Onboarding guide, dashboard, and diagrams exist + +Failure conditions STOP the pipeline. Warning conditions are logged. + +--- + +## Phase 2: Design Generation + +Runs automatically after Phase 1 verification. + +### Step 12: Business Rules Per Domain + +Classify rules as HARD / SOFT / QUESTIONABLE. Cite source entities. + +### Step 13: DDD Specs Per Domain + +Bounded context: entities, aggregates, events, commands, integration points. + +### Step 14: Implementation Contracts Per Domain + +API contracts, event schemas, data models, integration contracts. + +### Step 15: Per-Domain Review Documents + +Decisions, proposed system, differences, risks, open items. + +**Quality Gate** for Steps 12-15. + +### Step 16: Business Rules Export (MANDATORY) + +DMN XML, JSON, CSV, Gherkin — four formats per domain. + +**Quality Gate.** + +### Step 17: OpenAPI + AsyncAPI Specs (MANDATORY) + +Per-domain specs + cross-domain integration specs in `_integration/`. + +**Quality Gate.** + +### Step 18: Phase 2 Verification + +Verify all deliverables exist with meaningful content. + +### Step 19-20: Regenerate Dashboard and Diagrams + +Capture any new contradictions or relationships from Phase 2. + +### Step 21: Update State and Index + +Final stats. + +### Step 22: Final Summary Report + +Display the summary, then run the coverage matrix. + +``` +Pipeline Complete (Phase 1 + Phase 2) +====================================== +Phase 1: Discovery + Files processed: 47 + Facts extracted: 312 + Entities: 89 + Relationships: 134 + Contradictions: 4 + Open questions: 12 + +Phase 2: Design + Business rules: 142 (52 HARD, 63 SOFT, 27 QUESTIONABLE) + DDD specs: 5 domains + Rules exports: DMN, JSON, CSV, Gherkin + API specs: OpenAPI + AsyncAPI +``` + +**Verify — Coverage Matrix:** +For each source document in processed_files.json with disposition `ingested`: +1. Verify it has a corresponding fact file in `domains//facts/`. +2. Read the fact file and get the fact_count. +3. Check if any entities reference this source in their evidence arrays. + +Display a coverage table: + +``` +Source Coverage +=============== + Source Document Facts Entities Domain + Q3_ops_runbook.pdf 12 8 billing + CBBLKBOOK.cblle 15 6 billing + dealer_manual.pdf 3 2 dealer_management + README.md 0 0 — (no_facts) + config.bin — — — (unreadable) + --- + 47/52 files contributed to the knowledge graph. + 5 files produced no knowledge (3 no_facts, 2 unreadable). +``` + +Flag any file with disposition `ingested` but 0 entities referencing it — those +facts were extracted but never built into the graph. + +``` +Next steps: + /magellan:ask Query the knowledge graph + /magellan:add Add more materials + /magellan Check status +``` + +--- + +## Error Handling + +Every file must reach a recorded disposition: + +| Status | Meaning | +|--------|---------| +| `ingested` | Facts extracted successfully | +| `no_facts` | File read but no extractable facts | +| `unreadable` | File could not be read | +| `extraction_error` | Error during fact extraction | +| `skipped_by_rule` | Excluded by project rule | + +Rules: +- Never let a file failure stop the pipeline. +- Every failure is logged with the error and filename. +- The final summary includes counts for every disposition. + +## Context Window Management + +The pipeline is **resumable**: + +- `state.json` tracks the last completed pipeline step. +- `processed_files.json` tracks every file's disposition. +- On resume, processed files are skipped automatically. + +When context runs low, save progress and tell the user to run `/magellan` again. diff --git a/partner-built/magellan/skills/api-spec-generation/SKILL.md b/partner-built/magellan/skills/api-spec-generation/SKILL.md new file mode 100644 index 00000000..a61d82b7 --- /dev/null +++ b/partner-built/magellan/skills/api-spec-generation/SKILL.md @@ -0,0 +1,305 @@ +--- +name: api-spec-generation +description: Generate OpenAPI 3.1 and AsyncAPI 3.0 specification files from DDD specs, implementation contracts, and knowledge graph entities. Produces machine-readable API documentation that developers use with Swagger UI, Postman, Redoc, and code generators. +--- + +# API Specification Generation + +You produce OpenAPI 3.1 and AsyncAPI 3.0 YAML files from the Phase 2 +deliverables and knowledge graph. Developers use existing tools — Swagger UI, +Redoc, Postman, AsyncAPI Studio — to browse, validate, and mock the proposed +APIs. No custom viewer needed. + +## Output Files + +### Per-Domain Specs + +In `.magellan/domains//deliverables/`: + +| File | Format | Use Case | +|------|--------|----------| +| `openapi.yaml` | OpenAPI 3.1 | REST API documentation, Swagger UI, Postman import, SDK generation | +| `asyncapi.yaml` | AsyncAPI 3.0 | Event documentation, message broker configuration | + +### Cross-Domain Integration Spec + +In `.magellan/domains/_integration/`: + +| File | Format | Use Case | +|------|--------|----------| +| `openapi.yaml` | OpenAPI 3.1 | All inter-service REST endpoints aggregated | +| `asyncapi.yaml` | AsyncAPI 3.0 | All published/subscribed events across domains | + +## When to Generate + +- After Phase 2 contract generation (runs as a pipeline step after rules export) +- On demand when an architect requests API spec regeneration + +## Process + +### Per-Domain Specs + +For each domain discovered via Glob on `.magellan/domains/*/`: + +1. Read the DDD spec (`ddd_spec.md`) from the deliverables directory using the + Read tool — it contains bounded context, aggregates, events, commands. +2. Read the contracts (`contracts.md`) from the deliverables directory — it + contains API endpoints, event schemas, data models. +3. Use Glob on `.magellan/domains//entities/*.json` to discover entities, + then Read key entities for data model schemas. +4. Use the Read tool to read `.magellan/domains//relationships.json` for + entity relationships (informs data model foreign keys and nested schemas). +5. Use the Read tool to read `.magellan/cross_domain.json` to identify + integration points with other domains. +6. Generate `openapi.yaml` following the OpenAPI format below. Write immediately. +7. Generate `asyncapi.yaml` following the AsyncAPI format below. Write immediately. +8. Display: "API specs: domain_name (N endpoints, M events)" + +### Cross-Domain Integration Spec + +After all per-domain specs are generated: + +1. Read `.magellan/cross_domain.json` using the Read tool. +2. For each inter-domain relationship, collect the relevant endpoints and events + from the per-domain specs. +3. Generate `.magellan/domains/_integration/openapi.yaml` aggregating all + inter-service REST endpoints. +4. Generate `.magellan/domains/_integration/asyncapi.yaml` aggregating all + cross-domain events with their channels. +5. Display: "Integration specs: N cross-domain endpoints, M cross-domain events" + +## OpenAPI 3.1 Format (`openapi.yaml`) + +```yaml +openapi: "3.1.0" +info: + title: Billing Service + version: "1.0.0" + description: | + Manages invoicing, fee calculation, payment processing, + and settlement for vehicle auction transactions. + Generated from Magellan KG — billing domain. + contact: + name: Magellan Knowledge Graph + x-generated: "2026-02-23T10:00:00Z" + x-domain: billing + x-entity-count: 23 + +paths: + /invoices: + post: + summary: Create invoice for completed sale + operationId: createInvoice + tags: [invoicing] + requestBody: + required: true + content: + application/json: + schema: + $ref: '#/components/schemas/CreateInvoiceRequest' + example: + saleId: "a1b2c3d4-5678-90ab-cdef-1234567890ab" + vehicleVin: "1HGCM82633A004352" + buyerDealerId: "DLR-4521" + salePrice: 18500.00 + responses: + '201': + description: Invoice created + content: + application/json: + schema: + $ref: '#/components/schemas/Invoice' + '400': + $ref: '#/components/responses/ValidationError' + '409': + description: Duplicate invoice for this sale + + /invoices/{invoiceId}/approve: + post: + summary: Manager approval for invoices above threshold + description: | + Required when invoice amount exceeds $15,000 (BR-FIN-001). + Source: CBBLKBOOK.cblle:247 + operationId: approveInvoice + tags: [invoicing, approval] + parameters: + - name: invoiceId + in: path + required: true + schema: + type: string + +components: + schemas: + Invoice: + type: object + required: [invoiceId, saleId, amount, status] + properties: + invoiceId: + type: string + format: uuid + saleId: + type: string + format: uuid + amount: + type: number + format: decimal + status: + type: string + enum: [draft, pending_approval, approved, paid, reversed] + + responses: + ValidationError: + description: Request validation failed + content: + application/json: + schema: + type: object + properties: + error: + type: string + details: + type: array + items: + type: object + properties: + field: + type: string + message: + type: string +``` + +### OpenAPI Generation Rules + +- **Version**: Always OpenAPI 3.1.0 +- **Info block**: Include `x-generated` timestamp, `x-domain` name, `x-entity-count` +- **Paths**: Derive from contracts.md API endpoints. Include all HTTP methods, + parameters, request/response schemas. +- **Tags**: Group endpoints by aggregate or business process from the DDD spec. +- **Components/Schemas**: Derive from KG entities. Map entity properties to JSON + Schema types. Use `$ref` for shared schemas. +- **Examples**: Synthesize realistic example payloads from KG entity properties. + Use domain-appropriate values (real-looking VINs, dealer IDs, invoice numbers), + not placeholder "string" values. +- **Error responses**: Use a consistent error schema across all endpoints: + 400 (validation), 401 (auth), 403 (forbidden), 404 (not found), 409 (conflict). +- **Business rule references**: When an endpoint enforces a business rule, cite + it in the description (e.g., "Required per BR-FIN-001"). +- **Security**: Note auth requirements from contracts.md. Use security schemes + if specified (JWT Bearer, API key, etc.). + +## AsyncAPI 3.0 Format (`asyncapi.yaml`) + +```yaml +asyncapi: "3.0.0" +info: + title: Billing Domain Events + version: "1.0.0" + description: | + Events published and consumed by the billing domain. + Generated from Magellan KG. + x-generated: "2026-02-23T10:00:00Z" + x-domain: billing + +channels: + invoiceCreated: + address: "billing.invoices.created" + messages: + invoiceCreated: + $ref: '#/components/messages/InvoiceCreated' + settlementCompleted: + address: "billing.settlements.completed" + messages: + settlementCompleted: + $ref: '#/components/messages/SettlementCompleted' + +operations: + publishInvoiceCreated: + action: send + channel: + $ref: '#/channels/invoiceCreated' + summary: Published when a new invoice is generated from a sale + consumePaymentReceived: + action: receive + channel: + $ref: '#/channels/paymentReceived' + summary: Consumed when a payment is processed by the payment gateway + +components: + messages: + InvoiceCreated: + payload: + type: object + required: [invoiceId, saleId, amount, timestamp] + properties: + invoiceId: + type: string + format: uuid + saleId: + type: string + format: uuid + amount: + type: number + timestamp: + type: string + format: date-time + example: + invoiceId: "INV-2024-00847" + saleId: "a1b2c3d4-5678-90ab-cdef-1234567890ab" + amount: 19275.00 + timestamp: "2024-01-15T14:31:02Z" +``` + +### AsyncAPI Generation Rules + +- **Version**: Always AsyncAPI 3.0.0 +- **Channels**: Derive from DDD spec domain events. Use dot-notation addressing + (`domain.aggregate.event`). +- **Operations**: Separate `send` (publish) and `receive` (subscribe) operations. +- **Messages**: Full payload schemas with typed fields and realistic examples. +- **Cross-domain events**: Note which other domains consume/produce each event + in the description. +- **Bindings**: Include broker-specific hints if the KG contains infrastructure + details (Kafka topic config, partition keys, etc.). + +## Cross-Domain Integration Spec + +The integration specs aggregate inter-service communication: + +### `_integration/openapi.yaml` +- Collects all REST endpoints that are called cross-domain +- Groups by calling domain → target domain +- Shows the complete synchronous API surface between services + +### `_integration/asyncapi.yaml` +- Collects all events that cross domain boundaries +- Shows publisher → channel → subscriber relationships +- Acts as an event catalog for the entire system + +## Critical: Use Built-in Tools for Reading + +- ALL KG data reads MUST use Claude's built-in tools: + - **Discover domains**: Glob on `.magellan/domains/*/` + - **Read entity details**: Read tool on `.magellan/domains//entities/.json` + - **Read relationships**: Read tool on `.magellan/domains//relationships.json` + - **Read cross-domain edges**: Read tool on `.magellan/cross_domain.json` + - **Discover entities**: Glob on `.magellan/domains//entities/*.json` + - **Read domain summaries**: Read tool on `.magellan/domains//summary.json` +- Read Phase 2 deliverables (`ddd_spec.md`, `contracts.md`) using the Read tool + (they are generated artifacts in the deliverables directory). +- Write spec files using the Write tool (same pattern as other generated artifacts). +- Create the `_integration/` directory if it doesn't exist. + +## What You Do NOT Do + +- Do not invent API endpoints. Only generate specs for endpoints described in + contracts.md or derivable from DDD spec aggregates and commands. +- Do not use placeholder values in examples. Synthesize realistic, domain-appropriate + values from KG entity properties. +- Do not generate invalid YAML. Ensure proper indentation, quoting of special + characters, and valid OpenAPI/AsyncAPI structure. +- Do not skip error responses. Every endpoint needs at least 400 and 500 responses. +- Do not merge domains into one spec. Each domain gets its own pair of files. + The integration specs are separate aggregations. +- Do not omit source traceability. Reference business rules and KG entities in + endpoint descriptions where relevant. diff --git a/partner-built/magellan/skills/contradiction-detection/SKILL.md b/partner-built/magellan/skills/contradiction-detection/SKILL.md new file mode 100644 index 00000000..27e0f97f --- /dev/null +++ b/partner-built/magellan/skills/contradiction-detection/SKILL.md @@ -0,0 +1,138 @@ +--- +name: contradiction-detection +description: Detect contradictions between facts and existing KG entities, and raise open questions for ambiguous or incomplete information. Use during graph building (Stage 2a) and cross-domain linking (Stage 2b). +--- + +# Contradiction Detection + +You detect two types of issues during graph building: + +1. Contradictions — when new facts conflict with existing entities +2. Open questions — when facts are ambiguous, incomplete, or reference undocumented behavior + +These are the most valuable outputs of the system. The faster they are surfaced, +the faster the team builds a complete and trustworthy picture. + +## Detecting Contradictions + +A contradiction exists when: + +- A new fact states a different value for the same property of an existing entity + (e.g., threshold is $10k in one source, $5k in another) +- A new fact describes behavior that conflicts with documented behavior + (e.g., "batch runs nightly" vs. "batch runs weekly") +- A new fact says something was removed or deprecated that another source says is active + +When you detect a contradiction, record it by reading the existing contradictions file +for the domain, appending the new entry, and writing it back: + +1. Read `.magellan/domains//contradictions.json` using the Read tool. + If the file does not exist yet, start with `{"active": [], "resolved": []}`. +2. Append the new contradiction to the `active` array. +3. Write the updated file back using the Write tool. + +The contradiction object format: + +```json +{ + "contradiction_id": "c_", + "description": "Clear, human-readable description of the conflict", + "domain": "billing", + "sources": [ + { + "document": "Q3_ops_runbook.pdf", + "claim": "Invoices exceeding $10,000 are routed to MANUAL_REVIEW", + "location": "page 12", + "confidence": 0.75 + }, + { + "document": "billing_db_config.sql", + "claim": "MANUAL_REVIEW_THRESHOLD = 5000", + "location": "line 47", + "confidence": 0.90 + } + ], + "related_entities": ["billing:manual_review_bypass"], + "severity": "high", + "status": "open", + "detected_at": "" +} +``` + +## Severity Levels + +- `critical` — contradicts a HARD business rule or compliance requirement +- `high` — contradicts a core system behavior or significant threshold +- `medium` — contradicts operational detail or non-critical configuration +- `low` — minor discrepancy in descriptive or contextual information + +## Raising Open Questions + +An open question should be raised when: + +- A fact references undocumented behavior ("the system does X but no documentation explains why") +- A fact mentions a system, process, or rule that no other source corroborates +- A fact is ambiguous and could be interpreted multiple ways +- A code path exists but its purpose is unclear +- A business rule is implemented but no policy document defines it + +Record the open question by reading the existing open questions file for the domain, +appending the new entry, and writing it back: + +1. Read `.magellan/domains//open_questions.json` using the Read tool. + If the file does not exist yet, start with `{"active": [], "resolved": []}`. +2. Append the new question to the `active` array. +3. Write the updated file back using the Write tool. + +The open question object format: + +```json +{ + "question_id": "oq_", + "question": "Clear question that a client SME could answer", + "context": "Why this question matters and what evidence prompted it", + "domain": "billing", + "related_entities": ["billing:invoice_generation", "billing:manual_review_bypass"], + "directed_to": "senior_developer", + "priority": "high", + "status": "open", + "raised_at": "", + "raised_by": "ingestion of " +} +``` + +## Who Should Answer (directed_to) + +- `senior_developer` — questions about code behavior, undocumented logic +- `dba` — questions about database schemas, data relationships, configs +- `business_analyst` — questions about business rules, domain logic +- `operations` — questions about batch jobs, monitoring, runbooks +- `finance_ops` — questions about financial rules, thresholds, compliance +- `security` — questions about access control, encryption, audit +- `management` — questions about organizational structure, ownership + +## Priority + +- `critical` — blocks understanding of a core business process +- `high` — significant gap that affects design decisions +- `medium` — useful context but not blocking +- `low` — nice to know, can be deferred + +## When Contradictions Affect Existing Entities + +When you create a contradiction, also update the affected entity: +1. Read the entity file at `.magellan/domains//entities/.json` using the Read tool. +2. Add a `contested: true` property to flag it. +3. Update the summary to mention the contradiction. +4. Write the entity back to the same path using the Write tool. + +This ensures that any model reading the entity sees the dispute immediately +rather than having to check contradictions.json separately. + +## What You Do NOT Do + +- Do not resolve contradictions yourself. Surface them for the team. +- Do not lower an entity's weight because of a contradiction. Set `contested: true` instead. +- Do not create contradictions for minor wording differences that don't change meaning. +- Do not create open questions for things that are clearly explained in other documents + you haven't processed yet — those will be resolved when those documents are ingested. diff --git a/partner-built/magellan/skills/cross-domain-linking/SKILL.md b/partner-built/magellan/skills/cross-domain-linking/SKILL.md new file mode 100644 index 00000000..705fa205 --- /dev/null +++ b/partner-built/magellan/skills/cross-domain-linking/SKILL.md @@ -0,0 +1,176 @@ +--- +name: cross-domain-linking +description: Detect SAME_AS entities and cross-domain relationships across the knowledge graph. Use after graph building (Stage 2a) to link entities that represent the same concept in different domains. +--- + +# Cross-Domain Linking (Stage 2b) + +## Critical: Use Built-in Tools for All Operations + +You MUST use Claude's built-in tools for reading and writing: +- Glob on `.magellan/domains/*/` to discover domains +- Glob on `.magellan/domains//entities/*.json` to discover entities +- Read tool on entity files to read entity details +- Read/Write tools on `.magellan/cross_domain.json` for cross-domain edges +- Read/Write tools on `.magellan/domains//contradictions.json` for contradictions +- Read/Write tools on `.magellan/domains//open_questions.json` for open questions + +Do NOT skip this step. + +You scan all domains in the knowledge graph to detect: + +1. SAME_AS entities — the same concept appearing in different domains +2. Cross-domain relationships — edges connecting entities across domains +3. Cross-document contradictions — facts in one domain that conflict with another + +## Process + +1. Get the inventory first — never load the entire graph into context. + a. Use Glob on `.magellan/domains/*/` to discover all domains (each subdirectory name is a domain). + b. For each domain, use Glob on `.magellan/domains//entities/*.json` to get entity file paths. + c. For each entity, use the Read tool on the entity file to get its name, type, domain, and summary. + d. Build a lightweight inventory: `[{entity_id, name, type, domain, summary_snippet}]` + +2. Compare entities across domains for SAME_AS candidates. + Two entities are SAME_AS candidates when: + - They have similar names (e.g., "Vehicle Title" in billing and "Title Record" in title) + - They describe the same real-world concept from different perspectives + - They reference the same external system, database, or data entity + + Do NOT create SAME_AS edges for: + - Generic entities that happen to share a name (e.g., "Config" in two domains) + - Entities that are clearly different things with similar names + - Entities within the same domain (intra-domain linking is handled in Stage 2a) + +3. For each SAME_AS candidate pair, read the full entities to confirm. + Only create the edge if the entities genuinely represent the same concept. + +4. Detect cross-domain relationships. + When one domain's entity references another domain's entity (e.g., billing's + settlement process triggers title's transfer event), create a cross-domain edge. + +5. Detect cross-document contradictions. + When entities in different domains make conflicting claims about the same system + behavior, create a contradiction. Read `.magellan/domains//contradictions.json` + using the Read tool, append the new contradiction to the `active` array, and write + it back using the Write tool. If the file does not exist yet, start with + `{"active": [], "resolved": []}`. + +6. Write results. + - Read existing `.magellan/cross_domain.json` using the Read tool. + If the file does not exist yet, start with `{"edges": []}`. + - Append new edges to the `edges` array (don't overwrite existing ones). + - Write the updated file back using the Write tool. + +## Weight Calculation + +When assigning weights to cross-domain edges, calculate directly using this formula: + +``` +effective_weight = clamp(base_weight + modifiers, 0.0, 1.0) +``` + +Base weights by source type: +- production_source_code: 0.95 +- database_schema: 0.90 +- api_spec: 0.85 +- config_file: 0.80 +- official_documentation: 0.70 +- meeting_transcript: 0.50 +- email_thread: 0.40 +- informal_notes: 0.30 + +Modifiers: +- Corroboration: +0.05 per additional independent source (max +0.15) +- Recency: +0.05 if document is less than 6 months old +- Reference count: +0.02 per entity referencing this one (max +0.10) + +Clamp the final value to the range [0.0, 1.0]. + +## SAME_AS Edge Format + +Every cross-domain edge MUST include evidence tracing it back to source facts. +Cross-domain edges without evidence are untraceable and untrustworthy. + +```json +{ + "edge_id": "cx_", + "from": "billing:vehicle_record", + "to": "title:vehicle_title", + "type": "SAME_AS", + "properties": { + "description": "Same vehicle entity referenced in both billing and title domains" + }, + "evidence": { + "from_entity_source": "billing:vehicle_record evidence from Dealer Master Manual p.12", + "to_entity_source": "title:vehicle_title evidence from Title Inventory Report p.3", + "linking_rationale": "Both entities reference VIN-keyed vehicle records with overlapping fields (VIN, year, make, model)" + }, + "confidence": 0.92, + "weight": 0.85 +} +``` + +## Cross-Domain Relationship Format + +Same as intra-domain relationships but the `from` and `to` span different domains: + +```json +{ + "edge_id": "cx_", + "from": "billing:settlement_service", + "to": "title:title_transfer_event", + "type": "TRIGGERS", + "properties": { + "description": "Settlement completion triggers title transfer to buyer", + "trigger": "settlement finalized" + }, + "evidence": { + "source": "Architecture overview.pdf", + "location": "page 8", + "quote": "Title transfer is initiated upon settlement confirmation." + }, + "confidence": 0.75, + "weight": 0.80 +} +``` + +## Cross-Domain Relationship Types + +In addition to the standard relationship types (DEPENDS_ON, CALLS, etc.), these +are common across domains: + +| Type | Meaning | +|------|---------| +| `SAME_AS` | Same real-world concept in different domains | +| `TRIGGERS` | Action in domain A causes action in domain B | +| `SHARES_DATA_WITH` | Two domains exchange data | +| `DEPENDS_ON` | Domain A requires domain B to function | +| `INTEGRATES_WITH` | System-level integration across domains | + +## Confidence for SAME_AS + +- 0.95+: Entities have the same name, same type, and overlapping evidence +- 0.85-0.94: Strong name similarity and matching descriptions +- 0.70-0.84: Related concepts that likely represent the same thing +- Below 0.70: Do not create a SAME_AS edge. If unsure, raise an open question instead. + +## Scale Awareness + +For large graphs with hundreds of entities across many domains, be strategic: +- Start with entity names — look for obvious matches first +- Group by entity type — compare Components with Components, not Components with Rules +- Use summary snippets for quick comparison before loading full entities +- Skip domains with no plausible overlap (e.g., infrastructure vs. security) + +Do NOT try to compare every entity with every other entity. Use the inventory +to identify candidates efficiently. + +## What You Do NOT Do + +- Do not create SAME_AS edges within the same domain. +- Do not merge entities. SAME_AS preserves both entities independently. +- Do not create ANY edge without an `evidence` field. Every cross-domain link + must cite the source entities and explain why the link exists. Edges without + evidence are the #1 quality issue found in production runs. +- Do not overwrite existing cross-domain edges. Append to them. diff --git a/partner-built/magellan/skills/dashboard-generation/SKILL.md b/partner-built/magellan/skills/dashboard-generation/SKILL.md new file mode 100644 index 00000000..9c4989ea --- /dev/null +++ b/partner-built/magellan/skills/dashboard-generation/SKILL.md @@ -0,0 +1,307 @@ +--- +name: dashboard-generation +description: Generate a contradictions and open questions dashboard as structured markdown plus a print-friendly HTML version. The dashboard consolidates per-domain data into a single meeting-ready document. +--- + +# Contradictions & Open Questions Dashboard + +You produce `contradictions_dashboard.md` — a structured markdown document that +consolidates all contradictions and open questions across every domain into one +meeting-ready view. After writing the markdown, you generate a print-friendly HTML +version using the Write tool with the inline HTML template defined below. + +This is the document architects bring to client meetings. It answers: "here's what +we found, here's what we need from you." Other AI tools can consume the markdown +to help find answers. + +## When to Generate + +- After the onboarding guide in a full pipeline run (Phase 1) +- Again after Phase 2 verification (to capture Phase 2 contradictions/questions) +- On demand when an architect requests a dashboard refresh + +## Process + +Read data per-domain to avoid loading everything into context at once. + +1. Discover all domains using Glob on `.magellan/domains/*/` (each subdirectory + name is a domain). +2. Read `.magellan/index.json` using the Read tool for overall stats (total entities, edges). +3. For each domain: + a. Read `.magellan/domains//contradictions.json` using the Read tool — + the `active` array contains active contradictions for this domain. + b. Read `.magellan/domains//open_questions.json` using the Read tool — + the `active` array contains active questions for this domain. + c. For resolved items, check if a `resolved` directory exists under the domain. + Read `.magellan/domains//resolved/contradictions.json` if it exists — + these are resolved contradictions for this domain (audit trail). Also check + for a `resolved` array within the contradictions.json file itself. + d. Similarly read `.magellan/domains//resolved/questions.json` if it + exists — these are answered questions for this domain. Also check for a + `resolved` array within the open_questions.json file itself. + e. Read `.magellan/domains//summary.json` using the Read tool — entity + count, hub count for context. +4. Synthesize the markdown following the Dashboard Structure below. +5. Write `contradictions_dashboard.md` to `.magellan/contradictions_dashboard.md` + using the Write tool. +6. Generate the HTML version by converting the markdown to HTML using the template + and CSS defined in the "HTML Generation" section below. Write the result to + `.magellan/contradictions_dashboard.html` using the Write tool. + +## Dashboard Structure + +Write the dashboard in Markdown with these exact sections: + +### 1. Executive Summary + +A table showing at-a-glance metrics: + +```markdown +| Metric | Count | +|--------|-------| +| Open Contradictions | N | +| Resolved Contradictions | N | +| Open Questions | N | +| Answered Questions | N | +| Domains Covered | N | +| Total Entities | N | +``` + +### 2. Severity Distribution + +A table showing the severity breakdown of all open items: + +```markdown +| Severity | Contradictions | Open Questions | Total | +|----------|---------------|----------------|-------| +| [critical] | N | N | N | +| [high] | N | N | N | +| [medium] | N | N | N | +| [low] | N | N | N | +``` + +### 3. Domain Breakdown + +A table showing per-domain counts: + +```markdown +| Domain | Open Contradictions | Open Questions | Entities | Hubs | +|--------|--------------------:|---------------:|---------:|-----:| +| billing | N | N | N | N | +| title | N | N | N | N | +``` + +Order domains by total open items (most first). + +### 4. Open Contradictions + +Group by domain, then by severity within each domain. + +For each contradiction: + +```markdown +### domain_name + +#### [severity] contradiction_id: Short description + +**Description**: Full description of the contradiction. + +- **Source A**: document_name — "exact claim from source A" +- **Source B**: document_name — "exact claim from source B" +- **Related entities**: `entity_id_1`, `entity_id_2` +- **Route to**: directed_to_role +``` + +If there are no open contradictions, write: +"No open contradictions. All contradictions have been resolved." + +### 5. Open Questions + +Group by `directed_to` role first (so a client can forward the right section to +the right person), then by domain within each role group. + +For each question: + +```markdown +### For role_name + +#### [priority] question_id: The question + +**Context**: Why this question matters and what evidence prompted it. + +- **Domain**: domain_name +- **Related entities**: `entity_id_1`, `entity_id_2` +``` + +If there are no open questions, write: +"No open questions. All questions have been answered." + +### 6. Audit Trail + +Two sub-sections showing resolved/answered items as tables: + +```markdown +### Resolved Contradictions + +| ID | Domain | Resolution | Resolved At | +|----|--------|-----------|-------------| +| c_001 | billing | Resolution note text | 2026-02-20T10:00:00Z | + +### Answered Questions + +| ID | Domain | Answer Source | Answered At | +|----|--------|--------------|-------------| +| oq_001 | billing | interviews/john_smith.md | 2026-02-21T14:30:00Z | +``` + +If no resolved/answered items exist, write "No items resolved yet." + +## Writing Style + +- Be direct and factual. This is a meeting document, not a narrative. +- Use the exact contradiction_id and question_id — clients track by ID. +- Include the actual source claims in quotes for contradictions. Don't paraphrase. +- Keep descriptions concise — one sentence for the description, full quotes for sources. +- Use severity badges: `[critical]`, `[high]`, `[medium]`, `[low]` — the HTML + renderer converts these to color-coded badges. +- The `directed_to` grouping is critical — it tells the client "forward this section + to your DBA" or "this is for the business analyst." + +## Critical: Use Built-in Tools + +- ALL reads MUST go through the Read tool on the appropriate file paths. +- Always read per-domain files individually. Do NOT try to load all data at once — + that defeats the per-domain storage principle. +- The HTML generation MUST use the inline template below with the Write tool. +- Write the markdown using the Write tool (same pattern as onboarding_guide.md). + +## HTML Generation + +After writing the markdown dashboard, generate the HTML version. Convert the +markdown content to HTML by applying these transformations: + +1. **Headers**: `# text` becomes `

text

`, `##` becomes `

`, etc. +2. **Tables**: Pipe-delimited rows become `` with `` for the first row. + Skip separator rows (lines like `|---|---|`). +3. **Bold**: `**text**` becomes `text` +4. **Italic**: `*text*` becomes `text` +5. **Code spans**: `` `text` `` becomes `text` +6. **Unordered lists**: `- item` becomes `
  • item
  • ` wrapped in `
      ` +7. **Severity badges**: `[critical]` becomes `critical`, + similarly for `[high]`, `[medium]`, `[low]` +8. **Paragraphs**: Consecutive non-special lines wrapped in `

      ` tags + +Wrap the converted HTML body in this template, replacing `{body}` with the +converted content and `{timestamp}` with the current UTC datetime formatted +as `YYYY-MM-DD HH:MM UTC`: + +```html + + + + + +Contradictions & Open Questions Dashboard + + + +

      +

      Contradictions & Open Questions Dashboard

      +

      Generated {timestamp}

      +
      +{body} + + + +``` + +Write the resulting HTML to `.magellan/contradictions_dashboard.html` using the Write tool. + +## What You Do NOT Do + +- Do not invent contradictions or questions. Only report what's in the KG. +- Do not editorialize or suggest resolutions. Present the data objectively. +- Do not skip the audit trail. Resolved items prove the system is working. +- Do not load all contradictions at once. Read per-domain. diff --git a/partner-built/magellan/skills/design-generation/SKILL.md b/partner-built/magellan/skills/design-generation/SKILL.md new file mode 100644 index 00000000..da6b151c --- /dev/null +++ b/partner-built/magellan/skills/design-generation/SKILL.md @@ -0,0 +1,258 @@ +--- +name: design-generation +description: Generate business rules, DDD specs, and implementation contracts from the knowledge graph. Phase 2 capability — use when the team is ready to move from discovery to design. +--- + +# Design Generation (Phase 2) + +You generate the deliverables that implementation teams need to build the new system. +You work from the knowledge graph — domain summaries, entities, relationships, +contradictions, and open questions. + +The goal is a greenfield design based on requirements extracted from the current +system analysis — not a strangler fig migration. AI-accelerated development means +building new is fast enough that the old architecture doesn't need to be preserved. + +## Critical: Use Built-in Tools for Reading + +You MUST use Claude's built-in tools to read the KG: +- **Discover domains**: Glob on `.magellan/domains/*/` +- **Discover entities**: Glob on `.magellan/domains//entities/*.json` +- **Read entity details**: Read tool on `.magellan/domains//entities/.json` +- **Read domain summaries**: Read tool on `.magellan/domains//summary.json` +- **Read relationships**: Read tool on `.magellan/domains//relationships.json` +- **Read cross-domain edges**: Read tool on `.magellan/cross_domain.json` +- **Read contradictions**: Read tool on `.magellan/domains//contradictions.json` +- **Read open questions**: Read tool on `.magellan/domains//open_questions.json` + +Do NOT invent or assume system details. Every claim in a deliverable must trace +to a KG entity with evidence. + +## Process — One Domain at a Time + +Process each domain independently to avoid output limits. For each domain: + +1. Read the domain summary using the Read tool on `.magellan/domains//summary.json`. +2. Use Glob on `.magellan/domains//entities/*.json` to discover entities, + then Read the key entities (hubs first, then others). +3. Read `.magellan/domains//relationships.json` using the Read tool. +4. Read `.magellan/cross_domain.json` using the Read tool for inter-domain connections. +5. Read `.magellan/domains//contradictions.json` and + `.magellan/domains//open_questions.json` using the Read tool to get + this domain's entries only. +6. Generate the four deliverables (described below), writing each file + immediately after generating it — do NOT accumulate all four in one response. + +## Deliverables (Per Domain) + +All outputs go to `.magellan/domains//deliverables/`. + +### 1. business_rules.md + +Extract and formalize all business rules from the KG entities. The output has +two sections: a cross-domain summary table, then per-classification rule tables. + +#### Cross-Domain Summary + +Start with a summary showing the distribution across all domains (when processing +the first domain) or this domain's distribution: + +```markdown +# Business Rules: Billing Domain + +## Summary + +| Classification | Count | Description | +|----------------|------:|-------------| +| HARD | 8 | Legal, regulatory, compliance — must preserve | +| SOFT | 12 | Business policy — can be revisited | +| QUESTIONABLE | 5 | Likely tech debt — challenge actively | +| **Total** |**25** | | + +Rules without source citations: 2 (flagged below) +``` + +#### Per-Classification Rule Tables + +For each classification (HARD, SOFT, QUESTIONABLE), generate a structured table. +Model each rule as a condition/action pair where possible: + +```markdown +## HARD Rules (Legal/Regulatory — must preserve) + +| ID | Rule | Condition | Action | Source Entity | Source Document | Confidence | +|----|------|-----------|--------|---------------|-----------------|------------| +| BR-001 | Invoice manual review threshold | `invoice_amount > $10,000` | Route to MANUAL_REVIEW queue | `billing:manual_review_bypass` | CBBLKBOOK.cblle:142 | 0.95 | +| BR-002 | Title lien check required | `title_transfer = true` | Verify no outstanding liens with DMV | `title:lien_verification` | Title_Process_Manual.pdf p.8 | 0.90 | + +### Evidence + +**BR-001** — "Invoices exceeding $10,000 are routed to MANUAL_REVIEW, skipping +standard approval." (CBBLKBOOK.cblle, lines 142-198) + +**BR-002** — "All title transfers must include a lien check with the state DMV +before release." (Title_Process_Manual.pdf, page 8, section 3.2) +``` + +Repeat for SOFT and QUESTIONABLE classifications. + +#### Rules Without Source Citations + +At the end, list any rules that lack direct source citations: + +```markdown +## Rules Without Source Citations + +| ID | Rule | Why No Citation | +|----|------|-----------------| +| BR-015 | Late payment penalty rate | Inferred from multiple entities, no single source quote | +``` + +#### Classification Criteria + +- **HARD**: legal, regulatory, compliance, contractual obligation +- **SOFT**: business policy that could be changed if the business decides to +- **QUESTIONABLE**: likely a workaround, technical limitation, or outdated constraint + +Every rule MUST cite its source entity ID and original document. Rules that +cannot be traced to a specific source quote are flagged in the "Rules Without +Source Citations" section rather than silently omitted. + +Group rules by subdomain or business process within each classification when +a domain has more than 10 rules of the same type. + +### 2. ddd_spec.md + +Bounded context specification: + +- Context name and responsibility +- Entities and value objects +- Aggregates and aggregate roots +- Domain events (published and consumed) +- Commands and queries +- Integration points (APIs, events, data flows) +- Invariants (HARD business rules that must always be true) +- Cross-domain workflows (saga specifications) + +#### Cross-Domain Workflows Section + +When a domain participates in multi-domain workflows, include a +"Cross-Domain Workflows" section in the DDD spec. For each workflow that +touches this domain: + +1. Trace the cross-domain path manually: read `.magellan/cross_domain.json` + to find inter-domain edges, then read the referenced entities from + `.magellan/domains//entities/.json` and follow the + edges in `.magellan/domains//relationships.json` to build the + path across domains. Repeat for each domain the workflow touches. +2. Document each workflow with: + - **Step sequence**: ordered list of steps across domains + - **Domain events**: the event at each boundary crossing + - **Compensation actions**: what to undo if a step fails + - **Timeout considerations**: SLAs or time constraints from the KG + - **Failure modes**: what can go wrong and the impact + +3. Include a **Mermaid sequence diagram** embedded in the markdown showing + the temporal flow across domain swimlanes: + +```markdown +## Cross-Domain Workflows + +### Sale-to-Settlement Saga + +Steps: +1. **Sales** → SaleCompleted event +2. **Financial** → Calculate fees, generate invoice → InvoiceCreated event +3. **Title** → Title check with DMV → TitleTransferApproved or TitleHold +4. **Financial** → Settlement → SettlementCompleted event +5. **Transportation** → Schedule transport → VehicleDispatched event + +Compensation: +- Step 3 fails (TitleHold): Financial reverses fees (Step 2 compensation) +- Step 4 fails (SettlementFailed): Title transfer is rolled back + +\```mermaid +sequenceDiagram + participant SAL as Sales + participant FIN as Financial + participant TIT as Title + participant TRN as Transportation + + SAL->>FIN: SaleCompleted + FIN->>FIN: Calculate fees + FIN-->>SAL: InvoiceCreated + SAL->>TIT: InitiateTitleTransfer + alt Title clear + TIT-->>SAL: TitleTransferApproved + else Title issue + TIT-->>SAL: TitleHold + SAL->>FIN: ReverseFees (compensation) + end + FIN->>TRN: SettlementCompleted + TRN->>TRN: Schedule transport +\``` +``` + +Only include workflows where the KG has evidence of cross-domain interactions. +Do not invent workflows — they must be traceable to cross-domain edges and +domain events in the KG. + +### 3. contracts.md + +Implementation contracts. These are what developers build from — they must be +complete enough to code against without guessing. + +API contracts (for each endpoint): +- HTTP method, path, description +- Request schema (body, path params, query params) +- Success response schema (200/201) +- Error response schemas (400, 401, 403, 404, 409, 500) with error code and message format +- Authentication requirements (JWT, API key, service-to-service) +- Pagination pattern (cursor-based or offset-based, with standard envelope) +- Idempotency requirements (which operations need idempotency keys) +- Rate limiting (if applicable) + +Event schemas: +- Event name, topic/queue +- Payload schema with all fields typed +- Publishing trigger (what causes this event) +- Expected consumers + +Data model: +- Entities with field names, types, constraints +- Relationships and foreign keys +- Indexes for common query patterns + +Integration contracts: +- How this context communicates with others (sync API calls, async events) +- Which direction data flows +- Error handling for cross-context failures + +### 4. review.md + +Review document for the architect team: + +- What was decided and why +- What the proposed new system looks like for this domain +- Key differences from the current system +- Risks and assumptions +- Contested facts (from contradictions) that affect the design +- Open items that need team discussion + +## Weight-Based Prioritization + +Prioritize entities with weight > 0.7. Include lower-weight entities only when +they provide context for a high-weight entity. Never base a design decision +solely on an entity with weight < 0.5. + +## Output Limits + +Write each deliverable file immediately after generating it. Do NOT try to +generate all four files for a domain in one response — this will hit output +limits. The pattern is: + +1. Generate business_rules.md → write it +2. Generate ddd_spec.md → write it +3. Generate contracts.md → write it +4. Generate review.md → write it +5. Move to next domain diff --git a/partner-built/magellan/skills/diagram-generation/SKILL.md b/partner-built/magellan/skills/diagram-generation/SKILL.md new file mode 100644 index 00000000..ffb923ab --- /dev/null +++ b/partner-built/magellan/skills/diagram-generation/SKILL.md @@ -0,0 +1,341 @@ +--- +name: diagram-generation +description: Generate C4 architecture diagrams (Levels 1-3) as Mermaid and PlantUML from knowledge graph data. Produces system context, container, and per-domain component diagrams with interactive links and legends. +--- + +# C4 Architecture Diagram Generation + +You produce C4 model diagrams at three levels from the knowledge graph. Each level +is generated as both Mermaid (`.mmd`) and PlantUML (`.puml`) files. + +These diagrams give architects a visual overview of the system landscape, domain +boundaries, and key components — directly from the KG data, not hand-drawn. + +## Output Directory + +All diagrams go in `.magellan/diagrams/`: + +``` +.magellan/diagrams/ + context.mmd Level 1 — System Context (Mermaid) + context.puml Level 1 — System Context (PlantUML) + containers.mmd Level 2 — Container (Mermaid) + containers.puml Level 2 — Container (PlantUML) + components_billing.mmd Level 3 — Component per domain (Mermaid) + components_billing.puml Level 3 — Component per domain (PlantUML) + components_title.mmd ...one pair per domain + components_title.puml +``` + +## When to Generate + +- After the contradictions dashboard in a full pipeline run (Phase 1) +- Again after Phase 2 regeneration of the dashboard (to capture new relationships) +- On demand when an architect requests a diagram refresh + +## Process + +### Level 1 — System Context + +Produces `context.mmd` and `context.puml`. + +1. Use Glob on `.magellan/domains/*/` to discover all domain names. +2. For each domain, use the Read tool to read `.magellan/domains//summary.json` — + collect domain names and entity counts. Sum entity counts for the total. +3. Use the Read tool to read `.magellan/cross_domain.json` — get all cross-domain edges. +4. From cross-domain edges and domain summaries, identify external systems: + entities of type `Integration` or `Infrastructure` that represent systems + outside the target platform (e.g., "State DMV", "AutoIMS", "SAP"). + Also check hub_summaries across domains for integration-type entities. +5. Generate the diagram: + - The target system as one box containing all domains (show domain count + and total entity count). + - External systems/actors as separate nodes around the target system. + - Arrows between the target system and external systems, labeled with + the integration purpose from edge descriptions. + - A legend explaining node shapes and styles. + - A generation timestamp as a comment. + +### Level 2 — Container + +Produces `containers.mmd` and `containers.puml`. + +1. Use Glob on `.magellan/domains/*/` to discover domains, then Read each + domain's `.magellan/domains//summary.json` — get domain names, + entity counts, narratives. +2. Use the Read tool to read `.magellan/cross_domain.json` — get inter-domain + edges and external integrations. +3. Generate the diagram: + - Each domain as a container node (show entity count). + - Cross-domain edges as labeled arrows between domains. Use the edge + `description` from properties as the arrow label. + - External integrations at the boundary (same external systems from Level 1). + - Mermaid `click` events on domain nodes linking to their summary.json: + `click DOMAIN_ID "domains//summary.json"` + - A legend explaining node types and edge meanings. + - A generation timestamp as a comment. + +### Level 3 — Component (per domain) + +Produces `components_.mmd` and `components_.puml` for each domain. + +1. Use the Read tool to read `.magellan/domains//summary.json` for the + domain — get `hub_summaries` array. These are the hub entities (most connected, + highest weighted) for the domain. +2. Use the Read tool to read `.magellan/domains//relationships.json` for + the domain — get intra-domain edges. +3. Use the Read tool to read `.magellan/cross_domain.json` — find edges that touch + this domain (entry points from other domains, exit points to other domains). +4. Generate the diagram: + - **Hub entities only** as nodes within the domain subgraph. Use the entity + name and type from hub_summaries. Do NOT include every entity — only hubs. + - Intra-domain edges between hub entities as labeled arrows. Filter + relationships.json to only include edges where both `from` and `to` are + hub entities. + - Cross-domain entry/exit points as separate nodes outside the subgraph, + labeled with the other domain name and entity name. + - Mermaid `click` events on entity nodes linking to their entity JSON: + `click ENTITY_ID "domains//entities/.json"` + - A note: "Showing hub entities only. Full entity list in KG." + - A legend and generation timestamp. + +## Mermaid Output Format + +### Level 1 — System Context (`context.mmd`) + +```mermaid +%% C4 Level 1: System Context +%% Generated: 2026-02-23T10:00:00Z + +graph TB + subgraph legend [Legend] + direction LR + l_sys[Target System]:::systemStyle --- l_ext[External System]:::externalStyle + end + + SYSTEM["System Name
      N domains, M entities"]:::systemStyle + + EXT_DMV["State DMV
      Title Registration"]:::externalStyle + EXT_AUTOIMS["AutoIMS
      Vehicle Imaging"]:::externalStyle + + SYSTEM -->|"title check"| EXT_DMV + SYSTEM -->|"condition report"| EXT_AUTOIMS + + classDef systemStyle fill:#438DD5,stroke:#2E6295,color:#fff + classDef externalStyle fill:#999999,stroke:#6B6B6B,color:#fff +``` + +### Level 2 — Container (`containers.mmd`) + +```mermaid +%% C4 Level 2: Container +%% Generated: 2026-02-23T10:00:00Z + +graph TB + subgraph legend [Legend] + direction LR + l_cont[Domain / Container]:::containerStyle --- l_ext[External System]:::externalStyle + end + + subgraph SYSTEM ["System Name"] + BILLING["billing
      23 entities"]:::containerStyle + TITLE["title
      18 entities"]:::containerStyle + TRANSPORT["transportation
      15 entities"]:::containerStyle + end + + EXT_DMV["State DMV"]:::externalStyle + + BILLING -->|"invoice feeds title transfer"| TITLE + TITLE -->|"title check"| EXT_DMV + + click BILLING "domains/billing/summary.json" + click TITLE "domains/title/summary.json" + click TRANSPORT "domains/transportation/summary.json" + + classDef containerStyle fill:#438DD5,stroke:#2E6295,color:#fff + classDef externalStyle fill:#999999,stroke:#6B6B6B,color:#fff +``` + +### Level 3 — Component (`components_.mmd`) + +```mermaid +%% C4 Level 3: Component — billing +%% Generated: 2026-02-23T10:00:00Z +%% Showing hub entities only. Full entity list in KG. + +graph TB + subgraph legend [Legend] + direction LR + l_hub[Hub Entity]:::componentStyle --- l_cross[Cross-Domain Link]:::crossStyle + end + + subgraph BILLING ["billing"] + INV_GEN["Invoice Generation
      BusinessProcess"]:::componentStyle + MAN_REV["Manual Review Bypass
      BusinessRule"]:::componentStyle + SETTLE["Settlement
      BusinessProcess"]:::componentStyle + end + + TITLE_TRANSFER["title: Title Transfer"]:::crossStyle + + INV_GEN -->|"enforces"| MAN_REV + INV_GEN -->|"triggers"| SETTLE + SETTLE -->|"feeds"| TITLE_TRANSFER + + click INV_GEN "domains/billing/entities/invoice_generation.json" + click MAN_REV "domains/billing/entities/manual_review_bypass.json" + click SETTLE "domains/billing/entities/settlement.json" + + classDef componentStyle fill:#85BBF0,stroke:#5A9BD5,color:#000 + classDef crossStyle fill:#999999,stroke:#6B6B6B,color:#fff +``` + +## PlantUML Output Format + +Use C4-PlantUML syntax with the standard library includes. + +### Level 1 — System Context (`context.puml`) + +```plantuml +@startuml +!include https://raw.githubusercontent.com/plantuml-stdlib/C4-PlantUML/master/C4_Context.puml + +title System Context — System Name +footer Generated: 2026-02-23T10:00:00Z + +System(SYSTEM, "System Name", "N domains, M entities") + +System_Ext(EXT_DMV, "State DMV", "Title Registration") +System_Ext(EXT_AUTOIMS, "AutoIMS", "Vehicle Imaging") + +Rel(SYSTEM, EXT_DMV, "title check") +Rel(SYSTEM, EXT_AUTOIMS, "condition report") + +SHOW_LEGEND() +@enduml +``` + +### Level 2 — Container (`containers.puml`) + +```plantuml +@startuml +!include https://raw.githubusercontent.com/plantuml-stdlib/C4-PlantUML/master/C4_Container.puml + +title Container — System Name +footer Generated: 2026-02-23T10:00:00Z + +System_Boundary(SYSTEM, "System Name") { + Container(BILLING, "billing", "Domain", "23 entities") + Container(TITLE, "title", "Domain", "18 entities") + Container(TRANSPORT, "transportation", "Domain", "15 entities") +} + +System_Ext(EXT_DMV, "State DMV") + +Rel(BILLING, TITLE, "invoice feeds title transfer") +Rel(TITLE, EXT_DMV, "title check") + +SHOW_LEGEND() +@enduml +``` + +### Level 3 — Component (`components_.puml`) + +```plantuml +@startuml +!include https://raw.githubusercontent.com/plantuml-stdlib/C4-PlantUML/master/C4_Component.puml + +title Component — billing +footer Generated: 2026-02-23T10:00:00Z\nShowing hub entities only. Full entity list in KG. + +Container_Boundary(BILLING, "billing") { + Component(INV_GEN, "Invoice Generation", "BusinessProcess") + Component(MAN_REV, "Manual Review Bypass", "BusinessRule") + Component(SETTLE, "Settlement", "BusinessProcess") +} + +System_Ext(TITLE_TRANSFER, "title: Title Transfer") + +Rel(INV_GEN, MAN_REV, "enforces") +Rel(INV_GEN, SETTLE, "triggers") +Rel(SETTLE, TITLE_TRANSFER, "feeds") + +SHOW_LEGEND() +@enduml +``` + +## Node ID Generation + +Generate valid Mermaid/PlantUML node IDs from entity IDs: + +1. Take the entity_id (e.g., `billing:invoice_generation`). +2. Remove the domain prefix and colon. +3. Convert to UPPER_SNAKE_CASE (e.g., `INVOICE_GENERATION`). +4. For cross-domain references, prefix with the domain in uppercase + (e.g., `TITLE__TITLE_TRANSFER` for `title:title_transfer`). + +Ensure all node IDs are unique within each diagram. + +## Identifying External Systems + +External systems are entities that represent integrations with systems outside +the target platform. Identify them by: + +1. Entity type: `Integration`, `ExternalSystem`, or `Infrastructure` +2. Tags: `external_integration`, `third_party`, `external_system` +3. Cross-domain edges where one side references a system not in any domain +4. Hub summaries that mention external dependencies + +If no clear external systems are found, omit them from Level 1 rather than +guessing. The diagram should only contain what the KG proves. + +## Handling Edge Cases + +- **Single domain**: Level 1 shows the system with no internal arrows. + Level 2 has one container. Skip cross-domain arrows. +- **No external systems found**: Level 1 shows just the target system box. + Add a note: "No external integrations identified in KG." +- **Domain with no hub entities**: Skip Level 3 for that domain. Log: + "Skipping component diagram for : no hub entities in summary." +- **Empty cross_domain.json**: Level 2 shows domains without arrows between + them. Add a note: "No cross-domain relationships identified." + +## Writing the Files + +Write diagram files using the Write tool (same pattern as onboarding_guide.md +and contradictions_dashboard.md — these are generated text artifacts, not KG data). + +Create the `.magellan/diagrams/` directory if it doesn't exist. + +Write each file immediately after generating it. Do not accumulate all diagrams +in memory before writing. + +## Critical Rules + +- ALL data reads MUST use Claude's built-in tools: + - **Discover domains**: Glob on `.magellan/domains/*/` + - **Read domain summaries**: Read tool on `.magellan/domains//summary.json` + - **Read cross-domain edges**: Read tool on `.magellan/cross_domain.json` + - **Read relationships**: Read tool on `.magellan/domains//relationships.json` + - **Discover entities**: Glob on `.magellan/domains//entities/*.json` + - **Read entity details**: Read tool on `.magellan/domains//entities/.json` +- Only include hub entities in Level 3 diagrams. The hub_summaries array from + the domain summary is the source of truth for which entities to show. +- Every arrow label must come from KG data (edge descriptions, relationship types). + Never invent relationship labels. +- Every click link must point to an actual file path in the `.magellan/` structure. +- Include a legend in every diagram. +- Include a generation timestamp as a comment in every diagram. +- Write files using the Write tool — diagrams are generated artifacts like the + onboarding guide, not KG data. + +## What You Do NOT Do + +- Do not invent entities, relationships, or external systems. Only diagram what + the KG contains. +- Do not include all entities in Level 3. Hub entities only. +- Do not generate Level 4 (diff diagrams). That is deferred. +- Do not generate image files. Output is text (Mermaid/PlantUML) that renders + in GitHub, VS Code, Confluence, or dedicated viewers. +- Do not skip the legend or timestamp. Every diagram must be self-documenting. +- Do not use Bash or shell commands to create directories. Use the Write tool + which creates parent directories automatically. diff --git a/partner-built/magellan/skills/file-conventions/SKILL.md b/partner-built/magellan/skills/file-conventions/SKILL.md new file mode 100644 index 00000000..7653f695 --- /dev/null +++ b/partner-built/magellan/skills/file-conventions/SKILL.md @@ -0,0 +1,462 @@ +# File Conventions + +All Magellan outputs go in `.magellan/` within the workspace root. This skill defines +every file type, its exact JSON schema, path pattern, and validation rules. Follow these +schemas exactly when reading or writing Magellan files. + +## Directory Layout + +``` +.magellan/ +├── state.json +├── index.json +├── cross_domain.json +├── processed_files.json +├── pipeline_feedback.json +├── onboarding_guide.md +├── contradictions_dashboard.md +├── contradictions_dashboard.html +├── language_guides/ ← reference guides for legacy languages +├── diagrams/ ← C4 architecture diagrams (Mermaid + PlantUML) +│ ├── context.mmd / .puml +│ ├── containers.mmd / .puml +│ └── components_.mmd / .puml +└── domains/ + └── / + ├── facts/ ← one file per source document + │ └── .json + ├── entities/ ← one file per entity + │ └── .json + ├── relationships.json + ├── summary.json + ├── contradictions.json + ├── open_questions.json + ├── discovered_links.json + ├── resolved/ + │ ├── contradictions_resolved.json + │ └── questions_answered.json + └── deliverables/ ← Phase 2 outputs + ├── business_rules.md + ├── ddd_spec.md + ├── contracts.md + ├── review.md + ├── rules_.dmn + ├── rules_.json + ├── rules_.csv + ├── rules_.feature + ├── openapi.yaml + └── asyncapi.yaml +``` + +## ID Generation + +Generate IDs using this pattern: +- **fact_id**: `f_` + 8 random hex chars (e.g., `f_a1b2c3d4`) +- **contradiction_id**: `c_` + 3-digit sequence (e.g., `c_001`) +- **question_id**: `oq_` + 3-digit sequence (e.g., `oq_001`) +- **edge_id**: `e_` + 3-digit sequence (e.g., `e_001`), or `cx_` prefix for cross-domain +- **entity_id**: `:` (e.g., `billing:invoice_generation`) + +When generating sequence IDs (c_001, oq_001, e_001), read the existing file first to +find the next available number. + +## File Path Safety + +Entity IDs containing `/` or spaces are converted to underscores for filenames. +The entity_id `billing:invoice_generation` is stored at +`.magellan/domains/billing/entities/invoice_generation.json`. + +--- + +## Schemas + +### Atomic Fact + +**Path**: `.magellan/domains//facts/.json` + +The source document slug is the filename with extension replaced (e.g., `Q3_ops_runbook` +for `Q3_ops_runbook.pdf`). + +```json +{ + "source_document": "path/to/source.pdf", + "domain": "billing", + "extracted_at": "2026-03-15T10:30:45Z", + "fact_count": 2, + "facts": [ + { + "fact_id": "f_a1b2c3d4", + "statement": "Invoices exceeding $10,000 are routed to MANUAL_REVIEW", + "subject": "Invoice Generation", + "subject_domain": "billing", + "predicate": "has exception rule", + "object": "Manual review bypass for high-value invoices", + "source": { + "document": "Q3_ops_runbook.pdf", + "location": "page 12, section 'Exception Handling'", + "quote": "Invoices exceeding $10,000 are routed to MANUAL_REVIEW." + }, + "confidence": 0.75, + "tags": ["business_rule", "exception_handling"] + } + ] +} +``` + +**Required fields per fact**: `statement` (min 10 chars), `subject`, `subject_domain` +(lowercase, letters/digits/underscores only), `predicate`, `object`, +`source.document`, `source.location`, `source.quote` (max 500 chars), `confidence` +(0.0–1.0). + +**Optional**: `tags` (default empty array). `fact_id` should always be generated. + +**Rules**: +- Every fact MUST have a source quote. No exceptions. +- `subject_domain` must be lowercase: `^[a-z][a-z0-9_]*$` +- Update `fact_count` to match the actual length of the `facts` array. +- Write facts incrementally (every 10–15 facts), not all at the end. + +### Entity + +**Path**: `.magellan/domains//entities/.json` + +```json +{ + "entity_id": "billing:invoice_generation", + "name": "Invoice Generation", + "type": "BusinessProcess", + "domain": "billing", + "summary": "Four-state invoice lifecycle (DRAFT → ISSUED → PAID) with a MANUAL_REVIEW bypass for invoices exceeding $10k...", + "properties": { + "states": ["DRAFT", "ISSUED", "PAID", "MANUAL_REVIEW"] + }, + "evidence": [ + { + "source": "Q3_ops_runbook.pdf", + "location": "page 12", + "quote": "Invoices exceeding $10,000 are routed to MANUAL_REVIEW...", + "confidence": 0.75 + } + ], + "tags": ["business_rule"], + "confidence": 0.85, + "weight": 0.9, + "version": { + "current": "v1", + "status": "active" + }, + "related_entities": [ + { + "entity_id": "billing:manual_review_bypass", + "relationship": "ENFORCES", + "direction": "outgoing" + } + ], + "open_questions": ["oq_003"] +} +``` + +**Required fields**: `entity_id`, `name`, `type`, `domain`, `summary` (min 50 chars), +`evidence` (at least one entry with non-empty quote), `confidence`, `weight`. + +**Entity types**: `BusinessProcess`, `BusinessRule`, `Component`, `Service`, `Database`, +`DataEntity`, `Integration`, `Infrastructure`, `Person`, `Team`, `Operational`, +`Constraint`. + +**Version status**: `active`, `superseded`, `deprecated`. Never delete entities — mark +as superseded. + +**Rules**: +- Each entity is self-contained. A reader with just this one file has everything needed. +- Write entities one at a time, immediately after building. Do not accumulate. +- The `summary` field is the most important — models read it first. + +### Relationships (Intra-Domain) + +**Path**: `.magellan/domains//relationships.json` + +```json +{ + "domain": "billing", + "edges": [ + { + "edge_id": "e_001", + "from": "billing:invoice_generation", + "to": "billing:manual_review_bypass", + "type": "ENFORCES", + "properties": { + "description": "Invoice generation enforces the manual review bypass rule" + }, + "evidence": { + "source": "CBBLKBOOK.cblle", + "location": "lines 142-198" + }, + "confidence": 0.95, + "weight": 0.9 + } + ] +} +``` + +**Required per edge**: `edge_id`, `from`, `to`, `type`, `properties.description`, +`evidence.source`, `evidence.location`, `confidence`, `weight`. + +**Rules**: Write once per domain after all facts in that domain are processed. + +### Cross-Domain Relationships + +**Path**: `.magellan/cross_domain.json` + +```json +{ + "domain": "_cross_domain", + "edges": [ + { + "edge_id": "cx_001", + "from": "billing:vehicle", + "to": "title:vehicle_title", + "type": "SAME_AS", + "confidence": 0.92, + "properties": { + "description": "Same vehicle concept across billing and title domains" + }, + "evidence": { + "source": "billing/CBBLKBOOK.cblle", + "location": "line 45" + } + } + ] +} +``` + +**SAME_AS rules**: Confidence ≥ 0.70 required. Never merge entities — link them. +SAME_AS only between different domains (intra-domain handled by the entity itself). + +### Contradiction + +**Path (active)**: `.magellan/domains//contradictions.json` +**Path (resolved)**: `.magellan/domains//resolved/contradictions_resolved.json` + +```json +{ + "contradictions": [ + { + "contradiction_id": "c_001", + "description": "Threshold mismatch: one source says $10k, another says $15k", + "domain": "billing", + "severity": "high", + "status": "open", + "related_entities": ["billing:invoice_generation"], + "sources": [ + { "source": "Q3_ops_runbook.pdf", "quote": "...exceeding $10,000..." }, + { "source": "Policy_v2.docx", "quote": "...exceeding $15,000..." } + ] + } + ] +} +``` + +**Required**: `contradiction_id`, `description`, `domain`, `status` (`open` or `resolved`). + +**Resolved** adds: `resolution_note`, `resolved_at` (ISO 8601 timestamp). + +**To add**: Read the existing file, append to the array, write back. If the file +doesn't exist, create it with an empty `contradictions` array first. + +### Open Question + +**Path (active)**: `.magellan/domains//open_questions.json` +**Path (answered)**: `.magellan/domains//resolved/questions_answered.json` + +```json +{ + "questions": [ + { + "question_id": "oq_001", + "question": "Is the $10k threshold still active in the current system?", + "domain": "billing", + "priority": "high", + "status": "open", + "related_entities": ["billing:invoice_generation"], + "raised_by": "Ingestion Pass 2", + "context": "Found conflicting documentation about threshold" + } + ] +} +``` + +**Required**: `question_id`, `question`, `domain`, `status` (`open` or `answered`). + +**Answered** adds: `answer_source` (path to material), `answered_at` (ISO 8601). + +**Same append pattern** as contradictions: read, append, write back. + +### Domain Summary + +**Path**: `.magellan/domains//summary.json` + +```json +{ + "domain": "billing", + "entity_count": 42, + "narrative": "The billing domain manages the complete lifecycle of invoice generation...", + "hub_entities": [ + { + "entity_id": "billing:invoice_generation", + "hub_score": 3.85, + "relationships": 5, + "summary": "Generates invoices..." + } + ], + "hub_count": 2, + "contradiction_count": 1, + "question_count": 2 +} +``` + +**Required**: `domain`, `entity_count`, `narrative` (min 200 chars), `hub_entities`, +`hub_count`. + +**Hub detection**: `hub_score = relationship_count × entity_weight`. Exclude entities +with weight < 0.5. Select top 10–15 hubs per domain. + +### State + +**Path**: `.magellan/state.json` + +```json +{ + "initialized_at": "2026-03-15T09:00:00Z", + "last_ingest": "2026-03-15T10:30:00Z", + "last_summary_entity_counts": { + "billing": 42, + "title": 28 + }, + "pipeline_step": 6 +} +``` + +Tracks pipeline progress. `last_summary_entity_counts` triggers re-summarization +when entity count changes > 10%. + +### Index + +**Path**: `.magellan/index.json` + +```json +{ + "domains": { + "billing": { + "entity_count": 42, + "edge_count": 15, + "contradiction_count": 2, + "question_count": 3 + } + }, + "total_entities": 70, + "total_edges": 25 +} +``` + +Updated at pipeline end. Provides quick stats without reading all domain files. + +### Processed Files Ledger + +**Path**: `.magellan/processed_files.json` + +```json +{ + "files": { + "src/billing/CBBLKBOOK.cblle": { + "disposition": "ingested", + "domain": "billing", + "fact_count": 12, + "processed_at": "2026-03-15T10:30:00Z" + }, + "docs/corrupted.bin": { + "disposition": "unreadable", + "domain": null, + "fact_count": 0, + "error": "Binary file, could not read content", + "processed_at": "2026-03-15T10:31:00Z" + } + } +} +``` + +**Dispositions**: `ingested`, `no_facts`, `unreadable`, `extraction_error`, +`skipped_unchanged`, `skipped_by_rule`. + +Every file MUST reach a terminal disposition. Nothing is silently dropped. + +### Pipeline Feedback + +**Path**: `.magellan/pipeline_feedback.json` + +```json +{ + "entries": [ + { + "step": 3, + "step_name": "Fact Extraction", + "reviewed_at": "2026-03-15T10:45:00Z", + "findings": [ + { + "severity": "warning", + "message": "Low fact density for Q3_ops_runbook.pdf (2 facts from 45 pages)" + } + ], + "blocker_count": 0, + "warning_count": 1, + "suggestion_count": 0 + } + ] +} +``` + +Accumulated during pipeline. Each quality gate appends an entry. + +--- + +## Entity Weight Formula + +``` +effective_weight = base_weight + corroboration + recency + references +``` + +Clamp result to [0.0, 1.0]. + +**Base weights**: + +| Source Type | Weight | +|-------------|--------| +| correction | 0.95 | +| production_source_code | 0.90 | +| database_schema | 0.85 | +| official_policy | 0.85 | +| formal_design_document | 0.80 | +| api_specification | 0.80 | +| qa_operational_manual | 0.75 | +| interview_transcript | 0.70 | +| meeting_transcript | 0.50 | +| email_chain | 0.40 | +| informal_notes | 0.30 | + +**Modifiers**: +- Corroboration: +0.05 per additional source (cap +0.15) +- Recency: −0.05 for 1–3 year old docs, −0.10 for 3+ years +- References: +0.05 if referenced by 5+ other entities + +Weight is metadata for prioritization. It never filters entities out of the graph. + +--- + +## Key Rules + +1. **Append-only**: Never delete entities. Mark superseded with `version.status: "superseded"`. +2. **One file per entity**: Prevents merge conflicts and git bloat. +3. **Self-contained entities**: Each entity file has everything needed to understand it. +4. **Source tracing**: Every fact, entity, and edge traces to a source document with a quote. +5. **Nothing silently skipped**: Every file reaches a disposition in the processed files ledger. +6. **Read before write**: When appending to contradictions/questions, read the existing file first. +7. **Domain naming**: Lowercase, letters/digits/underscores: `^[a-z][a-z0-9_]*$` diff --git a/partner-built/magellan/skills/graph-building/SKILL.md b/partner-built/magellan/skills/graph-building/SKILL.md new file mode 100644 index 00000000..12913923 --- /dev/null +++ b/partner-built/magellan/skills/graph-building/SKILL.md @@ -0,0 +1,274 @@ +--- +name: graph-building +description: Transform atomic facts into knowledge graph entities and relationships. Use after fact extraction to build the structured KG from raw facts. +--- + +# Graph Building (Stage 2a) + +You transform atomic facts into knowledge graph entities and relationships. + +## Critical: Use Built-In Tools for All File Operations + +You MUST use Claude's built-in tools for every read and write operation: +- **Read** tool to read facts from `.magellan/domains//facts/.json` +- **Glob** tool on `.magellan/domains//entities/*.json` to list existing entities +- **Read** tool on `.magellan/domains//entities/.json` to read an entity +- **Write** tool to `.magellan/domains//entities/.json` for each entity +- **Write** tool to `.magellan/domains//relationships.json` for relationships +- **Read + Write** pattern for contradictions and open questions (read existing file, append, write back) + +Do NOT create a monolithic `knowledge_graph.json`. The KG is stored as individual entity files. + +You receive a set of facts from one source document and produce: + +1. Entity files — one self-contained JSON file per entity +2. Relationships — edges connecting entities within the same domain +3. Contradictions — when new facts conflict with existing entities +4. Open questions — when facts are ambiguous or incomplete + +## Process + +### Critical: Write Incrementally + +Do NOT accumulate all entities in your response and write them at the end. +This will hit output token limits on large fact files. Instead, write each +entity immediately after building it. + +The pattern is: read a few facts → build one entity → write it → move on. + +### Steps + +1. Read the facts file using the **Read** tool on `.magellan/domains//facts/.json`. +2. List existing entities in the domain using the **Glob** tool on `.magellan/domains//entities/*.json`. +3. Process facts in small batches (5-10 facts at a time): + a. For each fact in the batch, determine: + - Does this fact describe an existing entity? → Read it with the **Read** tool, update, write it back with the **Write** tool. + - Does this describe a new entity? → Build it, write it immediately with the **Write** tool. + - Does this fact establish a relationship? → Add to a running list. + - Does this fact contradict an existing entity? → Write contradiction immediately (see Contradiction Append Pattern below). + - Is this fact ambiguous or incomplete? → Write open question immediately (see Open Question Append Pattern below). + b. Write each entity with the **Write** tool as soon as it's built — do not wait. + c. Write contradictions and open questions as soon as detected — do not wait. +4. After all facts are processed, write relationships once using the **Write** tool to `.magellan/domains//relationships.json`. +5. Briefly report what was created: "N entities, M relationships, K contradictions, J open questions." + +### Contradiction Append Pattern + +To add a contradiction: +1. **Read** the file `.magellan/domains//contradictions.json`. + - If it does not exist, start with `{"contradictions": []}`. +2. Append the new contradiction object to the `contradictions` array. +3. **Write** the updated JSON back to `.magellan/domains//contradictions.json`. + +### Open Question Append Pattern + +To add an open question: +1. **Read** the file `.magellan/domains//open_questions.json`. + - If it does not exist, start with `{"questions": []}`. +2. Append the new question object to the `questions` array. +3. **Write** the updated JSON back to `.magellan/domains//open_questions.json`. + +### Why This Matters + +A document with 50 facts might produce 20 entities. If you try to build all 20 +entities as JSON in your response before writing any of them, you will exceed the +output token limit and produce nothing. By writing each entity immediately via +the Write tool, your response stays small and the work is saved incrementally. + +## Entity Types + +Assign one of these types to each entity based on the facts: + +- `BusinessProcess` — a workflow, procedure, or business operation +- `BusinessRule` — a rule governing decisions or behavior +- `Component` — a software module, program, or library +- `Service` — an API or network-accessible service +- `Database` — a data store (relational, file-based, etc.) +- `DataEntity` — a business data concept (Customer, Invoice, Vehicle) +- `Integration` — a connection between systems +- `Infrastructure` — a hosting environment or platform +- `Person` — a team member or stakeholder mentioned by name +- `Team` — an organizational unit +- `Operational` — a runbook, batch job, or operational procedure +- `Constraint` — a limitation or regulatory requirement + +If a fact doesn't fit any type, use `Insight` as a catch-all. + +## Entity ID Convention + +Entity IDs follow the pattern `:`: + +- `billing:invoice_generation` +- `billing:manual_review_bypass` +- `title:vehicle_title_transfer` +- `dealer_management:floor_plan_bank` + +When updating an existing entity, keep its ID unchanged. When creating a new entity, +derive the ID from the domain and a clear, descriptive snake_case name. + +## Entity Format + +Each entity must match this structure exactly: + +```json +{ + "entity_id": "billing:invoice_generation", + "name": "Invoice Generation", + "type": "BusinessProcess", + "domain": "billing", + "summary": "Clear, complete natural language summary of what this entity represents and why it matters. This is the most important field — it's what models read first.", + "properties": { + "key": "value pairs specific to this entity type" + }, + "evidence": [ + { + "source": "path/to/source/document", + "location": "page/line/section reference", + "quote": "Exact quote from the source", + "confidence": 0.85, + "extracted_from_fact": "facts/domain/source.json#f_abc123" + } + ], + "tags": ["business_rule", "exception_handling"], + "confidence": 0.85, + "weight": 0.9, + "version": { + "current": "v1", + "git_commit": "", + "ingested_at": "2026-03-15T09:00:00Z", + "status": "active" + }, + "related_entities": [ + {"entity_id": "billing:manual_review_bypass", "relationship": "ENFORCES", "direction": "outgoing"} + ], + "open_questions": [] +} +``` + +## Writing the Summary + +The `summary` field is the most critical. It must: +- Be a complete, standalone description (a model reading only this field understands the entity) +- Include key facts, not just a label +- Mention known constraints, thresholds, or conditions +- Note if the entity is contested (involved in a contradiction) +- Be 2-5 sentences + +Bad: "Invoice generation process" +Good: "Four-state invoice lifecycle (DRAFT → ISSUED → PAID) with a MANUAL_REVIEW bypass for invoices exceeding $10,000. The bypass was added in response to a tax audit finding and skips standard approval flow. The $10k threshold is contested — the ops runbook says $10k but a DB config sets it to $5k." + +## Updating Existing Entities + +When a new fact provides additional evidence for an existing entity: + +1. Read the existing entity using the **Read** tool on its file path. +2. Add the new evidence to the `evidence` array. +3. Update the `summary` if the new fact adds significant information. +4. Recalculate weight using the weight formula below (evidence_count = len(evidence)). +5. Add any new `related_entities` references. +6. Write the updated entity using the **Write** tool to the same file path. + +Do not overwrite existing evidence — append to it. + +## Relationship Types + +Use these relationship types for edges: + +| Type | Meaning | +|------|---------| +| `DEPENDS_ON` | A requires B to function | +| `CALLS` | A invokes B (API call, function call, program call) | +| `READS_FROM` | A reads data from B | +| `WRITES_TO` | A writes data to B | +| `INTEGRATES_WITH` | System-level integration | +| `ENFORCES` | A enforces business rule B | +| `CONTAINS` | A contains B (database contains table, system contains component) | +| `TRIGGERS` | A causes B to execute | +| `PRODUCES` | A creates/outputs B | +| `CONSUMES` | A uses/inputs B | +| `PART_OF` | A is a component of B | +| `SUCCEEDED_BY` | A is replaced by B | + +## Relationship Format + +Each relationship in `relationships.json`: + +```json +{ + "edge_id": "e_", + "from": "billing:invoice_generation", + "to": "billing:manual_review_bypass", + "type": "ENFORCES", + "properties": { + "description": "Invoice generation enforces the manual review bypass rule for amounts over $10k", + "criticality": "high" + }, + "evidence": { + "source": "CBBLKBOOK.cblle", + "location": "lines 142-198", + "quote": "IF WS-INV-AMT > 10000 PERFORM 3200-MANUAL-REVIEW" + }, + "confidence": 0.95, + "weight": 0.9 +} +``` + +Every relationship must have a `description` explaining WHY the relationship exists. + +## Weight Calculation + +Calculate the weight for each entity directly using this formula: + +``` +effective_weight = base_weight + corroboration + recency + references +``` + +Clamp the result to **[0.0, 1.0]**. + +### Base Weight Table + +Look up the base weight from the source type (passed through from ingestion): + +| Source Type | Base Weight | +|-------------|-------------| +| `correction` | 0.95 | +| `production_source_code` | 0.90 | +| `database_schema` | 0.85 | +| `official_policy` | 0.85 | +| `formal_design_document` | 0.80 | +| `api_specification` | 0.80 | +| `qa_operational_manual` | 0.75 | +| `interview_transcript` | 0.70 | +| `meeting_transcript` | 0.50 | +| `email_chain` | 0.40 | +| `informal_notes` | 0.30 | + +If the source type is not listed, use **0.50** as the default base weight. + +### Modifiers + +- **Corroboration**: +0.05 per additional source beyond the first (cap at +0.15). + - 1 source: +0.00, 2 sources: +0.05, 3 sources: +0.10, 4+ sources: +0.15 +- **Recency**: based on the age of the source document (if known). + - Less than 1 year old: +0.00 + - 1–3 years old: −0.05 + - More than 3 years old: −0.10 +- **References**: +0.05 if referenced by 5 or more other entities (0 for new entities). + +### Example Calculation + +An entity from `production_source_code` with 3 evidence entries, document less than 1 year old, and 0 references from other entities: +- base_weight = 0.90 +- corroboration = +0.10 (3 sources → 2 additional sources × 0.05) +- recency = 0.00 (less than 1 year) +- references = 0.00 (fewer than 5 references) +- effective_weight = clamp(0.90 + 0.10 + 0.00 + 0.00) = **1.00** + +Weight is metadata for prioritization. It never filters entities out of the graph. + +## What You Do NOT Do + +- Do not invent facts. Every claim must come from the atomic facts you received. +- Do not assign relationships between entities that aren't evidenced in the facts. +- Do not skip facts. Every fact must contribute to at least one entity or relationship. +- Do not merge entities across domains. Cross-domain linking (SAME_AS) is handled in Stage 2b. diff --git a/partner-built/magellan/skills/ingestion/SKILL.md b/partner-built/magellan/skills/ingestion/SKILL.md new file mode 100644 index 00000000..ad5a2de9 --- /dev/null +++ b/partner-built/magellan/skills/ingestion/SKILL.md @@ -0,0 +1,383 @@ +--- +name: ingestion +description: Extract atomic facts from documents following the Fact Protocol. Use when processing source materials (code, manuals, transcripts, configs) into structured knowledge. +--- + +# Fact Extraction + +You extract atomic facts from documents. Each fact is a single, self-contained factual +statement with full source provenance. + +Your only job: "What factual statements does this document make?" + +You do not decide entity types, relationships, or graph structure. That is the graph +builder's job. You extract raw facts. + +## Critical: Writing Facts + +You MUST write facts using the Write tool to the path +`.magellan/domains//facts/.json`. + +The `` is derived from the source document filename: take the filename stem +(without extension), replace path separators and spaces with underscores, and remove any +characters that are not alphanumeric, underscores, hyphens, or dots. + +Each fact file follows this JSON structure: + +```json +{ + "source_document": "path/to/original/source/document", + "domain": "lowercase_domain_name", + "extracted_at": "2024-01-15T10:30:00+00:00", + "fact_count": 3, + "facts": [ ... array of atomic facts ... ] +} +``` + +Generate a unique `fact_id` for each fact using the format `f_` followed by 8 random +hex characters (e.g., `f_3a8b2c1d`). Every fact must have a unique `fact_id`. + +Facts MUST be organized by domain: one file per source document at +`domains//facts/.json`. Do NOT create batch files like +`facts/batch1.json`. + +## The Fact Protocol + +Every fact you extract must follow this exact structure: + +```json +{ + "fact_id": "f_3a8b2c1d", + "statement": "Natural language summary of the fact", + "subject": "The entity or concept this fact is about", + "subject_domain": "lowercase_domain_name", + "predicate": "The relationship or property being stated", + "object": "The value, target, or detail", + "source": { + "document": "path/to/source/document", + "location": "page 12, section 'Exception Handling'", + "quote": "Exact quote from the source document (max 500 chars)" + }, + "confidence": 0.85, + "tags": ["business_rule", "exception_handling"] +} +``` + +## Rules + +1. One fact per statement. If a paragraph contains three claims, extract three facts. +2. Every fact must have a direct quote from the source. No invented content. +3. The quote must be verbatim from the document. Do not paraphrase in the quote field. +4. The statement field IS your summary — make it clear and complete. +5. subject_domain must be lowercase with underscores only (e.g., `billing`, `title_processing`). +6. Confidence reflects how clearly the source states this fact: + - 0.9-1.0: Explicitly stated, unambiguous + - 0.7-0.89: Clearly implied or stated with minor ambiguity + - 0.5-0.69: Inferred from context, needs validation + - 0.3-0.49: Weak evidence, speculative + - 0.0-0.29: Contradicted by other evidence in the same document + +## Tags + +Apply one or more of these tags to each fact: + +- `business_rule` — a rule governing business logic or decisions +- `data_flow` — how data moves between systems or components +- `integration` — connection between systems, APIs, protocols +- `system_behavior` — how a system operates, processes, or responds +- `data_model` — entities, fields, relationships in data structures +- `operational` — how systems are operated, maintained, monitored +- `security` — authentication, authorization, encryption, access control +- `performance` — SLAs, throughput, latency, batch timing +- `exception_handling` — error paths, edge cases, workarounds +- `organizational` — teams, ownership, responsibilities +- `constraint` — limitations, restrictions, compliance requirements +- `tribal_knowledge` — undocumented knowledge from interviews or transcripts + +## Examples + +### Example 1: From a QA Manual (business document) + +Source: "Dealer Master Manual 4.3.19.docx", page 5 + +> "When setting up a new dealership, the Floor Plan Bank must be assigned before +> any vehicles can be entered into inventory." + +```json +{ + "fact_id": "f_3a8b2c1d", + "statement": "A Floor Plan Bank must be assigned to a dealership before vehicles can be entered into inventory", + "subject": "Dealership Setup", + "subject_domain": "dealer_management", + "predicate": "has prerequisite", + "object": "Floor Plan Bank assignment required before vehicle inventory entry", + "source": { + "document": "QA Manuals/Dealer Master Manual 4.3.19.docx", + "location": "page 5, 'Setting Up a New Dealership'", + "quote": "When setting up a new dealership, the Floor Plan Bank must be assigned before any vehicles can be entered into inventory." + }, + "confidence": 0.95, + "tags": ["business_rule", "constraint"] +} +``` + +### Example 2: From COBOL source code + +Source: "CBBLKBOOK.cblle", lines 142-198 + +> `IF WS-INV-AMT > 10000 PERFORM 3200-MANUAL-REVIEW` + +```json +{ + "fact_id": "f_d4e5f6a7", + "statement": "Invoices exceeding $10,000 trigger a manual review process via paragraph 3200-MANUAL-REVIEW", + "subject": "Invoice Processing", + "subject_domain": "billing", + "predicate": "has threshold trigger", + "object": "Manual review triggered for invoice amounts over $10,000", + "source": { + "document": "Code/AS400 Artifacts/BLKBOOKV/QBLKBOOK/CBBLKBOOK.cblle", + "location": "lines 142-198", + "quote": "IF WS-INV-AMT > 10000 PERFORM 3200-MANUAL-REVIEW" + }, + "confidence": 0.95, + "tags": ["business_rule", "exception_handling"] +} +``` + +### Example 3: From a meeting transcript + +Source: "ASI Demo for Slalom.vtt", timestamp 00:12:34 + +> "we actually moved the entire auction access module to AWS about two years ago +> but the core billing still runs on the AS/400" + +```json +{ + "fact_id": "f_b8c9d0e1", + "statement": "The Auction Access module was migrated to AWS approximately two years ago, while core billing remains on the AS/400", + "subject": "Auction Access", + "subject_domain": "infrastructure", + "predicate": "runs on platform", + "object": "AWS (migrated ~2 years ago), while core billing remains on AS/400", + "source": { + "document": "Meetings/ASI Demo for Slalom.vtt", + "location": "timestamp 00:12:34", + "quote": "we actually moved the entire auction access module to AWS about two years ago but the core billing still runs on the AS/400" + }, + "confidence": 0.80, + "tags": ["system_behavior", "integration"] +} +``` + +### Example 4: From a database config or data file + +Source: "Current Domain Mapping.csv", row 3 + +> `billing, invoicing, INVLIB, CBINV001` + +```json +{ + "fact_id": "f_f2a3b4c5", + "statement": "The invoicing sub-domain in billing uses library INVLIB with source program CBINV001", + "subject": "Invoicing", + "subject_domain": "billing", + "predicate": "has source program", + "object": "CBINV001 in library INVLIB", + "source": { + "document": "Domain Information/Current Domain Mapping.csv", + "location": "row 3", + "quote": "billing, invoicing, INVLIB, CBINV001" + }, + "confidence": 0.90, + "tags": ["data_model", "system_behavior"] +} +``` + +## Domain Assignment + +Assign the `subject_domain` based on the primary business area the fact relates to. + +Common domains: `billing`, `title_processing`, `transportation`, `dealer_management`, +`vehicle_inventory`, `auction_operations`, `infrastructure`, `security`, `integration`. + +If unsure which domain a fact belongs to, use `general`. The cross-domain linking pass +will reclassify later if needed. + +## Language Reference Guides + +After classifying a file, check whether a language reference guide exists for the +file's language. The guide provides context about the programming language — syntax, +patterns, naming conventions, and common misinterpretations — that significantly improves +fact extraction quality for niche or legacy languages. + +1. Determine the `language_guide_key` from your classification (e.g., `rpg` for RPG ILE, + `cobol` for COBOL, `cl` for CL programs, `dds` for DDS files). +2. Check if `.magellan/language_guides/.md` exists (use the Read tool). +3. If it exists, read the guide and use it as context when extracting facts from this file. + The guide tells you: + - How to read the code (syntax, structure, control flow) + - What patterns carry business logic vs. boilerplate + - Client-specific naming conventions + - Common misinterpretations to avoid +4. If no guide exists for this language, proceed normally (no change to behavior). + +**Caching**: Read each guide once per language per pipeline run, not once per file. If you +have already read the RPG guide for a previous file, do not re-read it for subsequent RPG +files in the same run — it is already in your context. + +**Example**: When processing an RPG ILE file, your classification identifies +`language_guide_key: "rpg"`. You read `.magellan/language_guides/rpg.md` which explains +that `CHAIN` is a keyed read operation, indicators 01-99 are conditional flags, and +`PFDEALRMST` follows the client's PF-prefix naming convention for physical files. With +this context, you extract "the program reads the dealer master file (PFDEALRMST) using +a keyed CHAIN operation with key list KYDLR" instead of "the program reads a file." + +## What NOT to Extract + +These patterns inflate fact counts with noise. Skip them aggressively: + +- Table of contents entries, section headers repeated as content, index pages +- Lines that are just "Chapter N", "Section N.N", or page numbers +- Dotted leader lines (e.g., "Settings ........................... 3") +- Formatting artifacts: page headers, footers, watermarks, copyright notices +- Repeated boilerplate that appears in every document (e.g., "Auction Edge confidential") +- Opinions or editorial commentary (unless quoting a specific named person) +- Generic instructions like "Click OK to continue" or "See screenshot below" +- Empty or stub sections with no substantive content +- Metadata lines like "Last updated:", "Version:", "Author:" (unless the date/version + is itself a useful fact about the system) + +If you're unsure whether something is a real fact or document noise, apply this test: +would an architect preparing for a client meeting need to know this? If not, skip it. + +## Fact Density Expectations + +Use these benchmarks to gauge whether you're extracting thoroughly. If your yield +falls well below these targets, re-read the document more carefully before moving on. + +| Document Type | Expected Facts (per unit) | Notes | +|---------------|---------------------------|-------| +| QA / Ops Manual (per 10 pages) | 15-30 | Business rules, procedures, thresholds | +| COBOL / RPG program (per 500 lines) | 8-15 | File dependencies, business rules, call chains | +| CL program | 5-10 | Job scheduling, file overrides, call chains | +| Meeting transcript (per 30 min) | 10-20 | Decisions, contradictions, tribal knowledge | +| Architecture document (per 10 pages) | 15-25 | System descriptions, integrations, constraints | +| CSV / data file | 3-8 | Schema, field meanings, relationships | +| DDS file | 3-8 | Record format, key fields, field descriptions | + +If a file yields fewer than 3 facts, flag it in the progress display: +"Low yield: filename (N facts — expected M+ for this file type)" + +These are guidelines, not hard minimums. A boilerplate README genuinely has 0 +extractable facts. But a 200-page QA manual with 5 facts means you are skimming +and need to go deeper. + +## Reading Large Documents + +Not all documents can be processed in a single read. Long documents suffer from +attention degradation — facts from the middle and end of a long file get thinner +coverage than facts from the beginning. + +**Determine the reading strategy before you start extracting:** + +**Small documents (under ~5,000 lines or ~50 pages):** +Read the entire file in one pass. Extract facts normally. + +**Large documents (over ~5,000 lines or ~50 pages):** +Read and extract in sections. Do NOT read the entire file at once. + +1. **First pass — structure scan.** Read the first 200 lines with the Read tool + (use `offset: 0, limit: 200`) to understand the document structure: table of + contents, section headers, chapter boundaries, or natural break points. + +2. **Plan sections.** Divide the document into sections of ~2,000-3,000 lines + (~30-40 pages) based on the structure you found. Use natural boundaries + (chapters, sections, modules) when possible. If none exist (e.g., a flat + CSV or continuous log), use fixed-size chunks with 100-line overlap. + +3. **Process each section independently.** For each section: + a. Read ONLY that section using `offset` and `limit` on the Read tool. + b. Extract facts from that section. + c. Write facts immediately (batch write to the fact file). + d. Display: "Section N/M: extracted K facts (lines X-Y)" + +4. **Track sections.** After all sections, verify total lines processed matches + the document's total line count. If any range was skipped, go back and read it. + Display: "Document complete: N facts from M sections (lines 1-total)" + +**Why this matters:** A 200-page manual read in one pass might yield 30 facts, +heavily weighted toward the first 50 pages. The same manual read in 5 sections +of 40 pages each will yield 60-80 facts with even coverage. The section boundary +forces Claude to give full attention to every part of the document. + +**Code files:** Most code files are under 5,000 lines and can be read in one pass. +For very large programs (e.g., 10,000+ line COBOL), split at paragraph/section +boundaries (COBOL) or subroutine boundaries (RPG) rather than arbitrary line counts. + +## After Extraction + +### Critical: Write in Batches to Avoid Output Limits + +Do NOT accumulate all facts in your response and write them in one call at the end. +For large documents, this will exceed output token limits and lose all your work. + +Instead, write facts in batches of 10-15 as you extract them: + +1. Extract 10-15 facts from one section of the document. +2. **Pre-write checklist** — before writing, verify EVERY fact in the batch: + - [ ] `fact_id` present (format: `f_` + 8 hex chars) + - [ ] `statement` present and ≥ 10 characters + - [ ] `subject` present and non-empty + - [ ] `subject_domain` present, lowercase, matches `^[a-z][a-z0-9_]*$` + - [ ] `predicate` present and non-empty + - [ ] `object` present and non-empty + - [ ] `source.document` present and non-empty + - [ ] `source.location` present and non-empty + - [ ] `source.quote` present, non-empty, ≤ 500 characters + - [ ] `confidence` is a number between 0.0 and 1.0 + - [ ] `fact_count` in the wrapper matches the actual array length +3. Write the batch using the Write tool. +4. **Post-write verification** — immediately Read the file back and verify: + - The file is valid JSON + - `fact_count` matches the length of the `facts` array + - No fact is missing `source.quote` (the most commonly dropped field) + If verification fails, fix and rewrite before continuing. +5. **Quote verification** — for each fact in the batch, verify the quote + actually exists in the source document: + - Use the Grep tool to search for a distinctive substring of + `source.quote` (at least 20 characters) in the original file. + - If Grep finds a match: the quote is verified. + - If Grep finds no match: the quote may be hallucinated. Re-read the + relevant section of the source file, find the actual text, and correct + the quote. If no matching content exists, delete the fact entirely. + - You do NOT need to Grep every quote for short documents (under 50 lines) + that you read in full — you can verify by memory. For large documents + read in sections, always Grep-verify quotes from previous sections + that are no longer in your immediate context. + - Display any corrections: "Quote corrected: fact_id (original → fixed)" + - Display any deletions: "Fact removed: fact_id (quote not found in source)" +6. Move to the next section and repeat. + +When appending to an existing fact file, first Read the current file, merge the new +facts into the existing `facts` array, update `fact_count`, update `extracted_at`, +and Write the complete file back. This preserves earlier batches. + +For small documents (under ~20 facts), a single Write call is fine — but still +run the pre-write checklist and post-write verification. + +### Critical: Nothing Silently Skipped (Principle 3) + +Every document you process must end with a recorded disposition. You MUST NOT +move to the next document without accounting for the current one. + +- If you cannot read the file: record `unreadable` with the error. +- If you extract zero facts: record `no_facts_extracted` — this is not an error, + but it must be recorded so the team knows the file was processed. +- If fact writing fails: fix the facts and retry. + If still failing, record the error and the partial facts that did succeed. +- If any step throws an unexpected error: record `extraction_error` with the + error message. + +**Never silently skip a file, a section of a file, or a link in a file.** +If you encounter something you can't process, say so explicitly with a reason. diff --git a/partner-built/magellan/skills/ingestion/language_guides/GENERATE_GUIDE_PROMPT.md b/partner-built/magellan/skills/ingestion/language_guides/GENERATE_GUIDE_PROMPT.md new file mode 100644 index 00000000..701eface --- /dev/null +++ b/partner-built/magellan/skills/ingestion/language_guides/GENERATE_GUIDE_PROMPT.md @@ -0,0 +1,173 @@ +# Language Guide Generation Prompt + +Use this prompt with any LLM to generate a Magellan-compatible language guide. +Replace `{LANGUAGE}` with the target language (e.g., "NATURAL/ADABAS", "PL/I", +"REXX", "JCL", "CICS COBOL", "Assembler/370", "Easytrieve", "IDMS", "Fortran", +"PowerBuilder", "Progress 4GL", "MUMPS/M", "Pick BASIC", "ABAP"). + +Run the same prompt through multiple models and merge the best outputs. + +--- + +## The Prompt + +``` +I need you to demonstrate deep knowledge of {LANGUAGE} by completing two tasks. + +## Task 1: Verification (prove you know the language) + +Answer these 10 questions about {LANGUAGE}. Be specific — cite exact syntax, +keywords, or conventions. If you don't know an answer with confidence, say +"uncertain" rather than guessing. + +1. What platform(s) does {LANGUAGE} primarily run on? +2. Show the minimal "hello world" or equivalent program structure. +3. What is the primary mechanism for database/file access? Show the exact syntax. +4. How does the language handle control flow (conditionals, loops)? Show syntax. +5. How does one program call another? Show the exact call mechanism. +6. What is the variable/data declaration syntax? +7. How are errors or exceptions handled? +8. What is the compilation/execution model? (compiled, interpreted, both?) +9. Name 3 constructs that a developer unfamiliar with this language would + likely misinterpret, and explain what they actually mean. +10. What is the most common anti-pattern or "code smell" in legacy {LANGUAGE} + codebases, and what does it usually indicate about the business logic? + +## Task 2: Generate the Guide + +Using your verified knowledge, produce a reference guide in exactly this format. +This guide will be read by an AI system that is extracting business rules and +facts from legacy source code. The guide must help the AI understand what it's +reading — not teach a developer how to write new code. + +Write the guide using this exact structure: + +--- + +# {LANGUAGE} Reference Guide + +## Overview + +[2-3 paragraphs: What is this language? What platform does it run on? What era +is it from? What are its primary file formats / extensions? Are there multiple +dialects or versions the AI might encounter?] + +## Key Constructs + +### Program Structure +[How is a program organized? What are the major sections/divisions? What do the +first few lines tell you about what the program does?] + +### Data Access (Database / File I/O) +[This is the MOST IMPORTANT section. How does the program read, write, update, +and delete data? What are the exact operation names/keywords? What do they +translate to in SQL terms?] + +### Control Flow +[Conditionals, loops, branching. Focus on patterns that encode business rules +(e.g., "IF ACCOUNT-STATUS = 'D'" means a business decision is happening).] + +### Program-to-Program Communication +[How does one program call another? What are the call mechanisms (static, dynamic, +message-based)? How are parameters passed?] + +### Error Handling +[How does the program detect and handle errors? What are the standard patterns?] + +## Common Patterns + +[Show 3-5 code examples of the most frequent patterns found in production +codebases. Each example should be real-world (not textbook), 5-15 lines, +with a one-line explanation of what the pattern does. Focus on patterns that +carry business logic.] + +## What Carries Business Logic + +**Extract facts from these:** +[Bulleted list of constructs, operations, and patterns that encode business +rules, thresholds, calculations, and decisions. These are what the AI should +focus on during fact extraction.] + +**Skip these (boilerplate):** +[Bulleted list of constructs that are infrastructure, plumbing, or standard +setup. These rarely contain business logic and the AI should not spend time +on them.] + +## Common Misinterpretations + +[Numbered list of 5-10 things that an AI (or a developer unfamiliar with this +language) would likely get wrong. Each entry should explain: +- What the construct LOOKS like to an outsider +- What it ACTUALLY means +- Why this matters for understanding business logic + +These are the most valuable part of the guide. Be specific.] + +## File Naming Conventions + +[What file extensions and naming patterns does this language use? How can you +identify the type of program (batch, online, subroutine, copybook) from the +filename or member name?] + +--- + +IMPORTANT RULES: +- Do NOT pad with generic information. Every sentence should help an AI + understand legacy source code it's reading for the first time. +- Do NOT include installation, IDE setup, or "getting started" content. +- DO include misinterpretations even if they seem obvious to an expert. + The reader is an AI that has broad but shallow knowledge. +- DO use exact syntax in examples, not pseudocode. +- DO mention platform-specific behavior (e.g., EBCDIC vs ASCII, packed + decimal, fixed-length records) that affects how data is interpreted. +- Keep the guide under 200 lines. Conciseness is critical — this will be + loaded into a context window alongside source code. +``` + +--- + +## Languages to Generate Guides For + +Priority 1 — Common legacy languages in enterprise knowledge discovery: +- [ ] NATURAL / ADABAS +- [ ] PL/I +- [ ] JCL (Job Control Language) +- [ ] REXX +- [ ] CICS (COBOL with CICS commands) +- [ ] Assembler/370 (BAL) +- [ ] Easytrieve +- [ ] IDMS + +Priority 2 — Other legacy platforms: +- [ ] MUMPS / M (healthcare systems) +- [ ] Pick BASIC / UniVerse BASIC +- [ ] PowerBuilder (PowerScript) +- [ ] Progress 4GL / OpenEdge ABL +- [ ] ABAP (SAP) +- [ ] Fortran (scientific/engineering legacy) +- [ ] Informix 4GL +- [ ] Clipper / dBASE / FoxPro +- [ ] Uniface +- [ ] CA Gen / Cool:Gen (generated COBOL) +- [ ] Synon / 2E (AS400 code generator) +- [ ] RM COBOL / Micro Focus COBOL (PC COBOL variants) + +Priority 3 — Niche but encountered: +- [ ] Mapper (Unisys) +- [ ] LINC / Unisys +- [ ] Datapoint DATABUS +- [ ] Tandem TAL / pTAL +- [ ] ADS/Online (IDMS) +- [ ] Telon (generated COBOL) +- [ ] CSP (IBM Cross System Product) + +## Merging Outputs from Multiple Models + +After running the prompt through N models: +1. Compare Task 1 answers — discard outputs from models that answered + "uncertain" on more than 3 questions (they don't know the language well). +2. For each section, pick the most specific and accurate version. +3. Merge "Common Misinterpretations" from all models — different models + catch different blind spots. +4. Have a domain expert (or a high-capability model) do a final review pass. +5. Save the merged guide as `{language}.md` in this directory. diff --git a/partner-built/magellan/skills/ingestion/language_guides/assembler370.md b/partner-built/magellan/skills/ingestion/language_guides/assembler370.md new file mode 100644 index 00000000..d9fb7c9b --- /dev/null +++ b/partner-built/magellan/skills/ingestion/language_guides/assembler370.md @@ -0,0 +1,114 @@ +# IBM Assembler/370 (BAL) Reference Guide + +## Overview + +IBM Assembler/370 (BAL or HLASM) is the native assembly language for IBM System/370, System/390, and z/OS mainframes. It has been in continuous use since the 1960s. Assembler programs are found in system exits, performance-critical routines, I/O handlers, and legacy business logic that predates COBOL adoption. + +Assembler programs are column-sensitive: columns 1-8 hold a label, column 10+ holds the opcode, columns 16+ hold operands. Column 72 marks continuation. `*` in column 1 marks a comment. Macro instructions (`GET`, `PUT`, `OPEN`) expand into multiple machine instructions at assembly time. File extensions are typically `.asm`, `.s`, `.bal`, or PDS members with no extension. + +For business rule extraction, the key challenge is that assembler mixes machine-level register manipulation with business logic. The AI must learn to see through the register operations to the data transformations underneath. + +## Key Constructs + +### Program Structure + +- `CSECT`: Control Section — marks the beginning of a separately relocatable block of code. +- `DSECT`: Dummy Section — defines a data layout without allocating storage, used to map record structures over a buffer (like a COBOL COPY). +- `USING`: Resolves symbolic field names to register+offset pairs. +- `LTORG`: Literal pool — places literal constants (`=F'100'`, `=C'ACTIVE'`) in memory. + +### Data Access (Database / File I/O) + +Assembler accesses files through system macros: + +- **Sequential**: `OPEN (INPUTDCB,(INPUT))`, `GET INPUTDCB,BUFFER`, `PUT OUTPUTDCB,BUFFER`, `CLOSE (INPUTDCB)`. +- **File Definition**: `DCB` defines attributes (DSORG, RECFM, LRECL, BLKSIZE, DDNAME). +- **Indexed/VSAM**: Uses `ACB` (Access Method Control Block) and `RPL` (Request Parameter List) macros: `GET RPL=...`, `PUT RPL=...`, `POINT RPL=...` for keyed access. + +### Control Flow + +- `B label` / `BR R14`: Unconditional branch. +- Conditional branches (`BE`, `BNE`, `BH`, `BL`) rely on the condition code set by the *immediately preceding* instruction. +- `CLC FIELD1,FIELD2`: Compare Logical Character (EBCDIC). +- `CP PKFLD1,PKFLD2`: Compare Packed decimal fields (used for business numbers). +- `TM FLAGBYTE,X'80'`: Test under Mask — tests bits in a byte for status flags. +- `EX R1,INSTRUCTION`: Execute — dynamically modifies and runs an instruction, often used for variable-length moves or compares. + +### Program-to-Program Communication + +- `BALR R14,R15` or `BASR`: Branch And Link — calls subroutine at R15, saves return in R14. +- `LA R1,PARMLIST`: Load parameter list address into R1 before a call. Parameters are a list of addresses; the last has the high-order bit set (VL convention). +- Register conventions: R1 = args, R13 = save area, R14 = return addr, R15 = entry point / return code. + +### Error Handling + +- Return codes in R15: `0` = success, `4` = warning, `8` = error, `12`+ = severe. +- `LTR R15,R15` followed by `BNZ ERROR-RTN`: Standard return code test. +- `ABEND code,DUMP`: Abnormal termination. + +## Common Patterns + +### Packed Decimal Arithmetic (Business Calculation) + +``` + ZAP WKTOTAL,=P'0' ZERO ACCUMULATOR + AP WKTOTAL,INVAMT ADD INVOICE AMOUNT + CP WKTOTAL,THRESHOLD COMPARE TO THRESHOLD + BH OVER-LIMIT BRANCH IF OVER LIMIT +``` + +Business rules: total = invoice; if total > threshold, review. + +### String Translation & Validation + +``` + TRT INPUTFLD,TRTAB FIND FIRST NON-NUMERIC + BZ VALID-NUM 0 = ONLY NUMERICS FOUND + B INVALID-NUM NON-ZERO = INVALID +``` + +`TRT` scans a string until it finds a byte with a non-zero entry in `TRTAB`. Heavily used for data cleansing. + +### Dynamic Execution + +``` + BCTR R5,0 DECREMENT LENGTH BY 1 FOR EX + EX R5,MOVEMAC EXECUTE MVC WITH DYNAMIC LENGTH +... +MOVEMAC MVC TARGET(0),SOURCE 0 LENGTH MEANS 'SUPPLIED BY EX' +``` + +## What Carries Business Logic + +**Extract facts from these:** + +- `CP` (Compare Packed), `AP/SP/MP/DP` (Packed Math), `ZAP` — directly implement business math, thresholds, and monetary rules. +- `TRT` (Translate and Test) — implements data validation rules. +- `CVB`/`CVD` (Convert to Binary/Decimal) and `ED` (Edit) — handle date math and report formatting. +- `TM` (Test under Mask) + `BO/BZ/BM` branches — handle multiple boolean business states packed into bytes. +- `DSECT` definitions — the definitive data dictionary for the program. +- `EX` (Execute) statements — often contain complex, dynamic business logic. +- `GET/PUT` targeting VSAM `RPL`s — defines data extraction and persistence. + +**Skip these (boilerplate):** + +- Standard Entry/Exit Linkage (`STM R14,R12`, `USING`, save area chaining). +- `GETMAIN`/`FREEMAIN` memory management. +- Standard `OPEN`/`CLOSE` macros. +- `EQU` statements defining registers (e.g., `R1 EQU 1`). +- `DS 0D` or `DS 0H` alignment directives. + +## Common Misinterpretations + +1. **Registers are not business variables.** `R3` has no inherent meaning. Trace what was loaded into it (`L R3,CUSTBAL`) to understand a comparison (`C R3,=F'1000'`). The business meaning is in the memory field. +2. **Branch conditions refer to the PREVIOUS instruction.** `BH OVER-LIMIT` means "branch if CC indicates high". AI must pair each branch with the specific instruction that set the CC. +3. **TRT is not a typical translation.** It's a search mechanism. It does not alter the string; it sets condition codes and registers based on a lookup table. +4. **Packed decimal is not readable as text.** `DS PL5` stores 1234567.89 as hex `01234567 8C`. Do not mistake `A/S/C` (binary math) for `AP/SP/CP` (packed math). +5. **DSECT is not executable code.** It maps memory. `USING CUSTREC,R6` applies names to offsets from R6. +6. **MVC is a byte copy, not a value assignment.** `MVC TARGET(5),=C'YES'` copies exactly 5 bytes: 'YES' plus 2 bytes of whatever follows in memory. + +## File Naming Conventions + +- `.asm`, `.s`, `.bal`, `.mlc`: Assembler source files. +- PDS members: 1-8 chars uppercase, program name. +- `.mac`, `.copy`: Macro libraries and DSECT definitions. diff --git a/partner-built/magellan/skills/ingestion/language_guides/cics.md b/partner-built/magellan/skills/ingestion/language_guides/cics.md new file mode 100644 index 00000000..4b28e097 --- /dev/null +++ b/partner-built/magellan/skills/ingestion/language_guides/cics.md @@ -0,0 +1,115 @@ +# CICS COBOL Reference Guide + +## Overview + +CICS (Customer Information Control System) is IBM's online transaction processing (OLTP) monitor for z/OS mainframes. CICS COBOL programs are standard COBOL programs augmented with `EXEC CICS ... END-EXEC` commands providing terminal I/O, file access, and transactional integrity. + +CICS programs are **pseudo-conversational**: the program sends a screen to the user, terminates, and a new instance restarts when the user replies. State is preserved across invocations within the COMMAREA or modern Channels/Containers. Screen layouts are defined in BMS (Basic Mapping Support) mapsets. + +## Key Constructs + +### Program Structure + +- **DFHCOMMAREA**: The communication area passed between invocations, preserving state (current screen, customer record, flags). +- **DFHEIBLK** (EIB): Execute Interface Block — system-provided struct containing metadata: `EIBCALEN` (COMMAREA length), `EIBAID` (last key pressed), `EIBTRNID`. +- **Channels and Containers**: Modern alternative to COMMAREA that removes the 32KB limit. + +### Data Access (Database / File I/O) + +CICS programs do NOT use standard COBOL `READ`/`WRITE`. + +- `EXEC CICS READ FILE('name') INTO(ws-rec) RIDFLD(ws-key) END-EXEC`: Keyed VSAM read. +- `EXEC CICS READ ... UPDATE`: Locks the record for update. Followed by `REWRITE`. +- `EXEC CICS STARTBR` / `READNEXT` / `ENDBR`: Browse through a dataset sequentially. +- **Queues**: `EXEC CICS WRITEQ TS` (Temporary Storage Queue for scratchpad data) and `WRITEQ TD` (Transient Data Queue for triggering batch jobs or logging). + +### Control Flow + +- `EXEC CICS SEND MAP('MAP1') MAPSET('MAPSET1') ERASE`: Send a screen. +- `EXEC CICS RECEIVE MAP('MAP1')`: Receive user input into the symbolic map. +- `EXEC CICS RETURN TRANSID('TRN1') COMMAREA(WS-COMM)`: End invocation; restart `TRN1` when user responds. +- The `EIBAID` field (e.g., `DFHENTER`, `DFHPF3`) dictates workflow branching. + +### Program-to-Program Communication + +- `EXEC CICS LINK PROGRAM('SUBPGM') COMMAREA(WS-DATA)`: Synchronous call (returns control). +- `EXEC CICS XCTL PROGRAM('NEXTPGM') COMMAREA(WS-DATA)`: Transfer control (does NOT return). Used to chain screens. +- `EXEC CICS START TRANSID('TRN2')`: Asynchronously start another transaction. + +### Error Handling + +- `RESP` / `RESP2`: Inline response codes. `IF WS-RESP = DFHRESP(NOTFND)`. +- `EXEC CICS HANDLE CONDITION NOTFND(para-name)`: Legacy implicit asynchronous branch (GOTO). Traps errors gobally. + +## Common Patterns + +### Pseudo-Conversational Main Loop + +```cobol + IF EIBCALEN = 0 + PERFORM FIRST-TIME-SETUP + EXEC CICS SEND MAP('MAIN') ERASE END-EXEC + EXEC CICS RETURN TRANSID('MTRN') COMMAREA(WS-COMM) END-EXEC + ELSE + MOVE DFHCOMMAREA TO WS-COMM + EXEC CICS RECEIVE MAP('MAIN') END-EXEC + PERFORM PROCESS-INPUT + END-IF +``` + +`EIBCALEN = 0` implies no prior state. Non-zero means a continuation. + +### Key-Press Dispatch + +```cobol + EVALUATE EIBAID + WHEN DFHENTER PERFORM PROCESS-ENTER + WHEN DFHPF3 EXEC CICS RETURN END-EXEC + WHEN DFHPF12 EXEC CICS XCTL PROGRAM('MENU') END-EXEC + END-EVALUATE +``` + +### Browse Pattern (READNEXT) + +```cobol + EXEC CICS STARTBR FILE('ACCT') RIDFLD(WS-KEY) END-EXEC. + PERFORM UNTIL WS-RESP = DFHRESP(ENDFILE) + EXEC CICS READNEXT FILE('ACCT') INTO(WS-REC) RESP(WS-RESP) END-EXEC + IF WS-RESP = DFHRESP(NORMAL) PERFORM PROCESS-REC + END-PERFORM. + EXEC CICS ENDBR FILE('ACCT') END-EXEC. +``` + +## What Carries Business Logic + +**Extract facts from these:** + +- `EVALUATE EIBAID` blocks — map function keys to business actions/workflow. +- `EXEC CICS READ/REWRITE/DELETE` with `FILE(...)` — the core data lifecycle. +- `EXEC CICS LINK` and `XCTL` — dependencies on shared business services. +- `EXEC CICS READNEXT` loops — handles subsetting and listing of data. +- COMMAREA mapping before LINK/XCTL — reveals the API contract. +- RESP handlers (`DFHRESP(NOTFND)`, `DFHRESP(DUPREC)`) — encode rules for missing/duplicate data. + +**Skip these (boilerplate):** + +- `HANDLE CONDITION` / `HANDLE ABEND` global traps. +- DFHEIBLK / DFHCOMMAREA memory layout definitions. +- BMS cursor positioning (`MOVE -1 TO fieldL`). + +## Common Misinterpretations + +1. **CICS programs are NOT long-running.** They survive for milliseconds. The conversation spans dozens of separate, stateless program executions chained together by the `COMMAREA`. +2. **EIBCALEN = 0 is not an error.** It signals the start of a brand new workflow session. +3. **LINK is a function call; XCTL is a GOTO.** `LINK` returns. `XCTL` does not. +4. **FILE('xyz') is NOT a dataset name.** It's an FCT (File Control Table) mapping to a VSAM file. The same program can target different files via FCT. +5. **TSQ/TDQ are not standard DB tables.** TSQs are temporary scratchpads (often used to hold paginated search results). TDQs are sequential triggers for asynchronous work. +6. **HANDLE CONDITION alters flow invisibly.** A `HANDLE CONDITION NOTFND(X)` at the top of a program means ANY subsequent `READ` that fails implicitly GOTOs paragraph X. +7. **BMS maps have three layers.** Every symbolic field has `L` (length), `F`/`A` (attribute), and `I`/`O` (input/output data). Only the `I`/`O` suffix handles real business data. + +## File Naming Conventions + +- `.cbl`, `.cob`: Source code. +- `.bms`, `.map`: Screen definitions. +- `.cpy`: Copybooks (COMMAREA layouts). +- PDS Members: Max 8 characters. Transaction IDs max 4 characters (e.g., `TRN1`). diff --git a/partner-built/magellan/skills/ingestion/language_guides/cl.md b/partner-built/magellan/skills/ingestion/language_guides/cl.md new file mode 100644 index 00000000..0342e9d0 --- /dev/null +++ b/partner-built/magellan/skills/ingestion/language_guides/cl.md @@ -0,0 +1,110 @@ +# IBM CL (Control Language) Reference Guide + +## Overview + +CL (Control Language) is IBM's command language for the AS/400 (IBM i). CL programs orchestrate job-level operations: submitting batch jobs, overriding DB files, handling parameters, and executing RPG/COBOL programs. + +CL programs compile into executable objects (`.PGM`, `.clle` for ILE, `.clp` for legacy). **Key insight**: CL programs rarely contain complex business logic. They control WHEN things run (job scheduling), HOW they run (environment routing), and AGAINST WHAT (file overrides). Business rules live in the called RPG/COBOL programs. + +## Key Constructs + +### Program Structure + +``` +PGM PARM(&PARAM1 &PARAM2) + DCL VAR(&PARAM1) TYPE(*CHAR) LEN(10) + /* body */ +ENDPGM +``` + +- Variables prefix with `&`. Types include `*CHAR`, `*DEC`, `*LGL` (boolean). + +### Commands (The Action Layer) + +- **Execution**: `CALL PGM(lib/pgm)` (dynamic), `CALLPRC` (bound procedure). +- **Batch**: `SBMJOB CMD(...) JOB(name)` submits work asynchronously. +- **File Overrides**: `OVRDBF FILE(logical) TOFILE(physical)` dynamically points a program's file reference to a different database file. `OVRPRTF` overrides print formats. +- **Library List**: `ADDLIBLE`, `RMVLIBLE`. Alters the search path for finding databases and programs (environment routing). +- **Data Areas / Queues**: `RTVDTAARA` (Retrieve Data Area for global settings), `CHGDTAARA`. `SNDDTAQ` / `RCVDTAQ` (Send/Receive Data Queue) for async IPC. + +### Control Flow + +- `IF COND(...) THEN(...) ELSE(...)` +- Operators: `*EQ`, `*NE`, `*GT`, `*AND`, `*OR`. +- `SELECT / WHEN / OTHERWISE` for multi-branching. + +### Error Handling — MONMSG + +``` +CALL PGM(MYPGM) +MONMSG MSGID(CPF0000) EXEC(DO) + /* Handle failure */ +ENDDO +``` + +- `MONMSG` acts as a structured try/catch. Placed directly after a command, it catches errors just for that command. Placed at the top level, it catches them globally. + +## Common Patterns + +### Environment File Routing + +``` +IF COND(&ENV *EQ 'PROD') THEN(DO) + OVRDBF FILE(MAST) TOFILE(PRODDTA/MAST) +ENDDO +ELSE CMD(DO) + OVRDBF FILE(MAST) TOFILE(TESTDTA/MAST) +ENDDO +CALL PGM(PROCESS) +``` + +The CL determines which dataset the RPG program mutates. + +### Batch Orchestration via Job Queues + +``` +SBMJOB CMD(CALL PGM(RPTGEN) PARM(&DATE)) JOB(NIGHTRPT) JOBQ(QBATCH) +``` + +Passes runtime parameters to an async background job. + +### Async Coordination via Data Queues + +``` +RCVDTAQ DTAQ(ORDQ) WAIT(-1) DATA(&ORDDATA) +CALL PGM(PROCESSORD) PARM(&ORDDATA) +``` + +Wait infinitely for a message on a queue, then process it. Common pattern for decoupling online input from background processing. + +## What Carries Business Logic + +**Extract facts from these:** + +- `CALL` / `CALLPRC` — maps program dependencies. +- `OVRDBF` / `OVRPRTF` — maps dynamic data relationships. If `OVRDBF` occurs conditionally, it dictates distinct business scenarios. +- `SBMJOB` — reveals batch integration points and chronologies. +- `IF / SELECT` statements on parameters (`&PARAM`) — represents business branching. +- `SNDDTAQ` / `RCVDTAQ` — reveals asynchronous, event-driven business architectures. +- `RTVDTAARA` / `*LDA` (Local Data Area) — extracts system configuration limits or cross-program communication variables. + +**Skip these (boilerplate):** + +- Global `MONMSG MSGID(CPF0000)` without an `EXEC` block. +- `DLTOVR` file override cleanup. +- String slicing and padding variables for command execution. + +## Common Misinterpretations + +1. **Parameter Length mismatch is fatal.** CL handles parameters by reference. If a CL passes a 10-byte variable `&PARAM` but the RPG program expects 50 bytes, memory corruption occurs. Implicit sizes matter heavily. +2. **OVRDBF modifies the architecture, not the filesystem.** The RPG program compiled against `MYFILE`. CL intercepts references to `MYFILE` at runtime and redirects to `HISFILE`. It is dependency injection for files. +3. **MONMSG is not a console log.** It prevents the job from crashing and throwing a hard halt. It is strictly exception trapping. +4. **SBMJOB does not halt the calling program.** The command dispatches to a queue. The CL immediately proceeds to the next line. +5. **LDA (Local Data Area) is implicitly passed.** `*LDA` is a special 1024-byte memory space bound to the job. RPG and CL can communicate via it without explicitly declaring parameters. +6. **Library Lists dictate data access.** `ADDLIBLE` isn't just a PATH variable; it completely changes the target of an unqualified file reference. + +## File Naming Conventions + +- Extns: `.clle` (ILE CL, modern), `.clp` (Original Program Model CL). +- Often prefixed with `CL` or `CLP` (e.g., `CLORDPROC`). +- Programs are stored in source physical files, commonly `QCLSRC`. diff --git a/partner-built/magellan/skills/ingestion/language_guides/cobol.md b/partner-built/magellan/skills/ingestion/language_guides/cobol.md new file mode 100644 index 00000000..aced1073 --- /dev/null +++ b/partner-built/magellan/skills/ingestion/language_guides/cobol.md @@ -0,0 +1,150 @@ +# COBOL ILE Reference Guide + +## Overview + +COBOL (Common Business-Oriented Language) on the AS/400 (IBM i) runs in the ILE (Integrated Language Environment). COBOL ILE programs (`.cblle`) support bound calls, service programs, and activation groups — features not available in standard COBOL. + +COBOL is verbose by design. Programs are structured into four divisions: + +1. **IDENTIFICATION DIVISION** — program name and metadata +2. **ENVIRONMENT DIVISION** — file assignments and special names +3. **DATA DIVISION** — all variable and data structure declarations +4. **PROCEDURE DIVISION** — the executable business logic + +## Key Constructs + +### Data Division — Variable Declarations + +```cobol +WORKING-STORAGE SECTION. +01 WS-INV-AMT PIC 9(7)V99 COMP-3. +01 WS-STATUS PIC XX. + 88 WS-ACTIVE VALUE 'AC'. + 88 WS-INACTIVE VALUE 'IN'. +01 WS-TABLE. + 05 WS-ENTRY OCCURS 10 TIMES INDEXED BY IDX. + 10 WS-ID PIC X(5). +``` + +- `PIC 9(7)V99 COMP-3`: Packed Decimal. `V` is implied decimal. `COMP-3` means packed. +- `88` level: Condition names (boolean tests). `IF WS-ACTIVE` tests if `WS-STATUS` = `'AC'`. +- `OCCURS`: Defines an array/table. +- `INDEXED BY`: Defines an index used for `SEARCH` (linear) or `SEARCH ALL` (binary search). + +### File Control + +```cobol +FILE-CONTROL. + SELECT CUSTOMER-FILE ASSIGN TO PFDEALRMST + ORGANIZATION IS INDEXED + ACCESS MODE IS DYNAMIC + RECORD KEY IS CUST-ID. +``` + +- `SELECT ... ASSIGN TO`: Maps a logical file name to a physical database table. + +### Procedure Division — Business Logic + +```cobol +PROCEDURE DIVISION. + PERFORM 1000-INITIALIZE + PERFORM 2000-PROCESS UNTIL WS-EOF = 'Y' + PERFORM 9000-CLEANUP + STOP RUN. +``` + +- `PERFORM ... UNTIL`: Standard loop structure. +- Paragraph names (1000, 2000) indicate a call hierarchy convention. + +### File I/O Operations + +- `OPEN INPUT/OUTPUT/I-O/EXTEND`: Specific modes enforce constraints. +- `READ file NEXT`: Sequential read. +- `READ file KEY IS key-var`: Random/keyed read. +- `WRITE rec FROM ws-rec`: Insert a new record. +- `REWRITE`: Update a locked record (must immediately follow a successful READ). +- `DELETE`: Remove a locked record. + +### String and Table Processing + +- `SEARCH ALL`: Binary lookup on an ordered table (must use `INDEXED BY`). +- `STRING ... DELIMITED BY`: Concatenation. +- `UNSTRING txt DELIMITED BY ',' INTO A B`: Heavily used for parsing CSV or delimited files. +- `INSPECT ... TALLYING / REPLACING`: Find, count, and replace characters. + +### ILE-Specific Features + +- **Bound calls** (`CALL "program"`): Bound calls use the ILE linker (direct). Dynamic calls resolve late. +- **Service programs**: Export shared subprocedures. +- **Activation groups**: Isolate shared opens, COMMIT controls, and overrides. +- **Embedded SQL** (`EXEC SQL ... END-EXEC`): Bypasses native ISAM access for relational processing. + +## Common Patterns + +### Read-Process Loop + +```cobol +PERFORM UNTIL WS-EOF = 'Y' + READ INPUT-FILE INTO WS-REC + AT END + MOVE 'Y' TO WS-EOF + NOT AT END + PERFORM 2100-PROCESS-RECORD + END-READ +END-PERFORM. +``` + +### Table Lookup (Business Reference Data) + +```cobol +SEARCH ALL WS-ENTRY + AT END + MOVE 'NOT FOUND' TO WS-RESULT + WHEN WS-ID (IDX) = SEARCH-ID + MOVE 'FOUND' TO WS-RESULT. +``` + +Table lookups in Working-Storage often house hardcoded business matrices or tiers. + +### EVALUATE (Switch) Statement + +```cobol +EVALUATE TRUE + WHEN WS-AMT > 500 PERFORM 3000-HIGH-VALUE + WHEN WS-STATUS = 'X' PERFORM 4000-EXCEPTION + WHEN OTHER PERFORM 5000-STANDARD +END-EVALUATE. +``` + +## What Carries Business Logic + +**Extract facts from these**: + +- `IF / EVALUATE` — encodes explicit business branching, thresholds, and conditions. +- `88` levels — definitions of business states (e.g., `88 APPROVED VALUE 'Y'`). +- `SEARCH / SEARCH ALL` — lookups against application-specific code definitions. +- `SELECT ASSIGN TO` — the physical file dependencies. +- `READ / WRITE / REWRITE` — the persistence layers of a business entity. +- Hardcoded literals in `COMPUTE` or `IF` statements representing rates or limits. + +**Skip these (boilerplate)**: + +- `IDENTIFICATION DIVISION` (metadata). +- `ENVIRONMENT DIVISION` `CONFIGURATION SECTION`. +- Standard `OPEN / CLOSE`. +- `MOVE SPACES` initialization blocks. + +## Common Misinterpretations + +1. **Paragraph numbers are convention, not syntax.** `3200-MANUAL-REVIEW` acts as a subroutine label. Lower numbers = main line; higher = deeper nested operations. +2. **PIC 9(7)V99 is not a string.** The `V` implies a decimal point. It's stored numerically but displayed based on the PIC. `COMP-3` means packed decimal; `COMP` means binary integer. +3. **88-levels are boolean conditions, not variables.** `88 WS-ACTIVE VALUE 'A'` means "true if WS-STATUS = 'A'". You cannot move data to an 88-level; you evaluate it. +4. **REWRITE requires a prior READ.** A sequential `REWRITE` or update locks the record upon `READ`. It cannot be updated globally without a lock. +5. **MOVE CORRESPONDING hides assignments.** `MOVE CORR WS-REC1 TO WS-REC2` silently matches fields by identical name. It obscures data lineage. +6. **PERFORM vs CALL.** `PERFORM` jumps inside the SAME program. `CALL` accesses an EXTERNAL module. + +## File Naming Conventions + +- Extns: `.cblle`, `.cbl`, `.cob`. +- Prefixes: Typically `CB` or `C` for COBOL (e.g., `CBLPROC`). +- Copybooks: Include `-CPY` or exist in `QCBLLESRC` as partial members. diff --git a/partner-built/magellan/skills/ingestion/language_guides/dds.md b/partner-built/magellan/skills/ingestion/language_guides/dds.md new file mode 100644 index 00000000..899b6246 --- /dev/null +++ b/partner-built/magellan/skills/ingestion/language_guides/dds.md @@ -0,0 +1,116 @@ +# DDS (Data Description Specifications) Reference Guide + +## Overview + +DDS is IBM's language for defining database files, display screens, printer layouts, and menus on the AS/400 (IBM i). DDS compiles into system objects — it is NOT executable code. It defines the STRUCTURE and ACCESS PATHS. + +DDS is fixed-format (column-positional). A line describes a record format, a field, a key, or a keyword. + +- **Physical files** (`.pf`): Database tables. 1 record format. +- **Logical files** (`.lf`): Views/indexes over physical files. +- **Display files** (`.dspf`): Green screen (5250) definitions. +- **Printer files** (`.prtf`): Report layouts. + +## Key Constructs + +### Physical Files — Database Tables + +``` +A R CUSTREC TEXT('Customer Master') +A CUSTID 10A +A CUSTNAME 50A COLHDG('Cust' 'Name') +A CUSTBAL 9P 2 +A STATUS 2A +A K CUSTID +``` + +- `A`: DDS line indicator. +- `R`: Record format name. +- Data Types: `A` (Alphanumeric), `P` (Packed decimal), `S` (Signed/zoned decimal), `L/T/Z` (Date/Time/Timestamp). +- `K`: Key field (primary index). + +### Logical Files — Views and Indexes + +``` +A R CUSTBYNM PFILE(CUSTMAST) +A CUSTID +A CUSTNAME +A K CUSTNAME +A S STATUS COMP(EQ 'AC') +``` + +- `PFILE`: Base physical file. +- `K`: Selects an alternative access path for DB read operations. +- `S` / `O`: Select / Omit. A static filter applied across the file ("WHERE STATUS = 'AC'"). Programs reading this LF never see omitted rows. + +### Display Files — Screen Definitions + +``` +A R INQUIRY +A CA03(03 'Exit') +A CUSTID 10A B 5 2 TEXT('Input ID') +A CUSTBAL 9P 2O 9 2 EDTCDE(J) +``` + +- `B` = Both (I/O), `I` = Input, `O` = Output. +- `CA03(03...)` maps F3 key to Indicator 03. +- `EDTCDE(J)` applies a formatting mask (commas, decimals) to a raw number. + +### Field References + +``` +A CUSTID R REFFLD(CUSTID *LIBL/CUSTREF) +``` + +- `R` (Reference) and `REFFLD`: The field relies entirely on an external Data Dictionary file for its datatype, length, and description. + +## Common Patterns + +### LF Subsetting (Logical Views) + +A Master PF might have 10 LFs over it. + +- `LF1`: Key = CUSTID, `S STATUS = 'ACTIVE'` +- `LF2`: Key = ZIPCODE, `S STATE = 'NY'` +RPG programs issue a `CHAIN` to `LF2` implicitly passing a zipcode, completely shielding the RPG prog from the "WHERE" clause logic. + +### Join Logical Files + +``` +A R ORDREC JFILE(ORDHDR ORDLINE) +A J JOIN(1 2) +A JFLD(ORDID ORDID) +A ORDID JREF(1) +``` + +Defines a permanent DB-level join matching Header lines to Detail lines. + +## What Carries Business Logic + +**Extract facts from these**: + +- `K` (Key) fields — reveal the primary query axes of the business data. +- `S` / `O` (Select/Omit) — define core business domains ("Active Status", "Expired Policy"). +- `JFILE` / `JOIN` / `JFLD` — define exact foreign keys and relationships between datasets. +- `EDTCDE` / `EDTWRD` — formatting hints reveal if a 9P0 is a Phone Number, SSN, or Dollar value. +- `COMP`, `RANGE`, `VALUES` (in DSPF) — dictate hardcoded UI input validation rules. +- Field names and `TEXT` keywords document the Data Dictionary. + +**Skip these (boilerplate)**: + +- Row/Column screen positioning coordinates (`5 2`). +- Screen colors (`COLOR(BLU)`), attributes (`DSPATR`), and high-lighting. +- Generic function key descriptors without business routing context. + +## Common Misinterpretations + +1. **Logical Files are NOT clones or duplicates.** They are inverted lists/indexes over physical data. Updating an LF updates the PF. +2. **Select/Omit is not a runtime parameter.** An LF hardcodes the 'WHERE' clause. An RPG program cannot dynamically change an LF's Select condition. +3. **The 10-char limit generates heavy acronyms.** `PFDEALRMST` = Dealer Master. Rely entirely on the `TEXT` or `COLHDG` attributes for meaning. +4. **Keyed files map 1:1 with RPG operations.** An RPG program executing `CHAIN (Name)` means it *must* be pointing at an LF keyed by Name. +5. **Reference fields hide schema.** A DDS line with just an `R` and no type (`10A`) means the AI must lookup the referenced dictionary file to know what data exists here. + +## File Naming Conventions + +- Extns: `.pf`, `.lf`, `.dspf`, `.prtf`. +- Formats: Prefix `PF` = Physical, `LF` = Logical, `DS` = Display, `PR` = Printer. diff --git a/partner-built/magellan/skills/ingestion/language_guides/easytrieve.md b/partner-built/magellan/skills/ingestion/language_guides/easytrieve.md new file mode 100644 index 00000000..db3f67bc --- /dev/null +++ b/partner-built/magellan/skills/ingestion/language_guides/easytrieve.md @@ -0,0 +1,123 @@ +# Easytrieve (CA Easytrieve Plus) Reference Guide + +## Overview + +Easytrieve is a report generator and data manipulation 4GL originally developed by Pansophic Systems. It is heavily optimized for batch extraction, file matching, and data validation on mainframes (z/OS). + +Source files use `.EZT`, `.EZTV`, or `.EZP` (off-mainframe). Programs are interpreted directly at execution within a JCL step. + +**The core architectural concept of Easytrieve is the implicit loop.** The `JOB` statement automatically opens the listed files, reads every record sequentially, executes all logic beneath it once per record, and automatically closes the files at the end. You rarely write a `READ` loop; the platform *is* the read loop. + +## Key Constructs + +### Program Structure + +- **Library Section** (Header): All `FILE` and `W` (Working Storage) declarations must appear before the first `JOB`. +- `FILE CUSTFILE FB(80 0)`: Declares a file. 80-byte Fixed Block format. +- `CUST-NAME 11 25 A`: A field definition maps directly to bytes on the disk record. Start at position 11, length 25, Alphanumeric. +- `W-TOTAL W 8 P 2`: Variable in memory (Working Storage). 8 digits packed, 2 decimals. +- `JOB INPUT CUSTFILE`: The entry point for the process loop. +- `REPORT`: Declarative trailing section detailing report breaks and formatting. + +### Data Access & File Operations + +- **Implicit Sequential**: The primary file listed on `JOB INPUT primary` is read automatically. +- **Explicit Secondary**: `GET secondary-file` reads a single record from a secondary file manually. +- **Output**: `PUT outfile FROM infile` writes a record. +- **In-Memory Tables**: `FILE REFTBL TABLE`. Loaded entirely into memory before processing. Searched via `SEARCH REFTBL WITH key EQ value`. +- **Match/Merge**: `JOB INPUT FILEA FILEB` automatically reads both files synchronously, aligning them by matching sorting keys. + +### Control Flow + +- `IF / ELSE / END-IF`: Standard conditionals. +- `DO WHILE ... END-DO` +- `PERFORM paragraph`: Subroutine execution. +- `STOP`: Halts the entire program execution instantly (often used on fatal errors). +- `GOTO label`: Legacy branching. + +## Common Patterns + +### The Implicit Validation Loop + +```easytrieve +FILE TRANSFB FB(100 0) + TRN-ID 1 10 A + TRN-AMT 11 8 P 2 + +JOB INPUT TRANSFB + IF TRN-AMT > 50000.00 + PRINT AUDIT-REPORT + ELSE + PUT PRODFILE FROM TRANSFB + END-IF +``` + +This script acts as a filter. Every record > 50K goes to paper, the rest pass through to the next phase of the batch. + +### Synchronized File Match / Merge + +```easytrieve +FILE MASTER FB(80 0) + M-KEY 1 10 A +FILE UPDATE FB(80 0) + U-KEY 1 10 A + +JOB INPUT MASTER UPDATE + IF MATCHED + PERFORM UPDATE-LOGIC + ELSE + IF MASTER + PERFORM MASTER-ONLY-LOGIC + END-IF + END-IF +``` + +The builtin `MATCHED` boolean implies the `M-KEY` and `U-KEY` aligned perfectly during the automatic dual-read. `IF MASTER` means an orphan master record exists with no update. + +### In-Memory Table Lookup + +```easytrieve +FILE STATUSTBL TABLE FB(20 0) + TBL-KEY 1 10 A + TBL-DESC 11 10 A + +JOB INPUT TRANSACTIONS + SEARCH STATUSTBL WITH TBL-KEY EQ TRN-STATUS + IF STATUSTBL + W-DESC = TBL-DESC + ELSE + W-DESC = 'UNKNOWN' + END-IF +``` + +The `IF STATUSTBL` is true if the in-memory search succeeded. + +## What Carries Business Logic + +**Extract facts from these:** + +- `IF` conditionals — dictate thresholds, limits, and status evaluation. +- `SEARCH ... WITH` operations — explicitly link transactional codes to business reference definitions. +- `JOB INPUT` parameters — show the primary data lineage and match/merge topologies. +- `MATCHED` routing blocks — define the business reconciliation rules between datasets. +- Assignments (`=`) to `W-` fields — calculations accumulating totals or tax. + +**Skip these (boilerplate):** + +- `REPORT` formatting blocks, `TITLE`, `HEADING`, `LINE` (unless a native calculation occurs inline). +- Absolute byte positions `11 25 A` in File definitions (the layout is structural, the names hold the meaning). +- End of file housekeeping labels. + +## Common Misinterpretations + +1. **JOB is an implicit loop.** It is not a single execution block. Everything under it executes *N* times for *N* records in the file. +2. **There is no explicit READ for the primary flow.** Do not look for `READ INPUT` to figure out where data comes from. The `JOB` parameter handles it invisibly. +3. **Byte lengths do not equal string lengths for Packed fields.** An 8-byte Packed (`P`) field stores 15 numeric digits (plus a sign). +4. **W-fields vs Record-fields.** A field declared with `W` exists in RAM globally. A field declared without a `W` (e.g. `CUST-ID 1 10 A`) is an overlay on the current disk record buffer. +5. **TABLE files execute entirely in RAM.** A `FILE ... TABLE` is ingested totally at startup. `SEARCH` does no I/O. +6. **MATCHED is a reserved state, not a variable.** It toggles dynamically as the `JOB` iterator balances the two input files. + +## File Naming Conventions + +- Extns: `.EZT`, `.EZTV`, `.EZP`. +- On Mainframe: Usually stored in a PDS named `EASYTRV` or `EZTSRC`. Max 8 character member names. diff --git a/partner-built/magellan/skills/ingestion/language_guides/idms.md b/partner-built/magellan/skills/ingestion/language_guides/idms.md new file mode 100644 index 00000000..ba2406d2 --- /dev/null +++ b/partner-built/magellan/skills/ingestion/language_guides/idms.md @@ -0,0 +1,115 @@ +# IDMS (Integrated Database Management System) Reference Guide + +## Overview + +IDMS is a network-model (CODASYL) database management system running on IBM mainframes since the 1970s. It is distinctly **NOT relational**. There are no tables, no foreign keys, and no SQL JOINs. + +IDMS structures data via **Records** connected by **Sets** (physical linked lists representing one-to-many relationships). Programs interact with IDMS navigationally, moving a "Currency Pointer" precisely through the network graph. + +Logic is typically written in **COBOL-DML** (COBOL with embedded native DB traversal commands) or **ADS/Online** (a native 4GL dialog system). + +## Key Constructs + +### Program Structure + +- **SCHEMA**: The global database map. +- **SUBSCHEMA**: The subset map bound to this specific program. `MOVE 'SS-NAME' TO SUBSCHEMA-ID` authorizes access. +- **IDMS COMMUNICATIONS BLOCK**: Global error and status array. `ERROR-STATUS` dictates the result of every step. + +### Data Access (Navigating the Network) + +Because there are no JOINs, you must "walk" the database graph physically using DML (Data Manipulation Language). + +- `OBTAIN CALC record`: The fastest entry point. Looks up a root record via a hashing algorithm on its primary key. +- `OBTAIN NEXT record WITHIN set`: Walks forward through the linked list of children owned by the current parent. +- `OBTAIN OWNER WITHIN set`: Navigates backwards from a child to its parent record. +- `FIND`: Moves the currency pointer through the network without actually fetching the data into COBOL memory (highly optimized check). +- `STORE`: Inserts a new record *and automatically connects it* to all mandatory Sets. +- `MODIFY`: Updates the record under the current currency pointer. +- `ERASE`: Deletes the current record (`ERASE PERMANENT` physically cascades down to delete all child members in owned sets). + +### Control Flow and Error Handling + +- `ERROR-STATUS`: Evaluated after *every* DML verb. + - `0000` = Success + - `0326` = End of Set (No more children to loop through) + - `0306` = CALC Not Found (Primary key doesn't exist) + - `0069` = Deadlock +- Standard `PERFORM IDMS-STATUS` paragraphs wrap global abort logic. + +### ADS/Online Dialogs + +A unique 4GL environment specifically for IDMS terminal screens. + +- **Premap Process**: Code executed before drawing the UI. +- **Response Process**: Code executed after the user hits Enter. +- Commands: `LINK TO DIALOG 'Menu'`, `DISPLAY AND WAIT`, `IF ... LEAVE`. + +## Common Patterns + +### The Root Entry Lookup (CALC) + +```cobol +MOVE '12345' TO CUST-ID. +OBTAIN CALC CUSTOMER. +IF DB-STAT-OK + PERFORM EXISTING-CUST-LOGIC +ELSE + IF DB-REC-NOT-FOUND + PERFORM NEW-CUST-LOGIC + END-IF +END-IF. +``` + +### Walking the Graph (Parent to Children Iteration) + +```cobol +OBTAIN CALC CUSTOMER. +PERFORM UNTIL DB-END-OF-SET + OBTAIN NEXT ORDER WITHIN CUST-ORDER-SET + IF DB-STAT-OK + ADD ORDER-TOTAL TO WS-CUST-GRAND-TOTAL + END-IF +END-PERFORM. +``` + +This replaces a `SELECT * FROM ORDERS WHERE CUSTID = 12345`. Instead of a query, the program physically traverses the `CUST-ORDER-SET` linked list until it hits the end (`0326`). + +### Walking Upwards (Child to Parent) + +```cobol +OBTAIN CALC ORDER. +OBTAIN OWNER WITHIN CUST-ORDER-SET. +DISPLAY CUST-NAME. +``` + +There is no Foreign Key on the `ORDER` record holding the Customer ID. The program navigates the internal `OWNER` pointer backwards up the hierarchy to find out who owns the order. + +## What Carries Business Logic + +**Extract facts from these:** + +- `OBTAIN CALC` targets — what are the primary entry entities into the business workflow? +- `OBTAIN NEXT WITHIN` loops — how does the logic aggregate or evaluate children datasets? +- `IF ERROR-STATUS =` checks — specifically handling `0306` (Missing) vs `0326` (End of line) maps to explicit business pathways. +- `STORE / ERASE` execution blocks — where does data mutate, and does it use `ERASE PERMANENT` (cascading business deletion)? +- ADS `Response` processes — these house the strict validation rules applied against human input. + +**Skip these (boilerplate):** + +- Generic `PERFORM IDMS-STATUS` abort evaluations. +- `BIND RUN-UNIT` / `READY` / `FINISH` transaction lifecycle setup. + +## Common Misinterpretations + +1. **IDMS is not relational SQL.** Do not frame analysis around "Tables" or "Foreign Keys". `CUST-ORDER-SET` is a physical linked list traversal. +2. **Currency is implicit global state.** `MODIFY` updates whatever the database currently points at. If an intervening paragraph does a `FIND` on a different record, currency shifts invisibly. +3. **OBTAIN is not a bulk SELECT.** `OBTAIN CALC CUSTOMER` fetches exactly 1 record. Loops are mandatory to fetch sets. +4. **0326 and 0306 are standard flow, not fatal errors.** Trapping `0326` (End of Set) is the correct way to terminate a `WHILE` loop in IDMS. It is not an exception. +5. **STORE auto-wires relationships.** A `STORE ORDER` command implicitly wires the Order into the `CUST-ORDER-SET` if currency was already established on the Customer. +6. **FIND vs OBTAIN.** `FIND` locates the record in the DB engine but leaves COBOL's memory empty. `OBTAIN` locates it and moves the bytes into Working Storage. + +## File Naming Conventions + +- Extns: `.cbl`, `.cob`. +- Dialogs: Often stored internally in the Integrated Data Dictionary (IDD) rather than raw text files. Mapped as `dialog.PREMAP` or `dialog.RESPONSE`. diff --git a/partner-built/magellan/skills/ingestion/language_guides/jcl.md b/partner-built/magellan/skills/ingestion/language_guides/jcl.md new file mode 100644 index 00000000..1db1499c --- /dev/null +++ b/partner-built/magellan/skills/ingestion/language_guides/jcl.md @@ -0,0 +1,96 @@ +# JCL (Job Control Language) Reference Guide + +## Overview + +JCL (Job Control Language) is the batch scripting format for IBM z/OS. JCL orchestrates execution: it defines **what programs run**, **in what order**, **with what data files**, and **under what conditions**. A single JCL stream typically maps to a holistic business process (e.g., Nightly Settlement, End of Month Billing). + +It uses rigid column-positional syntax (`//` in cols 1-2). Max 8 character names. JCL rarely holds procedural loop logic itself; instead, its power lies in tying disparate utilities, database modules, and custom COBOL logic into a cohesive pipeline. + +## Key Constructs + +### Program Structure + +- `//jobname JOB`: The outer wrapper defining the execution queue (`CLASS`), priority, and reporting (`MSGCLASS`). +- `//stepname EXEC PGM=pgmname,PARM='args'`: The heart of JCL. Executes a COBOL, ASM, or Utility program, passing optional `PARM` args to it. +- `//stepname EXEC PROC=procname`: Calls a parameterized JCL macro/template. +- `// INCLUDE MEMBER=name`: Inserts another PDS member statically like a macro. + +### Data Access + +- `//ddname DD DSN=dataset.name,DISP=(status,normal,abnormal)`: Data Definition. Maps a logical target (`ddname`) expected by the program to the physical `dataset.name`. +- `DISP=(SHR)` (Shared read), `DISP=(OLD,KEEP)` (Exclusive lock), `DISP=(NEW,CATLG,DELETE)` (Create, register, or delete on fail). +- `//SYSIN DD *`: Inline configuration or control cards, terminated by `/*`. Commonly feeds parameters to Sorts, DB2 utilities, or COBOL. + +### Control Flow + +- `COND=(rc,operator)` on `EXEC`: Skips this step if a condition is true. `COND=(4,LT)` means "skip if 4 is LESS THAN any preceding return code" (i.e., skip if error >= 8). +- `//IF1 IF (STEP1.RC EQ 0) THEN`: Modern explicit branching. +- `RESTART=stepname` on `JOB`: Force resume from an aborted step. + +### Essential Utilities + +JCL executes IBM utilities heavily. These hold immense implicit business processing: + +- **IDCAMS**: Manages VSAM files (Delete, Define, Backup/Restore). +- **SORT (DFSORT / SyncSort)**: Extremely powerful. Can pre-filter data (`INCLUDE COND`), pre-aggregate records (`SUM FIELDS`), and reformat bytes. **Omitted sort logic destroys business rule lineage.** +- **IKJEFT01**: The TSO terminal monitor. Used heavily to execute DB2/SQL batch programs (via `DSN RUN PROGRAM(...)`). +- **IEBGENER / ICEGENER**: Copies data from one dataset to another or routes sequential files to print. + +## Common Patterns + +### Utility Pre-Processing, Custom COBOL Main-Processing + +``` +//SORT1 EXEC PGM=SORT +//SORTIN DD DSN=PROD.RAW.DATA,DISP=SHR +//SORTOUT DD DSN=&&TMPDATA,DISP=(NEW,PASS) +//SYSIN DD * + SORT FIELDS=(1,10,CH,A) + INCLUDE COND=(15,2,CH,EQ,C'AC') +/* +//PROCSS EXEC PGM=BILLCOBOL +//INPUT DD DSN=&&TMPDATA,DISP=(OLD,PASS) +``` + +The Sort filters strictly for `AC` (Active) accounts. The downstream COBOL program assumes the data is pre-validated. + +### DB2 Batch Execution + +``` +//DB2RUN EXEC PGM=IKJEFT01 +//SYSTSIN DD * + DSN SYSTEM(DB2P) + RUN PROGRAM(COBOLPGM) PLAN(COBPLAN) PARMS('01/01/2024') + END +/* +``` + +## What Carries Business Logic + +**Extract facts from these**: + +- `EXEC PGM=` and `EXEC PROC=` chronologies — define the macro-level order of operations. +- `DSN=` names in `DD` cards — track the lineage of master files to temporary work files (`&&TMP`) to output final report files. +- `SYSIN DD *` blocks passed to `SORT` — `INCLUDE COND` and `SUM FIELDS` are pure data filtration business logic occurring outside of COBOL. +- `COND/IF` branching — dictates the error handling routing of the batch night. +- `PARM=` arguments and `SYSTSIN` lines — reveal dates, environment overrides, or toggle flags mapping to COBOL's `LINKAGE SECTION`. + +**Skip these (boilerplate)**: + +- `SYSOUT=*`, `SYSMDUMP`, `SYSPRINT` — log routing. +- `STEPLIB/JOBLIB` — OS library search paths. +- Storage mapping allocations (`SPACE=(CYL,(...))`, `DCB=...`). + +## Common Misinterpretations + +1. **JCL is not just a runner.** A `SORT` step pre-aggregating monetary records is *just as critical* to the business process as the COBOL program. Don't skip `SORT` parameters. +2. **COND tests evaluate to SKIP, not to KEEP.** `COND=(12,EQ)` translates to: "If the prior RC equals 12, SKIP this step." +3. **DD names are the program interface.** If COBOL says `SELECT MASTER-IN ASSIGN TO 'MSTR01'`, then JCL MUST have `//MSTR01 DD DSN=...`. This allows dynamic data switching. +4. **GDGs are relative date trackers.** Dataset `PROD.FILE(+1)` creates today's version. `PROD.FILE(0)` reads the latest. `PROD.FILE(-1)` reads yesterday's. +5. **Double Ampersand (`&&`) means Temporary.** `DSN=&&TEMP1` exists only for this specific batch execution pipeline and cannot be retrieved globally. + +## File Naming Conventions + +- PDS members: 1-8 chars, e.g., `DLYBILL`, `ACCTEXT`. +- Extns: `.jcl`, `.job`. +- Dataset patterns: `ENV.SYSTEM.PROCESS.TYPE` (`PROD.GL.MONTHLY.REPORTS`). diff --git a/partner-built/magellan/skills/ingestion/language_guides/natural_adabas.md b/partner-built/magellan/skills/ingestion/language_guides/natural_adabas.md new file mode 100644 index 00000000..b7ad457d --- /dev/null +++ b/partner-built/magellan/skills/ingestion/language_guides/natural_adabas.md @@ -0,0 +1,113 @@ +# Natural / ADABAS Reference Guide + +## Overview + +Natural is Software AG's 4GL (compiled or interpreted) designed for business platforms, natively coupled with ADABAS (an inverted-list, non-relational database). + +Files have extensions `.NSP` (program), `.NSN` (subprogram), `.NSL`/`.NSG`/`.NSA` (Data Areas for Local/Global/Parameters), `.NSM` (screen Maps). +Code operates in **Structured Mode** (modern, explicit block bounds) or **Reporting Mode** (legacy, implicit looping). + +## Key Constructs + +### Program Structure + +- variables reside in Data Areas. `DEFINE DATA LOCAL USING LDA-NAME`. +- **Views**: `1 EMPLOYEES-V VIEW OF EMPLOYEES` maps program logic to ADABAS file descriptors. +- `LOCAL`, `GLOBAL`, `PARAMETER` define variable scopes. + +### Data Access (Non-Relational) + +- `READ view BY descriptor`: Loops through a table sequentially using an index route. +- `FIND view WITH field = 'x'`: Extracts records using inverted list lookups. Triggers an implicit `FOR EACH` loop. +- `GET view ISN-VALUE`: Direct address grab by Internal Sequence Number (Physical ID). +- `HISTOGRAM`: Read index counts efficiently without opening full records. +- `UPDATE / DELETE`: Mutates locked instances. +- **Reporting Mode clauses**: `ACCEPT IF...`, `REJECT IF...` (Inline row-level DB filtration) and `AT BREAK OF field` (Aggregation triggers). + +### Control Flow + +- `DECIDE ON FIRST VALUE OF field` (Switch/Case statement). +- `FOR`, `REPEAT UNTIL`. +- `ESCAPE TOP`: Jump to next loop iteration. `ESCAPE BOTTOM`: Break out of loop. `ESCAPE ROUTINE`: Return from method. + +### Program-to-Program Communication + +- `CALLNAT 'NPGM' parm1`: Subprogram call (Push/Pop context). +- `FETCH 'NPGM'`: Permanent transfer of control. `FETCH RETURN` preserves it. +- `STACK COMMAND 'pgm'`: Places commands in a global queue to dictate subsequent UI workflows natively. + +### Error Handling + +- `ON ERROR ... END-ERROR`. Wraps block level exceptions. +- `BACKOUT TRANSACTION` rolls back uncommitted `UPDATE/STORE` operations; `END TRANSACTION` hard commits them. + +## Common Patterns + +### Implicit Loop via READ / FIND + +```natural +FIND EMPLOYEES-V WITH DEPT = 'SALES' RETAIN AS 'HOLD' + IF SALARY > 50000 + PERFORM APPLY-BONUS + UPDATE + END-IF +END-FIND +END TRANSACTION +``` + +Every line inside the `FIND` / `END-FIND` executes *per record*. `RETAIN AS` locks the records natively in ADABAS during the sweep. + +### Periodic Groups (Multi-Value Arrays) + +```natural +FIND EMP-V WITH ID = '123' + FOR #I = 1 TO C*INCOME + IF INCOME(#I) > 0 WRITE INCOME(#I) + END-FOR +END-FIND +``` + +ADABAS natively stores arrays inside rows. `C*INCOME` is an automatically generated integer representing the total count of occurrences of the `INCOME` field in that specific row. + +### External Parameter Binding + +```natural +DEFINE DATA PARAMETER USING PDA-PROC +LOCAL USING LDA-VARS +GLOBAL USING GDA-SYST +END-DEFINE +``` + +Links external `.NSA`, `.NSL`, `.NSG` copybook layouts into local memory mapping. + +## What Carries Business Logic + +**Extract facts from these:** + +- `FIND ... WITH` / `READ ... BY` — defines the business queries and indexed filter patterns. +- `ACCEPT` / `REJECT` — Reporting mode data filtration constraints. +- `AT BREAK OF` / `AT END OF DATA` — control breaks where monetary aggregation/reports execute. +- `END TRANSACTION` — denotes exactly where atomic business boundaries lie. +- `DECIDE ON` trees — explicit business process routing. +- `CALLNAT` parms — tracks API interactions between distinct microservice boundaries. +- Uses of `C*` variables indicate business logic iterating over nested relationships (Line-items inside Headers). + +**Skip these (boilerplate):** + +- Screen positioning `POSITION x y` / `DISPLAY` aesthetics. +- Standard variables initializations (`RESET #VAR`). +- `SET KEY` UI mapping without complex conditional handlers. + +## Common Misinterpretations + +1. **READ and FIND are loops.** A `FIND` statement is not a single database fetch. It inherently declares a `WHILE` loop containing the lines beneath it up to `END-FIND`. +2. **ADABAS is NOT Relational SQL.** It operates on ISNs (Internal addresses) and inverted lists. Do not document ADABAS paths as SQL `JOIN`s. They correlate through explicit nested `FIND` queries. +3. **Periodic groups (`#VAR(1:10)`) are native repeating datasets.** It's not memory instantiation; a DB row functionally contains 10 occurrences of that field natively. +4. **ESCAPE BOTTOM is NOT an error trap.** It performs a loop `break`. +5. **GDA (Global Data Areas) persist over the entire session.** State leakage between decoupled programs using heavily populated `.NSG` elements is standard architectural practice. +6. **Error 113 means Normal EOF/No Records.** This isn't a hard system failure; it handles generic "not found" conditional branching. + +## File Naming Conventions + +- Extns: `.NSP` (Program), `.NSN` (Subprogram), `.NSL` (Local DTA), `.NSG` (Global DTA), `.NSA` (Parm DTA), `.NSM` (Map). +- Code is typically 8-char abbreviated stored in FUSER physical volumes. diff --git a/partner-built/magellan/skills/ingestion/language_guides/pli.md b/partner-built/magellan/skills/ingestion/language_guides/pli.md new file mode 100644 index 00000000..8b8ce52f --- /dev/null +++ b/partner-built/magellan/skills/ingestion/language_guides/pli.md @@ -0,0 +1,116 @@ +# PL/I Reference Guide + +## Overview + +PL/I (Programming Language One) is IBM's general-purpose language spanning both business data processing and system programming, primarily on z/OS. It combines the structured nature of COBOL (packed decimals, precise file structures) with the flexibility of C (pointers, recursion, bitwise flags). + +PL/I operates in MVS Batch regimes, CICS Online screens, and IMS DB/DC transaction hierarchies. It uses a rigid block-structured scope `BEGIN/END` and Procedure routines `PROC / END`. Data defaults to hardware-efficient forms: it uses standard EBCDIC collating, exact Fixed Decimal math, and pointer overlays. + +## Key Constructs + +### Program Structure and Data Types + +- `procname: PROCEDURE OPTIONS(MAIN)`: Start of execution. +- `DCL` or `DECLARE`: Initializes memory fields. + - `FIXED DECIMAL(p,q)`: Packed Decimal. The cornerstone of banking math. + - `FIXED BINARY(15)`: Native standard integer. + - `BIT(1)`: Boolean switches `1'B` or `0'B`. + - `CHARACTER(n)`: Fixed length text. +- `BASED`: Pointer-driven memory overlay. `DCL X BASED(P)` maps variable X directly to address P without native allocation. + +### Data Access + +- **Record I/O**: `READ FILE(InF) INTO(Rec)` or `WRITE FILE(OutF) FROM(Rec)`. +- **Keyed/VSAM**: `READ FILE(idx) INTO(var) KEY(target)`. +- **Relational/DB2**: `EXEC SQL SELECT ... INTO :hostvar`. Standard cursor fetch loops. +- **IMS DL/I**: `CALL PLITDLI(...)`. + +### Control Flow + +- `IF / THEN / ELSE`: The standard split. The `THEN` executes the ensuing line or DO-group. +- `SELECT; WHEN(A=1)...; WHEN(A=3)...; END;`: Case statement. +- `DO WHILE(x)`, `DO UNTIL(y)`, `DO I=1 TO 10`. +- `DO; ... END;`: Groups statements without iteration. +- `ITERATE` (Next), `LEAVE` (Break), `GOTO` (Jump). + +### Program-to-Program Communication + +- `CALL prog(a, b)`: Synchronous module connection passing arguments strictly by reference. +- `FETCH prog`: Dynamic/late-binding routine load. +- `EXEC CICS LINK`: Dispatches across the CICS CWA context to external routines. + +### Error Handling & Exceptions (ON Blocks) + +- `ON ENDFILE(infile) EOF='1'B;`: Sets an async event handler for EOF. +- `ON KEY(idx) GOTO err_rtn;`: Submits a trap for VSAM Record Not Found failures. +- `ON ZERODIVIDE`, `ON CONVERSION`: Math/Casting exception handlers. + +## Common Patterns + +### Async Event Priming Loop + +```pli +OPEN FILE(INFILE) INPUT; +ON ENDFILE(INFILE) EOF_FLAG = '1'B; +EOF_FLAG = '0'B; +READ FILE(INFILE) INTO(REC); + +DO WHILE(^EOF_FLAG); + IF REC.AMT > 0 THEN CALL PROCESS(); + READ FILE(INFILE) INTO(REC); +END; +``` + +Standard PL/I idiom: declare the trap, prime the read, loop until trap triggers. + +### The C-Style Assign into Substrings + +```pli +DCL 1 CUST, + 2 ID CHAR(10), + 2 FLG BIT(1); +SUBSTR(CUST.ID, 1, 3) = '100'; +``` + +PL/I permits pseudo-variables on the Left-Hand side of an assignment, dynamically mutating partial strings inline. + +### Pointer-Based Dynamic Parsing + +```pli +DCL PTR POINTER; +DCL 1 LAYOUT BASED(PTR), + 2 NAME CHAR(20); +PTR = ADDR(BUFFER) + 5; +``` + +`NAME` now evaluates to the 20 bytes starting at `BUFFER + 5`. + +## What Carries Business Logic + +**Extract facts from these:** + +- `IF` and `SELECT...WHEN`: Encodes strict monetary, date, and status code criteria. +- `READ DIR/KEY` operations map the domain dependencies. +- `ON KEY` or `ON CONVERSION` traps outline edge-cases the business accounts for explicitly. +- `FIXED DECIMAL` size definitions imply business limits (e.g., maximum allowable loan amount). +- `CALL` hierarchies and `ENTRY` declarations outline module responsibility lines. + +**Skip these (boilerplate):** + +- System-level `%PROCESS` or `OPTIONS` commands. +- Routine `OPEN`/`CLOSE` File definitions without context. +- Buffer math or `ADDR()` / `ALLOCATE` pointer manipulation routines meant solely to handle protocol messages. + +## Common Misinterpretations + +1. **The I-N Implicit Typing Bug.** If a variable is undeclared, PL/I defaults it to `FIXED BINARY(15)` if it starts with I, J, K, L, M, N. Otherwise it falls back to `FLOAT DECIMAL(6)`. Many silent bugs occur from spelling `DCL AMOUNT` vs `AMUNT` without declaring it. +2. **ON handlers are Event Listeners, not inline conditions.** `ON ENDFILE(F) EOF='1'B` does NOT evaluate immediately. It sets a global hook for that specific block scope. +3. **DO; ... END; is NOT a loop.** It operates like `{ }` in C or Java. +4. **Substrings mutate the source directly.** `SUBSTR(A,1,2) = 'YZ'` rewrites `A`. +5. **BIT math vs Boolean Logic.** `&` is a bitwise AND. `^` is NOT. `|` is OR. Do not mistake them for arithmetic `+` or `-`. + +## File Naming Conventions + +- Extns: `.pli`, `.pl1`. +- Source Members: 1-8 char PDS. +- Copybooks: Loaded via `%INCLUDE MEMBER;`. diff --git a/partner-built/magellan/skills/ingestion/language_guides/rexx.md b/partner-built/magellan/skills/ingestion/language_guides/rexx.md new file mode 100644 index 00000000..2f62d5e4 --- /dev/null +++ b/partner-built/magellan/skills/ingestion/language_guides/rexx.md @@ -0,0 +1,107 @@ +# REXX (Restructured Extended Executor) Reference Guide + +## Overview + +REXX is an interpreted, dynamically-typed scripting language created by IBM, deeply embedded on z/OS (TSO/ISPF). It glues together system services, operates utilities, scripts batch processes, and occasionally implements lightweight calculation routines. + +On z/OS, all REXX `.exec` or PDS members must begin with a comment `/* REXX */` to distinguish them from older CLIST formats. REXX evaluates everything as a string. Any standard line not identifiable as a native REXX instruction is immediately relegated to the native OS environment as an executable command. + +## Key Constructs + +### Program Structure + +- `ARG var1 var2`: Implicitly uppercased parameter extraction. +- `PARSE ARG var1 var2`: Case-preserving parameter extraction. +- `EXIT number`: Kills script and passes RC integer back to caller. +- `RETURN var`: Terminates a subroutine/function. + +### Data Access & Queues + +- **EXECIO**: Primary mechanism for reading disk datasets. `"EXECIO * DISKR infile (STEM LIST. FINIS)"` loads an entire DB table or flat file into `LIST.1` to `LIST.n` with `LIST.0` holding the count. +- **ISPF Storage**: `"ISPEXEC VGET (VAR1) SHARED"` grabs shared variables mapped from online UI screens. +- **Queues**: `QUEUE data` vs `PUSH data` (FIFO vs LIFO). Creates an IPC stack. `PULL val` grabs it. `MAKEBUF` and `DROPBUF` group queue sets natively. + +### Control Flow + +- `IF / THEN / ELSE`: Supports standard branching. +- `SELECT / WHEN / OTHERWISE`: Switch statement. +- `DO i = 1 TO N`, `DO WHILE x`, `DO FOREVER`. +- `ITERATE` (continue), `LEAVE` (break). + +### Parsing (The Core Feature) + +```rexx +string = "ACCT190 9050.25 ACTIVE" +PARSE VAR string id 11 amt 19 stat +``` + +This is a positional parse. `id` = characters 1 to 10. `amt` = characters 11 to 18. `stat` = characters 19+. Absolute parsing is how fixed-width mainframe records are handled instantly. + +### Error Trapping + +- `SIGNAL ON ERROR`: Acts as a permanent jump table if any OS command returns `RC > 0`. Transfers control definitively to `ERROR:` label. +- `SIGNAL ON NOVALUE`: Catches uninitialized variables across the script. Crucial for tracing literal strings that typo'd into variable evaluations. + +## Common Patterns + +### Subroutines returning evaluations + +```rexx +CALL CheckAcct acctNo +IF RESULT = 'VALID' THEN ... + +CheckAcct: PROCEDURE + PARSE ARG check + IF length(check) = 10 THEN RETURN 'VALID' + RETURN 'INVALID' +``` + +### DB2 Interaction via DSNREXX + +```rexx +ADDRESS DSNREXX "EXECSQL PREPARE SQL1 FROM :q" +ADDRESS DSNREXX "EXECSQL OPEN C1" +DO WHILE SQLCODE = 0 + ADDRESS DSNREXX "EXECSQL FETCH C1 INTO :A, :B" + IF SQLCODE=0 THEN SAY 'Found:' A B +END +``` + +### Multi-Dimensional Stems + +```rexx +grid.1.1 = "TopLeft" +row = 1; col = 1 +SAY grid.row.col +``` + +Stem variables masquerade as multi-dimensional arrays, using concatenated indices. + +## What Carries Business Logic + +**Extract facts from these:** + +- `PARSE VAR` templates — explicitly decodes the byte-length fields of rigid business datasets. +- `IF / SELECT` conditions containing hard-coded limits or status codes. +- `EXECIO` strings paired with `DISKR/DISKW` indicating input/output data sources. +- `ADDRESS DSNREXX` or `ADDRESS CICS` invocations showing dependencies on native systems. +- Statements that validate input and `EXIT 8` or `SAY "REJECTED"`. + +**Skip these (boilerplate):** + +- Standard OS allocations `ALLOC FI(X) DA(Y)`. +- Generic ISPF display messages `ISPEXEC SETMSG`. +- `SIGNAL ON` initializations and logging statements `TRACE R`. + +## Common Misinterpretations + +1. **Unrecognized Text evaluates to OS Execution.** If you type `DELETE DATASET A`, REXX does not throw a syntax error. It assumes `DELETE` is a valid TSO command and attempts to execute it externally on the mainframe. +2. **Variables are all strings.** `x=1` followed by `y='001'` means `x=y` evaluates differently. Using `x == y` is strict string comparison, `x = y` casts to numeric evaluate dynamically. +3. **Parse Variable removes spaces; Parse Positional preserves them.** `PARSE VAR str a b` trims blanks. `PARSE VAR str a 5 b` retains exactly the bytes inside those positions. +4. **Stems are NOT formal arrays.** `Line.1` works, but so does `Line.Name`. `Line.0` storing the count is merely a standard convention adhered to by EXECIO. +5. **SIGNAL destroys DO loops.** `SIGNAL` is a GOTO. Once you signal to `ERROR:`, the script cannot `RETURN` to the inner loop. + +## File Naming Conventions + +- Extns: PDS members have No extension on zOS. `.rex` `.rexx` `.cmd` locally. +- 1-8 chars, e.g., `FTPPUSH`, `EXTRACTR`. diff --git a/partner-built/magellan/skills/ingestion/language_guides/rpg.md b/partner-built/magellan/skills/ingestion/language_guides/rpg.md new file mode 100644 index 00000000..c861e65a --- /dev/null +++ b/partner-built/magellan/skills/ingestion/language_guides/rpg.md @@ -0,0 +1,104 @@ +# RPG ILE / RPG IV Reference Guide + +## Overview + +RPG (Report Program Generator) is the foundational business application language on IBM i (AS/400). RPG programs compile directly to highly optimized DB2 access modules using the ILE (Integrated Language Environment) runtime. + +Code exists in two primary states: + +- **Free-format** (modern): Begins with `**FREE`. Utilizes clean keywords `DCL-S`, `DCL-PROC`, `IF / ENDIF`, `READ / CHAIN`. +- **Fixed-format** (legacy): Fully rigid columns driven by Spec indicators (`H`, `F`, `D`, `C`, `O`). The letter in Column 6 defines the operation parameters dictating compiler interpretation. + +Most legacy migrations intercept a hybrid of Fixed declarations spanning into `/FREE` calculation blocks. + +## Key Constructs + +### File and Program Structure + +- `CTL-OPT` (Free) / `H-spec` (Fixed): Header. Declares compilation formats, Date bounds, and subsystem Activation Groups. +- `DCL-F` (Free) / `F-spec` (Fixed): File Declaration. Directly lists DB Physical/Logical definitions. + - Subspecs include **Primary** `P` (triggers implicit logic cycle over every record implicitly) vs **Full Procedural** `F` (user issues `READ/CHAIN` manually). +- `DCL-DS` (Data Structures): Creates nested variables mirroring exactly the file's layout or generic API schemas. +- **Service Programs (`.SRVPGM`)**: Exports independent `DCL-PROC` routines to act as Shared Libraries across independent applications. + +### File Operations (Database Access) + +- **CHAIN**: Keyed database access equivalent to `SELECT ... WHERE idx=key`. `CHAIN (CustomID) LF_FILES`. Sets `%FOUND`. +- `READ` / `READP`: Sequential cursor scroll forward or backward. +- `SETLL` / `SETGT`: Sets Lower Limit / Greater Than. Repositions the database cursor natively without retrieving memory rows. +- `READE`: Read Equal. Scrolls sequentially only so long as the index matches the applied Key parameter. +- `WRITE / UPDATE / DELETE`. + +### Built-in Functions (BIFs) + +All start with `%`. + +- State validation: `%FOUND`, `%EOF`, `%ERROR`. +- Slicing and Padding: `%SUBST`, `%TRIM`, `%SCAN`, `%REPLACE`. +- Math/Dates: `%DEC`, `%CHAR`, `%DATE`, `%DIFF`. +- List structures: `%LOOKUP` (array search). + +### Indicators and Control Flow + +Indicators are single-bit booleans designated `*IN01` to `*IN99`. + +- Fixed-format relies on them as exception mappings (e.g. `CHAIN MASTFILE 45` means "Set `*IN45=*ON` if record is not found"). +- **LR (Last Record)**: The most globally important indicator. Execution relies on `*INLR = *ON`. Setting it instructs RPG to commit open tables, kill static memory, and shutdown completely. Leaving it `*OFF` while returning causes the module to persist statically for lightning-fast subsequent calls. +- `IF / DOW / DOU / SELECT`: Standard conditional looping and switches. + +## Common Patterns + +### Sequential Key Mapping (Set/ReadE) + +```rpgle +SETLL (CustID) ORDERLF; +READE (CustID) ORDERLF; +DOW NOT %EOF(ORDERLF); + // Iterates fully over all Orders belonging strictly to CustID. + READE (CustID) ORDERLF; +ENDDO; +``` + +### Table Fetch & Insert + +```rpgle +CHAIN (ActNo) PF_ACCTS; +IF NOT %FOUND(PF_ACCTS); + ACTNO = ActNo; + STATUS = 'INITIALIZED'; + WRITE RECFMT; +ENDIF; +``` + +## What Carries Business Logic + +**Extract facts from these:** + +- `CHAIN`, `READE`, `SETLL` sequences. This dictates the core data entity dependencies and table queries dictating business hierarchy. +- `IF / SELECT` conditions mapping `*INxx` tags or `%FOUND` criteria to internal monetary checks, states, or dates. +- `DCL-F / F-Spec` blocks list exactly which DB tables are natively scoped to this script. +- `UPDATE` statements represent specific business mutations affecting domain state. +- `CALLP`, `EXSR` map modular dependency graphs. +- Hardcoded string constraints (`DCL-C`) act as system constants. + +**Skip these (boilerplate):** + +- Standard `USROPN` explicit file opening traps unless tied to conditionally loading external sources dynamically. +- Raw layout Display Files loading green-screen (`EXFMT`). +- Empty `*INLR = *ON / RETURN` footers. +- Static padding loops filling empty strings with 0s. + +## Common Misinterpretations + +1. **LR is not an error code.** It designates lifecycle closure. Service Programs deliberately omit setting LR so memory states stay primed. +2. **PF / LF isn't a code class.** Physical Files are literal SQL Tables. Logical Files are complex SQL Views. RPG accesses both directly out of the `F-Spec`. +3. **The Implicit Logic Cycle (P).** If an `F-spec` designates a file as Primary `P`, the RPG program **has no loop structure**. The compiler implicitly injects a `WHILE NOT EOF` wrapper encompassing the entire code script reading the file once per pass. +4. **CHAIN does not invoke procedures.** It performs a single highly optimized keyed DB fetch replacing SQL natively. +5. **KLIST is a composite key, not parameter args.** `KLIST` groups multiple variable IDs together strictly for use in `CHAIN`/`SETLL` targeting multi-keyed indexes. +6. **Indicators are context-dependent tags.** `*IN35` has no inherent meaning. It means exactly whatever the operation line directly above it assigned it to mean (e.g., "Error Reading", "Key Missing", or "F3 Pressed On Keyboard"). + +## File Naming Conventions + +- Extns: `.rpgle`, `.sqlrpgle` (Embedded SQL). +- Submodules: `.rpgleinc`. +- Modules generated to: `.PGM` or `.SRVPGM` objects via `CRTBNDRPG`. diff --git a/partner-built/magellan/skills/onboarding-guide/SKILL.md b/partner-built/magellan/skills/onboarding-guide/SKILL.md new file mode 100644 index 00000000..2f886b7a --- /dev/null +++ b/partner-built/magellan/skills/onboarding-guide/SKILL.md @@ -0,0 +1,228 @@ +--- +name: onboarding-guide +description: Generate a beginner-friendly narrative document that explains everything discovered about the client's systems. Use after domain summarization to produce onboarding_guide.md. +--- + +# Onboarding Guide Generation + +You produce `onboarding_guide.md` — a document that a new architect reads on their +first day to understand the client's business and systems. It is written for someone +who knows nothing about the client. + +This document auto-regenerates every time the pipeline runs. It is derived from the +domain summaries, open questions, and contradictions — not from the raw entity files. + +## When to Generate + +- After Stage 2c (domain summarization) in a full pipeline run +- On demand when an architect requests it +- After significant new material is ingested (determined by the orchestrator) + +## Process + +1. Discover all domains using Glob on `.magellan/domains/*/` (each subdirectory name + is a domain). +2. Read each domain's summary using the Read tool on + `.magellan/domains//summary.json`. +3. Read open questions and contradictions for each domain using the Read tool on + `.magellan/domains//open_questions.json` and + `.magellan/domains//contradictions.json`. To get consolidated data across + all domains, read each domain's files and aggregate. +4. Read `.magellan/index.json` using the Read tool for overall stats. +5. Read `.magellan/cross_domain.json` using the Read tool for inter-domain connections. +6. Synthesize the guide following the structure below. +7. Write the guide to `.magellan/onboarding_guide.md` using the Write tool. + +## Guide Structure + +Write the guide in Markdown with these sections: + +### 1. The Business + +What does this company do? Who are their customers? What industry are they in? +What are their core operations? + +Derive this from the domain summaries and entity types. If the KG contains a +Statement of Work, SOW, or project description, reference it. Otherwise, infer +from the systems and business rules discovered. + +Write 2-3 paragraphs. Use plain language, no jargon. + +### 2. The Systems + +What software runs the business? Give a high-level map of the technology landscape: +- What are the main systems (AS/400, web apps, APIs, databases)? +- How do they connect? +- What technology stack is each system built on? + +Explain this as a narrative, not a list. "When a vehicle arrives at the auction, +here's what happens in the system..." Walk through a key business flow to make +the technology concrete. + +Derive from domain summaries and cross-domain connections. + +### 3. The Domains + +For each domain in the KG, write a section with: +- What this domain is responsible for +- The most important entities (from hub summaries) +- Key business rules +- How this domain connects to other domains + +When mentioning a hub entity, include its entity ID as a cross-reference so +readers can look it up in the KG (e.g., "the Invoice Generation process +(`billing:invoice_generation`) handles..."). + +Start each domain section with a metrics line: + +```markdown +#### Billing (23 entities, 5 hubs, 3 open questions, 1 contradiction) +``` + +Order domains by importance (most hub entities, most cross-domain connections first). + +### 4. The Gotchas + +Things that would take weeks to discover by reading source materials directly: +- Undocumented behaviors found in code that aren't in any manual +- Workarounds or hacks that are still in production +- Contested facts (from contradictions) — where sources disagree +- Systems or processes that behave differently than documented + +These are the facts that an architect needs to know but wouldn't find in an +architecture diagram. + +Link each gotcha to the specific contradiction or open question by ID so it's +traceable: + +```markdown +- **Invoice threshold mismatch**: The MANUAL_REVIEW threshold is $10,000 in the + ops runbook but $5,000 in the database config. Both are in production. + (See contradiction `c_001`, directed to senior_developer) +``` + +### 5. Open Questions + +What we still don't know, organized by domain and priority: +- Critical questions that block understanding of core processes +- High-priority questions needed for design decisions +- Medium/low questions for completeness + +For each question, note who should be asked (the `directed_to` field). + +### 6. Coverage Summary + +A brief summary of what has been ingested and what hasn't: +- Total documents ingested +- Total entities, relationships, contradictions, open questions +- Domains covered and their entity counts +- Which source documents contributed to each domain's knowledge +- Any known gaps — domains with thin coverage (fewer than 5 entities), + directories that haven't been ingested, file types that failed + +Highlight domains with thin coverage so the team knows where to focus +next: + +```markdown +| Domain | Entities | Sources | Coverage | +|--------|----------|---------|----------| +| billing | 23 | 8 documents | Strong | +| title | 18 | 6 documents | Strong | +| transportation | 4 | 2 documents | **Thin — needs more source materials** | +``` + +### 7. Suggested First Touches + +Auto-generate 2-3 safe, low-risk discovery tasks for a new engineer joining +the engagement. These tasks should be: + +- **Low-risk**: read/investigate only, not change +- **Domain-specific**: builds expertise in one area +- **Directly useful**: resolves an open question or adds coverage + +Derive suggestions from: +- Open questions tagged for developers that could be resolved by reading + one specific document or code module +- Domains with thin coverage that need more source material reviewed +- Simple contradictions that could be resolved by reading one more document + +Example: + +```markdown +## Suggested First Touches + +1. **Review CBBLKBOOK module** — The billing domain has an open question about + the invoice threshold (see `oq_003`). Reading the CBBLKBOOK COBOL source + will clarify whether the $10k or $5k threshold is active. This is a + well-scoped task that familiarizes you with the billing domain and the + AS/400 codebase. + +2. **Add transportation documents** — The transportation domain has only 4 + entities from 2 source documents. Adding the dispatch manual or route + planning docs would significantly improve coverage. + +3. **Verify title transfer timing** — Contradiction `c_004` notes a conflict + between immediate and batch title transfers. The Title_Process_Manual.pdf + section 4.1 likely resolves this. +``` + +### 8. Discovered Materials + +If `discovered_links.json` files exist in domain directories, include a section +showing what the pipeline found and what still needs to be collected. + +Read `discovered_links.json` for each domain and aggregate by terminal status: + +```markdown +## Discovered Materials + +During ingestion, the pipeline discovered 47 links across all documents. + +| Status | Count | Action Needed | +|--------|------:|---------------| +| Ingested automatically | 12 | None — already in the KG | +| Skipped by rule | 18 | None — matched project skip rules | +| Auth required | 5 | Download manually and add via `/magellan:add` | +| Tool unavailable | 3 | Install required tools (see below) | +| Manual collection needed | 4 | Ask client team for these documents | +| Fetch failed | 1 | Retry or download manually | +| Dead links | 2 | No action — links are broken | +| Already ingested | 2 | None — already processed | + +### Materials to Collect + +These references were found in source documents but couldn't be fetched +automatically. Prioritized by how many source documents reference them: + +1. **Dealer Master Manual, section 3.2** — referenced by 3 documents + (Q3_ops_runbook.pdf, Architecture_Overview.pdf, billing_procedures.docx). + Status: manual_collection. Ask the client team for a copy. + +2. **company.sharepoint.com/sites/arch/designs.pdf** — referenced by 2 documents. + Status: auth_required. Requires VPN + SSO. Download and add via `/magellan:add`. + +### Configure These Tools + +The following link types couldn't be resolved because the tool isn't configured: + +- **GitHub**: 3 links found. Install and configure `gh` CLI. +``` + +If no `discovered_links.json` files exist, omit this section entirely. + +## Writing Style + +- Write for a reader who is new to this client's systems +- Use plain language — explain acronyms on first use +- Be specific — "the AS/400 runs a nightly batch job that reconciles settlements" + not "there is a batch process" +- Be honest about uncertainty — if confidence is low, say so +- Put the most important information first in each section +- Keep it under 3000 words total — this is a briefing, not a book + +## What You Do NOT Do + +- Do not copy entity JSON into the guide. Translate everything to natural language. +- Do not list every entity. Focus on hubs and key facts. +- Do not hide contradictions or open questions. They are features, not bugs. +- Do not invent information beyond what the KG contains. diff --git a/partner-built/magellan/skills/pipeline-review/SKILL.md b/partner-built/magellan/skills/pipeline-review/SKILL.md new file mode 100644 index 00000000..9c69fcc7 --- /dev/null +++ b/partner-built/magellan/skills/pipeline-review/SKILL.md @@ -0,0 +1,417 @@ +--- +name: pipeline-review +description: Quality gate and feedback collector. Invoked after every pipeline step to verify outputs, block on errors and shortcuts, and collect feedback for post-run analysis. +--- + +# Pipeline Review + +You are the quality gate for the Magellan pipeline. After every step, you review +what was produced, flag problems, and decide whether the orchestrator can proceed. + +You serve two purposes: + +1. **Active quality gate** — blockers must be fixed before the next step starts. +2. **Feedback collector** — all findings (including non-blocking) are persisted + to `.magellan/pipeline_feedback.json` for post-run analysis and Magellan improvement. + +## When You Run + +The orchestrator invokes you after every pipeline step by providing: +- The step number that just completed +- A summary of what the step produced (file counts, entity counts, etc.) + +You then check the step's outputs against the criteria below. + +## Mandatory Verification Protocol + +You MUST show your work for every check. Do not declare "PASS" without evidence. + +For every verification check, you must: +1. **Execute the check** — actually run the Glob or Read operation. +2. **Report the raw result** — the actual file count, character count, or field value. +3. **Compare against the criterion** — state what was expected and what was found. + +**Anti-shortcut rule**: If you report 0 blockers AND 0 warnings for any step, +that is itself a yellow flag. Re-read the criteria and verify you actually ran +every check. Most steps produce at least one warning (thin content, low density, +etc.). A perfect score is rare and should be double-checked. + +**Evidence format** for each check: +``` +CHECK: [description] + RESULT: [what Glob/Read returned — actual numbers] + VERDICT: [PASS/FAIL/WARN — with reason] +``` + +Example: +``` +CHECK: Domain "billing" has entities + RESULT: Glob found 23 files in .magellan/domains/billing/entities/ + VERDICT: PASS (23 entities, minimum is 1) + +CHECK: Entity billing:invoice_generation has summary ≥ 50 chars + RESULT: Read entity, summary is 187 chars: "Four-state invoice lifecycle..." + VERDICT: PASS + +CHECK: Fact density for Q3_ops_runbook.json + RESULT: Read file, fact_count: 2 (source is 45-page manual) + VERDICT: WARN — expected 15-30 facts per 10 pages for QA manuals +``` + +This format makes shortcuts visible. If you skip a check, the missing evidence +block is obvious. + +## Finding Severity Levels + +Every finding you report must be classified as one of: + +### `blocker` + +**The orchestrator MUST fix this before proceeding to the next step.** + +A blocker means the step's outputs are incomplete, incorrect, or will corrupt +downstream steps. Examples: + +- A file was silently skipped (no disposition recorded) +- A domain has facts but 0 entities after graph building +- A mandatory deliverable is missing (e.g., Step 16 rule exports) +- Facts were written without proper structure (missing required fields, no source traceability) +- Entity has no summary or no evidence entries +- Accounted file/link counts don't match totals + +### `warning` + +**Logged and displayed, but does not block progression.** + +A warning means the output exists but is below quality expectations. Examples: + +- Low fact density (well below the expected range for the document type) +- Domain has fewer than 3 entities (thin domain) +- Entity names are inconsistent across the domain +- Summary narrative is under 200 characters +- Dashboard or onboarding guide is thin but present + +### `suggestion` + +**Logged for post-run analysis. Not displayed during the run.** + +A suggestion is an improvement idea for future Magellan development. Examples: + +- A new language guide would help (e.g., SQLRPGLE-specific patterns) +- A document type would benefit from a different chunking strategy +- A domain's entity naming convention should be documented +- A common pattern was detected that could become a new skill + +## Blocker Resolution Flow + +When you find blockers: + +1. Report each blocker with a specific description and recommended fix. +2. The orchestrator fixes the issues (re-runs the skill, re-processes the file, etc.). +3. The orchestrator re-invokes you for the same step. +4. You verify the fix. If the blocker is resolved, mark it as `resolved: true` in the + feedback file and proceed. +5. If blockers remain, repeat the cycle. + +**Maximum 3 review cycles per step.** If blockers persist after 3 attempts, escalate +to a warning (log it, note the failure, and proceed) to prevent infinite loops. Record +this escalation in the feedback file. + +## Per-Step Review Criteria + +### After Step 1: Initialize and Discover + +Check: +- `.magellan/` directory exists (Glob on `.magellan/`) +- `state.json` exists and is readable +- `.magellan/language_guides/` contains at least one guide (Glob on `*.md`) +- File count > 0 reported + +Blockers: +- `.magellan/` does not exist +- 0 files discovered + +### After Step 2: Extract Facts + +Check: +- At least 1 domain exists (Glob on `.magellan/domains/*/`) +- Each domain has at least 1 fact file (Glob on `facts/*.json`) +- No fact files are empty (Read each, verify `fact_count` > 0) +- **File Ledger Reconciliation**: Count workspace files (Glob, excluding `.magellan/`, + `.git/`). Count entries in `processed_files.json`. If workspace > ledger, list missing + files by name. +- **Fact Count Cross-Check**: Sum `fact_count` from all fact files. Compare to total + reported during ingestion. If they differ, facts were lost. +- **Quote Verification (exhaustive)**: For EVERY fact across ALL domains: + 1. Read the fact file. + 2. For each fact, take a distinctive substring (20+ chars) from `source.quote`. + 3. Grep for that substring in the original source file (`source.document`). + 4. Track results: verified count, failed count, and the specific fact_ids that failed. + 5. Display: "Quotes verified: N/M passed (K failed: f_xxx, f_yyy)" + If ANY quote is not found in its source document, flag as blocker with category + `hallucinated_quote`. List every failed fact_id so the orchestrator can correct + or remove them before proceeding. + +Blockers: +- 0 domains after ingestion +- Files silently skipped (disposition count < file count) +- Any domain with fact files that contain 0 facts +- `processed_files.json` not updated (Read `.magellan/processed_files.json` and verify + it contains entries for all files that were in the processing list) + +Blockers: +- 0 domains after ingestion +- Files silently skipped (workspace count > ledger count) +- Fact count mismatch between files and reported total + +Warnings: +- Fact density well below expected range for the file type + +### After Step 3: Build Graph + +Check: +- Each domain with fact files has entities (Glob on `entities/*.json`) +- Read 2-3 entity files per domain and verify: `summary` (50+ chars), + `evidence` (at least 1 entry with non-empty `quote`), `weight` > 0 +- Relationships exist for domains with 3+ entities +- **Entity-to-Source Traceability**: For 3 sampled entities per domain, verify each + evidence entry references a source document that has a fact file. Broken + chains mean facts were extracted but lost before graph building. + +Blockers: +- Domain has facts but 0 entities +- Entities missing summaries or evidence + +Warnings: +- Domain with fewer than 3 entities +- Entities with weight 0 +- Broken source traceability (evidence cites nonexistent fact file) + +### After Step 4: Cross-Domain Linking + +Check: +- `cross_domain.json` has edges if 2+ domains exist +- **Relationship Integrity**: For every edge in `cross_domain.json` and each + domain's `relationships.json`, verify both `from` and `to` entity IDs exist + as files. List dangling references. + +Blockers: +- 2+ domains but linking skipped entirely + +Warnings: +- Dangling entity references in edges +- Very few cross-domain edges relative to entity count + +### After Step 5: Entity Deduplication + +Check: +- Deduplication pass was executed (not skipped) +- **Evidence Preservation**: For each merge, verify kept entity's evidence count + ≥ sum of both originals. If evidence was lost, flag as blocker. + +Blockers: +- Step skipped entirely +- Evidence lost during merge + +Warnings: +- Potential duplicates detected but not merged + +### After Step 6: Domain Summarization + +Check: +- For each domain, Read `.magellan/domains//summary.json` and verify: + - `narrative` is at least 200 characters + - `hub_summaries` array is non-empty + - `entity_count` matches the count from Glob on `.magellan/domains//entities/*.json` + +Blockers: +- Any domain missing a summary entirely +- Summary with empty narrative + +Warnings: +- Narrative under 200 characters (stub, not a real summary) +- Hub summaries empty (hub detection may have failed) + +### After Step 7: Onboarding Guide + +Check: +- `.magellan/onboarding_guide.md` exists (Read it) +- File is at least 500 characters +- Contains section headers (# lines) + +Blockers: +- File missing or empty + +Warnings: +- File is under 500 characters (stub) + +### After Step 8: Contradictions Dashboard + +Check: +- `.magellan/contradictions_dashboard.md` exists and is 200+ characters (Read it) +- `.magellan/contradictions_dashboard.html` exists (Read it) + +Blockers: +- Markdown file missing + +Warnings: +- HTML file missing (render may have failed) +- Dashboard is very thin relative to contradiction count + +### After Step 9: C4 Diagrams + +Check: +- `.magellan/diagrams/` directory exists (Glob on `.magellan/diagrams/*`) +- Contains at least `context.mmd` and `containers.mmd` + +Blockers: +- Diagrams directory missing entirely + +Warnings: +- Missing component-level diagrams for some domains + +### After Steps 12-15: Phase 2 Deliverables + +Check per domain: +- `business_rules.md` exists and is 200+ characters with at least 1 classified rule +- `ddd_spec.md` exists and is 500+ characters with section headers +- `contracts.md` exists and is 300+ characters +- `review.md` exists and is 300+ characters + +Blockers: +- Any deliverable file missing entirely for a domain +- business_rules.md with 0 rules classified + +Warnings: +- Files under minimum size (stubs) +- DDD spec referencing entity names not in the KG + +### After Step 16: Business Rules Export + +**This step is MANDATORY. Check that it was not skipped.** + +Check per domain: +- `rules_.dmn` exists and contains `` XML tag +- `rules_.json` exists and is parseable JSON with a `rules` array +- `rules_.csv` exists and has a header row with `rule_id` +- `rules_.feature` exists and contains at least one `Scenario` + +Blockers: +- Step was skipped (no export files exist for any domain) +- Any of the four export formats missing for a domain + +Warnings: +- Export has 0 rules (empty export) + +### After Step 17: API Specs + +**This step is MANDATORY. Check that it was not skipped.** + +Check per domain: +- `openapi.yaml` exists and contains `openapi:` header +- `asyncapi.yaml` exists and contains `asyncapi:` header + +Check integration: +- `_integration/openapi.yaml` and `_integration/asyncapi.yaml` exist (if 2+ domains) + +Blockers: +- Step was skipped (no spec files exist for any domain) + +Warnings: +- Integration specs missing (only needed with 2+ domains) +- Spec files are very small (stub) + +## Writing Feedback + +After reviewing a step, write ALL findings to `.magellan/pipeline_feedback.json`. + +Use the Write tool. If the file already exists, read it first, append the new +step's entry to the `entries` array, and write it back. Do not overwrite +previous entries. + +### Feedback File Structure + +```json +{ + "run_started_at": "2026-02-24T10:00:00Z", + "entries": [ + { + "step": 3, + "step_name": "Classify and Ingest", + "reviewed_at": "2026-02-24T10:15:00Z", + "findings": [ + { + "severity": "blocker", + "category": "missing_output", + "description": "Specific description of the problem", + "recommendation": "Specific action to fix it", + "resolved": true, + "resolved_at": "2026-02-24T10:20:00Z" + } + ], + "summary": { + "blockers": 1, + "blockers_resolved": 1, + "warnings": 2, + "suggestions": 1 + } + } + ] +} +``` + +### Finding Categories + +Use these categories to enable pattern analysis across runs: + +| Category | Meaning | +|----------|---------| +| `missing_output` | Expected file or data not produced | +| `skipped_step` | A mandatory step was not executed | +| `skipped_file` | A file was silently skipped during processing | +| `low_density` | Fact/entity count well below expectations | +| `invalid_output` | Output exists but fails validation (empty summary, missing fields) | +| `wrong_tool` | Write tool or Bash used instead of proper fact-writing mechanism | +| `count_mismatch` | Accounted totals don't match expected totals | +| `hallucinated_quote` | A source.quote in a fact does not appear in the source document | +| `quality_gap` | Output exists and is valid but is notably thin or low quality | +| `enhancement` | Suggestion for Magellan improvement (not a current-run issue) | + +## What You Do NOT Do + +- Do not re-run pipeline steps yourself. Report the issue and let the orchestrator fix it. +- Do not modify entities, facts, or KG data. You are read-only. +- Do not invent findings. Only report what you can verify by reading actual outputs. +- Do not block on warnings. Warnings are logged, not gates. +- Do not spend more than 3 review cycles on a single step. Escalate persistent blockers + to warnings after 3 attempts. + +## Display Format + +When reporting to the orchestrator, use this format: + +``` +Step N Review: [PASS | N BLOCKERS] +────────────────────────────────── + +[If blockers exist:] +[BLOCKER-1] category: description + → Fix: recommendation + +[BLOCKER-2] category: description + → Fix: recommendation + +[Warnings:] +[WARNING] category: description + +N findings logged to pipeline_feedback.json +(M blockers, N warnings, K suggestions) +``` + +If no blockers: + +``` +Step N Review: PASS +────────────────────────────────── +No blockers. N warnings and K suggestions logged to pipeline_feedback.json. +``` diff --git a/partner-built/magellan/skills/querying/SKILL.md b/partner-built/magellan/skills/querying/SKILL.md new file mode 100644 index 00000000..e9fc8427 --- /dev/null +++ b/partner-built/magellan/skills/querying/SKILL.md @@ -0,0 +1,275 @@ +--- +name: querying +description: Answer questions about the knowledge graph using a combination of direct entity reads, domain summaries, and graph traversal. Use when responding to /magellan:ask queries. +--- + +# Querying the Knowledge Graph + +You answer questions about the client's systems using the knowledge graph. Every +answer must include source citations. You never invent information — if the KG +doesn't have the answer, say so and identify it as an open question. + +## Query Strategy + +Choose your approach based on the question type: + +### Semantic / Overview Questions +"How does billing work?" "What do we know about the title transfer process?" + +1. Read the domain summary file at `.magellan/domains//summary.json` using + the Read tool. +2. If the summary answers the question, respond with it + citations. +3. If more detail is needed, read specific entity files for the relevant hubs + (paths are in the summary's hub list). + +### Factual / Lookup Questions +"What is the MANUAL_REVIEW threshold?" "What language is CBBLKBOOK written in?" + +1. Identify the likely domain and entity from the question. +2. Use Glob on `.magellan/domains//entities/*.json` to find matching + entity IDs. If you don't know the domain, Glob across all domains: + `.magellan/domains/*/entities/*.json`. +3. Read the matching entity file using the Read tool. +4. Answer with the specific fact + source citation. + +### Structural / Dependency Questions +"What systems does billing depend on?" +"What would break if we decommission the AS/400 batch job?" +"List all components downstream of payment that touch PII data." + +1. These require graph traversal — do NOT try to answer from entity files alone. +2. Perform manual graph traversal by reading relationship and cross-domain files, + then following edges hop-by-hop. See the "Manual Graph Traversal" section below + for the detailed procedure for each operation type: + - "depends on" / "connects to" → **walk** (follow outgoing edges) + - "what depends on" / "affected by" → **impact** (follow incoming edges) + - "how are X and Y connected" → **between** (BFS from start to end) + - "which entities have property X" → **filter** (walk + check entity properties) +3. Present the results with the full traversal path. + +### Cross-Domain Questions +"How does billing interact with the title system?" +"What data flows between transportation and auction operations?" + +1. Read `.magellan/cross_domain.json` using the Read tool to get cross-domain + edges (SAME_AS links and inter-domain relationships). +2. Perform a manual graph walk starting from the relevant entity to find + cross-domain paths (see "Manual Graph Traversal" below). +3. Read summaries for both domains for context. + +### Cross-Domain Workflow / Saga Questions +"Trace the sale-to-settlement workflow across all domains." +"What happens end-to-end when a vehicle is sold?" +"Show the complete flow from check-in to title transfer with compensation actions." + +1. Identify the start and end entities from the question. If the user specifies + both endpoints, use the **between** traversal to find all paths. If only a + start is given, use **walk** (outgoing) to trace the full chain. +2. Read `.magellan/cross_domain.json` using the Read tool to understand domain + boundaries. +3. For each step in the traversal path, read the entity to get its domain, + type, and summary. +4. Present the results as an **ordered step sequence** with: + - Step number and domain swimlane + - Entity name and what happens at this step + - Domain event that triggers the next step (from relationship edge descriptions) + - Which domain owns each step +5. Include a **Mermaid sequence diagram** showing the temporal flow across + domain swimlanes: + +```mermaid +sequenceDiagram + participant SAL as Sales & Auction + participant FIN as Financial Mgmt + participant TIT as Title Docs + + SAL->>FIN: SaleCompleted event + activate FIN + FIN->>FIN: Calculate fees, generate invoice + FIN-->>SAL: InvoiceCreated event + deactivate FIN + + SAL->>TIT: InitiateTitleTransfer + activate TIT + alt Title clear + TIT-->>SAL: TitleTransferApproved + else Title issue + TIT-->>SAL: TitleHold raised + SAL->>FIN: Initiate fee reversal (compensation) + end + deactivate TIT +``` + +6. If the question asks about compensation actions or failure modes, add `alt` + blocks for each step that can fail, showing the compensation path. +7. If the question asks about SLAs or timeouts, add `Note over` annotations + where timing constraints are known from the KG. + +### Open Questions and Contradictions +"What don't we know about billing?" +"What contradictions have been found?" + +1. For domain-specific queries, read the file directly: + - Open questions: Read `.magellan/domains//open_questions.json` + - Contradictions: Read `.magellan/domains//contradictions.json` +2. For cross-domain queries (e.g., "What contradictions have been found?"), + use Glob to find all files across domains, then read each: + - Open questions: Glob `.magellan/domains/*/open_questions.json`, then Read each + - Contradictions: Glob `.magellan/domains/*/contradictions.json`, then Read each +3. Present organized by priority/severity. + +--- + +## Manual Graph Traversal + +Since there is no dedicated graph walk tool, you traverse the knowledge graph +manually by reading relationship files and following edges hop-by-hop. This +section describes the procedure for each traversal operation. + +### Data Files + +- **Intra-domain edges**: `.magellan/domains//relationships.json` + Each file contains `{ "domain": "", "edges": [...] }` where each edge has: + `edge_id`, `from`, `to`, `type`, `properties.description`, `evidence`, `confidence`. + Entity IDs are prefixed with their domain (e.g., `billing:invoice_generation`). + +- **Cross-domain edges**: `.magellan/cross_domain.json` + Contains SAME_AS links and inter-domain relationship edges. Same edge structure + but `from` and `to` span different domain prefixes. + +### Walk (Outgoing Traversal) + +Use for: "What does X depend on?", "What does X connect to?" + +1. Identify the start entity (e.g., `billing:invoice_generation`). +2. Extract the domain from the entity ID prefix (e.g., `billing`). +3. Read `.magellan/domains/billing/relationships.json`. +4. Also read `.magellan/cross_domain.json`. +5. Find all edges where `from` equals the start entity. Collect the `to` entities. +6. For each `to` entity found, read its entity file for context if needed. +7. To continue deeper (multi-hop), repeat steps 2-6 for each discovered entity, + up to a maximum depth of 8 hops. +8. Track visited entities to avoid infinite cycles — never revisit an entity + you have already expanded. +9. Present the full traversal path showing each hop and the edge type/description. + +### Impact (Incoming / Reverse Traversal) + +Use for: "What depends on X?", "What would break if we changed X?" + +1. Identify the start entity. +2. Read relationships.json for all domains (Glob `.magellan/domains/*/relationships.json`, + then Read each). +3. Also read `.magellan/cross_domain.json`. +4. Find all edges where `to` equals the start entity. Collect the `from` entities — + these are the entities that depend on the start entity. +5. To continue deeper, repeat: for each discovered `from` entity, find edges where + `to` equals that entity. +6. Track visited entities to avoid cycles. Maximum depth: 8 hops. +7. Present results showing the reverse dependency chain. + +### Between (Path Finding) + +Use for: "How are X and Y connected?", "Trace the path from X to Y." + +1. Identify the start entity and end entity. +2. Read relationships.json for all relevant domains + cross_domain.json. +3. Build an adjacency picture by scanning all edges. +4. Perform a breadth-first search (BFS) from the start entity: + - Maintain a queue of `(current_entity, path_so_far)` tuples. + - At each step, find all edges where `from` equals the current entity. + Add each `to` entity to the queue (if not already visited) with the + extended path. + - If the current entity equals the end entity, record the path as a result. +5. Stop after finding up to 20 paths, or after reaching depth 8, whichever + comes first. Prefer shortest paths. +6. For each path found, read the entities along the way for context. +7. Present each path as a chain: `A -[EDGE_TYPE]-> B -[EDGE_TYPE]-> C`. + +### Filter (Walk + Property Filter) + +Use for: "Which entities have property X?", "Find all BusinessRule entities downstream of Y." + +1. Perform a **walk** (outgoing) from the start entity as described above. +2. At each hop, read the discovered entity file. +3. Check if the entity's properties match the filter criteria (e.g., + `entity_type == "BusinessRule"`, or `properties.handles_pii == true`). +4. Collect only the entities that pass the filter. +5. Present the filtered results with their traversal paths. + +### Traversal Guardrails + +- **Start small**: Read the start entity first to confirm it exists and get its domain. +- **Hard depth limit**: Stop at 5 hops. Do not increase this — deeper traversals + consume too much context and produce unreliable results. +- **Cycle detection**: Maintain a `visited` set of entity IDs. Before expanding any + entity, check if it's already in `visited`. If so, skip it. This is mandatory — + without it, cycles cause infinite loops. +- **Cross-domain awareness**: Always read cross_domain.json. An entity in billing + may have edges to the title domain only visible there. +- **Edge direction matters**: `from` is the source, `to` is the target. +- **Graph size check**: Before traversing, count total entities via Glob. If the + graph has more than 200 entities, tell the user: + "This is a large knowledge graph (N entities). Traversal results may be + incomplete. For comprehensive structural queries on large graphs, consider + using the production version with the graph walk tool." + Then proceed with best effort — this is a warning, not a stop. +- **Honest incompleteness**: If you hit the depth limit or visited-set limit before + finding an answer, say so: "Traversal reached depth limit without finding a + connection. The relationship may exist at a deeper level or through an indirect + path not explored." Never fabricate a path you didn't actually traverse. + +--- + +## Answer Format + +Every answer must include: + +1. The direct answer to the question +2. Source citations for every factual claim: + - Entity ID that contains the fact + - Original source document and location + - Confidence level +3. Any caveats: + - Low-confidence facts (weight < 0.5): flag explicitly + - Contested facts: mention the contradiction + - Open questions: mention what we don't know + +## Example Answer + +Question: "How does invoice processing work?" + +> Invoice processing in the billing domain follows a four-state lifecycle: +> DRAFT → ISSUED → PAID, with a MANUAL_REVIEW bypass for invoices exceeding +> $10,000. +> +> The MANUAL_REVIEW threshold is contested — the Q3 ops runbook states $10,000 +> (source: Q3_ops_runbook.pdf, page 12, confidence: 0.75) but a database +> config sets it to $5,000 (source: billing_db_config.sql, line 47, +> confidence: 0.90). See contradiction c_001. +> +> The settlement process triggers after an invoice reaches PAID status, +> which in turn initiates title transfer in the title domain. +> (source: billing:settlement_service, evidence from Architecture overview.pdf) +> +> Open question: Is the $10k threshold still active? (oq_003, directed to +> senior_developer, priority: high) + +## What You Do NOT Do + +- Do not invent facts. If the KG doesn't have the information, say "The knowledge + graph does not contain information about [topic]. This should be raised as an + open question." +- Do not guess at relationships. Use the manual graph traversal procedure for + structural questions. +- Do not omit source citations. Every factual claim needs provenance. +- Do not present low-weight entities (< 0.5) as established facts. Qualify them + as "low-confidence" or "from informal sources." +- Do not hide contradictions. Always surface them when relevant to the question. + +## When the KG is Empty or Sparse + +If the workspace has few or no entities: +- Say so clearly: "The knowledge graph has N entities across M domains." +- Suggest what materials should be ingested to answer the question. +- Offer to help add materials using `/magellan:add`. diff --git a/partner-built/magellan/skills/rules-export/SKILL.md b/partner-built/magellan/skills/rules-export/SKILL.md new file mode 100644 index 00000000..84e44d27 --- /dev/null +++ b/partner-built/magellan/skills/rules-export/SKILL.md @@ -0,0 +1,279 @@ +--- +name: rules-export +description: Export business rules from the knowledge graph in standard machine-readable formats — DMN XML, JSON, CSV, and Gherkin BDD scenarios. Runs after Phase 2 business rules generation to produce files that feed directly into BRMS tools, rule engines, and test frameworks. +--- + +# Business Rules Export + +You produce machine-readable exports of the business rules discovered during +Phase 2. The `business_rules.md` deliverable is the human-readable layer; these +exports are the machine-readable layer that feeds into BRMS tools (Camunda, +Drools, IBM ODM), lightweight JSON rule engines, spreadsheet review, and BDD +test frameworks. + +## Output Files + +Per domain, generate these files in `.magellan/domains//deliverables/`: + +| Format | File | Use Case | +|--------|------|----------| +| DMN XML | `rules_.dmn` | BRMS import (Camunda, Drools, IBM ODM) | +| JSON | `rules_.json` | Lightweight engines (json-rules-engine, similar) | +| CSV | `rules_.csv` | Spreadsheet review, bulk editing | +| Gherkin | `rules_.feature` | BDD test scenarios for QA teams | + +## When to Generate + +- After business rules are generated in Phase 2 (Step 12) +- On demand when an architect requests rule exports + +## Process + +For each domain discovered via Glob on `.magellan/domains/*/`: + +1. Use the Read tool to read `.magellan/domains//summary.json` for the + domain overview. +2. Use Glob on `.magellan/domains//entities/*.json` to discover entities, + then Read the entities tagged with `business_rule`. +3. Read the existing `business_rules.md` from the deliverables directory for + the classification (HARD/SOFT/QUESTIONABLE) and condition/action pairs. +4. Model each rule as a structured object (see Rule Structure below). +5. Generate all four export formats from the structured rules. +6. Write each file immediately after generating it. + +## Rule Structure + +Model each rule internally before generating exports: + +```json +{ + "rule_id": "BR-FIN-001", + "domain": "financial_management_payments", + "name": "Invoice Manager Approval Threshold", + "classification": "HARD", + "condition": "invoice_amount > 15000", + "action": "require_manager_approval", + "decision_table": null, + "source_entity": "billing:invoice_approval", + "source_document": "CBBLKBOOK.cblle", + "source_quote": "IF WS-INV-AMT > 15000 PERFORM MANAGER-APPROVAL-PARA", + "confidence": 0.85, + "notes": "Threshold value conflicts with QA manual ($10k) — see C-001", + "tags": ["financial", "approval", "threshold"] +} +``` + +For rules with multiple conditions, use a decision table: + +```json +{ + "rule_id": "BR-SAL-015", + "name": "Sale Reversal Eligibility", + "classification": "HARD", + "decision_table": { + "inputs": ["sale_age_hours", "settlement_status", "title_transferred"], + "output": "reversal_allowed", + "rules": [ + {"conditions": ["<= 24", "pending", "false"], "result": "yes"}, + {"conditions": ["<= 24", "pending", "true"], "result": "manual_review"}, + {"conditions": ["> 24", "*", "*"], "result": "no"} + ] + } +} +``` + +## DMN XML Format (`rules_.dmn`) + +Generate valid DMN 1.3 XML. Each rule becomes a decision element. Decision +tables become `decisionTable` elements with input and output columns. + +```xml + + + + + HARD — Source: billing:invoice_approval (CBBLKBOOK.cblle) + Confidence: 0.85 + + + + invoice_amount + + + + + > 15000 + "require_manager_approval" + + + <= 15000 + "auto_approve" + + + + + +``` + +Rules: +- Use DMN 1.3 namespace (`https://www.omg.org/spec/DMN/20191111/MODEL/`) +- Each rule is a `` element with the rule_id as its id attribute +- Simple condition/action rules become single-row decision tables +- Multi-condition rules use their decision_table structure directly +- Include classification and source in the `` element +- Escape XML entities properly (`>`, `<`, `&`) + +## JSON Format (`rules_.json`) + +Generate a JSON array of rule objects compatible with lightweight rule engines: + +```json +{ + "domain": "billing", + "generated": "2026-02-23T10:00:00Z", + "rule_count": 25, + "distribution": { + "HARD": 8, + "SOFT": 12, + "QUESTIONABLE": 5 + }, + "rules": [ + { + "rule_id": "BR-FIN-001", + "name": "Invoice Manager Approval Threshold", + "classification": "HARD", + "condition": { + "field": "invoice_amount", + "operator": ">", + "value": 15000 + }, + "action": { + "type": "require_manager_approval" + }, + "source_entity": "billing:invoice_approval", + "source_document": "CBBLKBOOK.cblle", + "confidence": 0.85, + "tags": ["financial", "approval", "threshold"] + } + ] +} +``` + +For decision table rules, use a `conditions` array instead of a single condition: + +```json +{ + "rule_id": "BR-SAL-015", + "name": "Sale Reversal Eligibility", + "classification": "HARD", + "conditions": [ + {"field": "sale_age_hours", "operator": "<=", "value": 24}, + {"field": "settlement_status", "operator": "==", "value": "pending"}, + {"field": "title_transferred", "operator": "==", "value": false} + ], + "action": { + "type": "allow_reversal" + } +} +``` + +## CSV Format (`rules_.csv`) + +Standard CSV with headers. Opens cleanly in Excel/Google Sheets: + +```csv +rule_id,name,classification,condition,action,source_entity,source_document,confidence,tags,notes +BR-FIN-001,Invoice Manager Approval Threshold,HARD,"invoice_amount > 15000",require_manager_approval,billing:invoice_approval,CBBLKBOOK.cblle,0.85,"financial;approval;threshold","Threshold conflicts with QA manual — see C-001" +``` + +Rules: +- Quote fields that contain commas, quotes, or newlines +- Use semicolons to separate tags within the tags field +- Decision table rules expand to one row per rule combination +- Include a header row + +## Gherkin Format (`rules_.feature`) + +Generate BDD scenarios for every business rule. Gherkin bridges the gap between +business analysts who define rules and QA engineers who validate them. + +```gherkin +@domain:billing +Feature: Billing Domain Business Rules + + Business rules extracted from the Magellan knowledge graph. + Source: .magellan/domains/billing/deliverables/business_rules.md + + @classification:HARD + @confidence:0.85 + @rule:BR-FIN-001 + Scenario: Invoice above threshold requires manager approval + Given an invoice with amount 20000 + When the invoice is submitted for processing + Then manager approval should be required + + @classification:HARD + @confidence:0.85 + @rule:BR-FIN-001 + Scenario: Invoice below threshold is auto-approved + Given an invoice with amount 10000 + When the invoice is submitted for processing + Then the invoice should be auto-approved + + @classification:HARD + @confidence:0.85 + @rule:BR-FIN-001 + Scenario Outline: Invoice approval threshold boundary testing + Given an invoice with amount + When the invoice is submitted for processing + Then the result should be + + Examples: + | amount | outcome | + | 14999 | auto-approved | + | 15000 | auto-approved | + | 15001 | requires-approval | +``` + +Gherkin generation rules: + +- **Confidence score visible**: tagged on every scenario (`@confidence:0.85`). + Teams can filter to only test high-confidence rules first. +- **Classification as tag**: `@classification:HARD` allows running only HARD rules. +- **Source traceability**: feature description cites the source document and entity. +- **Boundary tests**: numeric threshold rules auto-generate Scenario Outlines with + boundary values (below, at, above the threshold). +- **Decision table rules**: generate Scenario Outlines with Examples tables matching + the decision table rows. +- **QUESTIONABLE rules**: tag scenarios with `@needs-review` so QA knows to validate + the rule itself, not just its behavior. +- **Condition/action mapping**: Given = set up the condition, When = trigger the + evaluation, Then = assert the action. + +## Critical: Use Built-in Tools for Reading + +- ALL KG data reads MUST use Claude's built-in tools: + - **Discover domains**: Glob on `.magellan/domains/*/` + - **Discover entities**: Glob on `.magellan/domains//entities/*.json` + - **Read entity details**: Read tool on `.magellan/domains//entities/.json` + - **Read domain summaries**: Read tool on `.magellan/domains//summary.json` +- Read `business_rules.md` from the deliverables directory using the Read tool + (it's a generated artifact, not KG data). +- Write export files using the Write tool (same pattern as other generated + artifacts in deliverables/). + +## What You Do NOT Do + +- Do not invent rules. Only export rules that exist in the KG. +- Do not guess at conditions or actions. If a rule's condition can't be extracted + as a structured expression, use the natural language description and note it. +- Do not generate invalid XML. Escape all special characters in DMN output. +- Do not skip QUESTIONABLE rules. Export all rules with their classification — + consumers decide what to use. +- Do not omit source traceability. Every exported rule must reference its KG + entity and source document. +- Do not combine domains. Generate separate export files per domain. diff --git a/partner-built/magellan/skills/summarization/SKILL.md b/partner-built/magellan/skills/summarization/SKILL.md new file mode 100644 index 00000000..2f43c8b9 --- /dev/null +++ b/partner-built/magellan/skills/summarization/SKILL.md @@ -0,0 +1,139 @@ +--- +name: summarization +description: Synthesize domain narratives from knowledge graph entities. Identifies hub entities and produces summary.json per domain. Use after graph building to create readable overviews. +--- + +# Domain Summarization (Stage 2c) + +## Critical: Use Built-in Tools for All Operations + +You MUST use Claude's built-in tools for reading and writing: +- Glob on `.magellan/domains/*/` to discover domains +- Glob on `.magellan/domains//entities/*.json` to discover entities +- Read tool on entity files to read entity details +- Read tool on `.magellan/domains//relationships.json` for edge counts (hub detection) +- Read tool on `.magellan/domains//open_questions.json` and `.magellan/domains//contradictions.json` for counts +- Read tool on `.magellan/domains//summary.json` to check existing summaries +- Write tool to `.magellan/domains//summary.json` for each domain's summary +- Read/Write tools on `.magellan/state.json` to update state with summary entity counts + +Do NOT compress summaries into index.json. +Each domain MUST get its own `summary.json` written via the Write tool. + +You produce a master narrative for each domain by identifying hub entities and +synthesizing a coherent summary. This bridges the gap between hundreds of individual +entity files and the high-level understanding a model or architect needs. + +## When to Run + +- After Stage 2b (cross-domain linking) when a domain's entity count has changed by + more than 10% since the last summary +- Before Phase 2 (design generation) for all domains +- On demand when an architect requests regeneration + +## Process + +1. Read `.magellan/state.json` using the Read tool to get `last_summary_entity_counts`. + +2. For each domain (discovered via Glob on `.magellan/domains/*/`): + a. Count current entities using Glob on `.magellan/domains//entities/*.json`. + b. Compare against `last_summary_entity_counts[domain]` from state.json. + c. If the count changed by more than 10% or no summary exists, regenerate. + +3. For domains that need regeneration: + a. Read all entity files and relationships for the domain using the Read tool. + b. Calculate hub scores. + c. Read the top hub entities in detail. + d. Synthesize the domain narrative. + e. Write `summary.json` to `.magellan/domains//summary.json` using the Write tool. + f. Update `.magellan/state.json` with current entity count for the domain using + the Read tool (to get current state) then the Write tool (to save updated state). + +## Hub Detection + +Hub entities are the most important concepts in a domain — the ones everything else +clusters around. Identify them using: + +``` +hub_score = relationship_count * entity_weight +``` + +Where `relationship_count` is the total inbound + outbound edges for the entity, +and `entity_weight` is the weight field on the entity. + +Exclusion rule: entities with weight below 0.5 are excluded from hub candidacy +entirely. They are infrastructure or utility concepts, not business hubs. + +Example: +- `Invoice_Generation` (weight 0.9, 43 connections) -> hub_score = 38.7 -> hub +- `DateFormatter` (weight 0.3, 200 connections) -> excluded (weight < 0.5) + +Select the top 10-15 hub entities per domain. + +## Narrative Writing + +The `narrative` field in `summary.json` is the most important output. It must: + +- Explain the domain in plain language, as if briefing an architect who has never + seen this system +- Start with the core business process (what does this domain DO) +- Describe the key entities and how they relate +- Mention known risks, contradictions, and contested entities +- Reference open questions that affect this domain +- Be 3-8 paragraphs — long enough to be useful, short enough to fit in a context window + +Structure the narrative: +1. Overview: what this domain is responsible for +2. Core processes: the main workflows and how they operate +3. Key entities: the hubs and what they do +4. Integrations: how this domain connects to other domains +5. Risks and open items: contradictions, open questions, contested facts + +## summary.json Format + +```json +{ + "domain": "billing", + "generated_at": "2026-03-20T10:00:00Z", + "entity_count": 487, + "hub_entities": 12, + "narrative": "The billing domain centers on Invoice Generation, a four-state lifecycle...", + "hub_summaries": [ + { + "entity_id": "billing:invoice_generation", + "name": "Invoice Generation", + "hub_score": 38.7, + "connected_entities": 43, + "summary": "Core billing process. Four states with a MANUAL_REVIEW exception...", + "open_questions": 2, + "contradictions": 1 + } + ], + "open_question_count": 8, + "contradiction_count": 3, + "cross_domain_connections": ["title", "transportation"] +} +``` + +## Cross-Domain Connections + +Read `.magellan/cross_domain.json` using the Read tool for edges involving entities +in this domain. List the other domains this domain connects to in the +`cross_domain_connections` field. Mention these connections in the narrative. + +## Open Questions and Contradictions + +Read `.magellan/domains//open_questions.json` and +`.magellan/domains//contradictions.json` using the Read tool. Count how many +relate to this domain. Include the counts in the summary and mention the most critical +ones in the narrative. + +## Updating State + +After writing each domain's summary, update `.magellan/state.json`: + +1. Read `.magellan/state.json` using the Read tool. +2. Update `last_summary_entity_counts[domain]` to the current entity count. +3. Write the updated state back using the Write tool. + +This enables the 10% change threshold check on the next run.