Add CAML interactive article system with published npm packages#1156
Add CAML interactive article system with published npm packages#1156
Conversation
Implement end-to-end support for rich, interactive scrollytelling articles stored as Readme.CAML documents within a corpus. This is the document convention approach (like GitHub's README) — the latest version of Readme.CAML is always used. Backend: - Add text/markdown MIME type with no-op MarkdownParser that skips NLP pipeline (no PAWLs, no sentence annotations, no embeddings) - Add title filter to DocumentFilter for exact-match document queries - Detect .caml/.md files by extension during upload Frontend - CAML Parser (frontend/src/caml/parser/): - Pure function parseCaml(string) -> CamlDocument JSON IR - Pass 1: YAML frontmatter extraction + ::: fence tokenization - Pass 2: Type-specific block parsing (cards, pills, tabs, timeline, CTA, signup, corpus-stats, annotation-embed) - Supports nested :::: fences for tab sub-blocks Frontend - CAML Renderer (frontend/src/caml/renderer/): - CamlArticle top-level renderer with Hero, Chapter, Footer - Block components: Cards (grid), Pills (flex row), Tabs (interactive), Timeline (vertical), CTA (buttons), Signup, CorpusStats - Prose blocks render through MarkdownMessageRenderer for @ mention chip support (reuses existing mention system) - OS Legal design system styling with dark/light chapter themes Frontend - Article View & Editor: - New ?view=article route in CorpusHome orchestrator - CorpusArticleView queries for Readme.CAML by title, fetches content, parses and renders - Landing view shows "Read the article" link when Readme.CAML exists - CamlArticleEditor: split-pane editor (CAML source + live preview) with create/update via UploadDocument mutation - Edit button in article toolbar for quick access https://claude.ai/code/session_01VA1iypRHHhLPU48aGaygHA
Code Review: CAML Interactive Article SystemThis is a well-thought-out feature with clean architecture (two-pass parser, pure function design, pipeline bypass via no-op parser). The overall approach is solid. Below is feedback organized by severity. 🔴 Critical / Bugs1. Duplicate
Either:
2. Deprecated const base64Content = btoa(unescape(encodeURIComponent(content)));
const bytes = new TextEncoder().encode(content);
const base64Content = btoa(String.fromCharCode(...bytes));
// or for large files: btoa(Array.from(bytes, b => String.fromCharCode(b)).join(""))🟠 Architecture / Design Concerns3. The parser calls 4. Title-based article discovery is fragile The 5. Fragile CSS class-name targeting in const StyledModalWrapper = styled.div`
[class*="modal-content"],
[class*="ModalContent"],
[role="dialog"] > div {
width: 95vw !important; ...This approach is brittle — it will silently break if 🟡 Code Quality6. Magic color constants in
7. if (raw === "true") return true as unknown as string;
if (raw === "false") return false as unknown as string;The return type is declared 8. id: `chapter-${Math.random().toString(36).slice(2, 8)}`Random IDs are regenerated on every 9. interface CamlTimelineItem {
side: string; // used as legend lookup key
}
🟡 Performance10. Extra GraphQL query on every corpus landing page
🟡 Missing Tests11. No unit tests for the CAML parser The 12. No backend test for A test verifying that a 🟡 Accessibility13. Tabs have no keyboard navigation
14. Using ✅ What's Done Well
|
Tests: - CamlArticle.ct.tsx: 11 tests covering full article rendering, hero, cards, pills, interactive tabs, timeline, CTA, dark themes, pullquotes, empty document, and corpus stats blocks - CamlArticleEditor.ct.tsx: 3 tests covering new article template, live preview updates, and close behavior - CorpusArticleView.ct.tsx: 2 tests covering empty state (no article) and toolbar with back/edit buttons All tests capture docScreenshots for automated documentation. Also adds CAML v2 spec at docs/superpowers/specs/2026-03-24-caml-v2-spec.md documenting planned features: AI citation (cite-me/cite-all), annotation embeds, live corpus stats, signup notifications, editor enhancements, and VS Code extension. https://claude.ai/code/session_01VA1iypRHHhLPU48aGaygHA
|
Test |
- Fix CamlHeroRenderer to handle non-array title defensively (prevents crash when YAML parser produces string instead of array) - Add MemoryRouter to CamlArticleTestWrapper (required by MarkdownMessageRenderer's useNavigate hook) - Fix strict mode violations in test assertions: use .first() and role-based selectors for text that appears in multiple blocks - Simplify editor live preview test to avoid YAML parser edge case - All 16 tests pass: 11 CamlArticle + 3 CamlArticleEditor + 2 CorpusArticleView Screenshots captured: caml--article--full-render, caml--article--minimal, caml--article-view--empty-state, caml--article-view--toolbar, caml--cards--grid-render, caml--chapter--dark-gradient, caml--corpus-stats--with-data, caml--cta--buttons, caml--editor--live-preview, caml--editor--new-article, caml--hero--with-stats, caml--pills--with-status, caml--prose--pullquote, caml--tabs--compliance-active, caml--tabs--risk-active, caml--timeline--with-legend https://claude.ai/code/session_01VA1iypRHHhLPU48aGaygHA
|
PR Review: CAML Interactive Article System Overall this is a well-structured, ambitious feature with good test coverage and clean architecture (pure parser function, typed IR, renderer separation). The backend integration is minimal and targeted. Below are the issues to address before merging. BUGS AND CORRECTNESS
SECURITY
ARCHITECTURE
TEST COVERAGE
MINOR ISSUES
|
|
Deleting test comment |
|
Deleting test |
|
deleting test |
|
deleting |
1 similar comment
|
deleting |
PR Review: CAML Interactive Article SystemThis is a well-structured, feature-rich PR. The architecture is clean - pure parser function, typed IR, composable renderer - and the test coverage for a feature of this size is solid. The security guard (isSafeHref) is a nice touch. A few issues worth addressing before merge: Critical: Apollo cache keyArgs mismatch Files: frontend/src/graphql/cache.ts, frontend/tests/CamlArticleEditorTestWrapper.tsx, frontend/tests/CorpusArticleViewTestWrapper.tsx The main cache.ts adds title to the documents keyArgs, but GET_CORPUS_ARTICLE filters by both inCorpusWithId AND title. Since inCorpusWithId is not in the production keyArgs, two different corpuses querying for Readme.CAML will share the same cache bucket - the second corpus can receive the first corpus article result. This is the exact bug class described in CLAUDE.md pitfall 15. The test wrappers already use inCorpusWithId and title as keyArgs, which highlights the divergence from production. Either add inCorpusWithId to the main documents keyArgs, or give this query its own field policy. Important: Unconditional query on every corpus landing page File: frontend/src/components/corpuses/CorpusHome/CorpusLandingView.tsx (line ~132) GET_CORPUS_ARTICLE fires with no skip guard on every landing page render, adding a round-trip for all corpus users regardless of whether an article exists. Consider lazy-loading or at minimum adding a skip guard. The cache issue above amplifies this. Important: Magic colors violate project conventions Files: CamlArticleEditor.tsx, CorpusArticleView.tsx Both files use hardcoded hex literals (#e2e8f0, #fafbfc, #f8fafc, #64748b, #94a3b8, #475569) instead of OS_LEGAL_COLORS tokens. Per CLAUDE.md, all hardcoded values should use constants. The renderer's styles.ts demonstrates the correct pattern throughout. Bug: ConfirmModal noAction is a no-op File: frontend/src/components/corpuses/CamlArticleEditor.tsx (line ~397) The noAction prop is an empty function. If ConfirmModal calls noAction when the user clicks Keep editing, the confirmation dialog stays open permanently. Check whether the cancel button triggers noAction or toggleModal - if the former, this should be setShowCloseConfirm(false). Other usages of ConfirmModal in the codebase do not use an empty no-op here. Minor: Duplicated external-link detection Files: frontend/src/caml/renderer/CamlBlocks.tsx (line ~334), frontend/src/caml/renderer/CamlFooter.tsx (line ~25) The href.startsWith("http") check for target="_blank" is duplicated verbatim in both files. Consider extracting an isExternalHref helper into safeHref.ts alongside isSafeHref. Minor: Redundant articleStats useMemo File: frontend/src/components/corpuses/CorpusHome/CorpusArticleView.tsx (lines ~390-402) The memo mirrors stats with no transformation. Pass stats directly to CamlArticle and remove the intermediate memo. Missing: CHANGELOG.md update Per CLAUDE.md conventions, significant new features require a changelog entry. This PR introduces a new MIME type, parser, document convention, and ~4400 lines of frontend. Observation: No backend tests for MarkdownParser or MIME detection The MarkdownParser and text/markdown detection in document_mutations.py have no test coverage. A minimal test verifying that .caml/.md uploads bypass NLP and correctly populate txt_extract_file would protect against regressions. Positive notes
The cache keyArgs issue is the only one that could cause silent data correctness bugs in production (wrong corpus article served from cache). Everything else is polish. |
Codecov Report✅ All modified and coverable lines are covered by tests. 📢 Thoughts on this report? Let us know! |
…active-features-1Mvfr
… fixes - Fix XSS vulnerability: add isSafeHref guard for CTA and footer hrefs, rejecting javascript: and other dangerous URL protocols - Fix Apollo cache keyArgs: add 'title' to documents relayStylePagination to prevent GET_CORPUS_ARTICLE from sharing cache with unrelated queries - Replace deprecated unescape() with TextEncoder in CamlArticleEditor - Use deterministic positional IDs for chapters instead of Math.random(), preventing unnecessary React unmount/remount on every re-parse - Handle unclosed fences gracefully by flushing as prose instead of silently dropping content - Remove unnecessary type casts in parseYamlValue - Consolidate TEXT_MIMETYPES into constants/document_processing.py (single source of truth, imported by versioning.py and models.py) - Add explicit UTF-8 decoding in MarkdownParser for non-ASCII content - Replace 30+ hardcoded hex colors in CAML styles.ts with OS_LEGAL_COLORS/OS_LEGAL_TYPOGRAPHY design tokens
…active-features-1Mvfr
…bled signup - Allow fragment (#) URLs in isSafeHref guard for in-page anchor links - Replace window.confirm with ConfirmModal component for unsaved changes dialog in CamlArticleEditor, matching project conventions - Mark SignupBlock button as disabled with proper styling since the signup action is not yet implemented
…ternal href check - Replace all hardcoded hex color literals in CamlArticleEditor.tsx and CorpusArticleView.tsx with OS_LEGAL_COLORS design tokens - Fix YAML frontmatter parser bug where nested keys (e.g., hero.kicker) were silently dropped because content used trimEnd() instead of full trim, causing the key-value regex to fail on indented lines - Extract isExternalHref() helper into safeHref.ts and use it in CamlBlocks.tsx and CamlFooter.tsx (DRY) - Remove redundant articleStats useMemo in CorpusArticleView.tsx - Add JSDoc to CamlTimelineItem.side documenting legend lookup contract - Update CHANGELOG.md with CAML feature summary
- 23 tests for parseCaml: frontmatter extraction, chapter parsing, all block types (cards, pills, tabs, timeline, CTA, signup, corpus-stats), and edge cases (empty input, unclosed fences, unknown block types, whitespace-only body) - 11 tests for safeHref: isSafeHref (XSS prevention for javascript:, data:, vbscript: protocols) and isExternalHref helper
Cover all branches: successful parse, missing txt file, bytes vs string from storage, and None description fallback.
|
test comment please ignore |
|
PR review test - multiline |
|
line 1 |
|
line 1 Heading 2Heading 3 |
|
7. Magic hex values remain in styles.ts and CorpusArticleView.tsx Per CLAUDE.md rule 4, hardcoded values must live in constants files. Several slipped through: "#94a3b8" inline in TimelineDot fallback, rgba(0,0,0,0.06) in CardItem hover shadow, rgba(255,255,255,0.95) in ArticleToolbar backdrop, and the inline style prop in CorpusArticleView empty state. |
|
7. Magic hex values remain in styles.ts and CorpusArticleView.tsx Per CLAUDE.md rule 4, hardcoded values must live in constants files. Several slipped through: "#94a3b8" inline in TimelineDot fallback. |
|
test line |
|
line with backtick: |
|
Several slipped through: |
|
Security |
|
Security |
|
the blocks closing fence is consumed as the chapter close. Security |
|
line without apostrophe Security |
PR Review: CAML Interactive Article SystemThis is a well-scoped feature. The two-pass parser architecture is clean, the Bugs1. Missing import in The diff removes the local 2. CAML template ships with a parse failure
3. Misleading "Parsing article..." state on CAML parse error in When Security4. Protocol-relative URLs pass the The current regex allows Performance5. Extra GraphQL round-trip on every corpus landing page load
6. CAML file content not cached across in-session navigations Every visit to Code Quality7. Magic hex values remain in Per CLAUDE.md rule 4, hardcoded values must live in constants files. Several slipped through: 8. Hand-rolled YAML parser is a maintenance liability
9.
10. The field stores a legend category ( 11. Uploading Minor
Summary
Most critical before merge: 1 (potential
|
Replace the in-tree frontend/src/caml/ parser and renderer with the standalone @os-legal/caml (parser) and @os-legal/caml-react (renderer) npm packages, linked from ../../os-legal-caml/. - Add link dependencies and resolution for @os-legal/caml - Update CamlArticleEditor.tsx and CorpusArticleView.tsx imports - Wrap CamlArticle in CamlThemeProvider, pass renderMarkdown prop - Delete the old frontend/src/caml/ directory (15 files)
The caml-react npm package declares workspace:* as a dep on @os-legal/caml (a packaging artifact), so add a resolutions override to point it at the published 0.0.1. Also add defaultCamlTheme to ThemeProvider's merged theme to satisfy the DefaultTheme augmentation now visible from the npm types.
Pass totalDocs, totalAnnotations, and totalThreads from CorpusHome down into CorpusArticleView so the CAML article renderer receives live stats. Also add totalThreads to the CorpusHomeProps and CorpusDashboardProps stats types to match the full shape returned by GET_CORPUS_STATS.
Adds jurisdiction map (Chapter 4) and case-tracker (Chapter 5) chapters to SAMPLE_CAML_DOCUMENT, plus Playwright tests with docScreenshot calls for both new block types.
PR Review: CAML Interactive Article SystemThis is a substantial, well-architected feature. The two-pass parser design, centralization of Bugs / Correctness1.
const { data: articleData } = useQuery<GetCorpusArticleOutput, GetCorpusArticleInput>(
GET_CORPUS_ARTICLE,
{
variables: articleVars,
skip: !corpus.id, // corpus always has an id here, but guards against blank renders
}
);More importantly, consider whether it's worth polling for article existence on every corpus landing page. If 2. Both binary files resolve to the same blob hash ( 3. This is hardcoded in Security4. The component does // Guard: only fetch from same origin or known storage domain
const url = new URL(articleDoc.txtExtractFile, window.location.origin);
if (url.origin !== window.location.origin && !ALLOWED_STORAGE_ORIGINS.has(url.origin)) {
throw new Error("Unexpected article URL origin");
}Or use the existing 5. MIME detection is extension-only, not content-based
Code Quality6. Fragile CSS glob selectors in [class*="modal-content"],
[class*="ModalContent"],
[role="dialog"] > div {
width: 95vw !important;
...
}These 7. Plan/spec files committed to main branch
8. Missing backend tests for the new filter and MIME detection The
9. The diff adds Minor / Nits
Summary
The core feature is solid. The main asks are: fix the duplicate screenshot, add a |
|
test comment |
|
placeholder for full review |
|
PR Review: CAML Interactive Article System - see full review in thread |
|
PR Review: CAML Interactive Article System Overall this is a well-structured feature with good test coverage and clean separation of concerns. The backend additions are minimal and correct. A few issues worth addressing before merge. Bugs / Issues 1. Triple-fetching GET_CORPUS_ARTICLE on landing page When a corpus has an article, the landing page fires GET_CORPUS_ARTICLE three times in parallel: CorpusHome.tsx (hasArticle detection), CorpusLandingView.tsx (show/hide Read article button), and CorpusArticleView.tsx (actual content fetch). Apollo deduplicates concurrent identical queries, but the first two are structurally redundant. Since CorpusHome already has hasArticle and controls whether onViewArticle is passed down, the check in CorpusLandingView can be removed. Passing onViewArticle only when hasArticle is true is sufficient to hide/show the button without an extra query. 2. Missing AbortController in fetch useEffects Both CorpusArticleView.tsx and CamlArticleEditor.tsx fetch from articleDoc.txtExtractFile without an abort controller. If the component unmounts mid-fetch, the state setter fires on an unmounted component. React 18 suppresses the warning but the callback still executes. Both effects should return a cleanup function calling controller.abort(). 3. Silent fetch failure in CamlArticleEditor When fetching existing article content fails, the editor silently falls back to the blank CAML_TEMPLATE with only a console.error. The existing article is replaced by the template with no explanation. A toast.error() should surface this. |
|
Code Quality 4. Hardcoded rgba values Per CLAUDE.md, hardcoded hex/rgba values should use semantic tokens from osLegalStyles.ts. Two places introduce raw rgba: CorpusHome.tsx FloatingControls uses rgba(255, 255, 255, 0.95) and rgba(0, 0, 0, 0.12), and CorpusArticleView.tsx ArticleToolbar uses rgba(255, 255, 255, 0.95). These should be added to OS_LEGAL_COLORS (e.g. glassSurface, shadowMd) rather than inlined. 5. Plan docs committed to repo docs/superpowers/plans/2026-03-25-caml-npm-extraction.md (1279 lines) and docs/superpowers/plans/2026-03-26-caml-npm-migration.md are agentic implementation plans for extracting CAML into a separate npm repo. These are internal work artifacts rather than user-facing documentation. Are these intentionally checked in? A GitHub issue would track the future work with less repo bloat. Backend 6. Silent None return on missing txt_extract_file MarkdownParser._parse_document_impl logs at ERROR level and returns None when txt_extract_file is missing, leaving the document with no content in the export dict. A more descriptive message including the document ID and expected file path would make these failures faster to debug. Minor
Overall the architecture is sound, the npm extraction design is clean, and the 19 screenshot tests are a good addition. The hardcoded colors, triple-fetch, and silent failure in the editor are the items worth addressing before merge. |
Code ReviewThis is a substantial, well-structured PR. The overall architecture is sound: using npm packages for the CAML parser/renderer, centralizing Bugs / Potential Breakage1. The diff removes the local from opencontractserver.constants.document_processing import TEXT_MIMETYPES2.
3. The Architecture / Design Concerns4. Duplicate When a corpus has an article and the landing view is rendered, both 5. CSS [class*="modal-content"],
[class*="ModalContent"],
[role="dialog"] > div {
width: 95vw !important;
...This is fragile: it will silently break or mis-apply if 6.
Code Quality7. Planning docs committed to the repo
8. // cache.ts
"title",Adding 9. logger.error(f"No txt file found for document {doc_id}")
10. The Testing11. Backend tests checkbox is unchecked in the test plan
The new 12. No test for the MIME detection logic in The extension-based Minor
Overall this is a well-executed feature that follows existing patterns (routing system, constants centralization, test wrappers). The items above are worth addressing before merge, with items 1–3 being the most critical. |
Summary
@os-legal/caml(parser) and@os-legal/caml-react(renderer) from npm, with full theme integrationcorpus-statsCAML blocks now receive real data (documents, annotations, threads) from the GraphQLcorpusStatsqueryUPLOAD_DOCUMENTmutation.camlfiles that stores raw text for frontend renderingTest plan
yarn test:ct --reporter=list -g "CamlArticle|CorpusArticleView")npx tsc --noEmit)docker compose -f test.yml run django pytest -n 4 --dist loadscope)