Skip to content

Add CAML interactive article system with published npm packages#1156

Open
JSv4 wants to merge 27 commits intomainfrom
claude/markdown-interactive-features-1Mvfr
Open

Add CAML interactive article system with published npm packages#1156
JSv4 wants to merge 27 commits intomainfrom
claude/markdown-interactive-features-1Mvfr

Conversation

@JSv4
Copy link
Collaborator

@JSv4 JSv4 commented Mar 24, 2026

Summary

  • CAML rendering system: Adds Corpus Article Markup Language support for rich, interactive corpus articles (Readme.CAML documents) with hero sections, cards, pills, tabs, timelines, CTA buttons, and more
  • Published npm packages: Uses @os-legal/caml (parser) and @os-legal/caml-react (renderer) from npm, with full theme integration
  • New block types: US jurisdiction map (categorical + heatmap modes) and case history timeline (court progression tracker with outcome color-coding)
  • Corpus stats wiring: corpus-stats CAML blocks now receive real data (documents, annotations, threads) from the GraphQL corpusStats query
  • Editor: Full-screen split-pane CAML editor with live preview, template with all block types, and save-to-corpus via UPLOAD_DOCUMENT mutation
  • Backend: No-op markdown parser for .caml files that stores raw text for frontend rendering
  • 19 Playwright screenshot tests covering all block types with auto-generated documentation screenshots

Test plan

  • All 19 CAML Playwright component tests pass (yarn test:ct --reporter=list -g "CamlArticle|CorpusArticleView")
  • TypeScript compiles cleanly (npx tsc --noEmit)
  • Pre-commit hooks pass
  • Screenshots generated for map, case-history, editor, and all existing block types
  • Backend tests pass (docker compose -f test.yml run django pytest -n 4 --dist loadscope)
  • Manual: Create a new corpus, click "Article" view, verify editor opens with template
  • Manual: Save article, verify it renders with all block types in the article view

Implement end-to-end support for rich, interactive scrollytelling articles
stored as Readme.CAML documents within a corpus. This is the document
convention approach (like GitHub's README) — the latest version of
Readme.CAML is always used.

Backend:
- Add text/markdown MIME type with no-op MarkdownParser that skips NLP
  pipeline (no PAWLs, no sentence annotations, no embeddings)
- Add title filter to DocumentFilter for exact-match document queries
- Detect .caml/.md files by extension during upload

Frontend - CAML Parser (frontend/src/caml/parser/):
- Pure function parseCaml(string) -> CamlDocument JSON IR
- Pass 1: YAML frontmatter extraction + ::: fence tokenization
- Pass 2: Type-specific block parsing (cards, pills, tabs, timeline,
  CTA, signup, corpus-stats, annotation-embed)
- Supports nested :::: fences for tab sub-blocks

Frontend - CAML Renderer (frontend/src/caml/renderer/):
- CamlArticle top-level renderer with Hero, Chapter, Footer
- Block components: Cards (grid), Pills (flex row), Tabs (interactive),
  Timeline (vertical), CTA (buttons), Signup, CorpusStats
- Prose blocks render through MarkdownMessageRenderer for @ mention
  chip support (reuses existing mention system)
- OS Legal design system styling with dark/light chapter themes

Frontend - Article View & Editor:
- New ?view=article route in CorpusHome orchestrator
- CorpusArticleView queries for Readme.CAML by title, fetches content,
  parses and renders
- Landing view shows "Read the article" link when Readme.CAML exists
- CamlArticleEditor: split-pane editor (CAML source + live preview)
  with create/update via UploadDocument mutation
- Edit button in article toolbar for quick access

https://claude.ai/code/session_01VA1iypRHHhLPU48aGaygHA
@claude
Copy link

claude bot commented Mar 24, 2026

Code Review: CAML Interactive Article System

This is a well-thought-out feature with clean architecture (two-pass parser, pure function design, pipeline bypass via no-op parser). The overall approach is solid. Below is feedback organized by severity.


🔴 Critical / Bugs

1. Duplicate Readme.CAML documents on repeated saves

CamlArticleEditor.tsx saves via UPLOAD_DOCUMENT with addToCorpusId, which creates a new document on every save rather than updating an existing one. Since GET_CORPUS_ARTICLE uses first: 1 without explicit ordering, repeated saves will accumulate multiple Readme.CAML documents in the corpus and the query may return a stale (oldest) version rather than the latest.

Either:

  • Use an UpdateDocument mutation when articleDoc already exists, or
  • Add explicit ordering (e.g., orderBy: "-modified") to GET_CORPUS_ARTICLE, or
  • Deduplicate/replace on upload in the backend

2. Deprecated unescape() in save path (CamlArticleEditor.tsx)

const base64Content = btoa(unescape(encodeURIComponent(content)));

unescape() is deprecated and can corrupt non-ASCII characters in some environments. Replace with:

const bytes = new TextEncoder().encode(content);
const base64Content = btoa(String.fromCharCode(...bytes));
// or for large files: btoa(Array.from(bytes, b => String.fromCharCode(b)).join(""))

🟠 Architecture / Design Concerns

3. MarkdownParser reads txt_extract_file rather than the raw upload file

The parser calls document.txt_extract_file.name to retrieve content, but txt_extract_file is populated by versioning.py before the parser runs (because text/markdown is in TEXT_MIMETYPES). This creates an undocumented ordering dependency between versioning and parsing. If the versioning pipeline changes, _parse_document_impl could receive a document where txt_extract_file isn't yet populated. A comment explaining this invariant would help, or consider reading from the raw pdf_file/upload instead.

4. Title-based article discovery is fragile

The Readme.CAML naming convention (like GitHub README) is a nice analogy but relies on a title__exact filter match. If a user uploads any document titled "Readme.CAML" that is not CAML-formatted, it will be treated as the corpus article and may fail to parse silently. Consider adding a MIME-type filter (fileType: "md") to GET_CORPUS_ARTICLE to be more robust.

5. Fragile CSS class-name targeting in CamlArticleEditor.tsx

const StyledModalWrapper = styled.div`
  [class*="modal-content"],
  [class*="ModalContent"],
  [role="dialog"] > div {
    width: 95vw !important; ...

This approach is brittle — it will silently break if @os-legal/ui renames internal classes. It also uses !important throughout. Better to use the Modal component's own size/className API if it supports it, or request that @os-legal/ui expose a size prop.


🟡 Code Quality

6. Magic color constants in styles.ts (violates CLAUDE.md "No magic numbers")

styles.ts contains ~30+ hardcoded hex values (#0f766e, #94a3b8, #64748b, #0f172a, etc.) inline. CLAUDE.md specifies that hardcoded values belong in frontend/src/assets/configurations/constants.ts. These should at minimum be extracted to named tokens at the top of the file, or better, reference existing design tokens from osLegalStyles.

7. parseYamlValue return type hack

if (raw === "true") return true as unknown as string;
if (raw === "false") return false as unknown as string;

The return type is declared string | number | boolean but then cast through unknown. The upstream uses any anyway (const result: any = {}), so the type is not being enforced. Either lean into unknown as the return type or fix the calling code to accept boolean. The eslint-disable @typescript-eslint/no-explicit-any on the result object is a smell that the YAML frontmatter shape should be typed properly (or use js-yaml as a dev dependency).

8. Math.random() for chapter IDs

id: `chapter-${Math.random().toString(36).slice(2, 8)}`

Random IDs are regenerated on every parseCaml() call (including every keypress in the editor preview). This means React keys change on every render, causing unnecessary unmounting/remounting of chapter components. Use a deterministic ID (e.g., index-based chapter-0, chapter-1, or hash of content).

9. CamlTimelineItem.side should be narrowly typed

interface CamlTimelineItem {
  side: string;  // used as legend lookup key
}

side is used for legend color lookup via colorMap.get(item.side.toLowerCase()). This could be string (open) since it matches legend labels, but the current documentation doesn't clarify the expected values. A comment explaining the relationship to legend labels would help.


🟡 Performance

10. Extra GraphQL query on every corpus landing page

CorpusLandingView now fires GET_CORPUS_ARTICLE on every render to detect whether the article link should appear. This adds a network round-trip to the critical path for all corpus landing pages, even those without articles. Consider:

  • Cache-only/network-and-cache Apollo fetch policy, or
  • Lifting the query to a parent that already knows about the corpus, or
  • Deferring until user interaction

🟡 Missing Tests

11. No unit tests for the CAML parser

The parseCaml function has significant complexity (custom YAML parser, fence tokenizer, multiple block parsers). Edge cases like unclosed fences, invalid YAML, nested ::: ::: blocks, and malformed card items are not tested. Since this is a pure function, it's very easy to unit test.

12. No backend test for MarkdownParser

A test verifying that a .caml or .md upload: (a) gets MIME type text/markdown, (b) bypasses NLP pipeline, and (c) stores content in txt_extract_file would catch regressions.


🟡 Accessibility

13. Tabs have no keyboard navigation

TabsBlock handles onClick only. WCAG 2.1 requires tabs to be navigable via ArrowLeft/ArrowRight keys. Also missing: role="tablist", role="tab", aria-selected, aria-controls, role="tabpanel" on the panel.

14. window.confirm() for unsaved-changes guard

Using window.confirm() blocks the main thread, is not styleable, and is inconsistent with the rest of the app. Consider a small inline warning banner or a styled confirmation modal instead.


✅ What's Done Well

  • Two-pass parser (tokenizer → block parsers) is clean and testable
  • Pure function design for parseCaml makes preview/testing straightforward
  • No-op parser approach with pipeline bypass is correct — avoids unnecessary NLP work
  • Proper rel="noopener noreferrer" on external links in CTA and Footer
  • CAML_ARTICLE_FILENAME constant in constants.ts (correct placement)
  • useMemo for article vars and parsed document (avoids unnecessary re-parses)
  • Unsaved-changes guard in editor is a nice UX touch
  • Consistent barrel exports pattern (index.ts files)

Tests:
- CamlArticle.ct.tsx: 11 tests covering full article rendering, hero,
  cards, pills, interactive tabs, timeline, CTA, dark themes, pullquotes,
  empty document, and corpus stats blocks
- CamlArticleEditor.ct.tsx: 3 tests covering new article template,
  live preview updates, and close behavior
- CorpusArticleView.ct.tsx: 2 tests covering empty state (no article)
  and toolbar with back/edit buttons

All tests capture docScreenshots for automated documentation.

Also adds CAML v2 spec at docs/superpowers/specs/2026-03-24-caml-v2-spec.md
documenting planned features: AI citation (cite-me/cite-all), annotation
embeds, live corpus stats, signup notifications, editor enhancements,
and VS Code extension.

https://claude.ai/code/session_01VA1iypRHHhLPU48aGaygHA
Comment on lines +17 to +20
import {
CamlArticleTestWrapper,
SAMPLE_CAML_DOCUMENT,
} from "./CamlArticleTestWrapper";
@claude
Copy link

claude bot commented Mar 24, 2026

Test

- Fix CamlHeroRenderer to handle non-array title defensively (prevents
  crash when YAML parser produces string instead of array)
- Add MemoryRouter to CamlArticleTestWrapper (required by
  MarkdownMessageRenderer's useNavigate hook)
- Fix strict mode violations in test assertions: use .first() and
  role-based selectors for text that appears in multiple blocks
- Simplify editor live preview test to avoid YAML parser edge case
- All 16 tests pass: 11 CamlArticle + 3 CamlArticleEditor + 2 CorpusArticleView

Screenshots captured:
  caml--article--full-render, caml--article--minimal,
  caml--article-view--empty-state, caml--article-view--toolbar,
  caml--cards--grid-render, caml--chapter--dark-gradient,
  caml--corpus-stats--with-data, caml--cta--buttons,
  caml--editor--live-preview, caml--editor--new-article,
  caml--hero--with-stats, caml--pills--with-status,
  caml--prose--pullquote, caml--tabs--compliance-active,
  caml--tabs--risk-active, caml--timeline--with-legend

https://claude.ai/code/session_01VA1iypRHHhLPU48aGaygHA
@claude
Copy link

claude bot commented Mar 24, 2026

PR Review: CAML Interactive Article System

Overall this is a well-structured, ambitious feature with good test coverage and clean architecture (pure parser function, typed IR, renderer separation). The backend integration is minimal and targeted. Below are the issues to address before merging.


BUGS AND CORRECTNESS

  1. Deprecated unescape() in base64 encoding (CamlArticleEditor.tsx) -- The pattern btoa(unescape(encodeURIComponent(content))) uses the deprecated unescape() function which is removed in strict mode environments. Non-ASCII characters will produce garbled output. Use TextEncoder: const bytes = new TextEncoder().encode(content); const base64Content = btoa(Array.from(bytes, b => String.fromCharCode(b)).join(''));

  2. Unstable random IDs break React reconciliation (tokenizer.ts) -- Math.random() IDs are generated during parseCaml(), which runs inside useMemo on every editor keystroke. Chapter ids change on every re-parse, causing React to unmount/remount all CamlChapterRenderer instances and destroying tab selection state. Use positional IDs (e.g. chapter-N) which are stable across re-parses.

  3. Unclosed fence silently drops content (tokenizer.ts) -- A fence without a closing ::: causes all content from that point to EOF to be silently dropped. The flushProse() at the end of tokenizeFences() only fires for prose, not for an open fence. Add a fallback that treats unclosed fences as prose to prevent data loss.

  4. fetch() for txtExtractFile may fail on private corpora (CorpusArticleView.tsx, CamlArticleEditor.tsx) -- Both components fetch article content via bare fetch(articleDoc.txtExtractFile). If the storage backend serves files through Django (not a CDN with signed URLs), cross-origin requests will not include session cookies and private corpus articles will return 403. Either serve CAML content through a GraphQL field, ensure txtExtractFile URLs are publicly accessible, or add credentials: include to the fetch options. Additionally, if fetch fails in CamlArticleEditor the template is silently loaded, risking overwriting the existing article -- needs an explicit error state.

  5. parseYamlValue type coercion defeats type safety (tokenizer.ts) -- The return type declares string | number | boolean but every non-string branch casts through as unknown as string. Remove the casts and let TypeScript enforce the union type properly.


SECURITY

  1. CTA href accepts javascript: URLs (CamlBlocks.tsx) -- The CTA block renders author-provided href values directly into anchor elements. The check item.href.startsWith('http') only controls target=blank -- a javascript: URL renders as a plain anchor without rel guards and executes on click. CAML articles are authored by corpus admins so the attack surface is limited, but URL validation should be added in the parser or renderer.

ARCHITECTURE

  1. TEXT_MIMETYPES defined in two places (DRY violation) -- Both opencontractserver/documents/versioning.py and opencontractserver/corpuses/models.py define TEXT_MIMETYPES locally. This PR correctly updates both, but the duplication will bite again on the next MIME type addition. One should import from the other. Pre-existing issue but this PR is the right time to fix it.

  2. Fragile modal CSS overrides (CamlArticleEditor.tsx) -- StyledModalWrapper uses class-name substring selectors with !important to override modal dimensions. These silently break if @os-legal/ui renames its internal CSS classes. The Modal component should expose a size or className prop instead.

  3. Extra network request on every corpus landing view -- CorpusLandingView now always fires GET_CORPUS_ARTICLE even for corpora that will never have a Readme.CAML. Apollo caches this (one request per corpus per session) but it adds a query to an already query-heavy page. Consider a skip condition or a corpus-level metadata flag.

  4. window.confirm() for unsaved changes (CamlArticleEditor.tsx) -- window.confirm() is blocked in some browser iframe contexts and is inconsistent with the app UX. Replace with the modal confirmation pattern used elsewhere.


TEST COVERAGE

  1. No backend tests for new parser and mutation changes -- Per CLAUDE.md, backend changes need tests. Currently untested: MarkdownParser._parse_document_impl (happy path, missing txt_extract_file, invalid file); extension detection in document_mutations.py (.caml, .md, .markdown, mixed-case .CAML, no extension); new title filter on DocumentFilter.

MINOR ISSUES

  1. V2 spec file should not ship in this PR -- docs/superpowers/specs/2026-03-24-caml-v2-spec.md describes explicitly deferred features. Shipping forward-looking design docs alongside v1 creates confusion about what is implemented vs. planned. Move to a GitHub issue or a separate draft PR.

  2. Magic numbers in styles.ts -- Hex colors appear 30+ times across the styles file. Per CLAUDE.md: no magic numbers, use constants files. Reference OS_LEGAL_COLORS tokens or extract to local constants at the top of the file.

  3. CamlTimelineItem.side has an implicit undocumented contract -- side is typed as string but used as a legend color lookup key -- its value must match a legend.label exactly (case-insensitively). Document this on the type or enforce it in the parser.

  4. Array index keys throughout renderer -- key={i} is used throughout block renderers. A composite key like block.type+index is safer if items can ever be reordered.

@claude
Copy link

claude bot commented Mar 24, 2026

Deleting test comment

@claude
Copy link

claude bot commented Mar 24, 2026

Deleting test

@claude
Copy link

claude bot commented Mar 24, 2026

deleting test

@claude
Copy link

claude bot commented Mar 24, 2026

deleting

1 similar comment
@claude
Copy link

claude bot commented Mar 24, 2026

deleting

@claude
Copy link

claude bot commented Mar 24, 2026

PR Review: CAML Interactive Article System

This is a well-structured, feature-rich PR. The architecture is clean - pure parser function, typed IR, composable renderer - and the test coverage for a feature of this size is solid. The security guard (isSafeHref) is a nice touch. A few issues worth addressing before merge:

Critical: Apollo cache keyArgs mismatch

Files: frontend/src/graphql/cache.ts, frontend/tests/CamlArticleEditorTestWrapper.tsx, frontend/tests/CorpusArticleViewTestWrapper.tsx

The main cache.ts adds title to the documents keyArgs, but GET_CORPUS_ARTICLE filters by both inCorpusWithId AND title. Since inCorpusWithId is not in the production keyArgs, two different corpuses querying for Readme.CAML will share the same cache bucket - the second corpus can receive the first corpus article result. This is the exact bug class described in CLAUDE.md pitfall 15.

The test wrappers already use inCorpusWithId and title as keyArgs, which highlights the divergence from production. Either add inCorpusWithId to the main documents keyArgs, or give this query its own field policy.

Important: Unconditional query on every corpus landing page

File: frontend/src/components/corpuses/CorpusHome/CorpusLandingView.tsx (line ~132)

GET_CORPUS_ARTICLE fires with no skip guard on every landing page render, adding a round-trip for all corpus users regardless of whether an article exists. Consider lazy-loading or at minimum adding a skip guard. The cache issue above amplifies this.

Important: Magic colors violate project conventions

Files: CamlArticleEditor.tsx, CorpusArticleView.tsx

Both files use hardcoded hex literals (#e2e8f0, #fafbfc, #f8fafc, #64748b, #94a3b8, #475569) instead of OS_LEGAL_COLORS tokens. Per CLAUDE.md, all hardcoded values should use constants. The renderer's styles.ts demonstrates the correct pattern throughout.

Bug: ConfirmModal noAction is a no-op

File: frontend/src/components/corpuses/CamlArticleEditor.tsx (line ~397)

The noAction prop is an empty function. If ConfirmModal calls noAction when the user clicks Keep editing, the confirmation dialog stays open permanently. Check whether the cancel button triggers noAction or toggleModal - if the former, this should be setShowCloseConfirm(false). Other usages of ConfirmModal in the codebase do not use an empty no-op here.

Minor: Duplicated external-link detection

Files: frontend/src/caml/renderer/CamlBlocks.tsx (line ~334), frontend/src/caml/renderer/CamlFooter.tsx (line ~25)

The href.startsWith("http") check for target="_blank" is duplicated verbatim in both files. Consider extracting an isExternalHref helper into safeHref.ts alongside isSafeHref.

Minor: Redundant articleStats useMemo

File: frontend/src/components/corpuses/CorpusHome/CorpusArticleView.tsx (lines ~390-402)

The memo mirrors stats with no transformation. Pass stats directly to CamlArticle and remove the intermediate memo.

Missing: CHANGELOG.md update

Per CLAUDE.md conventions, significant new features require a changelog entry. This PR introduces a new MIME type, parser, document convention, and ~4400 lines of frontend.

Observation: No backend tests for MarkdownParser or MIME detection

The MarkdownParser and text/markdown detection in document_mutations.py have no test coverage. A minimal test verifying that .caml/.md uploads bypass NLP and correctly populate txt_extract_file would protect against regressions.

Positive notes

  • Two-pass parser architecture (tokenize then type-dispatch) is clean and easy to extend for v2.
  • safeHref.ts XSS guard is correct and well-documented.
  • Moving TEXT_MIMETYPES to opencontractserver/constants/document_processing.py as a single source of truth is a good DRY improvement.
  • Test wrappers correctly use separate InMemoryCache instances (per CLAUDE.md pitfall 8).
  • The v2 spec document is a useful forward reference for reviewers.

The cache keyArgs issue is the only one that could cause silent data correctness bugs in production (wrong corpus article served from cache). Everything else is polish.

@codecov
Copy link

codecov bot commented Mar 24, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

JSv4 added 7 commits March 23, 2026 22:53
… fixes

- Fix XSS vulnerability: add isSafeHref guard for CTA and footer hrefs,
  rejecting javascript: and other dangerous URL protocols
- Fix Apollo cache keyArgs: add 'title' to documents relayStylePagination
  to prevent GET_CORPUS_ARTICLE from sharing cache with unrelated queries
- Replace deprecated unescape() with TextEncoder in CamlArticleEditor
- Use deterministic positional IDs for chapters instead of Math.random(),
  preventing unnecessary React unmount/remount on every re-parse
- Handle unclosed fences gracefully by flushing as prose instead of
  silently dropping content
- Remove unnecessary type casts in parseYamlValue
- Consolidate TEXT_MIMETYPES into constants/document_processing.py
  (single source of truth, imported by versioning.py and models.py)
- Add explicit UTF-8 decoding in MarkdownParser for non-ASCII content
- Replace 30+ hardcoded hex colors in CAML styles.ts with
  OS_LEGAL_COLORS/OS_LEGAL_TYPOGRAPHY design tokens
…bled signup

- Allow fragment (#) URLs in isSafeHref guard for in-page anchor links
- Replace window.confirm with ConfirmModal component for unsaved changes
  dialog in CamlArticleEditor, matching project conventions
- Mark SignupBlock button as disabled with proper styling since the
  signup action is not yet implemented
…ternal href check

- Replace all hardcoded hex color literals in CamlArticleEditor.tsx and
  CorpusArticleView.tsx with OS_LEGAL_COLORS design tokens
- Fix YAML frontmatter parser bug where nested keys (e.g., hero.kicker)
  were silently dropped because content used trimEnd() instead of full
  trim, causing the key-value regex to fail on indented lines
- Extract isExternalHref() helper into safeHref.ts and use it in
  CamlBlocks.tsx and CamlFooter.tsx (DRY)
- Remove redundant articleStats useMemo in CorpusArticleView.tsx
- Add JSDoc to CamlTimelineItem.side documenting legend lookup contract
- Update CHANGELOG.md with CAML feature summary
- 23 tests for parseCaml: frontmatter extraction, chapter parsing,
  all block types (cards, pills, tabs, timeline, CTA, signup,
  corpus-stats), and edge cases (empty input, unclosed fences,
  unknown block types, whitespace-only body)
- 11 tests for safeHref: isSafeHref (XSS prevention for javascript:,
  data:, vbscript: protocols) and isExternalHref helper
Cover all branches: successful parse, missing txt file, bytes vs
string from storage, and None description fallback.
@claude
Copy link

claude bot commented Mar 25, 2026

test comment please ignore

@claude
Copy link

claude bot commented Mar 25, 2026

PR review test - multiline
line 2 here

@claude
Copy link

claude bot commented Mar 25, 2026

line 1
line 2

@claude
Copy link

claude bot commented Mar 25, 2026

line 1

Heading 2

Heading 3

@claude
Copy link

claude bot commented Mar 25, 2026

7. Magic hex values remain in styles.ts and CorpusArticleView.tsx

Per CLAUDE.md rule 4, hardcoded values must live in constants files. Several slipped through: "#94a3b8" inline in TimelineDot fallback, rgba(0,0,0,0.06) in CardItem hover shadow, rgba(255,255,255,0.95) in ArticleToolbar backdrop, and the inline style prop in CorpusArticleView empty state.

@claude
Copy link

claude bot commented Mar 25, 2026

7. Magic hex values remain in styles.ts and CorpusArticleView.tsx

Per CLAUDE.md rule 4, hardcoded values must live in constants files. Several slipped through: "#94a3b8" inline in TimelineDot fallback.

@claude
Copy link

claude bot commented Mar 25, 2026

test line
"#94a3b8" inline test

@claude
Copy link

claude bot commented Mar 25, 2026

line with backtick: "#94a3b8" inline test
next line has hash: #94a3b8

@claude
Copy link

claude bot commented Mar 25, 2026

Several slipped through: "#94a3b8" inline in TimelineDot fallback, rgba(0,0,0,0.06) in CardItem hover shadow, rgba(255,255,255,0.95) in ArticleToolbar backdrop, and the inline style prop in CorpusArticleView empty state.

@claude
Copy link

claude bot commented Mar 25, 2026

CAML_TEMPLATE in CamlArticleEditor.tsx nests a ::: cards block inside a ::: chapter block.

Security

@claude
Copy link

claude bot commented Mar 25, 2026

CAML_TEMPLATE in CamlArticleEditor.tsx nests a ::: cards block inside a ::: chapter block. The test file documents the known limitation: when blocks appear inside a chapter, the closing fence is consumed.

Security

@claude
Copy link

claude bot commented Mar 25, 2026

the blocks closing fence is consumed as the chapter close.

Security

@claude
Copy link

claude bot commented Mar 25, 2026

line without apostrophe

Security

@claude
Copy link

claude bot commented Mar 25, 2026


PR Review: CAML Interactive Article System

This is a well-scoped feature. The two-pass parser architecture is clean, the safeHref XSS guard is correct, and the test coverage (34 unit tests + Playwright component tests) is solid for a 5K-line addition. Below are issues to address before merging.


Bugs

1. Missing import in corpuses/models.py — potential NameError at runtime

The diff removes the local TEXT_MIMETYPES class attribute from the Corpus model but does not show an import being added from opencontractserver.constants.document_processing. If import_content or any other method in that class references TEXT_MIMETYPES, it will raise a NameError at runtime. Please verify the import is present in the parts of the file not shown in the diff.

2. CAML template ships with a parse failure

CAML_TEMPLATE in CamlArticleEditor.tsx nests a ::: cards block inside a ::: chapter block. The test file documents the known limitation: when blocks appear inside a chapter, the block closing fence is consumed as the chapter close. The default template will silently mangle the cards block on first load, producing a broken live preview. Fix the tokenizer to handle nested same-depth fences, or change the template to use top-level blocks only.

3. Misleading "Parsing article..." state on CAML parse error in CorpusArticleView

When camlContent is non-null but parsedDocument is null (parse threw), the component shows "Parsing article..." indefinitely. Add an explicit error state branch for when camlContent !== null && parsedDocument === null.


Security

4. Protocol-relative URLs pass the isSafeHref guard

The current regex allows //evil.com because it matches any string starting with /. In a browser, protocol-relative URLs inherit the page protocol and navigate to an external domain. isExternalHref will not catch them either (only checks for http), so they skip rel="noopener noreferrer". Fix with a negative lookahead to reject strings starting with // while still allowing single-slash relative paths.


Performance

5. Extra GraphQL round-trip on every corpus landing page load

CorpusLandingView unconditionally fires GET_CORPUS_ARTICLE on mount for every corpus, including those that will never have a CAML article. This adds a network round-trip to the most-visited page. Consider adding a boolean hasArticle field to the existing corpus details query (computed server-side), or fetching lazily only when the user navigates toward article-related UI.

6. CAML file content not cached across in-session navigations

Every visit to ?view=article re-fetches the .caml file via fetch(). Since the storage URL is stable, a simple useRef guard per txtExtractFile URL would avoid redundant fetches within the same session.


Code Quality

7. Magic hex values remain in styles.ts and CorpusArticleView.tsx

Per CLAUDE.md rule 4, hardcoded values must live in constants files. Several slipped through: "#94a3b8" inline in TimelineDot fallback, rgba(0,0,0,0.06) in CardItem hover shadow, rgba(255,255,255,0.95) in ArticleToolbar backdrop, and the inline style prop in CorpusArticleView empty state.

8. Hand-rolled YAML parser is a maintenance liability

parseYamlFrontmatter is ~100 lines and already had one bug fixed during development (trimEnd() vs trimEnd().trimStart()). It silently drops literal block scalars (|), anchors, and aliases. js-yaml is 11 KB minified+gzipped and would eliminate this surface area entirely. If zero extra dependencies is a hard constraint, the parser needs a clearly documented scope and a much more exhaustive edge-case test suite.

9. !important CSS overrides in CamlArticleEditor are brittle

StyledModalWrapper overrides internal class names of @os-legal/ui Modal with !important. This will break silently on any library update. Use the modal documented size props instead.

10. side field on CamlTimelineItem is a confusing name

The field stores a legend category ("executed", "amended"), not a left/right position. category or legendKey would be clearer. This is a new type with no existing consumers, so now is the cheapest time to rename.

11. Readme.CAML lookup is case-sensitive with no user guidance

Uploading readme.caml processes correctly (MIME detection is extension-based) but the article never appears in the UI because the query matches title: "Readme.CAML" exactly. Add a normalization step in the upload mutation when the extension is .caml, or prominently document this constraint.


Minor

  • Index-based key={i} throughout all block renderers. For chapter blocks, chapter.id is already available and should be used as the key.
  • MarkdownParser returns None on missing txt file; this may leave the document permanently unprocessed. Returning a minimal valid OpenContractDocExport would be safer.
  • The v2 spec file (docs/superpowers/specs/2026-03-24-caml-v2-spec.md) would be more discoverable under docs/architecture/ or docs/roadmap/.

Summary

Category Count
Bugs 3
Security 1
Performance 2
Code quality 5
Minor 3

Most critical before merge: 1 (potential NameError in models.py), 2 (broken default editor template), 4 (protocol-relative URL bypass in safeHref), 5 (extra query on every corpus landing page). The rest can be addressed in follow-up PRs.

JSv4 added 12 commits March 25, 2026 00:40
Replace the in-tree frontend/src/caml/ parser and renderer with the
standalone @os-legal/caml (parser) and @os-legal/caml-react (renderer)
npm packages, linked from ../../os-legal-caml/.

- Add link dependencies and resolution for @os-legal/caml
- Update CamlArticleEditor.tsx and CorpusArticleView.tsx imports
- Wrap CamlArticle in CamlThemeProvider, pass renderMarkdown prop
- Delete the old frontend/src/caml/ directory (15 files)
The caml-react npm package declares workspace:* as a dep on @os-legal/caml
(a packaging artifact), so add a resolutions override to point it at the
published 0.0.1. Also add defaultCamlTheme to ThemeProvider's merged theme
to satisfy the DefaultTheme augmentation now visible from the npm types.
Pass totalDocs, totalAnnotations, and totalThreads from CorpusHome down
into CorpusArticleView so the CAML article renderer receives live stats.
Also add totalThreads to the CorpusHomeProps and CorpusDashboardProps
stats types to match the full shape returned by GET_CORPUS_STATS.
Adds jurisdiction map (Chapter 4) and case-tracker (Chapter 5) chapters
to SAMPLE_CAML_DOCUMENT, plus Playwright tests with docScreenshot calls
for both new block types.
@JSv4 JSv4 changed the title Add CAML interactive article system for corpus storytelling Add CAML interactive article system with published npm packages Mar 27, 2026
@claude
Copy link

claude bot commented Mar 27, 2026

PR Review: CAML Interactive Article System

This is a substantial, well-architected feature. The two-pass parser design, centralization of TEXT_MIMETYPES, no-op parser strategy, and npm extraction plan are all solid decisions. Below are the issues worth addressing before merge, organized by severity.


Bugs / Correctness

1. CorpusLandingView fires GET_CORPUS_ARTICLE without a skip guard

CorpusLandingView.tsx runs the article existence query unconditionally on every corpus landing page, even when onViewArticle is not provided. This adds an extra GraphQL round-trip on every corpus view for the lifetime of the app. Add a skip option:

const { data: articleData } = useQuery<GetCorpusArticleOutput, GetCorpusArticleInput>(
  GET_CORPUS_ARTICLE,
  {
    variables: articleVars,
    skip: !corpus.id,  // corpus always has an id here, but guards against blank renders
  }
);

More importantly, consider whether it's worth polling for article existence on every corpus landing page. If Readme.CAML is rare, this is wasted network traffic. An alternative is to only query when the user is an editor/admin, or to expose hasReadmeCaml as a field on CorpusType from the backend.

2. caml--article-view--empty-state.png and caml--article-view--toolbar.png are identical

Both binary files resolve to the same blob hash (0409d2207), which means the toolbar screenshot isn't capturing a distinct visual state. The docScreenshot for the toolbar state needs to be called after the toolbar becomes visible/distinct, or the test should be fixed to actually produce different output.

3. MarkdownParser returns "page_count": 1 for all markdown files

This is hardcoded in _parse_document_impl. While "1" is a reasonable default for the UI, it could cause subtle issues if any downstream code uses page_count for pagination or range checks on .caml files with many sections. Worth noting in a comment that this is intentional.


Security

4. CorpusArticleView fetches txtExtractFile directly via browser fetch

The component does fetch(articleDoc.txtExtractFile) to load CAML content. If the storage backend returns signed/temporary URLs this is fine, but if the URL can be influenced by user-controlled data from the GraphQL response and the URL is not validated before fetching, this could be a path for SSRF-adjacent issues in unusual configurations. At minimum, validate the URL is a relative path or from an expected origin before fetching:

// Guard: only fetch from same origin or known storage domain
const url = new URL(articleDoc.txtExtractFile, window.location.origin);
if (url.origin !== window.location.origin && !ALLOWED_STORAGE_ORIGINS.has(url.origin)) {
  throw new Error("Unexpected article URL origin");
}

Or use the existing safeHref utility if it covers this case.

5. MIME detection is extension-only, not content-based

document_mutations.py detects .caml/.md files purely by extension, inside the is_plaintext_content() branch. This is already partially safe (binary files fail the plaintext check), but a file named evil.md with a crafted plaintext payload would get text/markdown MIME and skip NLP processing. Document this assumption explicitly in a comment so future reviewers understand the security model.


Code Quality

6. Fragile CSS glob selectors in CamlArticleEditor

[class*="modal-content"],
[class*="ModalContent"],
[role="dialog"] > div {
  width: 95vw !important;
  ...
}

These [class*=...] selectors are brittle and will silently break if the @os-legal/ui modal's internal class names change. The !important overrides also suppress any future intentional styling. Pass sizing props to the Modal component directly, or add a fullscreen / size="xl" prop to Modal upstream.

7. Plan/spec files committed to main branch

docs/superpowers/plans/2026-03-25-caml-npm-extraction.md (1279 lines) and docs/superpowers/specs/ are detailed implementation planning documents. These are good to have somewhere, but shipping them in the main branch of the production repo creates maintenance burden (they'll go stale immediately after the npm extraction is done). Consider moving them to a wiki, a separate planning repo, or a docs/architecture/ directory with a clear "this is historical context" header.

8. Missing backend tests for the new filter and MIME detection

The title filter added to DocumentFilter and the .md/.caml MIME detection in document_mutations.py have no backend test coverage. Given the project has a test suite, add at minimum:

  • A test that uploading a .md file assigns text/markdown MIME
  • A test that the title filter returns the correct document and doesn't leak cross-user results

9. CorpusHome props interface: stats was already present

The diff adds totalThreads: number to the stats interface — good. But the stats prop itself doesn't appear to be declared in the CorpusHomeProps interface addition in the diff (only in CorpusQueryView). Verify stats is properly typed in CorpusHomeProps and not implicitly any.


Minor / Nits

  • GET_CORPUS_ARTICLE exposes creator.email: This field is returned in the query response but appears unused in the renderer. Drop it unless it's needed (reduces data transfer and avoids leaking user emails to the client unnecessarily).
  • application/txt in TEXT_MIMETYPES: The constant includes "application/txt" which is non-standard (the correct MIME is "text/plain"). This was in the old code too, but now that it's centralized, it's worth auditing whether any file actually arrives with this MIME type and removing it if not.
  • parseCaml called in CorpusLandingView?: The landing view imports CAML_ARTICLE_FILENAME from constants — good. Confirm this constant matches exactly what MarkdownParser stores as the document title (i.e., the filename becomes the document title via existing upload logic).

Summary

Category Count
Bugs/Correctness 3
Security 2
Code quality 4
Minor/nits 3

The core feature is solid. The main asks are: fix the duplicate screenshot, add a skip guard on the article existence query (or move detection server-side), replace the CSS glob selectors with proper modal props, and add backend tests for the filter/MIME changes.

@claude
Copy link

claude bot commented Mar 27, 2026

test comment

@claude
Copy link

claude bot commented Mar 27, 2026

placeholder for full review

@claude
Copy link

claude bot commented Mar 27, 2026

PR Review: CAML Interactive Article System - see full review in thread

@claude
Copy link

claude bot commented Mar 27, 2026

PR Review: CAML Interactive Article System

Overall this is a well-structured feature with good test coverage and clean separation of concerns. The backend additions are minimal and correct. A few issues worth addressing before merge.

Bugs / Issues

1. Triple-fetching GET_CORPUS_ARTICLE on landing page

When a corpus has an article, the landing page fires GET_CORPUS_ARTICLE three times in parallel: CorpusHome.tsx (hasArticle detection), CorpusLandingView.tsx (show/hide Read article button), and CorpusArticleView.tsx (actual content fetch). Apollo deduplicates concurrent identical queries, but the first two are structurally redundant. Since CorpusHome already has hasArticle and controls whether onViewArticle is passed down, the check in CorpusLandingView can be removed. Passing onViewArticle only when hasArticle is true is sufficient to hide/show the button without an extra query.

2. Missing AbortController in fetch useEffects

Both CorpusArticleView.tsx and CamlArticleEditor.tsx fetch from articleDoc.txtExtractFile without an abort controller. If the component unmounts mid-fetch, the state setter fires on an unmounted component. React 18 suppresses the warning but the callback still executes. Both effects should return a cleanup function calling controller.abort().

3. Silent fetch failure in CamlArticleEditor

When fetching existing article content fails, the editor silently falls back to the blank CAML_TEMPLATE with only a console.error. The existing article is replaced by the template with no explanation. A toast.error() should surface this.

@claude
Copy link

claude bot commented Mar 27, 2026

Code Quality

4. Hardcoded rgba values

Per CLAUDE.md, hardcoded hex/rgba values should use semantic tokens from osLegalStyles.ts. Two places introduce raw rgba: CorpusHome.tsx FloatingControls uses rgba(255, 255, 255, 0.95) and rgba(0, 0, 0, 0.12), and CorpusArticleView.tsx ArticleToolbar uses rgba(255, 255, 255, 0.95). These should be added to OS_LEGAL_COLORS (e.g. glassSurface, shadowMd) rather than inlined.

5. Plan docs committed to repo

docs/superpowers/plans/2026-03-25-caml-npm-extraction.md (1279 lines) and docs/superpowers/plans/2026-03-26-caml-npm-migration.md are agentic implementation plans for extracting CAML into a separate npm repo. These are internal work artifacts rather than user-facing documentation. Are these intentionally checked in? A GitHub issue would track the future work with less repo bloat.

Backend

6. Silent None return on missing txt_extract_file

MarkdownParser._parse_document_impl logs at ERROR level and returns None when txt_extract_file is missing, leaving the document with no content in the export dict. A more descriptive message including the document ID and expected file path would make these failures faster to debug.

Minor

  • CorpusArticleView.tsx renders a persistent Parsing article... state when parsedDocument is null but articleDoc exists (e.g. parse failure). This spins forever and should fall through to an error/empty state instead.
  • The backend test calls _parse_document_impl directly which is fine for unit testing, but a smoke test through the public parse_document entrypoint would catch any FileTypeEnum.MD lookup/dispatch issues.
  • Adding title to keyArgs for documents in cache.ts is correct and necessary for the filtered query to cache independently.

Overall the architecture is sound, the npm extraction design is clean, and the 19 screenshot tests are a good addition. The hardcoded colors, triple-fetch, and silent failure in the editor are the items worth addressing before merge.

@claude
Copy link

claude bot commented Mar 27, 2026

Code Review

This is a substantial, well-structured PR. The overall architecture is sound: using npm packages for the CAML parser/renderer, centralizing TEXT_MIMETYPES, and wiring the article view through the existing routing system are all good choices. Below are findings roughly ordered by severity.


Bugs / Potential Breakage

1. corpuses/models.py removes TEXT_MIMETYPES without adding an import

The diff removes the local TEXT_MIMETYPES class attribute from models.py but adds no corresponding import from opencontractserver.constants.document_processing. If import_content references TEXT_MIMETYPES (which it likely does for deciding whether to write content to txt_extract_file), this will raise a NameError at runtime. Please verify — and if it's used, add:

from opencontractserver.constants.document_processing import TEXT_MIMETYPES

2. MarkdownParser depends on txt_extract_file already existing

oc_markdown_parser.py reads from document.txt_extract_file.name and returns None if it's absent. This is only safe if the versioning/upload pipeline writes the file to txt_extract_file before dispatching the parser. Confirm that text/markdown documents are handled in the same pre-parser path as text/plain (i.e., in versioning.pyTEXT_MIMETYPES inclusion there looks correct). A comment in _parse_document_impl explaining this precondition would help future maintainers.

3. CorpusHome now requires stats.totalThreads but this wasn't previously in the prop type

The CorpusHomeProps.stats type now includes totalThreads: number. Any call site that passes a stats object without that key will fail TypeScript compilation. Please verify all call sites in Corpuses.tsx and any other parent components pass totalThreads.


Architecture / Design Concerns

4. Duplicate GET_CORPUS_ARTICLE queries

When a corpus has an article and the landing view is rendered, both CorpusHome.tsx and CorpusLandingView.tsx independently issue GET_CORPUS_ARTICLE with identical variables. Apollo's cache deduplicates in-flight requests, but the intent is unclear and it adds cognitive overhead. Consider lifting the query to CorpusHome and passing the result down as a prop (or using the cached result via cache-only policy in CorpusLandingView).

5. CSS !important overrides with class-name pattern matching in CamlArticleEditor

[class*="modal-content"],
[class*="ModalContent"],
[role="dialog"] > div {
  width: 95vw !important;
  ...

This is fragile: it will silently break or mis-apply if @os-legal/ui's modal class names change. If the Modal component exposes a className or style prop, prefer that. If not, consider filing a feature request upstream and using a narrower selector in the interim.

6. txtExtractFile URL for private corpora

GET_CORPUS_ARTICLE returns txtExtractFile — which is a storage URL. If the corpus is private and the storage backend uses authenticated or signed URLs, the frontend needs to ensure those URLs are accessed with proper credentials. If they're presigned S3 URLs they expire; if they're plain paths they may be accessible unauthenticated. This is worth a comment or a review of how txtExtractFile is consumed in CorpusArticleView (a fetch() call to that URL).


Code Quality

7. Planning docs committed to the repo

docs/superpowers/plans/2026-03-25-caml-npm-extraction.md (1,279 lines) and docs/superpowers/plans/2026-03-26-caml-npm-migration.md are agentic implementation plans. Committing them makes sense if they're intended as developer documentation, but their format is primarily machine-directed (> For agentic workers: REQUIRED SUB-SKILL...). Consider whether these belong in the repo at all, or at minimum, whether the agentic-worker boilerplate should be stripped for the committed version.

8. title added to documents cache keyArgs is a broad change

// cache.ts
"title",

Adding title to keyArgs for the documents relay pagination means any existing query that previously omitted title now gets a separate cache bucket. This is correct for GET_CORPUS_ARTICLE, but if any component that queries documents without a title filter already has cached results, those results will be invalidated on next render. This is a low-risk-but-non-zero cache-thrashing concern worth testing manually.

9. MarkdownParser uses f-string logging with user-controlled data

logger.error(f"No txt file found for document {doc_id}")

doc_id here is an integer from the pipeline, not user input, so there's no injection risk — but consider logger.error("No txt file found for document %s", doc_id) to follow standard %-style logging (which defers string formatting until the message is actually emitted).

10. CAML_ARTICLE_FILENAME = "Readme.CAML" is title-matched against documents

The GET_CORPUS_ARTICLE query uses title: $title with title_Contains NOT enabled — it uses exact match. This is correct and intentional, but the filter was added to DocumentFilter as ["exact", "contains"]. The contains variant exposes a new case-sensitive substring search on document titles. Ensure this doesn't open any unintended information disclosure for private documents visible via the documents query.


Testing

11. Backend tests checkbox is unchecked in the test plan

  • Backend tests pass (docker compose -f test.yml run django pytest -n 4 --dist loadscope)

The new test_markdown_parser.py is well-written and covers the key cases (happy path, missing file, bytes vs str from storage). Please confirm the suite passes before merge — especially since TEXT_MIMETYPES consolidation touches the import path.

12. No test for the MIME detection logic in document_mutations.py

The extension-based text/markdown detection (.caml, .md, .markdown) is new logic that isn't covered by the backend test suite. A test verifying that uploading a file named Readme.CAML sets file_type = "text/markdown" would close this gap.


Minor

  • MarkdownParser log at logger.error level when txt_extract_file.name is falsy feels too loud — this could also happen on a legitimate edge case like a draft document. logger.warning may be more appropriate.
  • The resolution pinning @os-legal/caml: "^0.0.1" in package.json is unusual for a direct dependency — resolutions are typically used for transitive conflicts. Worth confirming this is intentional.
  • caml--article-view--empty-state.png and caml--article-view--toolbar.png appear to be identical binary files (same SHA 0409d2207). If the toolbar screenshot is supposed to show a different state, the test may not be reaching it.

Overall this is a well-executed feature that follows existing patterns (routing system, constants centralization, test wrappers). The items above are worth addressing before merge, with items 1–3 being the most critical.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants