Skip to content

feat: Measurement Validator — Divergence Classifier & Multi-Language Support (Phase 1 + 2)#3

Draft
Copilot wants to merge 2 commits intomainfrom
copilot/add-divergence-classifier-multi-language-support
Draft

feat: Measurement Validator — Divergence Classifier & Multi-Language Support (Phase 1 + 2)#3
Copilot wants to merge 2 commits intomainfrom
copilot/add-divergence-classifier-multi-language-support

Conversation

Copy link
Copy Markdown

Copilot AI commented Apr 4, 2026

Adds a new src/measurement-validator/ subsystem that compares Pretext canvas-based line measurements against DOM Range-API measurements, classifies root causes of divergences, and validates across 5 language groups.

Core modules

  • types.tsMeasurementSample, MeasurementResult, DivergenceAnalysis, FixtureSample, TestSuiteReport, tolerance config
  • dom-adapter.ts — Range-API DOM adapter; extracts per-line text + widths without synthetic reflow
  • comparator.ts — Runs layoutWithLines() against DOM metrics; assigns pass/minor/major/critical per line
  • report-generator.ts — JSON + console formatters for single results and suite aggregates
  • classifier.ts — Priority-ordered root-cause detector; both async (with DOM font-fallback check) and sync variants
  • test-suite.ts — Multi-language runner with per-LanguageGroup stats aggregation
  • index.ts — Public API surface

Classifier detection chain

Strategies fire in priority order; first match wins:

Priority rootCause Mechanism Confidence
1 font_fallback Re-measure with serif; compare totals 0.90
2 bidi_shaping RTL Unicode range check 0.85
3 emoji_rendering \p{Emoji_Presentation} 0.75
4 browser_quirk system-ui / variable font / Safari UA 0.60
5 unknown Fallback 0.30
const result = comparator.compare({ text: 'مرحباً', font: '16px Arial', maxWidth: 300, lineHeight: 20 })
const analysis = await classifyDivergence(result, adapter)
// { rootCause: 'bidi_shaping', confidence: 0.85, recommendation: 'RTL text detected...' }

Test fixtures (46 samples)

  • english-samples.json — LTR (EN/ES/FR/DE)
  • rtl-samples.json — Arabic, Hebrew, Urdu
  • cjk-samples.json — Chinese, Japanese, Korean (incl. keep-all mode)
  • complex-script-samples.json — Thai, Myanmar, Khmer
  • mixed-bidi-samples.json — Mixed RTL+LTR

Tests & docs

  • test/measurement-validator.test.ts — comparator + report-generator unit tests (fake DOM adapter, no browser required)
  • test/classifier.test.ts — per-strategy unit tests, priority ordering, output shape validation
  • docs/measurement-validator.md — API reference
  • docs/classifier-guide.md — per-cause examples and confidence interpretation
  • docs/language-matrix.md — known divergences and browser compat per language group
Original prompt

Phase 2: Divergence Classifier & Multi-Language Support

OBJECTIVE

Implement intelligent root cause detection for measurement divergences and add support for multiple languages. This builds on Phase 1's foundation.

WHAT WE'RE BUILDING

Core Components

1. Divergence Classifier (src/measurement-validator/classifier.ts)

Identifies WHY measurements diverge between Pretext and DOM.

Detection Strategies:

├─ Font Fallback Detection
  ├─ Measure with specified font
  ├─ Measure with system fallback (serif/sans-serif)
  └─ Compare to detect if font loaded

├─ Bidi Text Detection
  ├─ Check for RTL characters (Arabic, Hebrew, etc.)
  ├─ Verify segLevels available in Pretext
  └─ Flag if visual order mismatch detected

├─ Emoji Detection
  ├─ Check for emoji codepoints
  ├─ Measure emoji vs text separately
  └─ Note browser-specific rendering

├─ Browser-Specific Quirks
  ├─ Safari kerning differences
  ├─ Chrome vs Firefox rendering
  └─ OS-specific font rendering (macOS vs Windows)

└─ Variable Font Detection
   ├─ Check for font-variation-settings
   └─ Alert if not supported by canvas

Output:

interface DivergenceAnalysis {
  detected: boolean
  severity: 'minor' | 'major' | 'critical'
  rootCause?: 
    | 'font_fallback'
    | 'bidi_shaping'
    | 'emoji_rendering'
    | 'browser_quirk'
    | 'variable_font'
    | 'unknown'
  confidence: number // 0-1
  recommendation: string
  details: Record<string, any>
}

2. Multi-Language Support

Expand from English-only to 6+ language groups:

├─ LTR Simple (English, Spanish, French)
│  └─ Existing tests ✓
│
├─ RTL Languages (Arabic, Hebrew, Urdu)
│  ├─ New: test-fixtures-rtl.json
│  ├─ New: bidi detection in classifier
│  └─ New: segLevel validation
│
├─ CJK Languages (Chinese, Japanese, Korean)
│  ├─ New: test-fixtures-cjk.json
│  ├─ Handle word-break: keep-all
│  └─ Test line-breaking differences
│
├─ Complex Scripts (Thai, Myanmar, Khmer)
│  ├─ New: test-fixtures-complex.json
│  ├─ Cluster-based measurement
│  └─ Browser rendering differences
│
└─ Mixed Bidi (English + Arabic in same text)
   ├─ New: test-fixtures-mixed.json
   └─ Validate visual order handling

3. Enhanced Test Suite (src/measurement-validator/test-suite.ts)

  • Run corpus validation across language groups
  • Aggregate statistics per language
  • Generate cross-language reports
  • Performance tracking per language

4. Multi-Language Fixtures

test/fixtures/
├─ english-samples.json       (Phase 1 ✓)
├─ rtl-samples.json           (Phase 2 NEW)
├─ cjk-samples.json           (Phase 2 NEW)
├─ complex-script-samples.json (Phase 2 NEW)
└─ mixed-bidi-samples.json    (Phase 2 NEW)

5. Enhanced Documentation

  • Language support matrix
  • Known divergences per language
  • Workarounds and recommendations
  • Browser compatibility matrix

Directory Structure

src/measurement-validator/
├── types.ts                  (Phase 1 ✓)
├── dom-adapter.ts            (Phase 1 ✓)
├── comparator.ts             (Phase 1 ✓)
├── report-generator.ts       (Phase 1 ✓)
├── classifier.ts             (Phase 2 NEW)
├── test-suite.ts             (Phase 2 NEW - enhanced)
└── index.ts                  (Phase 1 ✓)

test/
├── measurement-validator.test.ts (Phase 1 ✓)
├── classifier.test.ts            (Phase 2 NEW)
├── fixtures/
│   ├─ english-samples.json       (Phase 1 ✓)
│   ├─ rtl-samples.json           (Phase 2 NEW)
│   ├─ cjk-samples.json           (Phase 2 NEW)
│   ├─ complex-script-samples.json (Phase 2 NEW)
│   └─ mixed-bidi-samples.json    (Phase 2 NEW)

docs/
├─ measurement-validator.md   (Phase 1 ✓)
├─ classifier-guide.md        (Phase 2 NEW)
└─ language-matrix.md         (Phase 2 NEW)

FILES TO CREATE (Phase 2)

1. src/measurement-validator/classifier.ts

Root cause detection with 5 different strategies.

2. src/measurement-validator/test-suite.ts

Enhanced test runner with multi-language support.

3. test/classifier.test.ts

Unit tests for each detection strategy.

4. test/fixtures/rtl-samples.json

Arabic, Hebrew, Urdu test cases.

5. test/fixtures/cjk-samples.json

Chinese, Japanese, Korean test cases.

6. test/fixtures/complex-script-samples.json

Thai, Myanmar, Khmer test cases.

7. test/fixtures/mixed-bidi-samples.json

Mixed RTL/LTR test cases.

8. docs/classifier-guide.md

How to use and interpret classifier results.

9. docs/language-matrix.md

Language support and known issues per language.

IMPLEMENTATION DETAILS

Classifier Algorithm

async function classifyDivergence(
  result: MeasurementResult,
  sample: MeasurementSample
): Promise<DivergenceAnalysis> {
  // 1. Return immediately if no divergence
  if (result.overallSeverity === 'pass') {
    return { detected: false, severity: 'minor' }
  }

  // 2. Check font fallback (highest priority)
  const fontFallback = await detectFontFallback(sample)
  if (fontFallback.detected) {
...

</details>



<!-- START COPILOT CODING AGENT SUFFIX -->

*This pull request was created from Copilot chat.*
>

…language support

Agent-Logs-Url: https://github.com/Himaan1998Y/pretext/sessions/ca665dce-f115-4eb3-87c1-8ca621c70083

Co-authored-by: Himaan1998Y <210527591+Himaan1998Y@users.noreply.github.com>
Copilot AI changed the title [WIP] Implement divergence classifier and multi-language support feat: Measurement Validator — Divergence Classifier & Multi-Language Support (Phase 1 + 2) Apr 4, 2026
Copilot AI requested a review from Himaan1998Y April 4, 2026 20:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants