feat: Measurement Validator — Divergence Classifier & Multi-Language Support (Phase 1 + 2) by Copilot · Pull Request #3 · Himaan1998Y/pretext

Copilot · 2026-04-04T19:56:44Z

Adds a new src/measurement-validator/ subsystem that compares Pretext canvas-based line measurements against DOM Range-API measurements, classifies root causes of divergences, and validates across 5 language groups.

Core modules

types.ts — MeasurementSample, MeasurementResult, DivergenceAnalysis, FixtureSample, TestSuiteReport, tolerance config
dom-adapter.ts — Range-API DOM adapter; extracts per-line text + widths without synthetic reflow
comparator.ts — Runs layoutWithLines() against DOM metrics; assigns pass/minor/major/critical per line
report-generator.ts — JSON + console formatters for single results and suite aggregates
classifier.ts — Priority-ordered root-cause detector; both async (with DOM font-fallback check) and sync variants
test-suite.ts — Multi-language runner with per-LanguageGroup stats aggregation
index.ts — Public API surface

Classifier detection chain

Strategies fire in priority order; first match wins:

Priority	`rootCause`	Mechanism	Confidence
1	`font_fallback`	Re-measure with `serif`; compare totals	0.90
2	`bidi_shaping`	RTL Unicode range check	0.85
3	`emoji_rendering`	`\p{Emoji_Presentation}`	0.75
4	`browser_quirk`	`system-ui` / variable font / Safari UA	0.60
5	`unknown`	Fallback	0.30

const result = comparator.compare({ text: 'مرحباً', font: '16px Arial', maxWidth: 300, lineHeight: 20 })
const analysis = await classifyDivergence(result, adapter)
// { rootCause: 'bidi_shaping', confidence: 0.85, recommendation: 'RTL text detected...' }

Test fixtures (46 samples)

english-samples.json — LTR (EN/ES/FR/DE)
rtl-samples.json — Arabic, Hebrew, Urdu
cjk-samples.json — Chinese, Japanese, Korean (incl. keep-all mode)
complex-script-samples.json — Thai, Myanmar, Khmer
mixed-bidi-samples.json — Mixed RTL+LTR

Tests & docs

test/measurement-validator.test.ts — comparator + report-generator unit tests (fake DOM adapter, no browser required)
test/classifier.test.ts — per-strategy unit tests, priority ordering, output shape validation
docs/measurement-validator.md — API reference
docs/classifier-guide.md — per-cause examples and confidence interpretation
docs/language-matrix.md — known divergences and browser compat per language group

Original prompt

Phase 2: Divergence Classifier & Multi-Language Support

OBJECTIVE

Implement intelligent root cause detection for measurement divergences and add support for multiple languages. This builds on Phase 1's foundation.

WHAT WE'RE BUILDING

Core Components

1. Divergence Classifier (`src/measurement-validator/classifier.ts`)

Identifies WHY measurements diverge between Pretext and DOM.

Detection Strategies:

├─ Font Fallback Detection
│  ├─ Measure with specified font
│  ├─ Measure with system fallback (serif/sans-serif)
│  └─ Compare to detect if font loaded
│
├─ Bidi Text Detection
│  ├─ Check for RTL characters (Arabic, Hebrew, etc.)
│  ├─ Verify segLevels available in Pretext
│  └─ Flag if visual order mismatch detected
│
├─ Emoji Detection
│  ├─ Check for emoji codepoints
│  ├─ Measure emoji vs text separately
│  └─ Note browser-specific rendering
│
├─ Browser-Specific Quirks
│  ├─ Safari kerning differences
│  ├─ Chrome vs Firefox rendering
│  └─ OS-specific font rendering (macOS vs Windows)
│
└─ Variable Font Detection
   ├─ Check for font-variation-settings
   └─ Alert if not supported by canvas

Output:

interface DivergenceAnalysis {
  detected: boolean
  severity: 'minor' | 'major' | 'critical'
  rootCause?: 
    | 'font_fallback'
    | 'bidi_shaping'
    | 'emoji_rendering'
    | 'browser_quirk'
    | 'variable_font'
    | 'unknown'
  confidence: number // 0-1
  recommendation: string
  details: Record<string, any>
}

2. Multi-Language Support

Expand from English-only to 6+ language groups:

├─ LTR Simple (English, Spanish, French)
│  └─ Existing tests ✓
│
├─ RTL Languages (Arabic, Hebrew, Urdu)
│  ├─ New: test-fixtures-rtl.json
│  ├─ New: bidi detection in classifier
│  └─ New: segLevel validation
│
├─ CJK Languages (Chinese, Japanese, Korean)
│  ├─ New: test-fixtures-cjk.json
│  ├─ Handle word-break: keep-all
│  └─ Test line-breaking differences
│
├─ Complex Scripts (Thai, Myanmar, Khmer)
│  ├─ New: test-fixtures-complex.json
│  ├─ Cluster-based measurement
│  └─ Browser rendering differences
│
└─ Mixed Bidi (English + Arabic in same text)
   ├─ New: test-fixtures-mixed.json
   └─ Validate visual order handling

3. Enhanced Test Suite (`src/measurement-validator/test-suite.ts`)

Run corpus validation across language groups
Aggregate statistics per language
Generate cross-language reports
Performance tracking per language

4. Multi-Language Fixtures

test/fixtures/
├─ english-samples.json       (Phase 1 ✓)
├─ rtl-samples.json           (Phase 2 NEW)
├─ cjk-samples.json           (Phase 2 NEW)
├─ complex-script-samples.json (Phase 2 NEW)
└─ mixed-bidi-samples.json    (Phase 2 NEW)

5. Enhanced Documentation

Language support matrix
Known divergences per language
Workarounds and recommendations
Browser compatibility matrix

Directory Structure

src/measurement-validator/
├── types.ts                  (Phase 1 ✓)
├── dom-adapter.ts            (Phase 1 ✓)
├── comparator.ts             (Phase 1 ✓)
├── report-generator.ts       (Phase 1 ✓)
├── classifier.ts             (Phase 2 NEW)
├── test-suite.ts             (Phase 2 NEW - enhanced)
└── index.ts                  (Phase 1 ✓)

test/
├── measurement-validator.test.ts (Phase 1 ✓)
├── classifier.test.ts            (Phase 2 NEW)
├── fixtures/
│   ├─ english-samples.json       (Phase 1 ✓)
│   ├─ rtl-samples.json           (Phase 2 NEW)
│   ├─ cjk-samples.json           (Phase 2 NEW)
│   ├─ complex-script-samples.json (Phase 2 NEW)
│   └─ mixed-bidi-samples.json    (Phase 2 NEW)

docs/
├─ measurement-validator.md   (Phase 1 ✓)
├─ classifier-guide.md        (Phase 2 NEW)
└─ language-matrix.md         (Phase 2 NEW)

FILES TO CREATE (Phase 2)

1. src/measurement-validator/classifier.ts

Root cause detection with 5 different strategies.

2. src/measurement-validator/test-suite.ts

Enhanced test runner with multi-language support.

3. test/classifier.test.ts

Unit tests for each detection strategy.

4. test/fixtures/rtl-samples.json

Arabic, Hebrew, Urdu test cases.

5. test/fixtures/cjk-samples.json

Chinese, Japanese, Korean test cases.

6. test/fixtures/complex-script-samples.json

Thai, Myanmar, Khmer test cases.

7. test/fixtures/mixed-bidi-samples.json

Mixed RTL/LTR test cases.

8. docs/classifier-guide.md

How to use and interpret classifier results.

9. docs/language-matrix.md

Language support and known issues per language.

IMPLEMENTATION DETAILS

Classifier Algorithm

async function classifyDivergence(
  result: MeasurementResult,
  sample: MeasurementSample
): Promise<DivergenceAnalysis> {
  // 1. Return immediately if no divergence
  if (result.overallSeverity === 'pass') {
    return { detected: false, severity: 'minor' }
  }

  // 2. Check font fallback (highest priority)
  const fontFallback = await detectFontFallback(sample)
  if (fontFallback.detected) {
...

</details>



<!-- START COPILOT CODING AGENT SUFFIX -->

*This pull request was created from Copilot chat.*
>

…language support Agent-Logs-Url: https://github.com/Himaan1998Y/pretext/sessions/ca665dce-f115-4eb3-87c1-8ca621c70083 Co-authored-by: Himaan1998Y <210527591+Himaan1998Y@users.noreply.github.com>

Initial plan

c5ee6d0

Copilot AI assigned Copilot and Himaan1998Y Apr 4, 2026

Copilot started work on behalf of Himaan1998Y April 4, 2026 19:56 View session

feat: add measurement validator with divergence classifier and multi-…

e3054bc

…language support Agent-Logs-Url: https://github.com/Himaan1998Y/pretext/sessions/ca665dce-f115-4eb3-87c1-8ca621c70083 Co-authored-by: Himaan1998Y <210527591+Himaan1998Y@users.noreply.github.com>

Copilot AI changed the title ~~[WIP] Implement divergence classifier and multi-language support~~ feat: Measurement Validator — Divergence Classifier & Multi-Language Support (Phase 1 + 2) Apr 4, 2026

Copilot finished work on behalf of Himaan1998Y April 4, 2026 20:10

Copilot AI requested a review from Himaan1998Y April 4, 2026 20:10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Measurement Validator — Divergence Classifier & Multi-Language Support (Phase 1 + 2)#3

feat: Measurement Validator — Divergence Classifier & Multi-Language Support (Phase 1 + 2)#3
Copilot wants to merge 2 commits intomainfrom
copilot/add-divergence-classifier-multi-language-support

Copilot AI commented Apr 4, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Copilot AI commented Apr 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Core modules

Classifier detection chain

Test fixtures (46 samples)

Tests & docs

Phase 2: Divergence Classifier & Multi-Language Support

OBJECTIVE

WHAT WE'RE BUILDING

Core Components

1. Divergence Classifier (src/measurement-validator/classifier.ts)

2. Multi-Language Support

3. Enhanced Test Suite (src/measurement-validator/test-suite.ts)

4. Multi-Language Fixtures

5. Enhanced Documentation

Directory Structure

FILES TO CREATE (Phase 2)

1. src/measurement-validator/classifier.ts

2. src/measurement-validator/test-suite.ts

3. test/classifier.test.ts

4. test/fixtures/rtl-samples.json

5. test/fixtures/cjk-samples.json

6. test/fixtures/complex-script-samples.json

7. test/fixtures/mixed-bidi-samples.json

8. docs/classifier-guide.md

9. docs/language-matrix.md

IMPLEMENTATION DETAILS

Classifier Algorithm

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Copilot AI commented Apr 4, 2026 •

edited

Loading

1. Divergence Classifier (`src/measurement-validator/classifier.ts`)

3. Enhanced Test Suite (`src/measurement-validator/test-suite.ts`)