Semantic processing

Jump to bottom

Colin Greenstreet edited this page Feb 8, 2026 · 4 revisions

Metadata

Author: Claude Opus 4.5 in Claude Ottoman Turkish Project, prompted and edited by Colin Greenstreet | Wiki entry created: Wednesday, February 4th 2026 | Wiki entry modified: Sunday, February 8th 2026

Version: v1.1

Version history:

v1.0 (4 February 2026): Initial draft.
v1.1 (8 February 2026): Minor formatting edits; added collapsing metadata feature.

Validation: This wiki entry requires validation by Ottoman Turkish scholars

Stage two: Semantic processing skill files (V3-T)

These operate at Stage 2 of the two-stage pipeline. They run on Anthropic Claude (either Opus 4.5 or Opus 4.6) at higher-creativity settings (Temperature ~1.0, Thinking High, Top-P 0.95). Their purpose is transliteration, translation, Named Entity Recognition, and contextual analysis. All inherit a common three-layer output structure: diplomatic transliteration → literal English → modernized English.

V3-T (Base Protocol)

File: HTR_V3-T_Transliteration_Translation.md
Name: V3-T v1.0
Date created: December 26, 2025
Created by: Colin Greenstreet
Tested: Takvîm-i Vekâyi Issues 1, 181, and 185 (the first Ottoman government gazette, 1831–1840). Source: Internet Archive. Multiple pages processed including masthead, and right-hand/left-hand columns across all five pages of Issue 1. Also tested on Şanizade Tarihi P9 and Vekâyi-i Devlet-i Aliye Vol. 6 Scan 321.

What the skill file does:
The foundation of the entire V3-T family. Defines the role ("Ottoman Turkish Philologist specializing in early 19th-century official gazette language"), the complete transliteration system (consonants with emphatics, vowels with length, izafet marking, Arabic definite article, sağır nun), and the three-layer translation methodology. Establishes the NER categories (persons, locations, dates, vessels, organizations, religious communities, titles/honorifics) and summary translation requirements. Includes special handling for defective spellings, Arabic quotations, ambiguous readings, and technical terminology. Defines the six-step workflow and quality standards.

Relationship to other skill files:

No parent — this is the root of the V3-T family
Children: V3-T-C, V3-T-Newspaper, V3-T-E-5035-Chronicle-Variant
Grandchildren: V3-T-C-Personal, V3-T-C17 (both via V3-T-C)
What it establishes for descendants: Transliteration tables, three-layer translation structure, NER framework, quality standards. All descendants inherit these and extend for specific genres/periods.

V3-T-C (Correspondence)

File: HTR_V3-T-Correspondence_Transliteration_Translation.md
Name: V3-T-C v1.0
Date created: December 28, 2025
Created by: Colin Greenstreet
Tested: Letter 1904, Pages 1 and 2 — late Ottoman bureaucratic correspondence from İSAM Kütüphanesi Arşivi, Sadâret correspondence. Two analysis JSONs produced (V1 and V2 for each page).

What the skill file does:
Adapts V3-T for late Ottoman bureaucratic correspondence (c. 1850–1922). Adds: role as "Ottoman Turkish Diplomatic Specialist"; document structure analysis using classical diplomatic formulae (invocatio, intitulatio, inscriptio, narratio, dispositio, sanctio, corroboratio, datatio, subscriptio); common formulaic expressions for Sublime Porte correspondence; title and honorific register (Imperial through Religious ranks); expanded NER categories replacing "vessels" with "offices/departments," "document types," and "legal/administrative terms"; handling for Hijri/Rumi/Miladi triple dating; mixed script zones (printed letterhead + handwritten body); rik'a script conventions.

Relationship to other skill files:

Parent: V3-T
Children: V3-T-C-Personal, V3-T-C17
Key adaptation: Shifts from gazette announcement prose to epistolary administrative register; adds document structure analysis as a distinct output section.

V3-T-C-Personal (Personal Correspondence)

File: HTR_V3-T-C-Personal_Transliteration_Translation.md
Name: V3-T-C-Personal v1.0
Date created: December 30, 2025
Created by: Colin Greenstreet
Tested: HHP 1716-1 Page 9 — personal/family letter from the Hüseyin Hilmi Paşa Documents at İSAM Kütüphanesi Arşivi. Analysis JSON V2 produced.

What the skill file does:
Adapts V3-T-C for personal and family correspondence (c. 1850–1922). Major departures from V3-T-C: replaces official document structure with personal epistolary conventions (selâm/du'â, hitâb, hâl sorma, haber, rica, selâm gönderme, hitâm); comprehensive formulaic expression tables organized by function (opening, health inquiry, news/update, closing, signature); kinship and relationship terminology (vâlide/ana, peder/baba, oğul, kız, birâder, hemşîre, etc.); NER categories replace "offices" with "family relationships," "life events," "greetings conveyed," and "material items"; adds "relationship_context" and "historical_context" to output JSON; special handling for informal spellings, emotional language, fragmentary letters, multiple hands, and date estimation.

Relationship to other skill files:

Parent: V3-T-C
Grandparent: V3-T
Key adaptation: Shifts from formal bureaucratic register to intimate/familial register; foregrounds family network reconstruction as an analytical goal.

V3-T-C17 (Classical Ottoman, c. 1500–1700)

File: HTR_V3-T-C17_Transliteration_Translation.md
Name: V3-T-C17 v1.0
Date created: January 28, 2026
Created by: Colin Greenstreet
Tested: Leipzig B.or.290/01 p51 — a 17th-century Dîvân-ı Hümâyûn (Imperial Council) document, probably a berât (appointment patent) from the reign of Sultan Murad IV (c. 1623–1640), held at Leipzig University Library. Analysis JSON and cross-reference report produced. The V3-T-C protocol had earlier been applied ad hoc to this document (Experiment 159) before V3-T-C17 was formally created.

What the skill file does:
Adapts V3-T-C backward in time for classical Ottoman imperial documents (c. 1500–1700). Key differences from V3-T-C: expects nesih, ta'lîk, dîvânî, or siyâkat scripts rather than rik'a; classical high chancery register rather than post-Tanzimat bureaucratic; document types include fermân, berât, hükm, and register entries rather than tezkire and 'arż; higher Arabic/Persian lexical density; Hijri-only dating (pre-Rumi calendar); detailed document type classification system; classical formulaic expressions ("buyurdum ki," "sen ki ... sın," "şöyle bilesin"); confidence scoring system (HIGH/MEDIUM/LOW per line) acknowledging the inherently lower baseline for archaic material; archaic vocabulary reference table; period-specific context notes for Murad IV reign and provincial administration.

Relationship to other skill files:

Parent: V3-T-C
Grandparent: V3-T
Key adaptation: Extends the correspondence framework backward by two centuries; introduces systematic confidence scoring; adds classical diplomatic document typology.
Genesis: Born from the practical need during Experiment 159 when V3-T-C proved inadequate for pre-Tanzimat material.

V3-T-Newspaper v1.0

File: V3-T-Newspaper_Skill_File.md
Name: V3-T-Newspaper v1.0
Date created: January 1, 2026
Created by: Colin Greenstreet
Tested: Peyam newspaper, Page 1, Experiment 156. A Second Constitutional Period newspaper (c. 1908–1918). Source: Internet Archive. Analysis JSON produced.

What the skill file does:
Extends V3-T for Ottoman political journalism of the Second Constitutional Period (1908–1918). Key additions beyond V3-T: punctuation handling (newspapers use Ottoman punctuation marks that the Takvîm-i Vekâyi largely does not); section header transliteration (Siyâsiyât, Dâhilî, Hâricî, etc.); numeral transliteration; foreign word/neologism handling (French borrowings like pârlâmento, Ottoman political neologisms like meşrûṭiyet); expanded NER with "constitutional references" and "political concepts" categories; triple calendar system with detailed conversion requirements and complete month-name tables (Rumi and Hijri); political context reference including key events timeline (1908), major political actors table (Abdülhamid II through Kâmil Paşa), and political party table (CUP, Ahrâr, İttihâd-ı Muhammedî); article summary replacing V3-T's "summary translation" with article type identification and editorial context; special handling for constitutional article citations, rhetorical questions, quotations, and editorial voice shifts.

Relationship to other skill files:

Parent: V3-T
Upstream pair: V3-S-Newspaper (visual capture)
Key adaptation: Shifts from gazette to newspaper journalism; adds constitutional-era political knowledge base; introduces punctuation handling and triple-calendar conversion.

V3-T-Newspaper v1.1

File: V3-T-Newspaper_Skill_File_v1_1.md
Name: V3-T-Newspaper v1.1
Date created: January 4, 2026
Created by: Colin Greenstreet
Tested: Encoding fix only. Content functionally identical to v1.0.

What the skill file does:
Addresses UTF-8 mojibake corruption that affected the v1.0 file in Claude Projects. Some Perso-Arabic characters and diacritical marks were corrupted during upload. V1.1 represents the corrected version, though the encoding issue was systemic and not fully resolved within the file itself.

Relationship to other skill files:

Parent: V3-T-Newspaper v1.0
Nature of change: Maintenance/encoding fix, not substantive protocol revision.

V3-T-E-5035-Chronicle-Variant v1.0

File: V3-T-E-5035-Chronicle-Variant.md
Name: V3-T-E-5035-Chronicle-Variant v1.0
Date created: January 29, 2026
Created by: Colin Greenstreet (for Nabil)
Tested: E.5035 manuscript (a bound codex in nesih script; archive reference E.5035, repository not specified in file). Period TBD pending content analysis. Initial visual assessment noted professional scribal hand, ~27–30 lines per page, possible reference to Sultan Korkud. Also incorporates lessons from earlier Şanizade Tarihi P9 processing (analysis JSON V2).

What the skill file does:
Adapts V3-T for chronicle/historiographical manuscripts. Key additions: role as "Ottoman Turkish Chronicle Specialist"; chronicle genre conventions (târîh, gazavâtnâme, tercüme-i hâl); embedded verse analysis (Ottoman Turkish, Persian, Arabic poetry within prose); manuscript-specific observations (scribal features, ink condition, marginalia, catchwords, rubrication); register spectrum from literary-historiographical to administrative; flexible content assessment framework for genre identification when document type is initially unknown; explicit protocol evolution notes for renaming after genre confirmation.

Relationship to other skill files:

Parent: V3-T (with additional input from Şanizade Tarihi processing experience)
Child: V3-T-E-5035-Chronicle-Variant v1.1
Key innovation: First skill file designed for an unknown/unclassified document, with built-in flexibility for genre discovery during processing.
Naming note: "E-5035" in the name is a specific archive reference; the file acknowledges it should be renamed to something generic (e.g., "V3-T-Chronicle") after genre confirmation.

V3-T-E-5035-Chronicle-Variant v1.1

File: V3-T-E-5035-Chronicle-Variant_V1_1_29012026.md
Name: V3-T-E-5035-Chronicle-Variant v1.1
Date created: January 29, 2026 (same day as v1.0)
Created by: Colin Greenstreet (for Nabil)
Tested: E.5035 processing revealed the document contained İkrâr/Ṣorgu (interrogation/deposition record) genre — a genre not anticipated in v1.0.

What the skill file does:
Adds İkrâr/Ṣorgu (interrogation record) as an explicit genre category after the E.5035 processing unexpectedly identified this genre. Adds: visual markers for triage detection of interrogation records; interrogation-specific processing guidance; legal/administrative formulae; expanded register options to include legal/administrative prose alongside literary-historiographical prose.

Relationship to other skill files:

Parent: V3-T-E-5035-Chronicle-Variant v1.0
Key lesson: Documents resist pre-classification; the skill file must accommodate genre surprises discovered during processing.
Cross-reference: V3-Triage v1.0/v1.1 now includes İkrâr/Ṣorgu detection markers based on this experience.

V3-T-Quranic

File: V3-T-Quranic_Semantic_Processing.md
Name: V3-T-Quranic v1.0
Date created: January 19, 2026
Created by: Colin Greenstreet
Tested: Quranic Arabic passages posted on LinkedIn by Michael Erdman, Head, Middle Eastern and Central Asian Collections, The British Library. The Quranic text required a distinct transliteration system (Arabic phonology rather than Ottoman Turkish phonology) and verification against the standard Medina muṣḥaf.

What the skill file does:
A standalone variant for Quranic Arabic text — the only V3-T skill file not specifically for Ottoman Turkish. Role as "Quranic Arabic Scholar"; Quranic identification protocol (Surah/Ayah matching against standard text); transliteration uses Arabic scholarly conventions (th, dh, kh, sh, q rather than Ottoman sÌ±, ż, ḫ, ş, ḳ); tajwīd-marked transliteration; translation provides both literal English and "interpretive English" following traditional Sunni tafsīr; NER categories are entirely distinct (divine names, prophets, angels, places, peoples, concepts, scripture); three-tier confidence scoring (visual capture, Quranic match, transliteration); special handling for basmala, variant readings (qirā'āt), calligraphic arrangements, and fragmentary text.

Relationship to other skill files:

Conceptual parent: V3-T (inherits the three-layer structure principle)
Transliteration system: INDEPENDENT — uses Arabic scholarly conventions, not Ottoman Turkish conventions. This is the key distinction: the same letter ث is transliterated "th" in V3-T-Quranic but "sÌ±" in all Ottoman V3-T variants.
Upstream: References "V3-S-Naskh" for visual capture (a ghost protocol not in the project).
Use case: Activated when Ottoman manuscripts or documents contain embedded Quranic passages that require Arabic rather than Ottoman Turkish phonological treatment.

Last updated: 8 February 2026 · v1.1

ottoman-archive wiki