Metadata
Author: Colin Greenstreet | Landing page read.me created Wednesday, February 7th 2026
Version: v1.3
Version history:
- v1.0 (4 February 2026): Initial draft
- v1.1 (8 February 2026): Significantly expanded read.me; added collapsing metadata feature; added contact details; removed demo skill files section; made minor edits to text
- v1.2 (9 February 2026): Added wiki call out button
- v1.3 (10 February 2026): MChanged contact details; minor edits to text
Validation: This respository structure has been validated by Colin Greenstreet
My name is Colin Greenstreet. I am a research public historian, convenor of the ai + history collaboratory, and founder of MarineLives.
I write a Substack called Generative Lives, and have published a series of articles about the application of large language models to machine transcription of printed documents and manuscripts.
I am working collaboratively with a small group of Ottoman historians.
My most recent Substack article looks at the number of the languages of the Ottoman Empire - Ottoman Turkish, Albanian, Bulgarian, Greek, and Armenian - as well as modern Turkish.
Greenstreet, Colin, 'Opening the Ottoman Archive: You want to do granular research on the Ottoman Empire and its successor states, but don't read Ottoman Turkish. Do you throw up your hands or look for a new set of powerful tools?', Generative Lives, January 4th 2026. Click here
I do not read Ottoman Turkish, nor any other Ottoman-related language, but do read English, German, French, and a little Spanish. I have thirteen years of history tech experience (NLP, NER, machine transcription) and was an early adopter of the precursor of Transkribus (CNN/RNN based machine transcription). I have been focussed for the last two years on the application of large language models to historical research, and have been a keynote speaker at the TNA and the IHR on LLM related topics.
My key innovations in the field of medium/low resource languages (including Ottoman Turkish):
- Two stage processing of Ottoman Turkish printed and handwritten manuscripts: Visual Capture + Semantic Processing
- Development and use of markdown skill files for visual capture and semantic processing, tailored to script, document type, and genre
You can read about the various skill files we have under development in our public wiki.
The concept of "skill files" was developed by Anthropic, and has been rapidly adopted by other large language model providers such as Google (Gemini) and OpenAI (ChatGPT). Anthropic provides a formal definition of an Anthropic skill file format and also a public folder of Anthropic skill files, which you may wish to explore.
Our plan is to develop Anthropic compliant skill files to support the visual capture and semantic processing of Ottoman Turkish. At a later stage we wish to extend this to Albanian, Bulgarian, Greek, and Armenian.
If you would like to try out one or more of these skill files, simply get in touch by email [colin.greenstreet@marinelives.org]. I will give you a copy of the skill file which interest you, explain how it works, and will contextualise it within the two step visual capture + semantic processing workflow I have developed. I will also support you as you test the file out on your own Ottoman Turkish documents. The more I/we can test these skill files on real live handwritten (and printed) documents, the more robust we can make these skill files, and the more we can document their applicability by script type, document type and genre. Everything we develop in terms of skill files, together with related documentation, will be made available to the Commons.
You can explore three of our skill files in the skill-files-demo public repository
The ottoman-archive GitHub organisation is a platform on which to develop with like-minded historians the ideas contained in the Opening the Ottoman Archive Substack article - both Ottoman history/language specialists and general historians are welcomed with open arms.
Each of our collaborating historians has access to a private GitHub repository hosted by our ottoman-archive GitHub organisation for their specific research interests and is password protected.
Preferred file formats are .png (for image files) and .md (markdown, for skill files).
You can browse this mocked up public example of a private GitHub repository colin-yournamehere-htr, which contains images, skill files, visual output, and semantic processing output for the masthead, and RH and LH columns of issue one of the letterpress printed Ottoman Turkish Ottoman Government Gazette Takvim-i Vekayi.
Contact colin.greenstreet@marinelives.org if you would like to discuss collaboration.
Last updated: 9 February 2026 · v1.3



