-
Notifications
You must be signed in to change notification settings - Fork 0
Identification of texts/paragraphs #1
Description
Dear Friends, thank you!
I have seen p n=[0-9]+ as identifiers in the texts.
While assuming UHJ has a more elaborated identification system (as I guess it should work on the basis of "word" index of the authentic originals), it would be also helpful to create a language-independent identification system for the Holy Scriptures.
That is, it would be helpful to add unique identifiers to the paragraphs.
It may server well in verification of completeness and accuracy of translations, concordances, avoiding double translation, perfect reference / citation accuracy etc.
Checking current practice:
-
As for paper(ish) collections, many secondary literature documents already, this or other way, use identifiers, yet, many times using references to "pages" of paper publications, which is not easy to put into context the quotations - since the exact edition is to be consulted.
-
As for electronic references, I have verified all recent BADI projects and I have found multiple, yet non-standardized ways of referencing, some may be used, yet, a common and standardized way could ensure bridges between the parallel efforts of the various teams.
As for the content, I am aware that the paragraph structure may not be the same in Persian and Arabic vs. the English version.
Based on this analysis (and years' practice in identification of multilingual, cross-linked corpuses), I dare to propose
- collecting a foreseen / comprehensive list of texts this current TEI conversion project would target, selected from Dr. Phelps inventory (notation below: BREF-XY below for this, without exact number taken from the collection of Dr. Phelphs)
- a proper hierarchic identification of "parts" (e.g. BREF-HW_P for Hidden Words Persian, BREF-KA_QA: using meaningful acronyms or numbers, where applicable, like for Gleanings BREF-GL_IV - yet roman numbers may be changed to Arabic)
- a unique, maybe hierarchic, identification of numbered paragraphs where available, e.g. BREF-HW_P_1
- adding artificial ones when no numbering in original is usual, like for main text BREF-KA_12 (like usually at legal texts where rigorously increasing number denote provisions), and for questions and answers BREF-KA._QA_n1_q, ///_QA_n1_a
- alternate identifications may coexist with concordance tables, such as word-based (static and never changing authentic texts), paper-based (bound to a certain edition, yet with an exact reference to the original paper edition, containing page number and row number)
(Btw. It would be great either to completely omit identifiers or to mark exactly mark some identifiers - in case the paragraph is NOT authentic, e.g. editorial text or explanation given by other parties, such as prefaces, footnotes, since these might be temporary. Most obviously, all comments and notes of Shoghi Effendi, even if the original Author is different, remain to be identified with unambiguous markup to clearly specify being not part of the original text or its translation).
I have participated creating such language-independent identification systems and also in maintenance of large corpuses in multiple languages, also experienced in XML/JS/VB/Go/Java in case my participation in this project could be found helpful. I have, though never deep enough, a view on the bahá'í corpus.
To streamline my efforts in related tasks, besides a common and universal identification of paragraphs, it would be also great (one day) to define/lay open a TXB and term markup recommendation. It would serve well in national translations - coherence of terminology, also in preparation of devotional meeting material / meditation. (FYI I have made, even if aware of the letter 2015 of UHJ, manual alignment of terms between HU/EN/ZH(!) for The Hidden Words, for the purpose of using in study circles.)
With love,
A