decide about structure of data

so far, we have the following recurring structures (many more to be added, this is just a first overview):

name | data type | content
--- | --- | ---
character | character list | character in big5 version
simplified_character | character list | character in simplified version, could be automatically produced or ignored if not present in original data
pinyin | character list |character reading in pinyin
doculect | character list, word list |  doculect in source
source | character list, word list, structure list | source (only to be used if multiple sources per dataset, otherwise specified in metadata)
reading | character list | original reading as given in source (maybe consider replacing with "value")
segments | character list, word list | segmented reading, following clpa specs
structure | character list, word list | the context description, that is, the phonetic/phonological structure of a given string (used for context determination)
concept | word list | the concept, which is then also linked to the concepticon 
concepticon_id | word list | obligatory if there is a concept in the data 
gloss | character list, structure list | not obligatory for character list, as the character is here the main gloss
value | word list, structure list | reading for a given word, that is, the main value, or the content of a structural feature in a structure list, as we retrieve it from the source

It is important to regularize the treatment of these values. Added values in all datasets are the refined segmented readings in CLPA, built on top of the other readings, but then, there are cases for data-checking, like "sampa", which may be useful, etc., the běnzì, etc. This all needs to be organized and structured. 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

decide about structure of data #7

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

name	data type	content
character	character list	character in big5 version
simplified_character	character list	character in simplified version, could be automatically produced or ignored if not present in original data
pinyin	character list	character reading in pinyin
doculect	character list, word list	doculect in source
source	character list, word list, structure list	source (only to be used if multiple sources per dataset, otherwise specified in metadata)
reading	character list	original reading as given in source (maybe consider replacing with "value")
segments	character list, word list	segmented reading, following clpa specs
structure	character list, word list	the context description, that is, the phonetic/phonological structure of a given string (used for context determination)
concept	word list	the concept, which is then also linked to the concepticon
concepticon_id	word list	obligatory if there is a concept in the data
gloss	character list, structure list	not obligatory for character list, as the character is here the main gloss
value	word list, structure list	reading for a given word, that is, the main value, or the content of a structural feature in a structure list, as we retrieve it from the source

decide about structure of data #7

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions