Statistical prediction of which common gloss is most appropriate in context

We aggregate the most common glosses for each word, and show the most common gloss as the primary suggestion. A couple ideas to improve this:
* Feed the top glosses to an LLM with the verse as context to see if it can improve rates of acceptance
* Cluster tuples of glosses across languages to try to predict which gloss is most likely to be appropriate. This would work by scoring pairs of glosses across languages, and then algorithmically picking the gloss that has a high correlation with how a word has already been glossed in other languages.
```
Spanish Gloss A, English Gloss 1 - 0.9 // When a word is glossed with English Gloss 1, predict Spanish Gloss A
Spanish Gloss A, English Gloss 2 - 0.1
Spanish Gloss B, English Gloss 1 - 0.3
Spanish Gloss B, English Gloss 2 - 0.7 // When a word is glossed with English Gloss 2, predict Spanish Gloss B
```
* We could improve sense clustering by clustering at the lemma level rather than the lemma+morphology level. This should yield better sense clusters. Then for each language, use the surrounding context and morphology information to predict precise glosses. This might perform better when languages have lots of grammatical forms for the same sense, because it can predict the right form, and is more likely to fail by getting the grammar wrong instead of the sense which  should be easier to correct.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Statistical prediction of which common gloss is most appropriate in context #248

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Statistical prediction of which common gloss is most appropriate in context #248

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions