-
Notifications
You must be signed in to change notification settings - Fork 15
Open
Description
Reproduce script
tokens = wt.tokenize("རིན་ཆེན་མིའི")
print(tokens)output:
[text: "རིན་ཆེན་"
text_cleaned: "རིན་ཆེན་"
text_unaffixed: "རིན་ཆེན་"
syls: ["རིན", "ཆེན"]
pos: OTHER
lemma: རིན་ཆེན་
senses: | pos: OTHER, freq: 22841, affixed: False, lemma: རིན་ཆེན་ |
char_types: |CONS|VOW|CONS|TSEK|CONS|VOW|CONS|TSEK|
chunk_type: TEXT
freq: 22841
syls_idx: [[0, 1, 2], [4, 5, 6]]
syls_start_end: [{'start': 0, 'end': 4}, {'start': 4, 'end': 8}]
start: 0
len: 8
, text: "མི"
text_cleaned: "མི"
text_unaffixed: "མི"
syls: ["མི"]
pos: PART
lemma: མི་
senses: | pos: PART, freq: 883801, affixed: True, lemma: མི་ |
char_types: |CONS|VOW|
chunk_type: TEXT
freq: 883801
affix_host: True
syls_idx: [[0, 1]]
syls_start_end: [{'start': 0, 'end': 2}]
start: 8
len: 2
, text: "འི"
text_cleaned: "འི་"
text_unaffixed: "འི་"
syls: ["འི"]
pos: PART
lemma: གི་
senses: | lemma: གི་ |
char_types: |CONS|VOW|
chunk_type: TEXT
affix: True
syls_idx: [[0, 1]]
syls_start_end: [{'start': 2, 'end': 4}]
start: 10
len: 2
]
Metadata
Metadata
Assignees
Labels
No labels