GitHub - OpenPecha/transfer-text-segments

Transfer text segment annotations to the original text

Usage

Install using pip.

pip install git+https://github.com/OpenPecha/transfer-text-segments.git

Import

from transfer_text_segments import transfer_text

Transfer_text

orginaltext_with_annotation = transfer_text(Original_text_location,Predicted_tsv_location Column_name)

Original_text_location := contains location of the original text

**Predicted_tsv_location ** := contains location of the tsv file

Column_name := contains column name of the text in the Predicted_tsv (starts from 0)

Example

original_text =  "sample/test_orginal_1.txt"
#file contains: མར་པ་ལོ་ཙཱ་བ་ཆོས་ཀྱི་བློ་གྲོས། མར་པ་ཆོས་ཀྱི་བློ་གྲོས་ནི། ཕྱི་ལོ་ ༡༠༡༡ ལོར་བྲག་ཕུག་ཆུ་ཁྱེར་ཞེས་པའི་ས་གནས་སུ་སྐུ་འཁྲུངས།

predicted_tsv =  "sample/test_transcription_1.tsv"
#file contains:
#Marpa-ep1-001.wav	1.9	3.65	མར་པ་ཆོས་ཀྱི་ལོ་འགྲོས
#Marpa-ep1-002.wav	5.4	7.2	མར་པ་ཆོས་གོི་སློ་གྲོད་ནི།
#Marpa-ep1-003.wav	7.75	9.65	སཤི་ལོ་རྒྱ་མད་བཅུགས་རྗིས་སལོར་དུ།
#Marpa-ep1-004.wav	9.9	12.55	བགྲ་བུགས་ཆུ་སྐྱིར་ཞེས་བྱ་བའི་ས་ནས་སུ་འདགུ་ཕྲུང༌།

column  =  'sentence'
# 0.                1    2        sentence
#Marpa-ep1-001.wav	1.9	   3.65	  མར་པ་ཆོས་ཀྱི་ལོ་འགྲོས
#Marpa-ep1-002.wav	5.4 	 7.2	 མར་པ་ཆོས་གོི་སློ་གྲོད་ནི།
#Marpa-ep1-003.wav	7.75	 9.65  སཤི་ལོ་རྒྱ་མད་བཅུགས་རྗིས་སལོར་དུ།
#Marpa-ep1-004.wav	9.9	   12.55	 བགྲ་བུགས་ཆུ་སྐྱིར་ཞེས་བྱ་བའི་ས་ནས་སུ་འདགུ་ཕྲུང༌།

result = transfer_text(original_text, predicted_tsv,'sentence')

extracted text from tsv file..
extracted text from original file..
Annotation transfer started...
Mapping annotations to tofu-IDs
[INFO] Computing diffs ...
source /Users/tenzingayche/Desktop/transfer-text-segments/.venv/bin/activate
[INFO] Diff computed!
Transfering annotations...

print(result)
#Note: It returns a dataframe
              0     1      2            sentence
0  Marpa-ep1-001.wav  1.90   3.65  མར་པ་ལོ་ཙཱ་བ་ཆོས་ཀྱི་བློ་གྲོས།
1  Marpa-ep1-002.wav  5.40   7.20  མར་པ་ཆོས་ཀྱི་བློ་གྲོས་ནི།
2  Marpa-ep1-003.wav  7.75   9.65  ཕྱི་ལོ་ ༡༠༡༡ ལོར་
3  Marpa-ep1-004.wav  9.90  12.55  བྲག་ཕུག་ཆུ་ཁྱེར་ཞེས་པའི་ས་གནས་སུ་སྐུ་འཁྲུངས།

Name		Name	Last commit message	Last commit date
Latest commit History 33 Commits
.github/workflows		.github/workflows
.vscode		.vscode
docs		docs
src		src
tests		tests
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
setup.cfg		setup.cfg

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Transfer text segment annotations to the original text

Usage

Install using pip.

Import

Transfer_text

Project owner(s)

Integrations

Docs

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

OpenPecha/transfer-text-segments

Folders and files

Latest commit

History

Repository files navigation

Transfer text segment annotations to the original text

Usage

Install using pip.

Import

Transfer_text

Project owner(s)

Integrations

Docs

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages