Skip to content

Add text splitting into small parts #3

@AigizK

Description

@AigizK

The current version ignores the H1-H5 headers that were added by user. But when book was translate text from chapter 1 will be translate as a chapter 1 text into another language.
You can use this fact and split a big text to small parts.

Next idea - try split a big text to small blocks automatically:
Select a few sentences from original text(for example 10 sentences) and using loop try to find translate block in the thanslated text.

You can use the next psedocode:

left_array = original_sentences[100:110]
sum=[]
for i=50;i<150 do:
   right_array_candidate=translated_sentences[i:i+10]
   sum[i]=sum(cosunuse_distance(left_array,right_array_candidate))

rigth_array=get_index_with_max_value(sum)

left_text_split_index=left_array[0]
rigth_text_split_index=rigth_array[0]

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions