Processing outline

main.tex
processed text: all latex macros removed or replaced with narratable text
1. (later) all non-text objects ( figures, tables, formulas,...) sent to video pipeline
chunked text: split the text into sentences
Narrator: sort chunks by length descending
split the sorted sentences into batches of constant* size
feed the batches one-by-one through TTS model
each batch now has a corresponding waveform for each chunk and a duration of the waveform
1. we can use duration to sync subtitles
2. (later) and layout the video content
once all batches are processed, we restore original order of sentences
concatenate all the waveforms into a single waveform
save waveform as audio file

main.tex
Spark: read file (or read text from kafka stream?)
LatexParser: all latex macros removed or replaced with narratable text
1. (later) all non-text objects ( figures, tables, formulas,...) sent to video pipeline
Chunker: split the text into sentences, to every chunk attach an index of it's original order
sort chunks by length descending
split the sorted sentences into batches of constant text volume ( = sentence size * batch size)
partition the data by text volume. each batch goes to one worker
each worker feeds the batches one-by-one through TTS model and appends results to DataFrame
each batch now has a corresponding waveform for each chunk and a duration of the waveform
1. we can use duration to sync subtitles
2. (later) and layout the video content
once all batches are processed, we restore original order of sentences
concatenate all the waveforms into a single waveform
save waveform as audio file

Provide feedback