-
Notifications
You must be signed in to change notification settings - Fork 0
Processing outline
Michael Berger edited this page Jan 4, 2025
·
1 revision
- main.tex
- processed text: all latex macros removed or replaced with narratable text
- (later) all non-text objects ( figures, tables, formulas,...) sent to video pipeline
- chunked text: split the text into sentences
- Narrator: sort chunks by length descending
- split the sorted sentences into batches of constant* size
- feed the batches one-by-one through TTS model
- each batch now has a corresponding waveform for each chunk and a duration of the waveform
- we can use duration to sync subtitles
- (later) and layout the video content
- once all batches are processed, we restore original order of sentences
- concatenate all the waveforms into a single waveform
- save waveform as audio file
- main.tex
- Spark: read file (or read text from kafka stream?)
- LatexParser: all latex macros removed or replaced with narratable text
- (later) all non-text objects ( figures, tables, formulas,...) sent to video pipeline
- Chunker: split the text into sentences, to every chunk attach an index of it's original order
- sort chunks by length descending
- split the sorted sentences into batches of constant text volume ( = sentence size * batch size)
- partition the data by text volume. each batch goes to one worker
- each worker feeds the batches one-by-one through TTS model and appends results to DataFrame
- each batch now has a corresponding waveform for each chunk and a duration of the waveform
- we can use duration to sync subtitles
- (later) and layout the video content
- once all batches are processed, we restore original order of sentences
- concatenate all the waveforms into a single waveform
- save waveform as audio file