Skip to content

Processing outline

Michael Berger edited this page Jan 4, 2025 · 1 revision

Vanilla process

  1. main.tex
  2. processed text: all latex macros removed or replaced with narratable text
    1. (later) all non-text objects ( figures, tables, formulas,...) sent to video pipeline
  3. chunked text: split the text into sentences
  4. Narrator: sort chunks by length descending
  5. split the sorted sentences into batches of constant* size
  6. feed the batches one-by-one through TTS model
  7. each batch now has a corresponding waveform for each chunk and a duration of the waveform
    1. we can use duration to sync subtitles
    2. (later) and layout the video content
  8. once all batches are processed, we restore original order of sentences
  9. concatenate all the waveforms into a single waveform
  10. save waveform as audio file

Spark process

  1. main.tex
  2. Spark: read file (or read text from kafka stream?)
  3. LatexParser: all latex macros removed or replaced with narratable text
    1. (later) all non-text objects ( figures, tables, formulas,...) sent to video pipeline
  4. Chunker: split the text into sentences, to every chunk attach an index of it's original order
  5. sort chunks by length descending
  6. split the sorted sentences into batches of constant text volume ( = sentence size * batch size)
  7. partition the data by text volume. each batch goes to one worker
  8. each worker feeds the batches one-by-one through TTS model and appends results to DataFrame
  9. each batch now has a corresponding waveform for each chunk and a duration of the waveform
    1. we can use duration to sync subtitles
    2. (later) and layout the video content
  10. once all batches are processed, we restore original order of sentences
  11. concatenate all the waveforms into a single waveform
  12. save waveform as audio file

Clone this wiki locally