This program transforms .wav files into a .json files whose contain words with certain character selected (filter) in a specific language.
This, using the whisper-timestamped library that processes the files and applies this filter (one or more characters). The program
detects the words whose contain these characters and creates a .json file with the words detected, including timestamps attributes for each word.
- Python 3.9 or newer
- Clone this repo (Or download it as a zip):
clone https://github.com/Klefur/Elan-Marker.git- Install
whisper-timestampedlibrary:
pip3 install git+https://github.com/linto-ai/whisper-timestamped- Install
ffmpeg:- On Ubuntu or Debian:
sudo apt update && sudo apt install ffmpeg- On Arch Linux:
sudo pacman -S ffmpeg
- On MacOS using Homebrew (https://brew.sh/):
brew install ffmpeg
- on Windows using Chocolatey (https://chocolatey.org/):
choco install ffmpeg
- on Windows using Scoop (https://scoop.sh/):
scoop install ffmpeg
- Install ONNX Runtime:
pip3 install onnxruntime torchaudio- Audio backend torchaudio:
- SoundFile for Windows
pip install soundfile
- Sox for Linux/MacOs
pip install sox
- moviepy
pip install moviepy- pympi-ling
pip install pympi-lingMove all files to process to the input folder.
The .mp4 files will be automatically transformed into .wav files. To avoid the conversion, use the flag --use_wav True
Open the console from the cloned repository. You can use the cd command.
cd ./path/Marcador-ElanOpen the repository in the terminal using
cd ./{path}/Elan-Marker
then the following command line will execute the program and mark on the timeline the words that contain the letters 's' and 'd'.
python ./marcador_elan.py --filters s d
--filters: List of strings to filter (use lowercase)
python ./marcador_elan.py --filters s d asa--input_folder: Folder with the input files
python ./marcador_elan.py --input_folder mp4_folder--output_folder: Folder for output files
python ./marcador_elan.py --output_folder elan_folder--save_temp: save temporal files
python ./marcador_elan.py --save_temp--use_wav: Skip .wav to .mp4 conversion
python ./marcador_elan.py --use_wav--name_model: Select whisper model
python ./marcador_elan.py --name_model medium--language: Select language of the audio (--help to see list) (default: Spanish)
python ./marcador_elan.py --language enThe generated files will be in output folder
- whisper-timestamped: Multilingual Automatic Speech Recognition with word-level timestamps and confidence (License AGPL-3.0).
- whisper: Whisper speech recognition (License MIT).
- dtw-python: Dynamic Time Warping (License GPL v3).
- json-to-elan: Tools and scripts for working with ELAN (License Apache-2.0).
Lucas Mesías | Joaquín Salidivia | Nicolás Aguilera
If you incorporate this in your research, reference the repository as the source.
@misc{mesias2023marcadorelan,
author = {Mesías, Lucas and Saldivia, Joaquín and Aguilera, Nicolás},
month = {6},
title = {Marcador-elan},
url = {https://github.com/Klefur/Marcador-Elan/},
year = {2023}
}Whisper-timestamped:
@misc{lintoai2023whispertimestamped,
title={whisper-timestamped},
author={Louradour, J{\'e}r{\^o}me},
journal={GitHub repository},
year={2023},
publisher={GitHub},
howpublished = {\url{https://github.com/linto-ai/whisper-timestamped}}
}OpenAI Whisper paper:
@article{radford2022robust,
title={Robust speech recognition via large-scale weak supervision},
author={Radford, Alec and Kim, Jong Wook and Xu, Tao and Brockman, Greg and McLeavey, Christine and Sutskever, Ilya},
journal={arXiv preprint arXiv:2212.04356},
year={2022}
}Dynamic-Time-Warping:
@article{JSSv031i07,
title={Computing and Visualizing Dynamic Time Warping Alignments in R: The dtw Package},
author={Giorgino, Toni},
journal={Journal of Statistical Software},
year={2009},
volume={31},
number={7},
doi={10.18637/jss.v031.i07}
}