User-service communication
- Install and check requirements
git clone https://github.com/Dan1lD/conference_speech2text && cd conference_speech2text- (optional, if you want to make your server public) Set your nginx configuration (host) in file services/Web/nginx/nginx.conf. Specify your
server_name(google nginx configuration).
- (optional, if you want to make your server public) Set your nginx configuration (host) in file services/Web/nginx/nginx.conf. Specify your
docker-compose builddocker-compose up- Open in your web-browser
http://127.0.0.1(orhttp://+server_nameif you did 2.1 step). It is web-site for end users.
Software:
- Docker
- nvidia-docker
- docker-compose
- nvidia drivers
- CUDA
Hardware:
- Nvidia videocard
- 20 GB RAM
- 30 GB of free memory on ROM
The website provides a graphical interface for wav file transcription and management. It runs on python Django backend, and javascript + HTML + CSS frontend.
- Voice transcription from uploaded
wavfile - Search records by keywords
wavfiles uploading by user- Files storing: voice record(
wav) and transcription(txt)
Main menu. There user can upload his wav file to transcript. At the top right we can open menu with main pages and search records by keywords in the local searcher.
Record cards. This a list of cards for uploaded voice records. User can open full transcription by clicking on card. Every card shows word cloud of keywords, title of record, few first words from transcription, upload date and time, links for download wav and txt files, button to delete the record.
Record transcription. There user can see full transcription, uploaded file name and same items as in record card. It is possible to edit recognized text.
Mobile site view. Our site supports mobile users.
- python
- django
- matplotlib
- wordcloud
- javascript
- bootstrap
- jquery
- modernizr
- HTML
- CSS
We perform voice(wav) to text convertion using open source speech recognition toolkit "VOSK". For code look at services/S2T/app/app.py.
We put punctuation marks to the converted text using modified "Neuro-comma" model. For code look at services/S2T/app/app.py.
A Transformer architecture based language model.
We exctact keywords from converted text for quick navigation between records.
Exctracting performs in 4 steps:
- We bring words in the text to their infinitives to avoid repetions of different forms of word in keywords by applying an open source conversational AI framework DeepPavlov
- Calculate embeddings of words using sentence-transformers framework
- Select the most popular words in terms of number of embedding similar word pairs
- Leave only unique instanses of word.





