Conference speech2text

User-service communication

Usage

Install and check requirements
git clone https://github.com/Dan1lD/conference_speech2text && cd conference_speech2text
1. (optional, if you want to make your server public) Set your nginx configuration (host) in file services/Web/nginx/nginx.conf. Specify your server_name (google nginx configuration).
docker-compose build
docker-compose up
Open in your web-browser http://127.0.0.1 (or http:// + server_name if you did 2.1 step). It is web-site for end users.

Requirements

Software:

Docker
nvidia-docker
docker-compose
nvidia drivers
CUDA

Hardware:

Nvidia videocard
20 GB RAM
30 GB of free memory on ROM

Services description

Web-site

The website provides a graphical interface for wav file transcription and management. It runs on python Django backend, and javascript + HTML + CSS frontend.

Avaliable features:

Voice transcription from uploaded wav file
Search records by keywords
wav files uploading by user
Files storing: voice record(wav) and transcription(txt)

Screenshots

Main menu. There user can upload his wav file to transcript. At the top right we can open menu with main pages and search records by keywords in the local searcher.

Record cards. This a list of cards for uploaded voice records. User can open full transcription by clicking on card. Every card shows word cloud of keywords, title of record, few first words from transcription, upload date and time, links for download wav and txt files, button to delete the record.

Record transcription. There user can see full transcription, uploaded file name and same items as in record card. It is possible to edit recognized text.

Mobile site view. Our site supports mobile users.

Stack used:

python
- django
- matplotlib
- wordcloud
javascript
- bootstrap
- jquery
- modernizr
HTML
CSS

Voice-text convertion

We perform voice(wav) to text convertion using open source speech recognition toolkit "VOSK". For code look at services/S2T/app/app.py.

Puntuation adding

We put punctuation marks to the converted text using modified "Neuro-comma" model. For code look at services/S2T/app/app.py.

A Transformer architecture based language model.

Keywords extracting

We exctact keywords from converted text for quick navigation between records.

Exctracting performs in 4 steps:

We bring words in the text to their infinitives to avoid repetions of different forms of word in keywords by applying an open source conversational AI framework DeepPavlov
Calculate embeddings of words using sentence-transformers framework
Select the most popular words in terms of number of embedding similar word pairs
Leave only unique instanses of word.

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
data		data
services		services
.gitignore		.gitignore
README.md		README.md
docker-compose.yml		docker-compose.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Conference speech2text

Usage

Requirements

Services description

Web-site

Avaliable features:

Screenshots

Stack used:

Voice-text convertion

Puntuation adding

Keywords extracting

About

Uh oh!

Releases

Packages

Contributors 3

Uh oh!

Languages

Dan1lD/conference_speech2text

Folders and files

Latest commit

History

Repository files navigation

Conference speech2text

Usage

Requirements

Services description

Web-site

Avaliable features:

Screenshots

Stack used:

Voice-text convertion

Puntuation adding

Keywords extracting

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Uh oh!

Languages

Packages