Thesaurus Visualization

Current available notebooks

ThesaurusVisualization.ipynb Is a demo on how to use the library (English language)

Tutorial.ipynb Shows how to train the model

MultiLang.ipynb Multi languages demo (French, German, Russian and Arabic)

back.ipynb Shows how to prepare and process the data for training

How to install and run

Install the requirements:

pip install -r requirements.txt

cd into source directory:

%cd source

Import the library:

from thesaurus import Thesaurus

Create an object and specify the language:

obj = Thesaurus(lang='eng')

Current supported languages are:

English eng
French fra
German deu
Arabic ara
Russian ru

Chose one or more files to process as a foreground:

text1 = obj.read_txt('../data/texts/covid19.txt')
text2 = obj.read_txt('../data/texts/descartes.txt')

texts = dict()

foreground_names = []

foreground_name1 = 'Covid 19'
texts[foreground_name1] = [text1]
foreground_names.append(foreground_name1)

foreground_name2 = 'Discours de la méthode, René Descartes'
texts[foreground_name2] = [text2]
foreground_names.append(foreground_name2)

processed_foregrounds = obj.process_foreground(foreground_names, texts)

Import the background files:

background_embeds, background_words = obj.import_background()

Note: If embeddings aren't available for the background then they will be auto generated

Visualize the texts:

print("Go to https://nbviewer.org/github/DinarZayahov/thesaurus/blob/master/ThesaurusVisualization.ipynb 
       to see the outputs of the notebook as the GitHub doesn't render dynamic output")
obj.show_map(background_embeds, background_words, foreground_names, processed_foregrounds)

Download embeddings on Colab:

After you change directory to '/source' in google colab run the current cell, this cell downloads the files into their folders:

!gdown 1pJnTdw5qyitFNVC_FLfmjhNdEBk0fH0N -O ../data/back_embeds/
!gdown 1BLHIb82kTFuSzkyEbe1f044UGojXuhV8 -O ../data/back_embeds/
!gdown 1WrU9i5BiCckDebYAcOZu83ZL1UD9kvM2 -O ../data/back_embeds/

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Thesaurus Visualization

Current available notebooks

How to install and run

Download embeddings on Colab:

About

Uh oh!

Releases

Packages

Contributors 4

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 63 Commits
data		data
source		source
.gitignore		.gitignore
MultiLang.ipynb		MultiLang.ipynb
README.md		README.md
ThesaurusVisualization.ipynb		ThesaurusVisualization.ipynb
Tutorial.ipynb		Tutorial.ipynb
back.ipynb		back.ipynb
requirements.txt		requirements.txt

DinarZayahov/thesaurus

Folders and files

Latest commit

History

Repository files navigation

Thesaurus Visualization

Current available notebooks

How to install and run

Download embeddings on Colab:

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Uh oh!

Languages

Packages