For PyICU package it's nescessary to have the ICU library installed.
# Debian/Ubuntu
sudo apt install libicu-devThis project uses Poetry as a package manager. Please refer to the Poetry Installation docs or just run the following to install:
curl -sSL https://raw.githubusercontent.com/python-poetry/poetry/master/install-poetry.py | python -Create a pandas dataframe with columns:
- word_label_text
- word_description_text
- word_concept_id
- word_label_id
- word_description_id
- score
Example when your words are in a text file, every word on its single line:
import pandas as pd
word_list = list()
with open('words') as wordlist_file:
for i, word in enumerate(wordlist_file.readlines()):
word_list.append([word.strip().lower(), '', i, i, i])
words_dataframe = pd.DataFrame(data=word_list,
columns=['word_label_text',
'word_description_text',
'word_concept_id',
'word_label_id',
'word_description_id'])
words_dataframe.to_pickle('wordlist.pickle.gzip', compression='gzip', protocol=5)poetry run python3 ./run.pyBranch ab/experiment-memory-usage
- EMPTY data structure, english, 540k words: 3.3MiB
- FILLED data structure, english, 540k words: 273.63MiB ~= 530B per word
- Enlist all upgradable dependencies:
poetry show --outdated --latest- Increase versions in
pyproject.toml
poetry lock && poetry installNot much unit test is written yet.
poetry run python3 -m pytest