evaluation of text embedding techniques
- Create and switch to the virtual environment:
cd text_evaluation
make create_environment
conda activate text_evaluation
make requirements
- Explore the notebooks in the
notebooksdirectory
LICENSEMakefile- top-level makefile. Type
makefor a list of valid commands
- top-level makefile. Type
README.md- this file
data- Data directory. often symlinked to a filesystem with lots of space
data/raw- Raw (immutable) hash-verified downloads
data/interim- Extracted and interim data representations
data/processed- The final, canonical data sets for modeling.
docs- A default Sphinx project; see sphinx-doc.org for details
models- Trained and serialized models, model predictions, or model summaries
models/trained- Trained models
models/output- predictions and transformations from the trained models
notebooks- Jupyter notebooks. Naming convention is a number (for ordering),
the creator's initials, and a short
-delimited description, e.g.1.0-jqp-initial-data-exploration.
- Jupyter notebooks. Naming convention is a number (for ordering),
the creator's initials, and a short
references- Data dictionaries, manuals, and all other explanatory materials.
reports- Generated analysis as HTML, PDF, LaTeX, etc.
reports/figures- Generated graphics and figures to be used in reporting
reports/tables- Generated data tables to be used in reporting
reports/summary- Generated summary information to be used in reporting
requirements.txt- (if using pip+virtualenv) The requirements file for reproducing the
analysis environment, e.g. generated with
pip freeze > requirements.txt
- (if using pip+virtualenv) The requirements file for reproducing the
analysis environment, e.g. generated with
environment.yml- (if using conda) The YAML file for reproducing the analysis environment
setup.py- Turns contents of
srcinto a pip-installable python module (pip install -e .) so it can be imported in python code
- Turns contents of
src- Source code for use in this project.
src/__init__.py- Makes src a Python module
src/data- Scripts to fetch or generate data. In particular:
src/data/make_dataset.py- Run with
python -m src.data.make_dataset fetchorpython -m src.data.make_dataset process
- Run with
src/analysis- Scripts to turn datasets into output products
src/models- Scripts to train models and then use trained models to make predictions.
e.g.
predict_model.py,train_model.py
- Scripts to train models and then use trained models to make predictions.
e.g.
tox.ini- tox file with settings for running tox; see tox.testrun.org
This project was built using cookiecutter-easydata, an experimental fork of [cookiecutter-data-science](https://github.com/drivendata/cookiecutter-data-science) aimed at making your data science workflow reproducible.