English / 日本語
MIREI is a research workspace that builds encoder/decoder text-embedding models under matched conditions, tracks shared training pipelines, and benchmarks their performance differences.
All MIREI checkpoints are gathered in the Hugging Face collection: MIREI Collection.
- Install with
git clone https://github.com/iamtatsuki05/MIREI.git
uv syncuv sync --group cuda
uv run python ...- Install with
git clone git clone https://github.com/iamtatsuki05/MIREI.git
docker compose up -d --build <service name(ex:python-cpu)
- connect
docker compose exec <service name(ex:python-cpu)> bash - disconect
exit
- Access with a browser http://localhost:8888/lab
- Starting
docker compose start - Stopping
docker compose stop
./
├── .dockerignore
├── .git
├── .gitattributes
├── .github
├── .gitignore
├── .pre-commit-config.yaml
├── README.md
├── README_JA.md
├── compose.yaml
├── config
├── data
│ ├── datasets
│ ├── misc
│ ├── models
│ ├── outputs
│ └── raw
├── docker
│ ├── cpu
│ └── gpu
├── docs
├── notebooks
├── uv.lock
├── pyproject.toml
├── scripts
│ ├── README.md
│ ├── README_JA.md
│ └── constract_llm
│ ├── README.md
│ ├── README_JA.md
│ ├── dataset
│ ├── model
│ ├── tokenizer
│ └── train
│ ├── README.md
│ ├── README_JA.md
│ ├── ft
│ └── pt
├── src
│ ├── __init__.py
│ └── mirei
│ ├── common
│ ├── config
│ ├── env.py
│ └── constract_llm
└── tests
└── mirei
This project includes various scripts related to building and training language models (LLMs). For more details, please refer to the following READMEs:
- Scripts Overview - Overview of basic scripts
- Language Model Construction Scripts - Scripts related to language model construction
- Training Scripts - Scripts for pre-training and fine-tuning
- Pre-training Scripts - Scripts for MLM and MNTP pre-training
- Fine-tuning Scripts - Scripts for Sentence Transformer fine-tuning
