MIREI: Matched Investigation of Representation Embedding Insights

MIREI is a research workspace that builds encoder/decoder text-embedding models under matched conditions, tracks shared training pipelines, and benchmarks their performance differences.

Hugging Face Collection

All MIREI checkpoints are gathered in the Hugging Face collection: MIREI Collection.

How to operate uv

setup

Install withgit clone https://github.com/iamtatsuki05/MIREI.git

uv configuration

uv sync
uv sync --group cuda

run script

uv run python ...

How to operate docker

setup

Install withgit clone git clone https://github.com/iamtatsuki05/MIREI.git

docker configuration

docker compose up -d --build <service name(ex:python-cpu)

Connect to and disconnect from docker

connectdocker compose exec <service name(ex:python-cpu)> bash
disconectexit

Using jupyterlab

Access with a browser http://localhost:8888/lab

Starting and Stopping Containers

Startingdocker compose start
Stoppingdocker compose stop

Directory structure

./
├── .dockerignore
├── .git
├── .gitattributes
├── .github
├── .gitignore
├── .pre-commit-config.yaml
├── README.md
├── README_JA.md
├── compose.yaml
├── config
├── data
│   ├── datasets
│   ├── misc
│   ├── models
│   ├── outputs
│   └── raw
├── docker
│   ├── cpu
│   └── gpu
├── docs
├── notebooks
├── uv.lock
├── pyproject.toml
├── scripts
│   ├── README.md
│   ├── README_JA.md
│   └── constract_llm
│       ├── README.md
│       ├── README_JA.md
│       ├── dataset
│       ├── model
│       ├── tokenizer
│       └── train
│           ├── README.md
│           ├── README_JA.md
│           ├── ft
│           └── pt
├── src
│   ├── __init__.py
│   └── mirei
│       ├── common
│       ├── config
│       ├── env.py
│       └── constract_llm
└── tests
    └── mirei

Scripts

This project includes various scripts related to building and training language models (LLMs). For more details, please refer to the following READMEs:

Scripts Overview - Overview of basic scripts
Language Model Construction Scripts - Scripts related to language model construction
Training Scripts - Scripts for pre-training and fine-tuning
- Pre-training Scripts - Scripts for MLM and MNTP pre-training
- Fine-tuning Scripts - Scripts for Sentence Transformer fine-tuning

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MIREI: Matched Investigation of Representation Embedding Insights

Hugging Face Collection

How to operate uv

setup

uv configuration

run script

How to operate docker

setup

docker configuration

Connect to and disconnect from docker

Using jupyterlab

Starting and Stopping Containers

Directory structure

Scripts

About

Uh oh!

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 137 Commits
.github		.github
config		config
data		data
docker		docker
docs		docs
notebooks		notebooks
scripts		scripts
src		src
tests		tests
.dockerignore		.dockerignore
.gitattributes		.gitattributes
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
README.md		README.md
README_JA.md		README_JA.md
compose.yml		compose.yml
pyproject.toml		pyproject.toml
tox.ini		tox.ini
uv.lock		uv.lock

iamtatsuki05/MIREI

Folders and files

Latest commit

History

Repository files navigation

MIREI: Matched Investigation of Representation Embedding Insights

Hugging Face Collection

How to operate uv

setup

uv configuration

run script

How to operate docker

setup

docker configuration

Connect to and disconnect from docker

Using jupyterlab

Starting and Stopping Containers

Directory structure

Scripts

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Uh oh!

Languages