Skip to content

MIREI is a research workspace that builds encoder/decoder text-embedding models under matched conditions, tracks shared training pipelines, and benchmarks their performance differences.

Notifications You must be signed in to change notification settings

iamtatsuki05/MIREI

Repository files navigation

MIREI: Matched Investigation of Representation Embedding Insights

English / 日本語

MIREI is a research workspace that builds encoder/decoder text-embedding models under matched conditions, tracks shared training pipelines, and benchmarks their performance differences.

MIREI Concept Overview

Hugging Face Collection

All MIREI checkpoints are gathered in the Hugging Face collection: MIREI Collection.

How to operate uv

setup

  1. Install withgit clone https://github.com/iamtatsuki05/MIREI.git

uv configuration

  1. uv sync
  2. uv sync --group cuda

run script

uv run python ...

How to operate docker

setup

  1. Install withgit clone git clone https://github.com/iamtatsuki05/MIREI.git

docker configuration

  1. docker compose up -d --build <service name(ex:python-cpu)

Connect to and disconnect from docker

  1. connectdocker compose exec <service name(ex:python-cpu)> bash
  2. disconectexit

Using jupyterlab

  1. Access with a browser http://localhost:8888/lab

Starting and Stopping Containers

  1. Startingdocker compose start
  2. Stoppingdocker compose stop

Directory structure

./
├── .dockerignore
├── .git
├── .gitattributes
├── .github
├── .gitignore
├── .pre-commit-config.yaml
├── README.md
├── README_JA.md
├── compose.yaml
├── config
├── data
│   ├── datasets
│   ├── misc
│   ├── models
│   ├── outputs
│   └── raw
├── docker
│   ├── cpu
│   └── gpu
├── docs
├── notebooks
├── uv.lock
├── pyproject.toml
├── scripts
│   ├── README.md
│   ├── README_JA.md
│   └── constract_llm
│       ├── README.md
│       ├── README_JA.md
│       ├── dataset
│       ├── model
│       ├── tokenizer
│       └── train
│           ├── README.md
│           ├── README_JA.md
│           ├── ft
│           └── pt
├── src
│   ├── __init__.py
│   └── mirei
│       ├── common
│       ├── config
│       ├── env.py
│       └── constract_llm
└── tests
    └── mirei

Scripts

This project includes various scripts related to building and training language models (LLMs). For more details, please refer to the following READMEs:

About

MIREI is a research workspace that builds encoder/decoder text-embedding models under matched conditions, tracks shared training pipelines, and benchmarks their performance differences.

Topics

Resources

Stars

Watchers

Forks