LLM ML Lab

A comprehensive platform for language model inference, evaluation, and deployment with multi-modal capabilities.

Overview

LLM ML Lab is a full-featured platform for deploying, serving, and evaluating large language models. The platform consists of multiple components that work together to provide a complete solution for language model infrastructure:

Inference Service - Python-based service for model execution and API endpoints
UI - React-based user interface for interacting with the services

Project Structure

/llmmllab
├── inference/                # Python-based inference services
│   ├── evaluation/           # Model benchmarking and evaluation tools
│   ├── server/               # REST and gRPC API services
│   └── runner/               # Model execution and pipeline management
├── ui/                       # React-based frontend
│   ├── public/               # Static assets
│   └── src/                  # React components and application logic
├── proto/                    # Protocol buffer definitions
├── docs/                     # Documentation
└── schemas/                  # Common schema definitions

Key Features

Multi-Modal Support: Text generation, image generation, and embeddings
Multiple API Interfaces: REST and gRPC endpoints
Model Management: Add, configure, and switch between models
Memory Optimization: Automatic memory management and resource allocation
Performance Monitoring: Logging and metrics collection
Session Management: User sessions and conversation context
Scalable Architecture: Components can be deployed independently
WebSocket Support: Real-time communication for chat and status updates
RabbitMQ Integration: Message queuing for asynchronous processing
Context Extension: Sophisticated system to extend LLM context windows
Schema Validation: YAML schemas for type-safety and consistency

Configuration Architecture

The platform uses a hierarchical configuration system that separates system administration from user preferences:

System Configuration: Infrastructure settings (ports, databases, logging) managed by operators
User Configuration: Workflow and tool preferences customizable per user via UI
Schema-Driven: YAML schemas automatically generate Python models and TypeScript types

Key configuration areas:

Workflow Settings: Caching, streaming, timeouts, multi-agent capabilities
Tool Management: Selection thresholds, generation preferences, execution settings
Memory & Context: Retrieval settings, circuit breakers, model profiles

See Configuration Architecture for detailed documentation.

Component Documentation

Each component has its own detailed README with specific instructions:

Inference Services - API services and model execution
UI Application - User interface for interacting with the services
YAML Schemas - Data structure definitions
Context Extension Architecture - LLM context window extension system
Dynamic Tool Generation - Tool generation for model execution
Configuration Architecture - Hierarchical configuration system
Composer Configuration Architecture - Configuration management rules for composer components
Multi-Tier User Config Caching - In-memory → Redis → Database caching system

Pipeline Documentation

The inference runner module includes comprehensive pipeline support for all model types. For developers building custom pipelines or working with existing ones:

Pipeline Documentation Overview - Complete guide to all available pipeline documentation
Pipeline Implementation Guide - Comprehensive step-by-step guide for implementing custom pipelines
Pipeline API Reference - Complete API documentation for all pipeline interfaces
Runner Architecture Overhaul - Recent improvements including streaming architecture and pipeline-specific processing

The pipeline system supports all major AI workflows including text generation, embeddings, image generation, and multimodal interactions with advanced features like circuit breakers, memory optimization, and real-time streaming.

Getting Started

Prerequisites

Python 3.12+ (for inference services)
Node.js 18+ (for UI)
Docker and Docker Compose (optional for containerized deployment)
CUDA-compatible GPU (recommended for performance)

Quick Start with Docker Compose

The simplest way to get started is using Docker Compose:

# Clone the repository
git clone https://github.com/LongStoryMedia/llmmllab.git
cd llmmllab

# Start all services
docker-compose up -d

This will start all the necessary services and make them available on their respective ports.

Manual Setup

For development or custom deployments, you can set up each component separately:

Set up inference services:
```
cd inference
./setup_environments.sh
```
Set up UI application:
```
cd ui
npm install
npm run dev
```

See the individual component READMEs for more detailed instructions.

Development

Schema-Driven Development

The platform uses YAML schemas to define data contracts and automatically generate Python models and TypeScript types.

Regenerating All Models

# Generate Python and TypeScript models from YAML schemas
./regenerate_models.sh

# Language-specific generation
./regenerate_models.sh python     # Generate only Python models
./regenerate_models.sh typescript # Generate only TypeScript models

Creating New Schemas

Create new YAML schema in schemas/[name].yaml
Generate Python model: schema2code schemas/[name].yaml -l python -o inference/models/[name].py
Generate TypeScript types: schema2code schemas/[name].yaml -l typescript -o ui/src/types/[name].ts

The schema2code tool automatically updates __init__.py with exports and maintains type consistency across the platform.

Schema Development Workflow

When modifying APIs or data structures:

Update the relevant YAML schema first
Run generation commands to update models
Test the changes with the generated types
Generated files: inference/models/*.py, ui/src/types/*.ts

Schema Design Rules

Avoid Duplication: If an enum or structure is used in multiple schemas, extract it to a separate schema file
Use $ref: Reference shared schemas using $ref: "shared_schema.yaml" instead of copying definitions
Single Source of Truth: Each data structure should be defined exactly once
Example: Instead of duplicating computational requirements enum, create computational_requirement.yaml and reference it

For more details on the schema architecture, see Intent Analysis Architecture.

Architecture

The system follows a microservice architecture where components communicate through well-defined APIs:

The UI makes direct requests to the Inference Services
WebSockets provide real-time communication for chat, image generation, and status updates
RabbitMQ handles asynchronous processing for computationally intensive tasks
PostgreSQL provides persistent storage for user data, conversations, and configurations

RabbitMQ Integration

The platform uses RabbitMQ as a message broker for:

Task Queuing: Managing computationally intensive tasks like image generation
Load Balancing: Distributing tasks across multiple worker instances
Priority Processing: Handling high-priority requests ahead of others
Failure Recovery: Ensuring tasks are not lost if a worker fails

Configuration is defined in schemas/rabbitmq_config.yaml.

WebSocket Communication

Real-time communication is handled through WebSocket connections for:

Chat Streaming: Streaming token-by-token responses for chat completions
Image Generation Status: Real-time updates on image generation progress
System Status: Updates on model loading, resource availability, and errors

WebSocket schemas are defined in schemas/web_socket_connection.yaml and related files.

Context Extension System

The platform includes a sophisticated Context Extension System (documented in context_extension.md) that:

Extends LLM Context Windows: Overcomes token limitations of models
Semantic Memory: Retrieves relevant conversation history
External Search: Incorporates real-time web knowledge
Hierarchical Summarization: Compresses conversation context intelligently

Release Notes

Version history and release notes are maintained in docs/releases/. See the CHANGELOG for a detailed history of changes across versions.

License

MIT License

Name		Name	Last commit message	Last commit date
Latest commit History 792 Commits
.github		.github
.kiro/steering		.kiro/steering
docs		docs
inference		inference
schemas		schemas
ui		ui
.gitignore		.gitignore
.pylintrc		.pylintrc
Dockerfile.old		Dockerfile.old
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
builder-refactor.md		builder-refactor.md
clean_empty.sh		clean_empty.sh
deep-research-requirements.md		deep-research-requirements.md
llama-direct-2.sh		llama-direct-2.sh
llama-direct.sh		llama-direct.sh
localai-refactor-requirements.md		localai-refactor-requirements.md
pyrightconfig.json		pyrightconfig.json
refactor-requirements.md		refactor-requirements.md
regenerate_models.sh		regenerate_models.sh
resolve_conflicts.sh		resolve_conflicts.sh
structured-output-requirements.md		structured-output-requirements.md
test.json		test.json
test_ui_exact.py		test_ui_exact.py
todo.md		todo.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

LLM ML Lab

Overview

Project Structure

Key Features

Configuration Architecture

Component Documentation

Pipeline Documentation

Getting Started

Prerequisites

Quick Start with Docker Compose

Manual Setup

Development

Schema-Driven Development

Regenerating All Models

Creating New Schemas

Schema Development Workflow

Schema Design Rules

Architecture

RabbitMQ Integration

WebSocket Communication

Context Extension System

Release Notes

License

About

Uh oh!

Releases

Packages

Languages

License

LongStoryMedia/llmmllab

Folders and files

Latest commit

History

Repository files navigation

LLM ML Lab

Overview

Project Structure

Key Features

Configuration Architecture

Component Documentation

Pipeline Documentation

Getting Started

Prerequisites

Quick Start with Docker Compose

Manual Setup

Development

Schema-Driven Development

Regenerating All Models

Creating New Schemas

Schema Development Workflow

Schema Design Rules

Architecture

RabbitMQ Integration

WebSocket Communication

Context Extension System

Release Notes

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages