Skip to content

A simple and minimal open source implementation of "Introducing LFM2: The Fastest On-Device Foundation Models on the Market" from Liquid AI in Pytorch

License

Notifications You must be signed in to change notification settings

kyegomez/LFM2

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

LFM2 - Liquid Foundation Model 2 (Minimal Implementation)

PyPI version License: MIT

This is a minimal, open-source implementation of the Liquid Foundation Model 2 (LFM2) architecture described in Liquid AI's blog post. Since there is no official open-source implementation available, this repository provides a PyTorch implementation of the core architecture for research and educational purposes.

Features

  • Hybrid architecture combining short-range convolutions with grouped query attention
  • 16 blocks: 10 LIV convolution blocks + 6 GQA blocks
  • Double-gated short-range convolutions for efficient local processing
  • Grouped Query Attention (GQA) for efficient global attention
  • SwiGLU activation functions
  • RMSNorm normalization
  • Rotary Positional Embeddings (RoPE)

Model Sizes

The implementation supports three model sizes:

  • 350M parameters (768 hidden size)
  • 700M parameters (1024 hidden size)
  • 1.2B parameters (1536 hidden size)

Installation

pip3 install -U lfm2 

Quick Start

import torch
from lfm2.main import create_lfm2_model

# Create a model
model = create_lfm2_model(
    model_size="700M",  # Choose from: "350M", "700M", "1.2B"
    vocab_size=32768,
    max_seq_length=32768,
    verbose=True
)

# Example forward pass
batch_size = 2
seq_length = 32
input_ids = torch.randint(0, model.config.vocab_size, (batch_size, seq_length))

# Generate outputs
with torch.no_grad():
    outputs = model(input_ids)
    logits = outputs["logits"]

Architecture Details

LIV Convolution Blocks

The model uses Linear Input-Varying (LIV) convolution blocks that combine double-gating with short-range convolutions:

def lfm2_conv(x):
    B, C, x = linear(x)    # input projection
    x = B*x                # gating (gate depends on input)
    x = conv(x)            # short conv
    x = C*x                # gating
    x = linear(x)
    return x

Grouped Query Attention

The model implements Grouped Query Attention (GQA) for efficient global attention processing, reducing memory and computational requirements while maintaining model quality.

Usage Examples

Check example.py for detailed usage examples including:

  1. Basic forward pass
  2. Forward pass with attention masks
  3. Forward pass with caching
  4. Forward pass with all outputs
  5. Forward pass with custom position IDs

Citation

If you use this implementation in your research, please cite:

@misc{lfm2_minimal,
  author = {Kye Gomez},
  title = {LFM2: Minimal Implementation of Liquid Foundation Model 2},
  year = {2024},
  publisher = {GitHub},
  url = {https://github.com/kyegomez/LFM2}
}

Disclaimer

This is an unofficial, minimal implementation based on publicly available information about the LFM2 architecture. It is not affiliated with or endorsed by Liquid AI. The implementation may differ from the original model in various aspects.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

About

A simple and minimal open source implementation of "Introducing LFM2: The Fastest On-Device Foundation Models on the Market" from Liquid AI in Pytorch

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Sponsor this project

 

Packages

No packages published

Languages