LangExtract

A Rust library for extracting structured data from unstructured text using Large Language Models (LLMs), featuring precise source attribution and interactive visualization capabilities.

Features

🚀 Multiple LLM Support: DeepSeek, OpenAI, and Ollama models
📝 Structured Extraction: Extract entities with attributes and relationships
🎯 Precise Attribution: Track exact source positions for every extraction
🔄 Flexible Formats: Support for YAML and JSON output formats
📊 Interactive Visualization: Generate HTML visualizations of results
⚡ Async/Await: Built with modern Rust async patterns
🛡️ Type Safety: Leverage Rust's type system for reliable extractions

Quick Start

Installation

Add this to your Cargo.toml:

[dependencies]
langextract = "0.1"
tokio = { version = "1", features = ["macros", "rt-multi-thread"] }

Basic Usage

use langextract::{
    annotation::Annotator,
    data::{Document, FormatType},
    inference::DeepSeekLanguageModel,
    prompting::PromptTemplateStructured,
    resolver::Resolver,
};

fn main() -> Result<(), Box<dyn std::error::Error>> {
    // 1. Set up your API key
    let api_key = std::env::var("DEEPSEEK_API_KEY")?;

    // 2. Create a prompt template
    let prompt = PromptTemplateStructured {
        description: "Extract names of people mentioned in the text.".to_string(),
        examples: vec![],
    };

    // 3. Initialize the model
    let model = DeepSeekLanguageModel::new(
        None, api_key, None, Some(FormatType::Yaml), Some(0.1), Some(1), None
    )?;

    // 4. Create annotator and resolver
    let annotator = Annotator::new(model, prompt, FormatType::Yaml, None, true);
    let resolver = Resolver::new(true, None, None, false);

    // 5. Process your text
    let text = "Alice met Bob at the coffee shop. Charlie joined them later.";
    let document = Document::new(text.to_string(), Some("example".to_string()), None);

    let results = annotator.annotate_documents(
        vec![document], &resolver, 1000, 1, true, 1, None
    )?;

    // 6. Use the results
    if let Some(extractions) = &results[0].extractions {
        for extraction in extractions {
            println!("Found: {}", extraction.extraction_text);
        }
    }

    Ok(())
}

Examples

We provide several examples to help you get started:

🌟 Getting Started

The simplest possible example - perfect for beginners!

export DEEPSEEK_API_KEY="your-api-key-here"
cargo run --example getting_started

📝 Simple Extraction

Basic entity extraction with minimal setup:

cargo run --example simple_extraction

🎭 Character Extraction

Advanced example with detailed prompts and attributes:

cargo run --example character_extraction

See the examples/ directory for complete code and detailed documentation.

Supported Models

DeepSeek

let model = DeepSeekLanguageModel::new(
    Some("deepseek-chat".to_string()),
    api_key,
    None, // Use default base URL
    Some(FormatType::Yaml),
    Some(0.1), // Temperature
    Some(1),   // Max workers
    None,      // Extra kwargs
)?;

OpenAI

let model = OpenAILanguageModel::new(
    Some("gpt-4".to_string()),
    api_key,
    None, // Use default base URL
    None, // Organization
    Some(FormatType::Json),
    Some(0.1),
    Some(1),
    None,
)?;

Ollama

let model = OllamaLanguageModel::new(
    "llama2:latest",
    Some("http://localhost:11434".to_string()),
    Some("json".to_string()),
    None,
    None,
);

API Documentation

Core Components

Annotator: Main interface for text processing
Document: Represents input text with metadata
Extraction: Structured output with source attribution
Resolver: Converts LLM output to structured data
PromptTemplateStructured: Template for few-shot learning

Output Formats

LangExtract supports both YAML and JSON output formats:

// YAML format (default)
let annotator = Annotator::new(model, prompt, FormatType::Yaml, None, true);

// JSON format
let annotator = Annotator::new(model, prompt, FormatType::Json, None, true);

Environment Variables

Variable	Description	Required
`DEEPSEEK_API_KEY`	DeepSeek API key	For DeepSeek models
`OPENAI_API_KEY`	OpenAI API key	For OpenAI models

Error Handling

LangExtract uses custom error types for clear error reporting:

use langextract::inference::InferenceOutputError;

match annotator.annotate_documents(documents, &resolver, 1000, 1, true, 1, None) {
    Ok(results) => println!("Success: {} documents processed", results.len()),
    Err(InferenceOutputError { message }) => eprintln!("Error: {}", message),
}

Contributing

We welcome contributions! Please see our Contributing Guide for details.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgments

Inspired by the Python langextract library, this Rust implementation brings type safety and performance to structured text extraction.

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
.github/workflows		.github/workflows
examples		examples
src		src
tests		tests
.gitignore		.gitignore
.rustfmt.toml		.rustfmt.toml
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
LICENSE		LICENSE
README.md		README.md
VISUALIZATION_GUIDE.md		VISUALIZATION_GUIDE.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LangExtract

Features

Quick Start

Installation

Basic Usage

Examples

🌟 Getting Started

📝 Simple Extraction

🎭 Character Extraction

Supported Models

DeepSeek

OpenAI

Ollama

API Documentation

Core Components

Output Formats

Environment Variables

Error Handling

Contributing

License

Acknowledgments

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

LangExtract

Features

Quick Start

Installation

Basic Usage

Examples

🌟 Getting Started

📝 Simple Extraction

🎭 Character Extraction

Supported Models

DeepSeek

OpenAI

Ollama

API Documentation

Core Components

Output Formats

Environment Variables

Error Handling

Contributing

License

Acknowledgments

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages