libmagic-rs

A pure-Rust implementation of libmagic, the library that powers the file command for identifying file types. This project provides a memory-safe, efficient alternative to the C-based libmagic library.

Note

This is a clean-room implementation inspired by the original libmagic project. We respect and acknowledge the original work by Ian Darwin and the current maintainers led by Christos Zoulas.

Project Status

Active Development (Phase 1 MVP) - The core file identification pipeline is functional. You can identify common file types using text magic files today.

Current Metrics:

17,000+ lines of Rust code
650+ tests with comprehensive coverage
Zero unsafe code with memory safety guarantees
Zero warnings with strict clippy linting

What Works Today

File type identification - Identify files using text magic file databases
Text and JSON output - Both output formats supported via --json flag
Custom magic files - Use --magic-file to specify your own rules
Memory-mapped I/O - Efficient file reading with bounds checking
Hierarchical rule matching - Full nested rule evaluation
Platform detection - Automatic magic file discovery on Unix systems

In Progress (Phase 1 Completion)

Multiple file support - Process multiple files in one command
Stdin input - Pipe data via rmagic -
Built-in fallback rules - Work without external magic files via --use-builtin
Magdir directory loading - Load all files from a magic directory
Compatibility testing - Validation against GNU file command output

Phase 1 Goals

95%+ compatibility with GNU file for common file types
85% test coverage across all modules
Complete documentation with rustdoc and mdbook site

Overview

libmagic-rs is designed to replace libmagic with a safe, efficient Rust implementation that:

Memory Safety: Pure Rust with no unsafe code (except vetted crates)
Performance: Uses memory-mapped I/O for efficient file reading
Compatibility: Supports common magic file syntax (offsets, types, operators, nesting)
Extensibility: Designed for modern use cases (PE resources, Mach-O, Go build info)
Multiple Output Formats: Classic text output and structured JSON

Features

Core Capabilities

Parse text magic files (DSL for byte-level file type detection)
Evaluate magic rules against file buffers to identify file types
Absolute offset specifications (indirect/relative in Phase 2)
Multiple data types: byte, short, long, quad, string
Hierarchical rule evaluation with proper nesting
Memory-mapped file I/O for efficient processing
Confidence scoring based on match depth

Output Formats

Text Output (Default):

ELF 64-bit LSB executable, x86-64, version 1 (SYSV)

JSON Output:

{
  "filename": "example.bin",
  "matches": [
    {
      "text": "ELF 64-bit LSB executable",
      "offset": 0,
      "value": "7f454c46",
      "tags": [
        "executable",
        "elf"
      ],
      "score": 90,
      "mime_type": "application/x-executable"
    }
  ],
  "metadata": {
    "file_size": 8192,
    "evaluation_time_ms": 2.3,
    "rules_evaluated": 45
  }
}

Quick Start

Installation

# Clone the repository
git clone https://github.com/EvilBit-Labs/libmagic-rs.git
cd libmagic-rs

# Build the project
cargo build --release

# Run tests
cargo test

CLI Usage

# Basic file identification
./target/release/rmagic file.bin

# JSON output with metadata
./target/release/rmagic file.bin --json

# Use custom magic file
./target/release/rmagic file.bin --magic-file custom.magic

Note

Multiple file support (rmagic file1.bin file2.bin) and stdin input (cat file | rmagic -) are planned for Phase 1 completion.

Library Usage

use libmagic_rs::MagicDatabase;

// Load magic rules from a text magic file
let db = MagicDatabase::load_from_file("/usr/share/misc/magic")?;

// Identify file type
let result = db.evaluate_file("example.bin")?;
println!("File type: {}", result.description);
println!("Confidence: {:.0}%", result.confidence * 100.0);

// Or evaluate an in-memory buffer
let buffer = std::fs::read("example.bin")?;
let result = db.evaluate_buffer(&buffer)?;
if let Some(mime) = result.mime_type {
    println!("MIME type: {}", mime);
}

Note

The library currently supports text-format magic files. Binary .mgc format support is planned for Phase 2, following the proven OpenBSD approach of parsing text format directly.

Architecture

The project follows a parser-evaluator architecture:

Magic File → Parser → AST → Evaluator → Match Results → Output Formatter
     ↓
Target File → Memory Mapper → File Buffer

Core Modules

Parser (src/parser/): Magic file DSL parsing into Abstract Syntax Tree
- ast.rs: Core AST data structures
- grammar.rs: nom-based parsing components
- mod.rs: Parser interface with text magic file support
Evaluator (src/evaluator/): Rule evaluation engine
- Offset resolution (absolute offsets supported, indirect in Phase 2)
- Type interpretation with endianness handling
- Comparison and bitwise operations
- Confidence scoring based on match depth
Output (src/output/): Result formatting
- Text formatter (GNU file compatible)
- JSON formatter with metadata
IO (src/io/): File access utilities
- Memory-mapped file buffers with FileBuffer
- Safe bounds checking with comprehensive error handling
- Resource management with RAII patterns

Key Data Structures

pub struct MagicRule {
    pub offset: OffsetSpec,
    pub typ: TypeKind,
    pub op: Operator,
    pub value: Value,
    pub message: String,
    pub children: Vec<MagicRule>,
    pub level: u32,
}

pub enum OffsetSpec {
    Absolute(i64),
    Indirect {
        base_offset: i64,
        pointer_type: TypeKind,
        adjustment: i64,
        endian: Endianness,
    },
    Relative(i64),
    FromEnd(i64),
}

pub enum TypeKind {
    Byte,
    Short { endian: Endianness, signed: bool },
    Long { endian: Endianness, signed: bool },
    String { max_length: Option<usize> },
}

pub enum Value {
    Uint(u64),
    Int(i64),
    Bytes(Vec<u8>),
    String(String),
}

Development

Prerequisites

Rust 1.85+ (2024)
Cargo
Git

Building

# Development build
cargo build

# Release build with optimizations
cargo build --release

# Check without building
cargo check

Testing

# Run all tests (650+ tests)
cargo test

# Run with nextest (faster test runner)
cargo nextest run

# Run specific test module
cargo test parser::grammar::tests
cargo test parser::ast::tests

# Test with coverage reporting
cargo llvm-cov --html

# Run compatibility tests against GNU file
cargo test --test compatibility

Current Test Coverage:

650+ tests covering parser, evaluator, I/O, and CLI components
Parser testing for numbers, offsets, operators, values, and rule hierarchies
Evaluator testing for rule matching and confidence scoring
I/O testing for FileBuffer, memory mapping, and error handling
CLI testing for argument parsing and output formatting
Compatibility testing against GNU file command output
Target: >85% test coverage for Phase 1 completion

Compatibility Testing

We maintain strict compatibility with the original file project by testing against their complete test suite. This ensures our implementation produces identical results to the original libmagic library.

The compatibility test suite includes:

All test files from the original file project
Expected output validation against GNU file command
Performance regression testing
Edge case handling verification

Code Quality

# Format code
cargo fmt

# Lint code (strict mode)
cargo clippy -- -D warnings

# Generate documentation
cargo doc --open

# Run benchmarks
cargo bench

Project Structure

libmagic-rs/
├── Cargo.toml              # Project manifest and dependencies
├── src/
│   ├── lib.rs              # Library root and public API
│   ├── main.rs             # CLI binary entry point
│   ├── parser/              # Magic file parser module
│   ├── evaluator/           # Rule evaluation engine
│   ├── output/              # Output formatting
│   ├── io/                  # Memory-mapped file I/O
│   └── error.rs             # Error types and handling
├── tests/                   # Integration tests
├── benches/                 # Performance benchmarks
├── magic/                   # Magic file databases
└── docs/                    # Documentation

Performance

The implementation includes:

Memory-mapped I/O: Efficient file access without loading entire files
Zero-copy operations: Minimize allocations during evaluation
Early termination: Stop evaluation at first match when appropriate

Planned optimizations (Phase 2+):

Aho-Corasick indexing for fast multi-pattern string search
Compiled rule caching for repeated use
Performance benchmarking against libmagic

Benchmarks

Performance targets (Phase 3):

Match or exceed libmagic performance within 10%
Memory usage comparable to libmagic
Fast startup with large magic databases

Compatibility

Magic File Support

Supported (Phase 1):

Text magic file format (the stable, documented format)
Hierarchical rule nesting with indentation levels
Absolute offset specifications
Core types: byte, short, long, quad, string
Core operators: =, !=, &, <, >
Endianness handling for multi-byte types
Magdir-style directory loading

Phase 2:

Binary .mgc compiled format
Indirect offset resolution
Regex patterns

Text-First Approach

libmagic-rs follows the OpenBSD approach: parse text magic files directly, prioritizing simplicity and correctness over binary format complexity. This is the same strategy used by OpenBSD's file implementation and other successful reimplementations like PolyFile.

Why text format first?

Text magic format is stable across libmagic versions
Binary .mgc has version lock-in issues (format changes between releases)
Simpler codebase (~1,500 lines vs ~3,000 for binary parsing)
Easier debugging and testing

Migration from libmagic

The library provides a migration path from C-based libmagic:

Similar API patterns where possible
Compatibility testing with GNU file command results
Text magic files work unchanged from system installations

Security

Memory Safety: No unsafe code except in vetted dependencies
Bounds Checking: All buffer access protected by bounds checking
Safe File Handling: Graceful handling of truncated/corrupted files
Fuzzing Integration: Robustness testing with malformed inputs

Contributing

Fork the repository
Create a feature branch (git checkout -b feature/amazing-feature)
Make your changes
Run tests and ensure they pass (cargo test)
Run clippy to check for issues (cargo clippy -- -D warnings)
Commit your changes (git commit -m 'Add amazing feature')
Push to the branch (git push origin feature/amazing-feature)
Open a Pull Request

Development Guidelines

Follow Rust naming conventions
Add tests for new functionality
Update documentation for API changes
Ensure all code passes cargo clippy -- -D warnings
Maintain >85% test coverage

License

This project is licensed under the Apache License 2.0 - see the LICENSE file for details.

Roadmap

Phase 1: MVP (v0.1) - Current Focus

Core Infrastructure (Complete):

Core AST data structures with comprehensive serialization
Magic file parser for text format with hierarchical rules
Rule evaluation engine with confidence scoring
Memory-mapped file I/O with FileBuffer
Text and JSON output formatters
CLI with --json and --magic-file flags
Comprehensive error handling

In Progress:

Multiple file support in CLI
Stdin input support (rmagic -)
Built-in fallback rules (--use-builtin)
Magdir directory loading (load all files from /usr/share/file/magic/Magdir/)
Strength calculation (libmagic's !:strength parsing)
Complete rustdoc and mdbook documentation

Success Criteria:

95%+ compatibility with GNU file for common types (ELF, PE, ZIP, JPEG, PNG, PDF)
85% test coverage

Phase 2: Enhanced Features (v0.2)

Binary .mgc format support (deferred per OpenBSD approach)
Indirect offset resolution
Regex support with binary-safe matching
Compiled rule caching for faster startup
Additional operators and type support
Aho-Corasick string indexing

Phase 3: Performance & Compatibility (v0.3)

Performance optimizations and benchmarking
Full libmagic syntax compatibility
PE/Mach-O/ELF format-specific detection
Go build info extraction

Phase 4: Production Ready (v1.0)

Stable API with semver guarantees
Migration guide from C libmagic
Performance parity validation
Fuzzing and security testing
crates.io publication

Support

Documentation: Project Documentation
Issues: GitHub Issues
Discussions: GitHub Discussions

Acknowledgments

Ian Darwin for the original file command and libmagic implementation
Christos Zoulas and the current libmagic maintainers
The original libmagic project for establishing the magic file format standard
Rust community for excellent tooling and ecosystem
Contributors and testers who help improve the project

Name		Name	Last commit message	Last commit date
Latest commit History 30 Commits
.devcontainer		.devcontainer
.github		.github
.kiro		.kiro
.serena		.serena
.vscode		.vscode
docs		docs
src		src
tests		tests
third_party		third_party
.coderabbitai.yaml		.coderabbitai.yaml
.editorconfig		.editorconfig
.gitattributes		.gitattributes
.gitignore		.gitignore
.gitmodules		.gitmodules
.markdownlint-cli2.jsonc		.markdownlint-cli2.jsonc
.markdownlint.json		.markdownlint.json
.mdformat.toml		.mdformat.toml
.pre-commit-config.yaml		.pre-commit-config.yaml
.prettierignore		.prettierignore
AGENTS.md		AGENTS.md
CLAUDE.md		CLAUDE.md
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
Cargo.toml		Cargo.toml
LICENSE		LICENSE
README.md		README.md
SECURITY.md		SECURITY.md
build.rs		build.rs
codecov.yml		codecov.yml
deny.toml		deny.toml
dist-workspace.toml		dist-workspace.toml
justfile		justfile
mise.toml		mise.toml
missing.magic		missing.magic
nonexistent.magic		nonexistent.magic
rust-toolchain.toml		rust-toolchain.toml
rustfmt.toml		rustfmt.toml

Uh oh!

License

EvilBit-Labs/libmagic-rs

Folders and files

Latest commit

History

Repository files navigation

libmagic-rs

Project Status

What Works Today

In Progress (Phase 1 Completion)

Phase 1 Goals

Overview

Features

Core Capabilities

Output Formats

Quick Start

Installation

CLI Usage

Library Usage

Architecture

Core Modules

Key Data Structures

Development

Prerequisites

Building

Testing

Compatibility Testing

Code Quality

Project Structure

Performance

Benchmarks

Compatibility

Magic File Support

Text-First Approach

Migration from libmagic

Security

Contributing

Development Guidelines

License

Roadmap

Phase 1: MVP (v0.1) - Current Focus

Phase 2: Enhanced Features (v0.2)

Phase 3: Performance & Compatibility (v0.3)

Phase 4: Production Ready (v1.0)

Support

Acknowledgments

About

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Sponsor this project

Uh oh!

Uh oh!

Contributors 5

Uh oh!

Languages