Skip to content

🎩 A comprehensive document authenticity verification tool with advanced image forensics and LLM-based text analysis.

Notifications You must be signed in to change notification settings

pirhoo/lupin-cli

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

37 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Lupin CLI

Lupin CLI is a command-line toolkit designed to verify the authenticity of digital documents using state-of-the-art image forensics, metadata inspection, and LLM-powered text analysis. It brings together a collection of proven forensic algorithms and modern AI techniques into a single, unified, scriptable interface.

The tool is built for investigators, journalists, researchers, archivists, and security teams who need a fast and reliable way to assess whether images or documents have been manipulated. Lupin CLI performs deep analysis at multiple levels: compression artifacts, sensor noise patterns, lighting consistency, metadata coherence, and cross-checked AI summaries that consolidate all findings into a human-readable verdict.

Installation

# Install UV if not already installed
curl -LsSf https://astral.sh/uv/install.sh | sh

# Clone and install
git clone https://github.com/your-repo/lupin-cli.git
cd lupin-cli
make install

# (Optional) Configure LLM for AI-powered summaries
lupin bootstrap

Quick Start

# Analyze an image
lupin check image.jpg

# Analyze a directory recursively
lupin check ./documents/ --recursive

# Export results to CSV
lupin check *.jpg --output results.csv

Analyzers

Lupin CLI uses multiple forensic techniques to detect image manipulation. Each analyzer focuses on a specific aspect of image authenticity:

Analyzer Method What It Detects Best For
Metadata EXIF/XMP analysis Missing camera data, editing software traces, timestamp anomalies Detecting edited photos, identifying source
ELA Error Level Analysis Compression artifact inconsistencies from local edits JPEG manipulation, splicing, cloning
JPEG Ghost Multi-quality recompression Regions saved at different JPEG qualities Detecting pasted content from other JPEGs
Double JPEG DCT coefficient analysis Images compressed multiple times Re-saved/edited JPEGs
Copy-Move Keypoint matching (ORB) Duplicated/cloned regions within the image Object removal, duplication fraud
PRNU Sensor noise pattern analysis Inconsistent camera sensor fingerprints Spliced regions from different cameras
Interpolation Frequency domain analysis Resampling artifacts from resize/rotate Scaled or rotated paste operations
Noise Inconsistency Block-wise noise variance Regions with different noise characteristics Composited images from multiple sources
Chromatic Aberration Color channel misalignment Inconsistent lens distortion patterns Spliced content from different lenses
Shadow/Lighting Gradient and shadow analysis Inconsistent light direction or shadow color Composited objects with wrong lighting
Visual Inconsistency Edge, texture, color analysis Abrupt visual discontinuities General compositing and local edits
LLM Summary AI cross-analysis Synthesizes all results into verdict Human-readable final assessment

Detection Strategy

The analyzers work together using complementary approaches:

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                         IMAGE AUTHENTICITY                              β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚  COMPRESSION ARTIFACTS          SENSOR/NOISE PATTERNS                   β”‚
β”‚  β”œβ”€β”€ ELA                        β”œβ”€β”€ PRNU                                β”‚
β”‚  β”œβ”€β”€ JPEG Ghost                 β”œβ”€β”€ Noise Inconsistency                 β”‚
β”‚  └── Double JPEG                └── Chromatic Aberration                β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚  GEOMETRIC TRANSFORMS           VISUAL CONSISTENCY                      β”‚
β”‚  β”œβ”€β”€ Interpolation              β”œβ”€β”€ Visual Inconsistency                β”‚
β”‚  └── Copy-Move                  └── Shadow/Lighting                     β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚  METADATA                       AI SYNTHESIS                            β”‚
β”‚  └── EXIF/XMP Analysis          └── LLM Summary                         β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Scoring: Each analyzer produces a score (0.0 = manipulated, 1.0 = authentic) and confidence level. The overall score is calculated using both confidence and analyzer reliability weights:

Reliability Analyzers Weight
High Metadata, ELA, Copy-Move, Double JPEG 0.85 - 1.0
Medium Interpolation, PRNU, Visual Inconsistency, Chromatic Aberration 0.6 - 0.7
Lower JPEG Ghost, Noise Inconsistency, Shadow/Lighting 0.4 - 0.5

This weighting ensures that analyzers prone to false positives on authentic images don't unfairly drag down the overall score.

Interpretation:

  • 0.77 - 1.0: Likely authentic
  • 0.35 - 0.77: Uncertain, requires review
  • 0.0 - 0.35: Likely manipulated

Understanding the Output

When you export results to CSV (--output results.csv), Lupin creates a spreadsheet you can open in Excel, Google Sheets, or any spreadsheet application.

CSV Columns Explained

Column What It Means
file_path The file that was analyzed
file_hash SHA-256 hash of file contents (for verification/deduplication)
file_type Type of file (image, pdf, docx)
is_embedded "Yes" if this is an image extracted from a PDF/DOCX
parent_file If embedded, which document contained this image
method Which analyzer produced this row (or "FINAL" for the summary)
method_score Score from 0.0 to 1.0 (higher = more likely authentic)
method_confidence How confident the analyzer is (0.0 to 1.0)
method_details Technical details about what was found
verdict Only in FINAL rows: "LIKELY AUTHENTIC", "UNCERTAIN", or "LIKELY MANIPULATED"
llm_verdict AI's assessment (if LLM is configured)
llm_reasoning AI's explanation of its verdict
llm_provider Which AI service was used

How to Read the Results

  1. Look at FINAL rows first - Filter the spreadsheet to show only rows where method = "FINAL". These give you the overall verdict for each file.

  2. Check the verdict column - This is your quick answer:

    • LIKELY AUTHENTIC - No strong signs of manipulation detected
    • UNCERTAIN - Some anomalies found; manual review recommended
    • LIKELY MANIPULATED - Multiple indicators suggest the image was edited
  3. Review individual analyzer scores - If a file is marked UNCERTAIN or LIKELY MANIPULATED, look at which analyzers gave low scores to understand what type of manipulation may have occurred.

  4. Consider the context - A low score on one analyzer isn't proof of manipulation. Look for patterns across multiple analyzers.

Example: Reading a Result

file_path: photo.jpg
method: FINAL
method_score: 0.42
verdict: UNCERTAIN
llm_verdict: LIKELY MANIPULATED
llm_reasoning: "ELA shows compression inconsistencies in the upper-left region.
                Metadata indicates the image was processed with Photoshop."

This tells you:

  • The overall score (0.42) falls in the uncertain range
  • The AI detected specific issues with compression and metadata
  • You should examine the image more closely, particularly the upper-left area

Tips for Non-Technical Users

  • Start with the verdict - Don't get overwhelmed by numbers; the verdict column gives you the bottom line
  • Trust patterns, not single scores - One low score might be a false positive; multiple low scores are more significant
  • LLM reasoning is your friend - If configured, the AI explanation translates technical findings into plain language
  • When in doubt, flag for review - UNCERTAIN means exactly that; get a second opinion from an expert

Important Caveats

Absence of evidence is not evidence of absence. A "LIKELY AUTHENTIC" verdict means no manipulation was detected by these specific techniques - it does not guarantee the image is unaltered. Sophisticated edits, AI-generated images, or manipulations outside the scope of these analyzers may go undetected.

Extraordinary claims require extraordinary evidence. If you're making serious accusations based on analysis results, ensure you have strong, corroborating evidence from multiple sources. A single tool's output - no matter how sophisticated - should be one piece of a larger investigation, not the sole basis for conclusions. Always consider alternative explanations and seek expert validation for high-stakes decisions.

Configuration

Quick Setup

lupin bootstrap

This interactive wizard configures your LLM provider (Anthropic, OpenAI, or OLLAMA) and saves settings to .env.

To verify your configuration:

lupin config

Manual Configuration

# LLM Provider (anthropic, openai, or ollama)
export LUPIN_LLM_PROVIDER=anthropic
export LUPIN_ANTHROPIC_API_KEY=your_key

# Or for local OLLAMA
export LUPIN_LLM_PROVIDER=ollama
export LUPIN_OLLAMA_HOST=http://localhost:11434
export LUPIN_OLLAMA_MODEL=llama3.2

GPU Acceleration (Optional)

# Install GPU support (requires NVIDIA GPU + CUDA 12.x)
uv sync --extra gpu

GPU provides 2-3x speedup for Visual Inconsistency and PRNU analysis. The tool automatically falls back to CPU if GPU is unavailable.

Development

# Install dev dependencies
make dev

# Run tests
make test

# Run linter
make lint

# Format code
make format

Supported File Types

  • Images: Any format recognized by your system's file type detection, including:
    • Common formats: JPEG, PNG, GIF, BMP, TIFF, WebP, AVIF
    • Professional formats: PSD, EPS, PCX, TGA, DDS
    • Scientific formats: FITS, HDF5
    • Other formats: ICO, ICNS, PPM, PGM, PBM, SGI, QOI, etc.
  • Documents: PDF, DOCX, DOC

Limitations

  • Image forensics work best on JPEG images
  • Text analysis requires LLM API access
  • PRNU analysis is most effective on images from the same camera
  • Results are indicators, not definitive proof

Why the Name "Lupin"?

Inspired by Arsène Lupin, the master of disguise in French literature, this tool aims to reveal what is hidden behind the surface: uncovering manipulations, inconsistencies, and digital "disguises" that might go unnoticed during manual inspection.

License

MIT License

About

🎩 A comprehensive document authenticity verification tool with advanced image forensics and LLM-based text analysis.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published