Skip to content

An end-to-end pipeline for firmware analysis and cryptographic function detection. Static analysis (Ghidra), dynamic tracing (Qiling), and feature extraction to produce datasets for binary analysis.

License

Notifications You must be signed in to change notification settings

pointblank-club/vestigo

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

209 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Vestigo: Firmware analysis & crypto-detection pipeline

Vestigo is a collection of tools, scripts and services to automate the process of (1) producing cross-compiled test binaries, (2) statically and dynamically analyzing firmware/binaries, (3) extracting ML-ready features, and (4) producing datasets and inference results for cryptographic-function detection. The repo combines headless Ghidra-based extraction, Qiling-based dynamic tracing, a dataset generation pipeline (including optional LLM assisted labeling), and a small backend + frontend for web access.

This README gives a concise, practical overview and quickstart so you can get the pipeline running and contribute.

Key project goals

  • Extract function-level and trace-level features suitable for ML
  • Provide utilities for static (Ghidra) and dynamic (Qiling) analysis
  • Offer scripts to build training CSVs and run inference
  • Provide a backend API and frontend for file upload and analysis

Quick facts / highlights

  • Languages: Python (main tooling & backend), TypeScript/React frontend
  • Major folders: ghidra_scripts, qiling_analysis, ml, backend, frontend
  • Important entry points:
    • generate_dataset.py — create ML CSVs from Ghidra JSONs
      • analyzer.py, bare_metal.py, main.py — orchestrate analysis flows
      • factory/builder.py — cross-compile sources across arch/opt matrix
      • qiling_analysis/ — dynamic tracing & batch extraction pipeline
      • backend/ — FastAPI backend with analysis endpoints

Quick Setup

1. Automated Installation

./setup.sh
source activate_vestigo.sh

What gets installed:

  • Python environment with all dependencies (FastAPI, Qiling, ML libraries)
  • Ghidra headless analyzer (/opt/ghidra)
  • Qiling framework + rootfs
  • Cross-compiler toolchains (ARM, MIPS, AArch64)
  • Container runtime (Podman/Docker)

Options: --minimal | --skip-ghidra | --skip-ml | --help

2. Manual Steps Required

Frontend (Node.js 18+):

# Install Node.js for your OS, then:
cd frontend && npm install && cd ..

Database (PostgreSQL):

# Option A: Local
sudo apt install postgresql && sudo -u postgres createdb vestigo

# Option B: Cloud (https://neon.tech - recommended)
# Get connection string and add to .env

Configure .env:

DATABASE_URL=postgresql://user:pass@host:5432/vestigo
OPENAI_API_KEY=sk-your-key-here  # Get from platform.openai.com

Initialize Database:

cd backend && prisma db push && prisma generate && cd ..

Usage

Always activate environment first: source activate_vestigo.sh

Static Analysis (Ghidra)

python3 scripts/analyzer.py <binary>

Dynamic Analysis (Qiling)

python3 qiling_analysis/tests/verify_crypto.py <binary>

Generate ML Dataset

python3 scripts/generate_dataset.py --input-dir ghidra_output --output dataset.csv

Batch Processing

python3 qiling_analysis/batch_extract_features.py \
    --dataset-dir ./dataset_binaries --output-dir ./results --parallel 4

Cross-Compile Binaries

python3 factory/builder.py --source algorithm.c

LLM Crypto Analysis

python3 qiling_analysis/tests/llm/crypto_deep_analyzer.py --strace trace.log --output analysis.json

Run Web Interface

# Backend (terminal 1)
cd backend && uvicorn main:app --reload

# Frontend (terminal 2)
cd frontend && npm run dev

Project Structure

vestigo-data/
├── setup.sh                 # Automated installation
├── activate_vestigo.sh      # Environment activation
├── backend/                 # FastAPI server
├── frontend/                # React UI
├── factory/                 # Cross-compilation tools
├── ghidra_scripts/          # Ghidra analysis scripts
├── qiling_analysis/         # Dynamic tracing pipeline
├── ml/                      # ML models and training
├── scripts/                 # Analysis orchestration
└── dataset_binaries/        # Sample binaries

Key Scripts:

  • scripts/analyzer.py - Ghidra static analysis
  • scripts/generate_dataset.py - Create ML datasets
  • qiling_analysis/tests/verify_crypto.py - Dynamic analysis
  • factory/builder.py - Cross-compilation

Troubleshooting

Issue Solution
Virtual environment not found Run ./setup.sh
Import errors pip install -r requirements.txt -r backend/requirements.txt
Qiling rootfs missing git clone --depth 1 https://github.com/qilingframework/rootfs.git qiling_analysis/rootfs
Ghidra not found Set export GHIDRA_HOME=/opt/ghidra
Database errors Check DATABASE_URL in .env, run prisma generate
OpenAI quota exceeded Check billing at platform.openai.com
Frontend won't start cd frontend && rm -rf node_modules && npm install

System Requirements

  • OS: Ubuntu/Debian, Fedora/RHEL, Arch, macOS
  • RAM: 8GB min, 16GB recommended
  • Disk: ~10GB
  • Python: 3.9+ (3.11 recommended)
  • Node.js: 18+ (for frontend)

Documentation

  • qiling_analysis/QUICKSTART_GUIDE.md - Dynamic analysis guide
  • CONTRIBUTING.md - Contribution guidelines

License

Apache-2.0 - See LICENSE

About

An end-to-end pipeline for firmware analysis and cryptographic function detection. Static analysis (Ghidra), dynamic tracing (Qiling), and feature extraction to produce datasets for binary analysis.

Resources

License

Code of conduct

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 9