Vestigo is a collection of tools, scripts and services to automate the process of (1) producing cross-compiled test binaries, (2) statically and dynamically analyzing firmware/binaries, (3) extracting ML-ready features, and (4) producing datasets and inference results for cryptographic-function detection. The repo combines headless Ghidra-based extraction, Qiling-based dynamic tracing, a dataset generation pipeline (including optional LLM assisted labeling), and a small backend + frontend for web access.
This README gives a concise, practical overview and quickstart so you can get the pipeline running and contribute.
- Extract function-level and trace-level features suitable for ML
- Provide utilities for static (Ghidra) and dynamic (Qiling) analysis
- Offer scripts to build training CSVs and run inference
- Provide a backend API and frontend for file upload and analysis
- Languages: Python (main tooling & backend), TypeScript/React frontend
- Major folders:
ghidra_scripts,qiling_analysis,ml,backend,frontend - Important entry points:
generate_dataset.py— create ML CSVs from Ghidra JSONsanalyzer.py,bare_metal.py,main.py— orchestrate analysis flowsfactory/builder.py— cross-compile sources across arch/opt matrixqiling_analysis/— dynamic tracing & batch extraction pipelinebackend/— FastAPI backend with analysis endpoints
./setup.sh
source activate_vestigo.shWhat gets installed:
- Python environment with all dependencies (FastAPI, Qiling, ML libraries)
- Ghidra headless analyzer (/opt/ghidra)
- Qiling framework + rootfs
- Cross-compiler toolchains (ARM, MIPS, AArch64)
- Container runtime (Podman/Docker)
Options: --minimal | --skip-ghidra | --skip-ml | --help
Frontend (Node.js 18+):
# Install Node.js for your OS, then:
cd frontend && npm install && cd ..Database (PostgreSQL):
# Option A: Local
sudo apt install postgresql && sudo -u postgres createdb vestigo
# Option B: Cloud (https://neon.tech - recommended)
# Get connection string and add to .envConfigure .env:
DATABASE_URL=postgresql://user:pass@host:5432/vestigo
OPENAI_API_KEY=sk-your-key-here # Get from platform.openai.comInitialize Database:
cd backend && prisma db push && prisma generate && cd ..Always activate environment first: source activate_vestigo.sh
python3 scripts/analyzer.py <binary>python3 qiling_analysis/tests/verify_crypto.py <binary>python3 scripts/generate_dataset.py --input-dir ghidra_output --output dataset.csvpython3 qiling_analysis/batch_extract_features.py \
--dataset-dir ./dataset_binaries --output-dir ./results --parallel 4python3 factory/builder.py --source algorithm.cpython3 qiling_analysis/tests/llm/crypto_deep_analyzer.py --strace trace.log --output analysis.json# Backend (terminal 1)
cd backend && uvicorn main:app --reload
# Frontend (terminal 2)
cd frontend && npm run devvestigo-data/
├── setup.sh # Automated installation
├── activate_vestigo.sh # Environment activation
├── backend/ # FastAPI server
├── frontend/ # React UI
├── factory/ # Cross-compilation tools
├── ghidra_scripts/ # Ghidra analysis scripts
├── qiling_analysis/ # Dynamic tracing pipeline
├── ml/ # ML models and training
├── scripts/ # Analysis orchestration
└── dataset_binaries/ # Sample binaries
Key Scripts:
scripts/analyzer.py- Ghidra static analysisscripts/generate_dataset.py- Create ML datasetsqiling_analysis/tests/verify_crypto.py- Dynamic analysisfactory/builder.py- Cross-compilation
| Issue | Solution |
|---|---|
| Virtual environment not found | Run ./setup.sh |
| Import errors | pip install -r requirements.txt -r backend/requirements.txt |
| Qiling rootfs missing | git clone --depth 1 https://github.com/qilingframework/rootfs.git qiling_analysis/rootfs |
| Ghidra not found | Set export GHIDRA_HOME=/opt/ghidra |
| Database errors | Check DATABASE_URL in .env, run prisma generate |
| OpenAI quota exceeded | Check billing at platform.openai.com |
| Frontend won't start | cd frontend && rm -rf node_modules && npm install |
- OS: Ubuntu/Debian, Fedora/RHEL, Arch, macOS
- RAM: 8GB min, 16GB recommended
- Disk: ~10GB
- Python: 3.9+ (3.11 recommended)
- Node.js: 18+ (for frontend)
qiling_analysis/QUICKSTART_GUIDE.md- Dynamic analysis guideCONTRIBUTING.md- Contribution guidelines
Apache-2.0 - See LICENSE