Pronunciation: Yaoyorozu (yorozu). Official ASCII name: YOLOZU.
YOLOZU is an Apache-2.0-only, contract-first evaluation + tooling harness for:
- real-time monocular RGB detection
- monocular depth + 6DoF pose heads (RT-DETR-based scaffold)
- semantic segmentation utilities (dataset prep + mIoU evaluation)
- instance segmentation utilities (PNG-mask contract + mask mAP evaluation)
Recommended deployment path (canonical): PyTorch → ONNX → TensorRT (TRT).
It focuses on:
- CPU-minimum dev/tests (GPU optional)
- A stable predictions-JSON contract for evaluation (bring-your-own inference backend)
- Minimal training scaffold (RT-DETR pose) with reproducible artifacts
- Hessian-based refinement for regression head predictions (depth, rotation, offsets)
- Backend-agnostic evaluation: run inference in PyTorch / ONNXRuntime / TensorRT / C++ / Rust → export the same
predictions.json→ compare apples-to-apples. - Unified CLI:
python3 tools/yolozu.pywraps backends with consistent args, caching (--cache), and always writes run metadata (git SHA / env / GPU / config hash). - Parity + benchmarks: backend diff stats (torch vs onnxrt vs trt) and fixed-protocol latency/FPS reports.
- Safe test-time training (Tent): norm-only updates with guard rails (non-finite/loss/update-norm stops + rollback) and reset policies.
- AI-friendly repo surface: stable schemas +
tools/manifest.jsonfor tool discovery / automation.
- Dataset I/O: YOLO-format images/labels + optional per-image JSON metadata.
- Stable evaluation contract: versioned predictions-JSON schema + adapter contract.
- Unified CLI:
python3 tools/yolozu.py(doctor,export,predict-images,sweep) for research/eval workflows. - Inference/export:
tools/export_predictions.py(torch adapter),tools/export_predictions_onnxrt.py,tools/export_predictions_trt.py. - Test-time adaptation options:
- TTA: lightweight prediction-space post-transform (
--tta). - TTT: pre-prediction test-time training (Tent or MIM) via
--ttt(adapter + torch required).
- TTA: lightweight prediction-space post-transform (
- Hessian solver: per-detection iterative refinement of regression outputs (depth, rotation, offsets) using Gauss-Newton optimization.
- Evaluation: COCO mAP conversion/eval and scenario suite reporting.
- Keypoints: YOLO pose-style keypoints in labels/predictions + PCK evaluation + optional COCO OKS mAP (
tools/eval_keypoints.py --oks), plus parity/benchmark helpers. - Semantic seg: dataset prep helpers +
tools/eval_segmentation.py(mIoU/per-class IoU/ignore_index + optional HTML overlays). - Instance seg:
tools/eval_instance_segmentation.py(mask mAP from per-instance binary PNG masks + optional HTML overlays). - Training scaffold: minimal RT-DETR pose trainer with metrics output, ONNX export, and optional SDFT-style self-distillation.
YOLOZU evaluates instance segmentation using per-instance binary PNG masks (no RLE/polygons required).
Predictions JSON (minimal):
[
{
"image": "000001.png",
"instances": [
{ "class_id": 0, "score": 0.9, "mask": "masks/000001_inst0.png" }
]
}
]Validate an artifact:
python3 tools/validate_instance_segmentation_predictions.py reports/instance_seg_predictions.jsonEval outputs:
- mask mAP (
map50,map50_95) - per-class AP table
- per-image diagnostics (TP/FP/FN, mean IoU) and overlay selection (
--overlay-sort {worst,best,first}; default:worst)
Run the synthetic demo and render overlays/HTML:
python3 tools/eval_instance_segmentation.py \
--dataset examples/instance_seg_demo/dataset \
--split val2017 \
--predictions examples/instance_seg_demo/predictions/instance_seg_predictions.json \
--pred-root examples/instance_seg_demo/predictions \
--classes examples/instance_seg_demo/classes.txt \
--html reports/instance_seg_demo_eval.html \
--overlays-dir reports/instance_seg_demo_overlays \
--max-overlays 10Same via the unified CLI:
python3 tools/yolozu.py eval-instance-seg --dataset examples/instance_seg_demo/dataset --split val2017 --predictions examples/instance_seg_demo/predictions/instance_seg_predictions.json --pred-root examples/instance_seg_demo/predictions --classes examples/instance_seg_demo/classes.txt --html reports/instance_seg_demo_eval.html --overlays-dir reports/instance_seg_demo_overlays --max-overlays 10Optional: prepare COCO instance-seg dataset with per-instance PNG masks (requires pycocotools):
python3 tools/prepare_coco_instance_seg.py --coco-root /path/to/coco --split val2017 --out data/coco-instance-segOptional: convert COCO instance-seg predictions (RLE/polygons) into YOLOZU PNG masks (requires pycocotools):
python3 tools/convert_coco_instance_seg_predictions.py \
--predictions /path/to/coco_instance_seg_preds.json \
--instances-json /path/to/instances_val2017.json \
--output reports/instance_seg_predictions.json \
--masks-dir reports/instance_seg_masksStart here: docs/training_inference_export.md
- Repo feature summary: docs/yolozu_spec.md
- Model/spec note: rt_detr_6dof_geom_mim_spec_en_v0_4.md
- Training / inference / export quick steps: docs/training_inference_export.md
- Hessian solver for regression refinement: docs/hessian_solver.md
- Predictions schema (stable): docs/predictions_schema.md
- Adapter contract (stable): docs/adapter_contract.md
- License policy: docs/license_policy.md
- Tools index (AI-friendly): docs/tools_index.md / tools/manifest.json
- P0 (done): Unified CLI (
torch/onnxruntime/tensorrt) with consistent args + same output schema; always write meta (git SHA / env / GPU / seed / config hash); keeptools/manifest.jsonupdated. - P1 (done):
doctor(deps/GPU/driver/onnxrt/TRT diagnostics) +predict-images(folder input → predictions JSON + overlays) + HTML report. - P2 (partial): cache/re-run (fingerprinted runs) + sweeps (wrapper exists; expand sweeps for TTT/threshold/gate weights) + production inference cores (C++/Rust) as needed.
- Apache-2.0-only utilities and evaluation harnesses (no vendored GPL/AGPL inference code).
- CPU-first development workflow: dataset tooling, validators, scenario suite, and unit tests run without a GPU.
- Adapter interface decouples inference backend from evaluation (PyTorch/ONNXRuntime/TensorRT/custom), so you can run inference elsewhere and still score/compare locally.
- Reproducible artifacts: stable JSON reports + optional JSONL history for regressions.
- Symmetry + commonsense constraints are treated as first-class, test-covered utilities (not ad-hoc postprocess).
- Not a turnkey training repo: the in-repo
rtdetr_pose/model is scaffolding to wire data/losses/metrics/export. It is not expected to be competitive without significant upgrades. - No “one command” real-time inference app is shipped here. The intended flow is: bring-your-own inference backend → export predictions JSON → run evaluation/scenarios in this repo.
- TensorRT development is not macOS-friendly: engine build/export steps assume an NVIDIA stack (typically Linux). On macOS you can still do CPU-side validation and keep GPU steps for Runpod/remote.
- Backend parity is fragile: preprocessing (letterbox/RGB order), output layouts, and score calibration can dominate mAP/FPS differences more than the model itself if they drift.
- Some tools intentionally use lightweight metrics (e.g.
yolozu.simple_map) to avoid heavy deps; full COCOeval requires optional dependencies and the proper COCO layout. - Large model weights/datasets are intentionally kept out of git; you need external storage and reproducible pointers.
- Install test dependencies (CPU PyTorch is OK for local dev):
python3 -m pip install -r requirements-test.txt- Fetch the tiny dataset (once):
bash tools/fetch_coco128.sh- Run a minimal check (pytest):
pytest -qOr:
python3 -m unittest -q- GPU is supported (training/inference): install CUDA-enabled PyTorch in your environment and use
--device cuda:0. - CI/dev does not require GPU; many checks are CPU-friendly.
Run flows with YAML settings:
python -m yolozu train train_setting.yaml
python -m yolozu test test_setting.yamlOr use the wrapper:
./tools/yolozu train train_setting.yaml
./tools/yolozu test test_setting.yamlTemplates:
train_setting.yamltest_setting.yaml
The minimal trainer is implemented in rtdetr_pose/tools/train_minimal.py.
Recommended usage is to set --run-dir, which writes a standard, reproducible artifact set:
metrics.jsonl(+ finalmetrics.json/metrics.csv)checkpoint.pt(+ optionalcheckpoint_bundle.pt)model.onnx(+model.onnx.meta.json)run_record.json(git SHA / platform / args)
Plot a loss curve (requires matplotlib):
python3 tools/plot_metrics.py --jsonl runs/<run>/metrics.jsonl --out reports/train_loss.pngONNX export runs when --run-dir is set (defaulting to <run-dir>/model.onnx) or when --onnx-out is provided.
Useful flags:
--run-dir <dir>--onnx-out <path>--onnx-meta-out <path>--onnx-opset <int>--onnx-dynamic-hw(dynamic H/W axes)
Base dataset format:
- Images:
images/<split>/*.(jpg|png|...) - Labels:
labels/<split>/*.txt(YOLO:class cx cy w hnormalized)
Optional per-image metadata (JSON): labels/<split>/<image>.json
- Masks/seg:
mask_path/mask/M - Depth:
depth_path/depth/D_obj - Pose:
R_gt/t_gt(orpose) - Intrinsics:
K_gt/intrinsics(also supports OpenCV FileStorage-stylecamera_matrix: {rows, cols, data:[...]})
Notes on units (pixels vs mm/m) and intrinsics coordinate frames:
If YOLO txt labels are missing and a mask is provided, bbox+class can be derived from masks. Details (including color/instance modes and multi-PNG-per-class options) are documented in:
This repo evaluates models through a stable predictions JSON format:
- Schema doc: docs/predictions_schema.md
- Machine-readable schema: schemas/predictions.schema.json
Adapters power tools/export_predictions.py --adapter <name> and follow:
If you run real inference elsewhere (PyTorch/TensorRT/etc.), you can evaluate this repo without installing heavy deps locally.
- Export predictions (in an environment where the adapter can run):
python3 tools/export_predictions.py --adapter rtdetr_pose --checkpoint /path/to.ckpt --max-images 50 --wrap --output reports/predictions.json- TTA (post-transform):
python3 tools/export_predictions.py --adapter rtdetr_pose --tta --tta-seed 0 --tta-flip-prob 0.5 --wrap --output reports/predictions_tta.json - TTT (pre-prediction test-time training; updates model weights in-memory):
- Tent:
python3 tools/export_predictions.py --adapter rtdetr_pose --ttt --ttt-method tent --ttt-steps 5 --ttt-lr 1e-4 --wrap --output reports/predictions_ttt_tent.json - MIM:
python3 tools/export_predictions.py --adapter rtdetr_pose --ttt --ttt-method mim --ttt-steps 5 --ttt-mask-prob 0.6 --ttt-patch-size 16 --wrap --output reports/predictions_ttt_mim.json - Optional log: add
--ttt-log-out reports/ttt_log.json - Recommended protocol + safe presets: docs/ttt_protocol.md
- Tent:
- Validate the JSON:
python3 tools/validate_predictions.py reports/predictions.json
- Consume predictions locally:
python3 tools/run_scenarios.py --adapter precomputed --predictions reports/predictions.json --max-images 50
Supported predictions JSON shapes:
[{"image": "...", "detections": [...]}, ...]{ "predictions": [ ... ] }{ "000000000009.jpg": [...], "/abs/path.jpg": [...] }(image -> detections)
Schema details:
To compete on e2e mAP (NMS-free), evaluate detections as-is (no NMS postprocess applied).
This repo includes a COCO-style evaluator that:
- Builds COCO ground truth from YOLO-format labels
- Converts YOLOZU predictions JSON into COCO detections
- Runs COCO mAP via
pycocotools(optional dependency)
Example (coco128 quick run):
- Export predictions (any adapter):
python3 tools/export_predictions.py --adapter dummy --max-images 50 --wrap --output reports/predictions.json - Evaluate mAP:
python3 tools/eval_coco.py --dataset data/coco128 --predictions reports/predictions.json --bbox-format cxcywh_norm --max-images 50
Note:
--bbox-format cxcywh_normexpects bbox dict{cx,cy,w,h}normalized to[0,1](matching the RTDETR pose adapter bbox head).
Reference recipe for external training runs (augment, multiscale, schedule, EMA):
docs/training_recipe_v1.md
docs/training_inference_export.md
Run a configurable sweep and emit CSV/MD tables:
docs/hpo_sweep.md
Report latency/FPS per YOLO26 bucket and archive runs over time:
docs/benchmark_latency.md
Fuse detection/template/uncertainty signals into a single score and tune weights offline (CPU-only):
docs/gate_weight_tuning.md
Reproducible engine build + parity validation steps:
docs/tensorrt_pipeline.md
This repo does not require (or vendor) any GPL/AGPL inference code.
To compare against external baselines (including YOLO26) while keeping this repo Apache-2.0-only:
- Run baseline inference in your own environment/implementation (ONNX Runtime / TensorRT / custom code).
- Export detections to YOLOZU predictions JSON (see schema below).
- (Optional) Normalize class ids using COCO
classes.jsonmapping. - Validate + evaluate mAP in this repo:
python3 tools/validate_predictions.py reports/predictions.jsonpython3 tools/eval_coco.py --dataset /path/to/coco-yolo --split val2017 --predictions reports/predictions.json --bbox-format cxcywh_norm
Minimal predictions entry schema:
{"image": "/abs/or/rel/path.jpg", "detections": [{"class_id": 0, "score": 0.9, "bbox": {"cx": 0.5, "cy": 0.5, "w": 0.2, "h": 0.2}}]}
Optional class-id normalization (when your exporter produces COCO category_id):
python3 tools/normalize_predictions.py --input reports/predictions.json --output reports/predictions_norm.json --classes data/coco-yolo/labels/val2017/classes.json --wrap
If you have the official COCO layout (images + annotations/instances_*.json), you can generate YOLO-format labels:
python3 tools/prepare_coco_yolo.py --coco-root /path/to/coco --split val2017 --out /path/to/coco-yolo
This creates:
/path/to/coco-yolo/labels/val2017/*.txt(YOLO normalizedclass cx cy w h)/path/to/coco-yolo/labels/val2017/classes.json(category_id <-> class_id mapping)
For local development, keep datasets under data/:
- Debug/smoke:
data/coco128(already included) - Full COCO (official):
data/coco(your download) - YOLO-format labels generated from official JSON:
data/coco-yolo(your output fromtools/prepare_coco_yolo.py)
If you export yolo26n/s/m/l/x predictions as separate JSON files (e.g. reports/pred_yolo26n.json, ...),
you can score them together:
- Protocol details:
docs/yolo26_eval_protocol.md python3 tools/eval_suite.py --protocol yolo26 --dataset /path/to/coco-yolo --predictions-glob 'reports/pred_yolo26*.json' --output reports/eval_suite.json- Fill in targets:
baselines/yolo26_targets.json - Validate targets:
python3 tools/validate_map_targets.py --targets baselines/yolo26_targets.json - Check pass/fail:
python3 tools/check_map_targets.py --suite reports/eval_suite.json --targets baselines/yolo26_targets.json --key map50_95 - Print a table:
python3 tools/print_leaderboard.py --suite reports/eval_suite.json --targets baselines/yolo26_targets.json --key map50_95 - Archive the run (commands + hardware + suite output):
python3 tools/import_yolo26_baseline.py --dataset /path/to/coco-yolo --predictions-glob 'reports/pred_yolo26*.json'
If you don't have pycocotools installed yet, you can still validate/convert predictions on data/coco128:
python3 tools/export_predictions.py --adapter dummy --max-images 10 --wrap --output reports/predictions_dummy.jsonpython3 tools/eval_coco.py --predictions reports/predictions_dummy.json --dry-run
- Keep symmetry/commonsense logic in lightweight postprocess utilities, outside any inference graph export.
Code in this repository is licensed under the Apache License, Version 2.0. See LICENSE.