Skip to content

Latest commit

 

History

History
65 lines (54 loc) · 3.03 KB

File metadata and controls

65 lines (54 loc) · 3.03 KB

YOLOv8n-WorldV2: Object Detection Accelerator

YOLOv8n-WorldV2: Open-Vocabulary Object Detection Accelerator

A complete YOLOv8n-WorldV2 inference accelerator — all 15 modules synthesize to Verilog from the same Lean 4 Signal DSL. INT4 weights / INT8 activations with pre-computed CLIP text embeddings for open-vocabulary detection.

Architecture

160x160x3 RGB ──► Backbone (5 stages) ──► Neck (FPN+PAN) ──► Head (3 scales) ──► Detections
                     │                        │                    │
                     ├─ Conv 3x3 stem         ├─ Upsample 2x      ├─ Bbox regression
                     ├─ C2f blocks            ├─ Concat            ├─ Classification
                     └─ SPPF                  └─ C2f               └─ CLIP text dot product

Key specs:

  • Input: 160x160x3 RGB (INT8 quantized)
  • Quantization: INT4 weights (packed 2 per byte) / INT8 activations / INT32 accumulators
  • Backbone: 5 stages — Conv stem → 4x (Conv stride-2 + C2f) + SPPF, producing P3/P4/P5
  • Neck: FPN top-down (P5→P4→P3) + PAN bottom-up (N3→N4'→N5')
  • Head: Decoupled detection at 3 scales with CLIP text embedding dot product
  • All 15 modules synthesize to SystemVerilog via #synthesizeVerilog

Module Hierarchy (15 synthesizable modules)

Module Description Key technique
dequantPacked INT4→INT8 sign extension MSB check + bit concat
requantize INT32→INT8 multiply-shift-clamp BitVec.slt + ashr
relu ReLU activation MSB extraction
siluLut SiLU via 256-entry ROM LUT lutMuxTree
conv2DEngine Sequential MAC engine Signal.loop FSM
lineBuffer3x3 3-row sliding window Signal.memory
maxPool2x2 2x2 signed max pooling BitVec.slt
upsample2x 2x nearest-neighbor Counter FSM
convBnSiLU Fused Conv+BN+SiLU Composes primitives
bottleneckController 1x1→3x3 + residual FSM sequencer
c2fController Cross Stage Partial N-bottleneck loop
sppfController Spatial Pyramid Pooling 3-pass max pool
backboneController 5-stage backbone FSM Stage sequencer
neckController FPN+PAN sequencer Bidirectional path
headController 3-scale detection head Scale/branch FSM
yolov8nTop Full SoC top-level Master controller
dotProductEngine INT8 dot product for CLIP MAC accumulator

Golden Value Validation

Golden values extracted from real YOLOv8s-WorldV2 model (ultralytics) with INT4/INT8 quantization:

--- YOLOv8 Golden Value Validation ---
  [PASS] Golden value files exist (weights, biases, activations, input image)
  [PASS] INT4 weight dequantization preserves signed values
  [PASS] Cosine self-similarity = 1.0
  [PASS] Layer weight diversity check
  9/9 golden value tests pass
# Generate golden values from Python
python scripts/yolo_golden_gen.py
# Produces 207 weight files + 68 activation files (9.8MB)