High-level system flow.
Realtime computer vision has diverse applications ranging from autonomous vehicles and industrial quality control to tracking a dog's meander around the living room. The vision pipelines needed to conduct these tasks often rely on a set of highly parallelizable computations with strong spatial locality. However, most traditional vision pipelines still rely largely on CPU computation. This motivates the design of a GPU-analoguous parallel system capable of efficiently exploiting the structure of standard image processing workloads.
We propose PEEP (Parallel Engine for Efficient Perception), a demonstration of such a system. It has three main optimizations that exploit properties of common image processing tasks:
- Many CV pipelines are embarrassingly parallel. This means we can process multiple pixels at once by dispatching them to separate processors.
- Most CV pipelines revolve around a discrete number of operations, such as convolution and thresholding. We optimize our processing cores for these tasks.
- Many CV operations exhibit spatial locality; i.e., the output of computation on one pixel only depends on itself and its close neighbors. We can design tilecaches that enable fast reads to square regions of an input region, and preemptively load the next region while the previous is undergoing processing.
PEEP integrates a dispatcher, a set of processors, a tilecache subsystem, SPI logic for reading and writing instructions and data, and interfaces to external SDRAM memory. The dispatcher coordinates data distribution to multiple processors, each optimized for common image processing operations such as convolution, thresholding, and morphological filtering; a system of multiple tilecaches allows for fast reads and writes to 2D slices of images.
Morphological operations. Demonstration of multistage pipeline. (a) one iteration of erosion; (b) two iterations; (c) three iterations; (d) four iterations.
2x2 Bayer map. Demonstration of branching logic. (a) original input image; (b) dithered binary output.
hardware/: Verilog source code for the PEEP acceleratorreal/hdl/: Core hardware description filessim/: Simulation testbenches and scriptsobj/: Compiled hardware objectsxdc/: Xilinx design constraints files
exp/: Experiments with hardware implementationmem/: Memory interfacesspi/: SPI peripheral and I/O routing managerprocessor/: four-stage processor implementationrouter/: processor management subsystem
software/: Software tools and scripts for programming and testing PEEPexp/: Experiment scripts and dataspi/: SPI communication on laptop/teensy side. Contains pipelines, assembler and SPI streamer.