HEAD
Problem
Modern robotics and AI systems rely on vision-based perception pipelines that must handle noisy sensor input, run ML inference reliably, and expose results to downstream systems in real time. Many student projects stop at training a model and never address production deployment, latency, or system integration.
This project implements a robotics-style vision perception service: from raw image input, through sensor noise and preprocessing, to neural network inference, decision logic, and a deployed backend API.
Architecture [ Image Input ] (upload / camera) ↓ [ Sensor Noise + Preprocessing ] ↓ [ ML Inference Layer ] ├── Neural Network (from scratch) └── CNN (PyTorch) ↓ [ Decision / Control Stub ] ↓ [ FastAPI Backend ] ↓ [ Deployed API + Metrics ]
Tech Stack
Languages
Python 3.11
Machine Learning
NumPy
PyTorch
OpenCV
Backend
FastAPI
Pydantic
Uvicorn
Infrastructure
Docker(planned)
Railway / Fly.io (planned)
Data
MNIST (initial prototype)
MVP Scope (Locked) Included
Neural network implemented from scratch
CNN baseline for comparison
Image preprocessing + noise simulation
Model inference API
Simple decision/control logic stub
Inference latency metrics
Dockerized deployment
Publicly accessible API
Explicitly Excluded (No Feature Creep)
No frontend UI
No training pipeline in production
No real robot hardware
No auth system beyond basic API key (optional)
No additional datasets beyond MNIST
Goal
Demonstrate end-to-end perception system engineering for AI and robotics applications:
ML fundamentals
Production backend
Deployment
Systems thinking
Folder Structure & Purpose
app/main.py Orchestrates the application: initializes the FastAPI app, defines API routes, and connects preprocessing, inference, and control logic.
app/models/ Defines and loads machine learning models, including the from-scratch neural network and the CNN.
app/preprocessing/ Handles input preparation and sensor simulation, including image normalization, resizing, and noise injection.
app/inference/ Runs model inference, selects the appropriate model, and measures inference latency.
app/control/ Implements decision logic based on model outputs, serving as a stub for robotics-style perception-to-action pipelines.
Some modules are consolidated for simplicity in the current prototype
Logging Strategy
The system logs operational signals for debugging and performance monitoring.
Log the model used for inference (cnn or nn)
Log inference latency (milliseconds)
Log input validation errors and inference failures
Logs are written to standard output for visibility in local runs and deployed environments.
Error Categories
Input Validation Errors (Client Errors) Invalid or malformed inputs such as non-image files, incorrect shapes, or unsupported formats. Returned as HTTP 400 or 422.
Model / Inference Errors (System Errors) Failures during model loading or inference, such as missing weights or runtime errors. Returned as HTTP 500 and logged internally.
Unexpected Exceptions (Fail-Safe Errors) Unhandled edge cases or bugs. The service returns HTTP 500 while logging diagnostic information without crashing.
- From-scratch FC network on MNIST
- Test accuracy: 97.13%
- Architecture: Conv(1→32,3x3) → ReLU → MaxPool(2) → Conv(32→64,3x3) → ReLU → MaxPool(2) → FC(6455 → 10)
- Clean test accuracy: ~99.0%
- Robustness:
- Gaussian noise (std=0.3): ~98–99%
- Gaussian noise + 2px translation: ~95–97%
- Inductive bias: CNNs exploit spatial locality + weight sharing; FC baselines treat pixels independently.
- Robustness: pooling and convolution yield tolerance to local perturbations (sensor noise, small shifts).
- Perception framing: exposed as a robotics-style module:
sensor → preprocess → infer → decisionwith confidence output for downstream control.
- Train model:
python scripts/train_cnn.py - Start API:
uvicorn app.main:app --reload - Test client:
python scripts/call_api.py
- TODO: continuous image stream ingestion
- TODO: temporal smoothing across frames
- TODO: decision thresholds based on confidence
- TODO: integrate with real camera hardware c9ce2d1c6f08d9c52e67435ace4acae1ccd2456a