Skip to content

Carb0n-17/vision

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

vision

Project Structure

vision/
|-- data/
|-- experiments/
|-- models/
    |-- cnn.py
    |-- lstm.py
    |-- pipeline.py
|-- utils/
    |-- dataset.py
    |-- transforms.py
    |-- metrics.py
|-- configs/
    |-- default.yaml
|-- train.py
|-- eval.py
|-- requirements.txt
|-- README.md

Processing Pipeline

  1. Data Acquisition
    • Input: video clips or frame sequences.
    • Each sample = [T, C, H, W].
    • Label = target coordinates (x, y) or (x, y, z).
  2. Preprocessing
    • Resize -> Normalize (ImageNet stats).
    • Augmentations: crop, blur, etc..
  3. Dataset & DataLoader
    • Datasets returns (frames, label, length).
    • DataLoader batches -> [B, T, C, H, W], [B, output_dim].
  4. CNN Feature Extraction
    • ResNet-50 extracts features per frame.
    • Options:
      • With global pooling: [B, T, 2048].
      • Without global pooling: [B, T, 100352] (spatial info kept).
  5. Temporal Modeling (LSTM)
    • Input: [B, T, D].
    • Output: last hidden state [B, H].
  6. Regression Head
    • Linear layer -> [B, output_dim].
    • output_dim = 2 for (x, y), or 3 for (x, y, z).
    • Loss: MSELoss or SmoothL1Loss.

Shape Conventions

Stage Shape
Dataset sample [T, C, H, W], target [2/3]
Batch (DataLoader) [B, T, C, H, W], [B, 2/3]
CNN (per frame) [B*T, F, Hf, Wf]
Flatten (no pooing) [B*T, F*Hf*Wf]
Sequence reshape [B, T, D]
LSTM output (final) [B, H]
Regression output [B, 2] or [B, 3]

(For ResNet-50: F=2048, Hf=Wf=7 -> D=100,352 without pooling)

Quick Start

  1. Install requirements
pip install -r requirements.txt
  1. Train
python train.py
  1. Evaluate
python eval.py --checkpoint experiments/latest.pth

Extensibility

  • Swap CNN backbone -> edit models/cnn.py.
  • Swap sequence model (e.g., ConvLSTM, Transformer) -> models/temporal.py.
  • Configure output_dim (2D or 3D coords) in config/default.yaml

Next Steps

  • Implement real dataset loaders with coordinate labels.
  • Add evaluation metrics (MAE, RMSE).
  • Experiment with ConvLSTM for spatiotemporal features.
  • Test real-time inference with rolling window.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages