vision

Project Structure

vision/
|-- data/
|-- experiments/
|-- models/
    |-- cnn.py
    |-- lstm.py
    |-- pipeline.py
|-- utils/
    |-- dataset.py
    |-- transforms.py
    |-- metrics.py
|-- configs/
    |-- default.yaml
|-- train.py
|-- eval.py
|-- requirements.txt
|-- README.md

Processing Pipeline

Data Acquisition
- Input: video clips or frame sequences.
- Each sample = [T, C, H, W].
- Label = target coordinates (x, y) or (x, y, z).
Preprocessing
- Resize -> Normalize (ImageNet stats).
- Augmentations: crop, blur, etc..
Dataset & DataLoader
- Datasets returns (frames, label, length).
- DataLoader batches -> [B, T, C, H, W], [B, output_dim].
CNN Feature Extraction
- ResNet-50 extracts features per frame.
- Options:
  - With global pooling: [B, T, 2048].
  - Without global pooling: [B, T, 100352] (spatial info kept).
Temporal Modeling (LSTM)
- Input: [B, T, D].
- Output: last hidden state [B, H].
Regression Head
- Linear layer -> [B, output_dim].
- output_dim = 2 for (x, y), or 3 for (x, y, z).
- Loss: MSELoss or SmoothL1Loss.

Shape Conventions

Stage	Shape
Dataset sample	`[T, C, H, W]`, target `[2/3]`
Batch (DataLoader)	`[B, T, C, H, W]`, `[B, 2/3]`
CNN (per frame)	`[B*T, F, Hf, Wf]`
Flatten (no pooing)	`[BT, FHf*Wf]`
Sequence reshape	`[B, T, D]`
LSTM output (final)	`[B, H]`
Regression output	`[B, 2]` or `[B, 3]`

(For ResNet-50: F=2048, Hf=Wf=7 -> D=100,352 without pooling)

Quick Start

Install requirements

pip install -r requirements.txt

Train

python train.py

Evaluate

python eval.py --checkpoint experiments/latest.pth

Extensibility

Swap CNN backbone -> edit models/cnn.py.
Swap sequence model (e.g., ConvLSTM, Transformer) -> models/temporal.py.
Configure output_dim (2D or 3D coords) in config/default.yaml

Next Steps

Implement real dataset loaders with coordinate labels.
Add evaluation metrics (MAE, RMSE).
Experiment with ConvLSTM for spatiotemporal features.
Test real-time inference with rolling window.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

vision

Project Structure

Processing Pipeline

Shape Conventions

Quick Start

Extensibility

Next Steps

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
models		models
utils		utils
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

Carb0n-17/vision

Folders and files

Latest commit

History

Repository files navigation

vision

Project Structure

Processing Pipeline

Shape Conventions

Quick Start

Extensibility

Next Steps

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages