A high-performance video loading library for machine learning, designed for efficient training data preparation.
-
High-Performance Processing
- Fast video frame extraction and preprocessing
- Multi-threaded and distributed processing support
- Smart caching with Redis for improved throughput
- Memory-efficient streaming with adaptive buffering
-
ML Framework Integration
- Native support for PyTorch, TensorFlow, JAX, and more
- Optimized tensor conversions and memory management
- GPU acceleration support
- Configurable output formats and color spaces
-
Advanced Capabilities
- Sophisticated augmentation pipeline
- Real-time performance monitoring
- Auto-tuning for optimal performance
- Peer-to-peer sharing for distributed setups
RAAD Video requires Python 3.8 or later. Install via pip:
pip install raad-videoFor development installation with testing and code quality tools:
pip install "raad-video[dev]"- Python 3.8+
- OpenCV dependencies
- Redis (optional, for distributed caching)
- CUDA-compatible GPU (optional, for GPU acceleration)
from raad import VideoDataLoader, VideoCatalog
from raad.config import ProcessingMode, StreamingConfig, FrameFormat
# Create a catalog of your videos
catalog = VideoCatalog()
catalog.add_video("video1.mp4", categories=["training"])
catalog.add_video("video2.mp4", categories=["validation"])
# Initialize the loader with optimal settings
loader = VideoDataLoader(
catalog=catalog,
processing_mode=ProcessingMode.MULTI_THREAD,
frame_format=FrameFormat.TORCH, # Output PyTorch tensors
streaming_config=StreamingConfig(
mode="adaptive",
buffer_size=1000
),
target_size=(224, 224), # Resize frames
normalize=True, # Normalize pixel values
device="cuda" # Use GPU if available
)
# Get frames for training
for frames in loader.get_dataset_iterator("training"):
# frames will be preprocessed and ready for your model
# Shape: (batch_size, channels, height, width)
model.train(frames)from raad.config import DistributedConfig
# Setup distributed processing across multiple nodes
loader = VideoDataLoader(
catalog=catalog,
processing_mode=ProcessingMode.DISTRIBUTED,
distributed_config=DistributedConfig(
enabled=True,
num_nodes=4,
node_rank=0,
master_addr='10.0.0.1',
master_port=29500
)
)from raad.augmentation import (
RandomBrightness,
RandomContrast,
RandomFlip,
RandomRotation,
ColorJitter
)
# Create a sophisticated augmentation pipeline
loader = VideoDataLoader(
catalog=catalog,
augmentations=[
RandomBrightness(0.2),
RandomContrast(0.2),
RandomFlip(p=0.5),
RandomRotation(degrees=15),
ColorJitter(brightness=0.1, contrast=0.1, saturation=0.1)
]
)from raad.config import CacheConfig, StreamingConfig
# Configure caching and streaming for optimal performance
loader = VideoDataLoader(
catalog=catalog,
cache_config=CacheConfig(
enabled=True,
policy="lru",
max_size_gb=100,
persistent=True,
compression="lz4"
),
streaming_config=StreamingConfig(
mode="adaptive",
buffer_size=1000,
max_latency=0.1,
drop_threshold=0.8
),
auto_tune=True, # Enable automatic performance tuning
monitoring_interval=1.0, # Monitor performance every second
export_metrics=True
)-
Memory Usage
- Use
streaming_configwith appropriatebuffer_size - Enable frame dropping with
drop_thresholdif needed - Consider using persistent caching
- Use
-
Performance
- Enable
auto_tunefor automatic optimization - Use appropriate
processing_modefor your setup - Monitor performance with
export_metrics=True
- Enable
-
GPU Issues
- Ensure CUDA is properly installed
- Set appropriate
deviceandbatch_size - Monitor GPU memory usage
- Open an issue on GitHub
- Check the API documentation
- Join our community on Discord
## Performance Tips
1. **Streaming Mode Selection**:
- Use `ADAPTIVE` for general training
- Use `REAL_TIME` for time-critical applications
- Use `PRIORITY` when certain frames are more important
2. **Caching Strategy**:
- Enable Redis for distributed setups
- Use local caching for single machine training
- Configure cache size based on available RAM
3. **Processing Mode**:
- `MULTI_THREAD` for I/O-bound workloads
- `MULTI_PROCESS` for CPU-bound preprocessing
- `DISTRIBUTED` for large-scale training
## Contributing
Contributions are welcome! Please read our [Contributing Guide](CONTRIBUTING.md) for details on our code of conduct and the process for submitting pull requests.
## License
This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.