Optimization: Downscale videos to reduce token usage

## Research: Video Resolution vs Token Usage

### Qwen3-VL Token Calculation

**Formula:** `tokens = total_pixels / 1024` (32×32 pixel block per token)

| Resolution | Pixels | Tokens per Frame |
|------------|--------|------------------|
| **1080p** (1920×1080) | 2,073,600 | ~2,025 tokens |
| **4K** (3840×2160) | 8,294,400 | ~8,100 tokens |

**4K uses exactly 4× more tokens than 1080p.**

### Video-Specific Details

- **Temporal compression**: 2× (every 2 frames compressed together)
- **Default limits**: Model may resize frames to fit a pixel budget
  - Default max: ~20,480 × 32 × 32 pixels total across all frames
- **Recommended token range**: 256-16,384 per video

### Practical Impact

For a 10-second clip at 2 fps (20 frames):
- **1080p**: ~20,250 tokens (with temporal compression: ~10,125)
- **4K**: ~81,000 tokens (with temporal compression: ~40,500)

### Recommendation

Downscale videos to 1080p or even 720p before sending. This will:
- Use 4× fewer tokens (compared to 4K)
- Reduce KV cache pressure
- Likely maintain similar accuracy for kill detection

### Implementation Ideas

- Add optional `--resize` flag to client script
- Use ffmpeg to downscale before base64 encoding
- Consider 720p for even more savings (~1,013 tokens/frame)

### Sources

- [Qwen3-VL GitHub](https://github.com/QwenLM/Qwen3-VL)
- [Qwen2-VL Token Discussion](https://huggingface.co/Qwen/Qwen2-VL-7B-Instruct/discussions/47)
- [Qwen2-VL Blog](https://qwenlm.github.io/blog/qwen2-vl/)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimization: Downscale videos to reduce token usage #1

Research: Video Resolution vs Token Usage

Qwen3-VL Token Calculation

Video-Specific Details

Practical Impact

Recommendation

Implementation Ideas

Sources

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Resolution	Pixels	Tokens per Frame
1080p (1920×1080)	2,073,600	~2,025 tokens
4K (3840×2160)	8,294,400	~8,100 tokens

Optimization: Downscale videos to reduce token usage #1

Description

Research: Video Resolution vs Token Usage

Qwen3-VL Token Calculation

Video-Specific Details

Practical Impact

Recommendation

Implementation Ideas

Sources

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions