generated from amazon-archives/__template_Apache-2.0
-
Notifications
You must be signed in to change notification settings - Fork 0
Open
Labels
Description
Problem
Fixed 16K tokens wastes money on simple docs, may fail on ultra-dense docs.
Solution
Quick pre-check estimates token needs → set max_tokens dynamically (8K-32K range).
How
def _estimate_token_requirements(self, image_data: bytes) -> int:
"""Quick density check before full analysis."""
# Calculate: text_ratio * cjk_multiplier * column_count
density = self._quick_density_score(image_data)
if density < 0.3: return 8000 # 70% of docs
if density < 0.6: return 12000 # 20% of docs
if density < 0.8: return 16000 # 8% of docs
return 24000 # 2% ultra-dense (like this Chinese manuscript)
Key metrics:
- Text pixel ratio (binary threshold)
- CJK detection (edge patterns)
- Multi-column layout (whitespace gaps)
Impact
- 15-25% cost reduction (most docs use fewer tokens)
Reactions are currently unavailable