-
Notifications
You must be signed in to change notification settings - Fork 0
Open
Description
한줄 설명
Implement intelligent context window management to include relevant message history, scenario parameters, and character consistency rules within Gemini 2.5 Flash token limits (1M input, 8K output).
문제·기회
No response
제안 내용
No response
완료 기준(AC)
-
ContextWindowManagerservice calculates token count for messages using Gemini token counting API - Gemini 2.5 Flash token limits:
- Input: 1,000,000 tokens (1M context window)
- Output: 8,192 tokens max
- Cost: $0.075 per 1M input tokens, $0.30 per 1M output tokens
- Context strategy: System instruction + Full conversation history (no sliding window needed with 1M limit)
- Smart context optimization for long conversations (>10K tokens):
- Summarize messages older than 100 messages using Gemini
- Keep recent 100 messages in full detail
- System instruction + character traits always included
- Character consistency injection: Adds character traits summary every 50 messages (reduced frequency due to large context)
-
/api/ai/build-contextendpoint returns: system_instruction, messages[], token_count, optimization_applied - Token count validation before Gemini API call (reject if > 1M input or expect > 8K output)
- Context caching: Redis cache for repeated conversation_id queries (TTL 5 minutes)
- Gemini Caching API for system instructions (reduces cost for repeated contexts)
- Metrics: Average token usage, optimization rate, cache hit rate, Gemini API cost per conversation
- Unit tests >85% coverage
관련 참고자료
No response
관련 이슈·블로커
No response
Reactions are currently unavailable