Skip to content

[FEAT] Story 2.2 Conversation Context Window Manager #19

@yeomin4242

Description

@yeomin4242

한줄 설명

Implement intelligent context window management to include relevant message history, scenario parameters, and character consistency rules within Gemini 2.5 Flash token limits (1M input, 8K output).

문제·기회

No response

제안 내용

No response

완료 기준(AC)

  • ContextWindowManager service calculates token count for messages using Gemini token counting API
  • Gemini 2.5 Flash token limits:
    • Input: 1,000,000 tokens (1M context window)
    • Output: 8,192 tokens max
    • Cost: $0.075 per 1M input tokens, $0.30 per 1M output tokens
  • Context strategy: System instruction + Full conversation history (no sliding window needed with 1M limit)
  • Smart context optimization for long conversations (>10K tokens):
    • Summarize messages older than 100 messages using Gemini
    • Keep recent 100 messages in full detail
    • System instruction + character traits always included
  • Character consistency injection: Adds character traits summary every 50 messages (reduced frequency due to large context)
  • /api/ai/build-context endpoint returns: system_instruction, messages[], token_count, optimization_applied
  • Token count validation before Gemini API call (reject if > 1M input or expect > 8K output)
  • Context caching: Redis cache for repeated conversation_id queries (TTL 5 minutes)
  • Gemini Caching API for system instructions (reduces cost for repeated contexts)
  • Metrics: Average token usage, optimization rate, cache hit rate, Gemini API cost per conversation
  • Unit tests >85% coverage

관련 참고자료

No response

관련 이슈·블로커

No response

Metadata

Metadata

Assignees

Labels

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions