Skip to content

Conversation

@luuquangvu
Copy link
Collaborator

  • Update all functions to use orjson for better performance and reduce token usage.
  • Update the LMDB store to more efficiently manage reusable sessions.
  • Update the logic to skip the system instruction when reusing a session to save tokens and speed up model response time.
  • Update project dependencies.

They are no longer needed since the underlying library issue has been resolved.
…probabilities, and token details; adjust response handling accordingly.
…tput_text` validator, rename `created` to `created_at`, and update response handling accordingly.
…roved streaming of response items. Refactor image generation handling for consistency and add compatibility with output content.
…t` and ensure consistent initialization in image output handling.
…anagement

Add dedicated router for /images endpoint and refactor image handling logic for better modularity. Enhance temporary image management with secure naming, token verification, and cleanup functionality.
…y` for tools, tool_choice, and streaming settings
…nd update response handling for consistency
- Moved utility functions like `strip_code_fence`, `extract_tool_calls`, and `iter_stream_segments` to a centralized helper module.
- Removed unused and redundant private methods from `chat.py`, including `_strip_code_fence`, `_strip_tagged_blocks`, and `_strip_system_hints`.
- Updated imports and references across modules for consistency.
- Simplified tool call and streaming logic by replacing inline implementations with shared helper functions.
- Replaced unused model placeholder in `config.yaml` with an empty list.
- Added JSON parsing validators for `model_header` and `models` to enhance flexibility and error handling.
- Improved validation to filter out incomplete model configurations.
…N support

- Replaced prefix-based parsing with a root key approach.
- Added JSON parsing to handle list-based model configurations.
- Improved handling of errors and cleanup of environment variables.
…to Python literals

- Added `ast.literal_eval` as a fallback for parsing environment variables when JSON decoding fails.
- Improved error handling and logging for invalid configurations.
- Ensured proper cleanup of environment variables post-parsing.
- Adjusted `TOOL_CALL_RE` regex pattern for better accuracy.
…nvironment variables; enhance error logging in config validation
…tring or list structure for enhanced flexibility in automated environments
…s found in either the raw or cleaned history.
…ystem instruction when reusing a session to save tokens.
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Refactors request/response handling and LMDB session reuse to reduce token usage and improve runtime performance, primarily by switching to orjson and reusing Gemini sessions more aggressively.

Changes:

  • Replace stdlib json usage with orjson across helpers, config parsing, and chat/response flows; set FastAPI default response to ORJSONResponse.
  • Enhance LMDB hashing/sanitization logic to better support session reuse and consistent conversation lookup.
  • Add session-reuse optimizations to skip re-sending heavy system/tool instructions and improve message splitting behavior.

Reviewed changes

Copilot reviewed 8 out of 9 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
uv.lock Bumps dependency lockfile entries (FastAPI, Uvicorn, orjson, ruff, etc.).
pyproject.toml Updates dependency constraints and adds orjson.
app/utils/helper.py Moves tool-call JSON parsing to orjson; changes tool call ID generation.
app/utils/config.py Switches env/config JSON parsing to orjson.
app/services/lmdb.py Refactors message/conversation hashing and assistant-message sanitization for reuse consistency.
app/services/client.py Uses orjson for tool-call argument normalization; minor formatting changes.
app/server/chat.py Adds session reuse optimizations, orjson structured parsing, TTL logic, and revised request splitting.
app/models/models.py Adds tool_call_id and centralizes developersystem role normalization.
app/main.py Sets FastAPI default_response_class=ORJSONResponse.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +229 to +230
# Generate a deterministic ID based on name, arguments, and index to avoid collisions
seed = f"{name}:{arguments}:{index}".encode("utf-8")
Copy link

Copilot AI Jan 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

extract_tool_calls() now generates deterministic tool call IDs using only name, canonicalized arguments, and the per-block index. Because index restarts at 0 for each fenced block, identical tool calls in different blocks can produce the same call_id, which violates the expectation that ToolCall.id values are unique within a message/conversation and can break tool_call_id mapping for tool responses. Include a globally unique component (e.g., the current len(tool_calls) at append-time, a monotonically increasing counter across the whole text, or incorporate the match start offset) into the hash seed to guarantee uniqueness across all extracted calls.

Suggested change
# Generate a deterministic ID based on name, arguments, and index to avoid collisions
seed = f"{name}:{arguments}:{index}".encode("utf-8")
# Generate a deterministic ID based on name, arguments, per-block index, and a
# globally increasing index (current tool_calls length) to avoid collisions
global_index = len(tool_calls)
seed = f"{name}:{arguments}:{index}:{global_index}".encode("utf-8")

Copilot uses AI. Check for mistakes.
Comment on lines +45 to +47
if text_parts is not None:
text_content = "".join(text_parts).replace("\r\n", "\n").strip()
core_data["content"] = text_content if text_content else None
Copy link

Copilot AI Jan 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In _hash_message(), when message.content is a list of text items you concatenate them with "".join(text_parts). This makes the hash non-injective: e.g., ["ab","c"] and ["a","bc"] hash identically, which can cause hash collisions and incorrect session reuse / conversation lookup. Preserve boundaries by joining with an unambiguous separator (e.g., "\n") or by hashing a structured list representation (including item order/lengths) instead of raw concatenation.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant