Refactor: Optimize performance by reducing token usage and speed up model response time. #89

luuquangvu · 2026-01-24T14:39:42Z

Update all functions to use orjson for better performance and reduce token usage.
Update the LMDB store to more efficiently manage reusable sessions.
Update the logic to skip the system instruction when reusing a session to save tokens and speed up model response time.
Update project dependencies.

They are no longer needed since the underlying library issue has been resolved.

…ch links

…h link matching

…to better handle heavy tasks

… separately

… client status checks

…probabilities, and token details; adjust response handling accordingly.

…tput_text` validator, rename `created` to `created_at`, and update response handling accordingly.

…roved streaming of response items. Refactor image generation handling for consistency and add compatibility with output content.

…t` and ensure consistent initialization in image output handling.

…anagement Add dedicated router for /images endpoint and refactor image handling logic for better modularity. Enhance temporary image management with secure naming, token verification, and cleanup functionality.

…larity

…l and refactor variable handling

…y` for tools, tool_choice, and streaming settings

…nd update response handling for consistency

…mat for compatibility

- Moved utility functions like `strip_code_fence`, `extract_tool_calls`, and `iter_stream_segments` to a centralized helper module. - Removed unused and redundant private methods from `chat.py`, including `_strip_code_fence`, `_strip_tagged_blocks`, and `_strip_system_hints`. - Updated imports and references across modules for consistency. - Simplified tool call and streaming logic by replacing inline implementations with shared helper functions.

- Replaced unused model placeholder in `config.yaml` with an empty list. - Added JSON parsing validators for `model_header` and `models` to enhance flexibility and error handling. - Improved validation to filter out incomplete model configurations.

…N support - Replaced prefix-based parsing with a root key approach. - Added JSON parsing to handle list-based model configurations. - Improved handling of errors and cleanup of environment variables.

…to Python literals - Added `ast.literal_eval` as a fallback for parsing environment variables when JSON decoding fails. - Improved error handling and logging for invalid configurations. - Ensured proper cleanup of environment variables post-parsing.

- Adjusted `TOOL_CALL_RE` regex pattern for better accuracy.

…nvironment variable setup

…nvironment variables; enhance error logging in config validation

…tring or list structure for enhanced flexibility in automated environments

… multiple chunks

…s found in either the raw or cleaned history.

…s found.

… for better Gemini compatibility.

…s found.

…eeds METADATA_TTL_MINUTES.

…tion from being saved

…ystem instruction when reusing a session to save tokens.

Copilot

Pull request overview

Refactors request/response handling and LMDB session reuse to reduce token usage and improve runtime performance, primarily by switching to orjson and reusing Gemini sessions more aggressively.

Changes:

Replace stdlib json usage with orjson across helpers, config parsing, and chat/response flows; set FastAPI default response to ORJSONResponse.
Enhance LMDB hashing/sanitization logic to better support session reuse and consistent conversation lookup.
Add session-reuse optimizations to skip re-sending heavy system/tool instructions and improve message splitting behavior.

Reviewed changes

Copilot reviewed 8 out of 9 changed files in this pull request and generated 7 comments.

Show a summary per file

File	Description
uv.lock	Bumps dependency lockfile entries (FastAPI, Uvicorn, orjson, ruff, etc.).
pyproject.toml	Updates dependency constraints and adds `orjson`.
app/utils/helper.py	Moves tool-call JSON parsing to `orjson`; changes tool call ID generation.
app/utils/config.py	Switches env/config JSON parsing to `orjson`.
app/services/lmdb.py	Refactors message/conversation hashing and assistant-message sanitization for reuse consistency.
app/services/client.py	Uses `orjson` for tool-call argument normalization; minor formatting changes.
app/server/chat.py	Adds session reuse optimizations, `orjson` structured parsing, TTL logic, and revised request splitting.
app/models/models.py	Adds `tool_call_id` and centralizes `developer`→`system` role normalization.
app/main.py	Sets FastAPI `default_response_class=ORJSONResponse`.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

app/server/chat.py

Copilot · 2026-01-25T18:16:38Z

app/utils/helper.py

+        # Generate a deterministic ID based on name, arguments, and index to avoid collisions
+        seed = f"{name}:{arguments}:{index}".encode("utf-8")


extract_tool_calls() now generates deterministic tool call IDs using only name, canonicalized arguments, and the per-block index. Because index restarts at 0 for each fenced block, identical tool calls in different blocks can produce the same call_id, which violates the expectation that ToolCall.id values are unique within a message/conversation and can break tool_call_id mapping for tool responses. Include a globally unique component (e.g., the current len(tool_calls) at append-time, a monotonically increasing counter across the whole text, or incorporate the match start offset) into the hash seed to guarantee uniqueness across all extracted calls.

Suggested change

# Generate a deterministic ID based on name, arguments, and index to avoid collisions

seed = f"{name}:{arguments}:{index}".encode("utf-8")

# Generate a deterministic ID based on name, arguments, per-block index, and a

# globally increasing index (current tool_calls length) to avoid collisions

global_index = len(tool_calls)

seed = f"{name}:{arguments}:{index}:{global_index}".encode("utf-8")

Copilot · 2026-01-25T18:16:38Z

app/services/lmdb.py

+        if text_parts is not None:
+            text_content = "".join(text_parts).replace("\r\n", "\n").strip()
+            core_data["content"] = text_content if text_content else None


In _hash_message(), when message.content is a list of text items you concatenate them with "".join(text_parts). This makes the hash non-injective: e.g., ["ab","c"] and ["a","bc"] hash identically, which can cause hash collisions and incorrect session reuse / conversation lookup. Preserve boundaries by joining with an unambiguous separator (e.g., "\n") or by hashing a structured list representation (including item order/lengths) instead of raw concatenation.

luuquangvu added 30 commits November 21, 2025 12:44

Remove the unused auto-refresh functionality and related imports.

a3dfe70

They are no longer needed since the underlying library issue has been resolved.

Enhance error handling in client initialization and message sending

3a692ab

Refactor link handling to extract file paths and simplify Google sear…

d57e367

…ch links

Fix regex pattern for Google search link matching

ccd55f9

Fix regex patterns for Markdown escaping, code fence and Google searc…

37632b3

…h link matching

Increase timeout value in configuration files from 60 to 120 seconds …

b11cfcc

…to better handle heavy tasks

Merge branch 'Nativu5:main' into main

f0bff2d

Merge branch 'Nativu5:main' into main

5b4eaca

Merge branch 'Nativu5:main' into main

b36a682

Fix Image generation

f00ebfc

Refactor tool handling to support standard and image generation tools…

d911c33

… separately

Fix: use "ascii" decoding for base64-encoded image data consistency

a8241ad

Merge branch 'Nativu5:main' into main

5d55780

Fix: replace running with _running for internal client status checks

fd2723d

Refactor: replace direct _running access with running() method in…

8ee6cc0

… client status checks

Merge remote-tracking branch 'upstream/main'

0be8aef

Extend models with new fields for annotations, reasoning, audio, log …

453700e

…probabilities, and token details; adjust response handling accordingly.

Extend models with new fields (annotations, error), add `normalize_ou…

9260f8b

…tput_text` validator, rename `created` to `created_at`, and update response handling accordingly.

Extend response models to support tool choices, image output, and imp…

d6a8e6b

…roved streaming of response items. Refactor image generation handling for consistency and add compatibility with output content.

Set default text value to an empty string for `ResponseOutputConten…

16435a2

…t` and ensure consistent initialization in image output handling.

feat: Add /images endpoint with dedicated router and improved image m…

fc99c2d

…anagement Add dedicated router for /images endpoint and refactor image handling logic for better modularity. Enhance temporary image management with secure naming, token verification, and cleanup functionality.

feat: Add token-based verification for image access

2844176

Refactor: rename image store directory to ai_generated_images for c…

4509c14

…larity

fix: Update create_response to use FastAPI Request object for base_ur…

75e2f61

…l and refactor variable handling

fix: Correct attribute access in request_data handling within `chat.p…

bde6d0d

…y` for tools, tool_choice, and streaming settings

fix: Save generated images to persistent storage

601451a

fix: Remove unused output_image type from ResponseOutputContent a…

893eb6d

…nd update response handling for consistency

fix: Update image URL generation in chat response to use Markdown for…

80462b5

…mat for compatibility

Merge branch 'Nativu5:main' into main

af91c4f

Merge branch 'Nativu5:main' into main

f088b5f

luuquangvu added 26 commits December 31, 2025 09:52

fix: Handle None input in estimate_tokens and return 0 for empty text

a1bc8e2

refactor: Simplify Gemini model environment variable parsing with JSO…

61c5f3b

…N support - Replaced prefix-based parsing with a root key approach. - Added JSON parsing to handle list-based model configurations. - Improved handling of errors and cleanup of environment variables.

fix: Improve regex patterns in helper module

476b9dd

- Adjusted `TOOL_CALL_RE` regex pattern for better accuracy.

docs: Update README files to include custom model configuration and e…

35c1e99

…nvironment variable setup

fix: Remove unused headers from HTTP client in helper module

9b81621

fix: Update README and README.zh to clarify model configuration via e…

32a48dc

…nvironment variables; enhance error logging in config validation

Update README and README.zh to clarify model configuration via JSON s…

0c00b08

…tring or list structure for enhanced flexibility in automated environments

Merge branch 'Nativu5:main' into main

e2233f4

Refactor: compress JSON content to save tokens and streamline sending…

b599d99

… multiple chunks

Refactor: Modify the LMDB store to fix issues where no conversation i…

186b844

…s found in either the raw or cleaned history.

Refactor: Modify the LMDB store to fix issues where no conversation i…

6dd1fec

…s found.

Refactor: Update all functions to use orjson for better performance

20ed245

Update project dependencies

f67fe63

Fix IDE warnings

889f2d2

Incorrect IDE warnings

66b6202

Refactor: Modify the LMDB store to fix issues where no conversation i…

3297f53

…s found.

Refactor: Centralized the mapping of the 'developer' role to 'system'…

5399b26

… for better Gemini compatibility.

Refactor: Modify the LMDB store to fix issues where no conversation i…

de01c78

…s found.

Refactor: Modify the LMDB store to fix issues where no conversation i…

1964147

…s found.

Refactor: Modify the LMDB store to fix issues where no conversation i…

8c5c749

…s found.

Refactor: Avoid reusing an existing chat session if its idle time exc…

ce67d66

…eeds METADATA_TTL_MINUTES.

Refactor: Update the LMDB store to resolve issues preventing conversa…

3d32d12

…tion from being saved

Refactor: Update the _prepare_messages_for_model helper to omit the s…

2eb9f05

…ystem instruction when reusing a session to save tokens.

luuquangvu requested a review from Nativu5 January 24, 2026 14:41

Nativu5 requested a review from Copilot January 25, 2026 18:11

Copilot started reviewing on behalf of Nativu5 January 25, 2026 18:11 View session

Copilot AI reviewed Jan 25, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor: Optimize performance by reducing token usage and speed up model response time. #89

Refactor: Optimize performance by reducing token usage and speed up model response time. #89

Uh oh!

luuquangvu commented Jan 24, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI Jan 25, 2026

Uh oh!

Copilot AI Jan 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

		# Generate a deterministic ID based on name, arguments, and index to avoid collisions
		seed = f"{name}:{arguments}:{index}".encode("utf-8")

-        # Generate a deterministic ID based on name, arguments, and index to avoid collisions
-        seed = f"{name}:{arguments}:{index}".encode("utf-8")
+        # Generate a deterministic ID based on name, arguments, per-block index, and a
+        # globally increasing index (current tool_calls length) to avoid collisions
+        global_index = len(tool_calls)
+        seed = f"{name}:{arguments}:{index}:{global_index}".encode("utf-8")

Refactor: Optimize performance by reducing token usage and speed up model response time. #89

Are you sure you want to change the base?

Refactor: Optimize performance by reducing token usage and speed up model response time. #89

Uh oh!

Conversation

luuquangvu commented Jan 24, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI Jan 25, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 25, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant