Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
62 commits
Select commit Hold shift + click to select a range
a3dfe70
Remove the unused auto-refresh functionality and related imports.
luuquangvu Nov 21, 2025
3a692ab
Enhance error handling in client initialization and message sending
luuquangvu Nov 22, 2025
d57e367
Refactor link handling to extract file paths and simplify Google sear…
luuquangvu Nov 22, 2025
ccd55f9
Fix regex pattern for Google search link matching
luuquangvu Nov 22, 2025
37632b3
Fix regex patterns for Markdown escaping, code fence and Google searc…
luuquangvu Nov 22, 2025
b11cfcc
Increase timeout value in configuration files from 60 to 120 seconds …
luuquangvu Nov 22, 2025
f0bff2d
Merge branch 'Nativu5:main' into main
luuquangvu Nov 24, 2025
5b4eaca
Merge branch 'Nativu5:main' into main
luuquangvu Nov 26, 2025
b36a682
Merge branch 'Nativu5:main' into main
luuquangvu Nov 29, 2025
f00ebfc
Fix Image generation
luuquangvu Dec 2, 2025
d911c33
Refactor tool handling to support standard and image generation tools…
luuquangvu Dec 2, 2025
a8241ad
Fix: use "ascii" decoding for base64-encoded image data consistency
luuquangvu Dec 2, 2025
5d55780
Merge branch 'Nativu5:main' into main
luuquangvu Dec 3, 2025
fd2723d
Fix: replace `running` with `_running` for internal client status checks
luuquangvu Dec 3, 2025
8ee6cc0
Refactor: replace direct `_running` access with `running()` method in…
luuquangvu Dec 3, 2025
0be8aef
Merge remote-tracking branch 'upstream/main'
luuquangvu Dec 3, 2025
453700e
Extend models with new fields for annotations, reasoning, audio, log …
luuquangvu Dec 3, 2025
9260f8b
Extend models with new fields (annotations, error), add `normalize_ou…
luuquangvu Dec 3, 2025
d6a8e6b
Extend response models to support tool choices, image output, and imp…
luuquangvu Dec 4, 2025
16435a2
Set default `text` value to an empty string for `ResponseOutputConten…
luuquangvu Dec 4, 2025
fc99c2d
feat: Add /images endpoint with dedicated router and improved image m…
luuquangvu Dec 4, 2025
2844176
feat: Add token-based verification for image access
luuquangvu Dec 4, 2025
4509c14
Refactor: rename image store directory to `ai_generated_images` for c…
luuquangvu Dec 4, 2025
75e2f61
fix: Update create_response to use FastAPI Request object for base_ur…
luuquangvu Dec 4, 2025
bde6d0d
fix: Correct attribute access in request_data handling within `chat.p…
luuquangvu Dec 4, 2025
601451a
fix: Save generated images to persistent storage
luuquangvu Dec 4, 2025
893eb6d
fix: Remove unused `output_image` type from `ResponseOutputContent` a…
luuquangvu Dec 4, 2025
80462b5
fix: Update image URL generation in chat response to use Markdown for…
luuquangvu Dec 4, 2025
af91c4f
Merge branch 'Nativu5:main' into main
luuquangvu Dec 4, 2025
f088b5f
Merge branch 'Nativu5:main' into main
luuquangvu Dec 6, 2025
8d49a72
fix: Enhance error handling for full-size image saving and add fallba…
luuquangvu Dec 8, 2025
d37eae0
fix: Use filename as image ID to ensure consistency in generated imag…
luuquangvu Dec 9, 2025
b9f776d
fix: Enhance tempfile saving by adding custom headers, content-type h…
luuquangvu Dec 16, 2025
4b5fe07
feat: Add support for custom Gemini models and model loading strategies
luuquangvu Dec 30, 2025
5cb29e8
feat: Improve Gemini model environment variable parsing and nested fi…
luuquangvu Dec 30, 2025
f25f16d
refactor: Consolidate utility functions and clean up unused code
luuquangvu Dec 31, 2025
a1bc8e2
fix: Handle None input in `estimate_tokens` and return 0 for empty text
luuquangvu Dec 31, 2025
a7e15d9
refactor: Simplify model configuration and add JSON parsing validators
luuquangvu Dec 31, 2025
61c5f3b
refactor: Simplify Gemini model environment variable parsing with JSO…
luuquangvu Dec 31, 2025
efd056c
fix: Enhance Gemini model environment variable parsing with fallback …
luuquangvu Dec 31, 2025
476b9dd
fix: Improve regex patterns in helper module
luuquangvu Dec 31, 2025
35c1e99
docs: Update README files to include custom model configuration and e…
luuquangvu Jan 13, 2026
9b81621
fix: Remove unused headers from HTTP client in helper module
luuquangvu Jan 13, 2026
32a48dc
fix: Update README and README.zh to clarify model configuration via e…
luuquangvu Jan 15, 2026
0c00b08
Update README and README.zh to clarify model configuration via JSON s…
luuquangvu Jan 15, 2026
e2233f4
Merge branch 'Nativu5:main' into main
luuquangvu Jan 22, 2026
b599d99
Refactor: compress JSON content to save tokens and streamline sending…
luuquangvu Jan 23, 2026
186b844
Refactor: Modify the LMDB store to fix issues where no conversation i…
luuquangvu Jan 23, 2026
6dd1fec
Refactor: Modify the LMDB store to fix issues where no conversation i…
luuquangvu Jan 24, 2026
20ed245
Refactor: Update all functions to use orjson for better performance
luuquangvu Jan 24, 2026
f67fe63
Update project dependencies
luuquangvu Jan 24, 2026
889f2d2
Fix IDE warnings
luuquangvu Jan 24, 2026
66b6202
Incorrect IDE warnings
luuquangvu Jan 24, 2026
3297f53
Refactor: Modify the LMDB store to fix issues where no conversation i…
luuquangvu Jan 24, 2026
5399b26
Refactor: Centralized the mapping of the 'developer' role to 'system'…
luuquangvu Jan 24, 2026
de01c78
Refactor: Modify the LMDB store to fix issues where no conversation i…
luuquangvu Jan 24, 2026
1964147
Refactor: Modify the LMDB store to fix issues where no conversation i…
luuquangvu Jan 24, 2026
8c5c749
Refactor: Modify the LMDB store to fix issues where no conversation i…
luuquangvu Jan 24, 2026
ce67d66
Refactor: Avoid reusing an existing chat session if its idle time exc…
luuquangvu Jan 24, 2026
3d32d12
Refactor: Update the LMDB store to resolve issues preventing conversa…
luuquangvu Jan 24, 2026
2eb9f05
Refactor: Update the _prepare_messages_for_model helper to omit the s…
luuquangvu Jan 24, 2026
ade61d6
Refactor: Modify the logic to convert a large prompt into a temporary…
luuquangvu Jan 26, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions app/main.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@
from contextlib import asynccontextmanager

from fastapi import FastAPI
from fastapi.responses import ORJSONResponse
from loguru import logger

from .server.chat import router as chat_router
Expand Down Expand Up @@ -92,6 +93,7 @@ def create_app() -> FastAPI:
description="OpenAI-compatible API for Gemini Web",
version="1.0.0",
lifespan=lifespan,
default_response_class=ORJSONResponse,
)

add_cors_middleware(app)
Expand Down
8 changes: 8 additions & 0 deletions app/models/models.py
Original file line number Diff line number Diff line change
Expand Up @@ -24,11 +24,19 @@ class Message(BaseModel):
content: Union[str, List[ContentItem], None] = None
name: Optional[str] = None
tool_calls: Optional[List["ToolCall"]] = None
tool_call_id: Optional[str] = None
refusal: Optional[str] = None
reasoning_content: Optional[str] = None
audio: Optional[Dict[str, Any]] = None
annotations: List[Dict[str, Any]] = Field(default_factory=list)

@model_validator(mode="after")
def normalize_role(self) -> "Message":
"""Normalize 'developer' role to 'system' for Gemini compatibility."""
if self.role == "developer":
self.role = "system"
return self


class Choice(BaseModel):
"""Choice model"""
Expand Down
187 changes: 105 additions & 82 deletions app/server/chat.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
import base64
import json
import re
import tempfile
import uuid
from dataclasses import dataclass
from datetime import datetime, timezone
Expand Down Expand Up @@ -57,6 +57,7 @@
# Maximum characters Gemini Web can accept in a single request (configurable)
MAX_CHARS_PER_REQUEST = int(g_config.gemini.max_chars_per_request * 0.9)
CONTINUATION_HINT = "\n(More messages to come, please reply with just 'ok.')"
METADATA_TTL_MINUTES = 15

router = APIRouter()

Expand Down Expand Up @@ -95,7 +96,7 @@ def _build_structured_requirement(
schema_name = json_schema.get("name") or "response"
strict = json_schema.get("strict", True)

pretty_schema = json.dumps(schema, ensure_ascii=False, indent=2, sort_keys=True)
pretty_schema = orjson.dumps(schema, option=orjson.OPT_SORT_KEYS).decode("utf-8")
instruction_parts = [
"You must respond with a single valid JSON document that conforms to the schema shown below.",
"Do not include explanations, comments, or any text before or after the JSON.",
Expand Down Expand Up @@ -135,7 +136,7 @@ def _build_tool_prompt(
description = function.description or "No description provided."
lines.append(f"Tool `{function.name}`: {description}")
if function.parameters:
schema_text = json.dumps(function.parameters, ensure_ascii=False, indent=2)
schema_text = orjson.dumps(function.parameters).decode("utf-8")
lines.append("Arguments JSON schema:")
lines.append(schema_text)
else:
Expand Down Expand Up @@ -266,31 +267,35 @@ def _prepare_messages_for_model(
tools: list[Tool] | None,
tool_choice: str | ToolChoiceFunction | None,
extra_instructions: list[str] | None = None,
inject_system_defaults: bool = True,
) -> list[Message]:
"""Return a copy of messages enriched with tool instructions when needed."""
prepared = [msg.model_copy(deep=True) for msg in source_messages]

instructions: list[str] = []
if tools:
tool_prompt = _build_tool_prompt(tools, tool_choice)
if tool_prompt:
instructions.append(tool_prompt)

if extra_instructions:
instructions.extend(instr for instr in extra_instructions if instr)
logger.debug(
f"Applied {len(extra_instructions)} extra instructions for tool/structured output."
)
if inject_system_defaults:
if tools:
tool_prompt = _build_tool_prompt(tools, tool_choice)
if tool_prompt:
instructions.append(tool_prompt)

if extra_instructions:
instructions.extend(instr for instr in extra_instructions if instr)
logger.debug(
f"Applied {len(extra_instructions)} extra instructions for tool/structured output."
)

if not _conversation_has_code_hint(prepared):
instructions.append(CODE_BLOCK_HINT)
logger.debug("Injected default code block hint for Gemini conversation.")
if not _conversation_has_code_hint(prepared):
instructions.append(CODE_BLOCK_HINT)
logger.debug("Injected default code block hint for Gemini conversation.")

if not instructions:
# Still need to ensure XML hint for the last user message if tools are present
if tools and tool_choice != "none":
_append_xml_hint_to_last_user_message(prepared)
return prepared

combined_instructions = "\n\n".join(instructions)

if prepared and prepared[0].role == "system" and isinstance(prepared[0].content, str):
existing = prepared[0].content or ""
separator = "\n\n" if existing else ""
Expand Down Expand Up @@ -318,8 +323,6 @@ def _response_items_to_messages(
normalized_input: list[ResponseInputItem] = []
for item in items:
role = item.role
if role == "developer":
role = "system"

content = item.content
normalized_contents: list[ResponseInputContent] = []
Expand Down Expand Up @@ -371,9 +374,7 @@ def _response_items_to_messages(
ResponseInputItem(type="message", role=item.role, content=normalized_contents or [])
)

logger.debug(
f"Normalized Responses input: {len(normalized_input)} message items (developer roles mapped to system)."
)
logger.debug(f"Normalized Responses input: {len(normalized_input)} message items.")
return messages, normalized_input


Expand All @@ -393,8 +394,6 @@ def _instructions_to_messages(
continue

role = item.role
if role == "developer":
role = "system"

content = item.content
if isinstance(content, str):
Expand Down Expand Up @@ -532,8 +531,14 @@ async def create_chat_completion(
)

if session:
# Optimization: When reusing a session, we don't need to resend the heavy tool definitions
# or structured output instructions as they are already in the Gemini session history.
messages_to_send = _prepare_messages_for_model(
remaining_messages, request.tools, request.tool_choice, extra_instructions
remaining_messages,
request.tools,
request.tool_choice,
extra_instructions,
inject_system_defaults=False,
)
if not messages_to_send:
raise HTTPException(
Expand Down Expand Up @@ -624,8 +629,8 @@ async def create_chat_completion(
detail="LLM returned an empty response while JSON schema output was requested.",
)
try:
structured_payload = json.loads(cleaned_visible)
except json.JSONDecodeError as exc:
structured_payload = orjson.loads(cleaned_visible)
except orjson.JSONDecodeError as exc:
logger.warning(
f"Failed to decode JSON for structured response (schema={structured_requirement.schema_name}): "
f"{cleaned_visible}"
Expand All @@ -635,7 +640,7 @@ async def create_chat_completion(
detail="LLM returned invalid JSON for the requested response_format.",
) from exc

canonical_output = json.dumps(structured_payload, ensure_ascii=False)
canonical_output = orjson.dumps(structured_payload).decode("utf-8")
visible_output = canonical_output
storage_output = canonical_output

Expand All @@ -644,17 +649,20 @@ async def create_chat_completion(

# After formatting, persist the conversation to LMDB
try:
last_message = Message(
current_assistant_message = Message(
role="assistant",
content=storage_output or None,
tool_calls=tool_calls or None,
)
cleaned_history = db.sanitize_assistant_messages(request.messages)
# Sanitize the entire history including the new message to ensure consistency
full_history = [*request.messages, current_assistant_message]
cleaned_history = db.sanitize_assistant_messages(full_history)

conv = ConversationInStore(
model=model.model_name,
client_id=client.id,
metadata=session.metadata,
messages=[*cleaned_history, last_message],
messages=cleaned_history,
)
key = db.store(conv)
logger.debug(f"Conversation saved to LMDB with key: {key}")
Expand Down Expand Up @@ -782,9 +790,10 @@ async def _build_payload(
if reuse_session:
messages_to_send = _prepare_messages_for_model(
remaining_messages,
tools=None,
tool_choice=None,
extra_instructions=extra_instructions or None,
tools=request_data.tools, # Keep for XML hint logic
tool_choice=request_data.tool_choice,
extra_instructions=None, # Already in session history
inject_system_defaults=False,
)
if not messages_to_send:
raise HTTPException(
Expand Down Expand Up @@ -864,8 +873,8 @@ async def _build_payload(
detail="LLM returned an empty response while JSON schema output was requested.",
)
try:
structured_payload = json.loads(cleaned_visible)
except json.JSONDecodeError as exc:
structured_payload = orjson.loads(cleaned_visible)
except orjson.JSONDecodeError as exc:
logger.warning(
f"Failed to decode JSON for structured response (schema={structured_requirement.schema_name}): "
f"{cleaned_visible}"
Expand All @@ -875,7 +884,7 @@ async def _build_payload(
detail="LLM returned invalid JSON for the requested response_format.",
) from exc

canonical_output = json.dumps(structured_payload, ensure_ascii=False)
canonical_output = orjson.dumps(structured_payload).decode("utf-8")
assistant_text = canonical_output
storage_output = canonical_output
logger.debug(
Expand Down Expand Up @@ -996,17 +1005,19 @@ async def _build_payload(
)

try:
last_message = Message(
current_assistant_message = Message(
role="assistant",
content=storage_output or None,
tool_calls=detected_tool_calls or None,
)
cleaned_history = db.sanitize_assistant_messages(messages)
full_history = [*messages, current_assistant_message]
cleaned_history = db.sanitize_assistant_messages(full_history)

conv = ConversationInStore(
model=model.model_name,
client_id=client.id,
metadata=session.metadata,
messages=[*cleaned_history, last_message],
messages=cleaned_history,
)
key = db.store(conv)
logger.debug(f"Conversation saved to LMDB with key: {key}")
Expand Down Expand Up @@ -1050,19 +1061,35 @@ async def _find_reusable_session(

# Start with the full history and iteratively trim from the end.
search_end = len(messages)

while search_end >= 2:
search_history = messages[:search_end]

# Only try to match if the last stored message would be assistant/system.
if search_history[-1].role in {"assistant", "system"}:
# Only try to match if the last stored message would be assistant/system/tool before querying LMDB.
if search_history[-1].role in {"assistant", "system", "tool"}:
try:
if conv := db.find(model.model_name, search_history):
client = await pool.acquire(conv.client_id)
session = client.start_chat(metadata=conv.metadata, model=model)
remain = messages[search_end:]
return session, client, remain
# Check if metadata is too old
now = datetime.now()
updated_at = conv.updated_at or conv.created_at or now
age_minutes = (now - updated_at).total_seconds() / 60

if age_minutes <= METADATA_TTL_MINUTES:
client = await pool.acquire(conv.client_id)
session = client.start_chat(metadata=conv.metadata, model=model)
remain = messages[search_end:]
logger.debug(
f"Match found at prefix length {search_end}. Client: {conv.client_id}"
)
return session, client, remain
else:
logger.debug(
f"Matched conversation is too old ({age_minutes:.1f}m), skipping reuse."
)
except Exception as e:
logger.warning(f"Error checking LMDB for reusable session: {e}")
logger.warning(
f"Error checking LMDB for reusable session at length {search_end}: {e}"
)
break

# Trim one message and try again.
Expand All @@ -1072,52 +1099,48 @@ async def _find_reusable_session(


async def _send_with_split(session: ChatSession, text: str, files: list[Path | str] | None = None):
"""Send text to Gemini, automatically splitting into multiple batches if it is
longer than ``MAX_CHARS_PER_REQUEST``.

Every intermediate batch (that is **not** the last one) is suffixed with a hint
telling Gemini that more content will come, and it should simply reply with
"ok". The final batch carries any file uploads and the real user prompt so
that Gemini can produce the actual answer.
"""
Send text to Gemini. If text is longer than ``MAX_CHARS_PER_REQUEST``,
it is converted into a temporary text file attachment to avoid splitting issues.
"""
if len(text) <= MAX_CHARS_PER_REQUEST:
# No need to split - a single request is fine.
try:
return await session.send_message(text, files=files)
except Exception as e:
logger.exception(f"Error sending message to Gemini: {e}")
raise
hint_len = len(CONTINUATION_HINT)
chunk_size = MAX_CHARS_PER_REQUEST - hint_len

chunks: list[str] = []
pos = 0
total = len(text)
while pos < total:
end = min(pos + chunk_size, total)
chunk = text[pos:end]
pos = end

# If this is NOT the last chunk, add the continuation hint.
if end < total:
chunk += CONTINUATION_HINT
chunks.append(chunk)

# Fire off all but the last chunk, discarding the interim "ok" replies.
for chk in chunks[:-1]:

logger.info(
f"Message length ({len(text)}) exceeds limit ({MAX_CHARS_PER_REQUEST}). Converting text to file attachment."
)

# Create a temporary directory to hold the message.txt file
# This ensures the filename is exactly 'message.txt' as expected by the instruction.
with tempfile.TemporaryDirectory() as tmpdirname:
temp_file_path = Path(tmpdirname) / "message.txt"
temp_file_path.write_text(text, encoding="utf-8")

try:
await session.send_message(chk)
# Prepare the files list
final_files = list(files) if files else []
final_files.append(temp_file_path)

instruction = (
"The user's input exceeds the character limit and is provided in the attached file `message.txt`.\n\n"
"**System Instruction:**\n"
"1. Read the content of `message.txt`.\n"
"2. Treat that content as the **primary** user prompt for this turn.\n"
"3. Execute the instructions or answer the questions found *inside* that file immediately.\n"
)

logger.debug(f"Sending prompt as temporary file: {temp_file_path}")

return await session.send_message(instruction, files=final_files)

except Exception as e:
logger.exception(f"Error sending chunk to Gemini: {e}")
logger.exception(f"Error sending large text as file to Gemini: {e}")
raise

# The last chunk carries the files (if any) and we return its response.
try:
return await session.send_message(chunks[-1], files=files)
except Exception as e:
logger.exception(f"Error sending final chunk to Gemini: {e}")
raise


def _create_streaming_response(
model_output: str,
Expand Down
Loading