fix: Code Review 全量修復 (28 issues, 2026-03-25) by jlin53882 · Pull Request #43 · jlin53882/Minecraft-translate

jlin53882 · 2026-03-25T14:33:10Z

Code Review 全量修復 PR

修復內容

本 PR 修復 CODE_REVIEW_REPORT_20260325_BY_PATH.md 中記錄的所有 28 個問題。

CRITICAL（需立即合併）

Issue	檔案	問題
#1	`ftbquests_lmtranslator.py`	`unshield_text()` 傳入 `ShieldedText` 而非 `.shields`
#1b	`patch_md_lmtranslator.py`	同上
#2	`lm_api_client.py`	API Key 直接暴露在 URL（安全漏洞）
#3	`lm_translator_main.py`	Dict 作為 System Prompt 傳入 API

HIGH

Issue	檔案	問題
#4	`_task_runner.py`	`session.finish()` 在 exception path 未呼叫
#5	`lm_service.py` / `extract_service.py`	錯誤路徑未呼叫 `flush()`
#6	`cache_manager.py`	`clear_dirty()` 在寫入前執行
#7/#7b	`ftbquests_lmtranslator.py` / `kubejs_tooltip_lmtranslator.py`	雙重 JSON 讀取效能問題
#8	`ftbquests_lmtranslator.py`	Callbacks 在迴圈內定義
#9	`lm_config_rules.py`	API Key 驗證過寬
#10	`lm_translator_shared_recording.py`	CSV injection 漏洞
#11	`md_lmtranslator.py`	Re-shielding 邏輯錯誤
#12	`lm_response_parser.py`	Greedy regex 匹配無效 JSON

MEDIUM + LOW

所有 MEDIUM (#13-#23) 及 LOW (#24-#27) 問題均已修復。

驗證

所有修改檔案已通過 python -m py_compile 語法檢查
修改前皆有建立 .bak 備份

- 在 clean_kubejs_from_raw_impl 中，pending_en 寫入前新增 reverse_index dedup - 若某英文文字（value）已出現在 final/zh_tw.json（不同 key），則跳過不寫入 pending - 避免同一英文原文因不同 key 而重複翻譯

- 新增 ColorCharError dataclass，記錄非法顏色字元錯誤 - 實作 COLOR_PATTERN 正則：& 後只能接 a-v（不含 w）、0-9、空格、\、# - check_color_chars()：檢查單一字串中的非法顏色字元 - check_json_file()：讀取 JSON 並遞迴檢查所有字串值 - check_directory()：遞迴檢查目錄下所有 .json 檔 - 遵循現有 checkers Generator yield 模式

…ipelines Commit 1: feat(rich-text-shield) - add core module - New module: translation_tool/plugins/shared/rich_text_shield.py - ShieldPiece / ShieldedText dataclasses - shield_text(): 抽出7種不應翻譯的格式片段（彩色碼/物品ID/URL/逸出\&/圖片/事件JSON/翻頁） - unshield_text(): 還原所有佔位符 - add_escape_quotes(): JSON逸出修補（移植自FTBQL） - Updated shared/__init__.py 導出新模組 Commit 2: fix(kubejs-lm) - integrate shield/unshield into LM translation pipeline - collect_items_from_mapping(): shield_text() 寫入 _shielded，skip_reason 非None時保留原文 - on_translated_item(): unshield_text() 還原翻譯後文字 Commit 3: fix(kubejs-clean) - integrate shield into s2t pipeline (Phase 2) - kubejs_translator_clean.py: _shielded_convert() helper，保護 safe_convert_text_fn 呼叫點：deep_merge_3way_flat_impl() 和 client_scripts 處理 - 避免 OpenCC s2t 轉換時破壞 KubeJS 格式標記

- md_extract_qa: shield_text before writing pending JSON - md_inject_qa: unshield_text before writing back to MD - Item dataclass: add _shields field for shield restoration - _shield_item helper for clean shield/unshield roundtrip

prevent ShieldedText object from leaking into json.dumps()

- Previously session.finish() was only called in the success path - Now it's in the finally block to ensure it always executes - This prevents session leaks when exceptions occur

- 移除 if save_path.exists() 檢查，改用 try-except 捕捉 FileNotFoundError - 避免 TOCTOU (Time-of-Check to Time-of-Use) 問題：檢查與讀取之間檔案可能被刪除

- lm_service.py: added flush() after session.set_error() in exception handler - extract_service.py: added flush() in both run_lang_extraction_service and run_book_extraction_service exception handlers - This ensures buffered logs are flushed even when exceptions occur

- 新增 _counter_lock (threading.Lock) 保護模組層級計數器 - _next_color_placeholder / _next_item_placeholder / _next_escaped_placeholder 的讀取-遞增-寫入操作以鎖保護，避免 race condition

- RE_LANG_SEG 只定義一次（在第 61 行） - 移除第 87 行的重複定義，保留單一宣告

…s shielded_src.shields The unshield_text() function expects shields (list[ShieldPiece]), not the whole ShieldedText object. Fixed line 78 to pass shielded_src.shields. Issue: #1b

The cache hit path was missing unshield_text() call while the cache miss path properly called unshield_text(). Added shield handling to cache hit path to unify both code paths. Issue: #18

Added explicit check for empty shard files (0 bytes) in load_shard_file(). Previously empty files would cause JSONDecodeError and be logged as generic failure. Now they are specifically detected and logged as empty shard warnings. Issue: #19

… add_to_cache Added add_to_cache_batch() function that takes multiple entries and acquires the lock once for all entries, rather than acquiring/releasing for each entry. This reduces lock contention when many cache entries are added rapidly. Issue: #23

#13 - except Exception: pass 改為 logger.warning 記錄錯誤 - ftbquests_lmtranslator.py: 7處 - kubejs_tooltip_lmtranslator.py: 6處 - md_lmtranslator.py: 2處 - lm_translator_shared_loop.py: 5處 - cache_loader.py: 1處 - cache_search.py: 4處 #14 - cache_manager.py 初始化鎖定 - reload_translation_cache() now holds lock during reset+reload #15 - cache_shards.py TOCTOU Race 防護 - _save_entries_to_active_shards() 加入檔案鎖保護讀取-修改-寫入循環 #16 - cache_shards.py fsync 已存在（確認無需修改） #17 - md_lmtranslator.py skip_reason 處理 - 新增 skip_skipped 計數器 - skip_reason 非 None 的項目直接視為 cache hit

…t lock - lm_api_client.py: move API key from URL query to Authorization Bearer header (security) - lm_translator_main.py: convert dict system prompt to string before API call - cache_manager.py: add cache_lock to initialize_translation_cache (race condition fix)

- Issue #7: ftbquests_lmtranslator.py - Cache JSON mappings to avoid duplicate reads - Issue #7b: kubejs_tooltip_lmtranslator.py - Same fix for duplicate JSON reads - Issue #8: ftbquests_lmtranslator.py - Move callbacks outside loop (factory functions) - Issue #11: md_lmtranslator.py - Fix re-shielding logic (remove invalid else branch) - Issue #12: lm_response_parser.py - Use non-greedy regex to avoid matching invalid JSON # Conflicts: # translation_tool/plugins/ftbquests/ftbquests_lmtranslator.py # translation_tool/plugins/kubejs/kubejs_tooltip_lmtranslator.py # translation_tool/plugins/md/md_lmtranslator.py

chatgpt-codex-connector · 2026-03-25T14:33:16Z

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.
To continue using code reviews, you can upgrade your account or add credits to your account and enable them for code reviews in your settings.

greptile-apps · 2026-03-25T14:39:24Z

Greptile Summary

This PR claims to fix 28 issues identified in a code review report, spanning critical security fixes, logic corrections, and a broad shield/unshield rich-text protection layer integrated into the KubeJS, FTBQuests, and Markdown translation pipelines. While many lower-priority fixes (exception logging, finally block cleanup, CSV injection protection, system prompt dict→string defaults) are correctly implemented, several of the critical and high-priority fixes remain broken or introduce new regressions.

Issues from previous review threads that remain unresolved:

lm_api_client.py — Authorization: Bearer {api_key} still breaks all Gemini API calls; the correct header is x-goog-api-key
lm_response_parser.py — Non-greedy regex \{[\s\S]*?\} still fails on nested JSON (stops at the first })
cache_shards.py — import msvcrt (Windows-only) added in four places with no cross-platform fallback, breaking the entire cache on Linux/macOS
ftbquests_lmtranslator.py — Shield/unshield in on_translated_item is still a silent no-op: shield_text(src_text) is called after translation, generating fresh placeholders that were never present in the text sent to the LLM

New issues introduced by this PR:

lm_config_rules.py — Regex [_-a-zA-Z0-9] creates an ASCII range _-a (95–97), unintentionally allowing backtick and silently excluding dash despite the error message claiming dash is valid
kubejs_translator_clean.py — New dedup block builds reverse_index from zh_tw.json values (Traditional Chinese text) but looks up English source values against it; the comparison almost never matches, making the dedup dead code
rich_text_shield.py — Global module-level counters produce non-deterministic placeholder IDs across runs; if processing order changes, source_text cache keys in md_lmtranslator.py will differ between runs causing unnecessary cache misses
md_lmtranslator.py — The else branch in on_translated_item calls shield_text() on already-shielded text, producing empty shields and a silent no-op

Confidence Score: 1/5

Not safe to merge — four critical bugs from the previous review remain unresolved, including a broken Gemini auth header that silently fails all API calls and a Windows-only import that crashes the entire cache system on Linux/macOS.
Three of the originally flagged CRITICAL issues (wrong auth header, non-greedy regex, msvcrt cross-platform) are still broken and will cause immediate runtime failures. The FTBQuests shield no-op also remains. This PR additionally introduces two new P1 logic bugs (regex character range and dedup semantic mismatch). While the lower-priority improvements are genuine progress, the unresolved regressions make the PR unsafe to merge.
lm_api_client.py (broken auth), lm_response_parser.py (broken JSON regex), cache_shards.py (Windows-only import), ftbquests_lmtranslator.py (shield no-op), lm_config_rules.py (regex range bug), kubejs_translator_clean.py (dedup dead code)

Important Files Changed

Filename	Overview
translation_tool/core/lm_api_client.py	Attempted fix for API key in URL changes to `Authorization: Bearer {api_key}`, but the Gemini REST API requires `x-goog-api-key` header for static keys — `Bearer` is for OAuth2 access tokens. All API calls will return HTTP 401. This was flagged in a previous review thread and remains unresolved.
translation_tool/core/lm_response_parser.py	Changed greedy regex `\{[\s\S]\}` to non-greedy `\{[\s\S]?\}` — this breaks all nested JSON objects (stops at the first `}` rather than finding the outermost one). Flagged in previous review thread and remains unresolved.
translation_tool/utils/cache_shards.py	`import msvcrt` (Windows-only) added in four places: unused in `_write_json_atomic`, and used for file locking in `_rotate_shard_if_needed` and `_save_entries_to_active_shards`. All four will raise `ModuleNotFoundError` on Linux/macOS. Flagged in previous review thread and remains unresolved.
translation_tool/plugins/ftbquests/ftbquests_lmtranslator.py	`on_translated_item` calls `shield_text(src_text)` AFTER translation, generating fresh placeholders that were never injected into the text sent to the LLM — `unshield_text` is always a no-op. Flagged in a previous thread and remains unresolved. Other improvements (exception logging, cache recording) are correct.
translation_tool/core/lm_config_rules.py	New API key validation adds length and character checks, but the regex `[_-a-zA-Z0-9]` has a character range bug: `_-a` creates an ASCII range 95–97, unintentionally allowing backtick and silently excluding the dash character that the error message claims is valid.
translation_tool/core/kubejs_translator_clean.py	New `_shielded_convert` helper and dedup logic added. The `_shielded_convert` function is correct. However, the new reverse-index dedup block builds its index from `zh_tw.json` values (Traditional Chinese text) but then looks up English values from `pending_en`, so the check almost never fires and the dedup is effectively dead code.
translation_tool/plugins/shared/rich_text_shield.py	New module implementing shield/unshield for KubeJS rich text. Pattern matching and placeholder substitution logic is sound. Main concern: global module-level counters make placeholder IDs non-deterministic across runs if processing order changes, which can cause cache key mismatches for MD blocks.
translation_tool/plugins/md/md_lmtranslator.py	Shield/unshield integration added. `on_translated_item` primary path (using `_shielded`) is correct. The `else` branch that calls `shield_text(src_text)` on already-shielded text is a no-op and should be removed. Cache key stability depends on consistent processing order due to global counters in `rich_text_shield.py`.
translation_tool/plugins/kubejs/kubejs_tooltip_lmtranslator.py	Shield integration in `collect_items_from_mapping` and `on_translated_item` follows the correct pattern: shield at collection time, store `_shielded` in item dict, unshield in both cache-hit and cache-miss callbacks. Correctly uses `source_text` (original) for cache key.
translation_tool/utils/cache_manager.py	`initialize_translation_cache` now takes the lock before checking `initialized`, fixing a double-initialization race. `reload_translation_cache` refactored to inline the load logic; since `reset_runtime_state` mutates the existing state object in-place (confirmed by reading cache_store.py), the lock acquired on the old state remains valid throughout. New `add_to_cache_batch` helper reduces lock contention correctly.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A["Source Text\n(e.g. 'Hello &a World')"] --> B["md_extract_qa.py\n_shield_item()"]
    B --> |"shield_text() → $C0$\nstores _shields in JSON"| C["Pending JSON\ntext: 'Hello $C0$ World'\n_shields: [$C0$→&a]"]
    C --> D["md_lmtranslator.py\nhash_to_src[hash] = it.text"]
    D --> E["shield_text(already-shielded)\n⚠️ No patterns match\nshields = []  ← REDUNDANT"]
    E --> F["LLM receives\n'Hello $C0$ World'"]
    F --> G["LLM returns\n'你好 $C0$ 世界'"]
    G --> H{"_shielded.shields\nempty?"}
    H --> |"Yes (always)\nelse branch"| I["shield_text(src_text)\n⚠️ No-op — fresh\nplaceholders not in dst"]
    H --> |"No (never reached)"| J["unshield_text(dst, shields)"]
    I --> K["hash_to_dst[h] = '你好 $C0$ 世界'"]
    K --> L["md_inject_qa.py\napply_item_to_md_lines()"]
    L --> M["unshield_text(item.text, item._shields)\n✅ $C0$ → &a\n= '你好 &a 世界'"]

    style E fill:#ffe0b2,stroke:#e65100
    style I fill:#ffe0b2,stroke:#e65100
    style H fill:#ffe0b2,stroke:#e65100

Comments Outside Diff (4)

translation_tool/core/lm_config_rules.py, line 303 (link)

Regex character range allows backtick and silently excludes dash

The character class [_-a-zA-Z0-9] contains the expression _-a, which creates an ASCII range from _ (95) to a (97). This unintentionally includes the backtick character ` (96), which should be invalid in a Gemini API key.

Additionally, despite the error message explicitly stating that dashes (-) are allowed ("僅允許 'AIza' 開頭後接英文字母、數字、 dash(-) 或 underscore(_)"), the current regex does not actually allow the dash character (ASCII 45 is outside all the ranges). Keys containing - will be incorrectly rejected.

The same bug appears at line 332 in validate_api_keys_from_ui.
translation_tool/core/lm_config_rules.py, line 332 (link)

Same regex bug as line 303 — validate_api_keys_from_ui

The same [_-a-zA-Z0-9] character class issue is repeated here (backtick unintentionally allowed; dash silently rejected despite documentation claiming it is valid).
translation_tool/plugins/shared/rich_text_shield.py, line 1150-1185 (link)

Global counters produce non-deterministic placeholders, potentially breaking cache key stability across runs

_counter_color, _counter_item, and _counter_escaped are module-level globals that never reset between shield_text() calls (except explicitly via _reset_counters()). In a single translation session that processes N files, the placeholder IDs ( $C0$ , $C1$ , ...) depend on the order in which texts are processed.

In md_lmtranslator.py, the cache key is content_hash|source_text where source_text is the extract-step-shielded text (e.g. "Hello $C0$ World"). On the next run, if any other file is processed before this one, the global counter will have advanced, and the same original text produces "Hello $C5$ World" instead — a different source_text, so the cache lookup misses even though the text is identical.

The safest fix is to generate placeholder IDs from a local counter per shield_text() call rather than from global state:
```
def shield_text(text: str) -> ShieldedText:
    local_color = 0
    local_item = 0
    local_escaped = 0
    # use local_* instead of incrementing globals
```
Alternatively, derive placeholder IDs deterministically from the original text (e.g., hash-based). This would make placeholders stable regardless of processing order.
translation_tool/plugins/md/md_lmtranslator.py, line 956-961 (link)

else branch in on_translated_item is always a no-op and potentially misleading

When execution reaches the else branch, _shielded is either None or has an empty shields list. In both cases, src_text here is the already-shielded text produced by md_extract_qa.py (e.g. "Hello $C0$ World"). Calling shield_text(src_text) on this already-shielded text will not match any shield pattern (since $C0$ does not look like &a, &#RRGGBB, or #namespace:item), so shielded_src.shields will always be empty, and unshield_text(dst, []) is a no-op.

Rather than silently doing nothing with misleading log-on-exception coverage, this branch should be removed entirely. The correct unshielding for _shields from the extract step happens downstream in apply_item_to_md_lines via item._shields.

_{Reviews (2): Last reviewed commit: "fix: 修正 API key regex range operator + 測..." | Re-trigger Greptile}

greptile-apps · 2026-03-25T14:39:28Z

translation_tool/core/lm_response_parser.py


-    matches = re.findall(r"\{[\s\S]*\}", text)
+    # ✅ Issue #12 修復：使用 non-greedy regex 避免匹配無效的多重 JSON
+    matches = re.findall(r"\{[\s\S]*?\}", text)


Non-greedy regex breaks nested JSON objects

The switch from greedy \{[\s\S]*\} to non-greedy \{[\s\S]*?\} introduces a regression for the fallback path. The non-greedy quantifier stops at the first } it finds, which for any nested JSON object will produce an incomplete, invalid fragment.

For example, given the text:

Here is my translation: {"key1": "value", "nested": {"k": "v"}} done

Greedy regex matches: {"key1": "value", "nested": {"k": "v"}} ✅ (valid JSON)

Non-greedy regex matches: {"key1": "value", "nested": {"k": "v"} ❌ (stops at inner }, missing outer })

Since json.loads then fails on that partial match, and re.findall has already consumed the position, no valid JSON is returned and the function throws RuntimeError("JSON 解析失敗").

LLM responses that embed translation dictionaries (which this project relies on) commonly use nested objects. The correct fix for the original issue is to keep the greedy match but additionally try the largest valid candidate first:

Suggested change

matches = re.findall(r"\{[\s\S]*?\}", text)

matches = re.findall(r"\{[\s\S]*\}", text)

greptile-apps · 2026-03-25T14:39:29Z

translation_tool/utils/cache_shards.py

+    使用 fsync 確保資料寫入磁碟，避免作業系統緩衝區未 flush
+    就執行 os.replace() 導致資料遺失。
    """
+    import msvcrt


Windows-only msvcrt module used without cross-platform fallback

msvcrt is a Windows-only standard library module. Importing it on Linux or macOS raises ModuleNotFoundError: No module named 'msvcrt' at runtime, making the entire cache_shards.py module — and by extension the whole translation cache — unusable on non-Windows systems.

import msvcrt appears in four places in this file:

_write_json_atomic (line 25) — unused in that function (the actual call is os.fsync(), which is POSIX)

_rotate_shard_if_needed (line 90 inside try, line 113 inside finally)

_save_entries_to_active_shards (line 134)

For true cross-platform file locking the standard approach is to use fcntl.flock on POSIX and msvcrt.locking on Windows, with a runtime check:

import sys if sys.platform == "win32": import msvcrt def _lock_fd(fd): msvcrt.locking(fd, msvcrt.LK_LOCK, 1) def _unlock_fd(fd): msvcrt.locking(fd, msvcrt.LK_UNLCK, 1) else: import fcntl def _lock_fd(fd): fcntl.flock(fd, fcntl.LOCK_EX) def _unlock_fd(fd): fcntl.flock(fd, fcntl.LOCK_UN)

Additionally, the import msvcrt on line 25 (_write_json_atomic) serves no purpose since that function only calls os.fsync() and should be removed regardless.

greptile-apps · 2026-03-25T14:39:30Z

translation_tool/plugins/ftbquests/ftbquests_lmtranslator.py

+                try:
+                    shielded_src = shield_text(src_text)
+                    t = unshield_text(t, shielded_src.shields)
+                except Exception as e:
+                    log_info(f"[FTB-LM] unshield 失敗: {e}")


Shield/unshield is a no-op — FTB items are never shielded before translation

The on_translated_item callback calls shield_text(src_text) after the LLM has already translated t. This freshly-created ShieldedText contains brand-new globally-counted placeholders (e.g. $C42$ , $P7$ ) that were never injected into the source text sent to the model, so they cannot exist in the translated result t. The subsequent unshield_text(t, shielded_src.shields) therefore always returns t unchanged — the entire shield operation is a silent no-op.

Compare this to the correct implementation in kubejs_tooltip_lmtranslator.py, where shielding happens during item collection in collect_items_from_mapping and the ShieldedText object is stored in the item dict under _shielded:

# kubejs — correct pattern: shielded = shield_text(v) item["text"] = shielded.clean # shielded text sent to LM item["_shielded"] = shielded # stored for later unshield # in on_translated_item: shielded = it.get("_shielded") if shielded and shielded.shields: final_text = unshield_text(t, shielded.shields)

To properly protect FTB source text, shielding needs to be applied in the item-collection helper (around line 96–110 in ftbquests_lmtranslator.py) so that item["text"] carries the cleaned text and item["_shielded"] carries the shield list — matching the KubeJS approach.

greptile-apps · 2026-03-25T14:39:31Z

translation_tool/core/lm_api_client.py

+    headers = {
+        "Content-Type": "application/json",
+        "Authorization": f"Bearer {api_key}",
+    }


Wrong authentication header breaks all Gemini API calls

The Google Gemini REST API requires plain API keys to be passed via the x-goog-api-key header, as shown in the official API reference. The Authorization: Bearer pattern is the OAuth2 flow for short-lived access tokens, not static API keys — these are entirely different credential types.

Sending a static API key in an Authorization: Bearer header will produce HTTP 401 Unauthorized on every request, silently breaking all translation calls.

The previous URL query-string approach was functionally correct. The improvement here should be using x-goog-api-key as the header name (without any Bearer prefix), per the Gemini documentation.

- lm_config_rules.py: 將 [_-a-zA-Z0-9] 改為 [a-zA-Z0-9_-]，避免 - 被解讀為 range operator - test_lm_config_rules.py: 將假 key 改為 40 字合規格式 - test_pipeline_services_error_handling.py: 改為檢查 calls[-2] 預期 set_error

greptile-apps · 2026-03-25T14:53:49Z

translation_tool/core/kubejs_translator_clean.py

+                        if is_filled_text_impl(v):
+                            reverse_index.setdefault(v, []).append(k)
+
+                    # 過濾 pending_en：跳過那些「英文文字已存在於 final」的 key
+                    pending_en = {
+                        k: v
+                        for k, v in pending_en.items()
+                        if not (
+                            is_filled_text_impl(v)
+                            and v in reverse_index
+                            and k != reverse_index[v][0]
+                        )
+                    }
+            # ── 雙軌去重 end ───────────────────────────────────────────────
+
            if pending_en:
                dst_en = pending_root_p / rel_group / "en_us.json"
                write_json_fn(dst_en, pending_en)


Dedup reverse index compares English values against Traditional Chinese values — always a no-op

final_tw_lookup is loaded from zh_tw.json files. In this project's translation pipeline, zh_tw.json files contain Traditional Chinese text as values (the translation output). The reverse_index is therefore a {TW_value → [keys]} mapping.

However, the dedup filter then checks v in reverse_index where v is sourced from pending_en — an English text. Since English text will almost never equal a Traditional Chinese translation, the condition v in reverse_index will virtually never be True. The entire dedup block is effectively dead code.

The comment describes the intent as "若某英文文字（value）已出現在 final/zh_tw.json" — if that intent is to skip English source texts that already have an existing translation, the index needs to be built from en_us.json files (or from the keys of zh_tw.json mapped to their corresponding source English text):

# Build reverse index: {en_text → key} from already-translated pairs reverse_index: dict[str, str] = {} for en_file in final_root_p.rglob("en_us.json"): en_data = read_json_dict_fn(en_file) if en_data: for k, en_v in en_data.items(): if is_filled_text_impl(en_v): reverse_index.setdefault(en_v, []).append(k)

Or alternatively, if the intent is simply to deduplicate by English value within pending_en itself, no lookup against final_tw_lookup is needed at all.

問題： - reverse_index[v][0] 取字典第一個 key，依賴 rglob 迭代順序 - rglob 在不同環境/執行之間迭代順序可能不同 - 導致相同英文文字的去重判斷結果不穩定修復： - reverse_index 類型從 dict[str, list[str]] 改為 dict[str, str]（直接存確定的 key） - 選擇策略： 1. 優先取「已翻譯的 key」（zh_tw 值與 key 名不同，表示有真正翻譯） 2. 同優先級則取字母序最小者（確定性 tiebreaker） 3. 無已翻譯時，取字母序最小者 - 消除 [0] 的非確定依賴

問題：雙軌去重的 filter 條件 k != reverse_index[v] 中： - k 來自 raw/pending 的 key（英文原文 key） - reverse_index[v] 來自 final/zh_tw 的 key（同樣是英文原文 key）兩者來自不同命名空間，直接比對 key 幾乎不會成立，導致去重邏輯形同 no-op。修復：移除 k != reverse_index[v] 判斷，直接以 v in reverse_index 來決定是否跳過。若同一個翻譯結果 v 已出現在 final（即 v in reverse_index），就視為已處理，直接跳過不送 pending。 Note: reverse_index 的非確定性問題（同一英文文字對應多個 key 時 rglob 迭代順序不穩定）在 f3a5814 中已透過 rev_candidates + sorted tiebreaker 修復，本次僅處理 cross-namespace 比對失效問題。

…dict prompt - lm_response_parser.py: 改用 brace-counting parser 取代 non-greedy regex，正確處理巢狀 JSON - lm_translator_main.py: System Prompt dict 支援 content/text key 萃取 - 新增測試：test_cache_manager, test_cache_store, test_ftbquests_unshield_logic, test_kubejs_translator_clean, test_lm_api_client, test_lm_response_parser, test_lm_translator_main_prompts

…ty 順序 - tests/test_lm_translator_main.py: 移除 test_patchouli_prompt_dict_converted_to_string（module-level 常數無法被 mock，修復後與 test_lm_translator_main_prompts.py 重疊） - cache_manager.py: clear_dirty() 移至 _save_entries_to_active_shards() 成功後執行，確保 crash 時 dirty flag 正確

- Issue #7: ftbquests + kubejs 雙重 JSON 讀取 → src_mapping_cache 緩存 - Issue #8: ftbquests callbacks 改為工廠函式 make_on_translated_item/batch_flushed/progress - Issue #7b: kubejs 同樣加入 src_mapping_cache - kubejs_translator_clean.py: reverse_index 確定性 + cross-namespace 去重 - 確認 unshield_text 傳入 .shields 而非整個 ShieldedText

…ld.shields） - ftbquests_lmtranslator.py: src_mapping_cache 緩存 JSON mapping，callback 工廠函式 - kubejs_tooltip_lmtranslator.py: src_mapping_cache 緩存，確認 unshield.shields - 驗證：1161 tests passed

jlin53882 added 20 commits March 23, 2026 14:10

style: apply ruff format

0b07775

feat(shared): extend rich text shield to FTB and MD translators

5678eba

fix(md): integrate rich text shield in markdown translator

3f52edd

fix(md): strip _shielded before JSON serialization in dry-run preview

a264928

prevent ShieldedText object from leaking into json.dumps()

fix #4: session.finish() moved to finally block in _task_runner.py

0961404

- Previously session.finish() was only called in the success path - Now it's in the finally block to ensure it always executes - This prevents session leaks when exceptions occur

fix(cache_shards): 修復 Read-after-delete Race Condition (Issue #22)

6139ee4

- 移除 if save_path.exists() 檢查，改用 try-except 捕捉 FileNotFoundError - 避免 TOCTOU (Time-of-Check to Time-of-Use) 問題：檢查與讀取之間檔案可能被刪除

fix(rich_text_shield): 計數器加入執行緒安全鎖 (Issue #26)

7aa69bf

- 新增 _counter_lock (threading.Lock) 保護模組層級計數器 - _next_color_placeholder / _next_item_placeholder / _next_escaped_placeholder 的讀取-遞增-寫入操作以鎖保護，避免 race condition

fix(md_inject_qa): 移除重複的 RE_LANG_SEG 定義 (Issue #27)

1bc6db4

- RE_LANG_SEG 只定義一次（在第 61 行） - 移除第 87 行的重複定義，保留單一宣告

fix/#1b CRITICAL: unshield_text() in patch_md_lmtranslator.py require…

5590411

…s shielded_src.shields The unshield_text() function expects shields (list[ShieldPiece]), not the whole ShieldedText object. Fixed line 78 to pass shielded_src.shields. Issue: #1b

fix/#18 MEDIUM: unify unshield_text path for cache hit/miss in kubejs

75c8af2

The cache hit path was missing unshield_text() call while the cache miss path properly called unshield_text(). Added shield handling to cache hit path to unify both code paths. Issue: #18

greptile-apps bot reviewed Mar 25, 2026

View reviewed changes

jlin53882 added 5 commits March 25, 2026 22:58

jlin53882 force-pushed the main branch from c9836ca to e87226d Compare March 27, 2026 16:01

jlin53882 force-pushed the fix/all-code-review-issues branch from 7a8351a to c2a1a60 Compare March 27, 2026 16:01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: Code Review 全量修復 (28 issues, 2026-03-25)#43

fix: Code Review 全量修復 (28 issues, 2026-03-25)#43
jlin53882 wants to merge 27 commits intomainfrom
fix/all-code-review-issues

jlin53882 commented Mar 25, 2026

Uh oh!

chatgpt-codex-connector bot commented Mar 25, 2026

Uh oh!

greptile-apps bot commented Mar 25, 2026 •

edited

Loading

Comments Outside Diff (4)

Uh oh!

greptile-apps bot Mar 25, 2026

Uh oh!

greptile-apps bot Mar 25, 2026

Uh oh!

greptile-apps bot Mar 25, 2026

Uh oh!

greptile-apps bot Mar 25, 2026

Uh oh!

greptile-apps bot Mar 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

	matches = re.findall(r"\{[\s\S]*?\}", text)
	matches = re.findall(r"\{[\s\S]*\}", text)

Conversation

jlin53882 commented Mar 25, 2026

Code Review 全量修復 PR

修復內容

CRITICAL（需立即合併）

HIGH

MEDIUM + LOW

驗證

Uh oh!

chatgpt-codex-connector bot commented Mar 25, 2026

Uh oh!

greptile-apps bot commented Mar 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 1/5

Important Files Changed

Flowchart

Comments Outside Diff (4)

Uh oh!

greptile-apps bot Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

greptile-apps bot commented Mar 25, 2026 •

edited

Loading