Skip to content

feat(copilot): MCTS dynamic strategy engine with reward model and rollout simulation#13

Open
ldemon2333 wants to merge 1 commit intoAnnaSuSu:mainfrom
ldemon2333:main
Open

feat(copilot): MCTS dynamic strategy engine with reward model and rollout simulation#13
ldemon2333 wants to merge 1 commit intoAnnaSuSu:mainfrom
ldemon2333:main

Conversation

@ldemon2333
Copy link
Copy Markdown

@ldemon2333 ldemon2333 commented Apr 7, 2026

Summary

Add an MCTS (Monte Carlo Tree Search) dynamic strategy engine to the interview copilot, enabling real-time strategy optimization during mock interviews.

Changes

New Modules (backend/copilot/)

  • mcts_config.pyMCTSConfig, MCTSNode, StrategyRecommendation data structures
  • reward_model.pyRewardModel with cosine similarity scoring: R(S) = W1·Match_JD + W2·Safe - W3·Risk
  • simulation_engine.py — 3-level degradation rollout simulator (LLM → lightweight LLM → pure reward)
  • mcts_engine.py — Full MCTS 4-step engine (Select/Expand/Simulate/Backprop) with PUCT selection

Modified Files

  • config.py — 11 new mcts_* settings (feature-flagged, disabled by default)
  • llm_provider.pyget_mcts_rollout_llm() for simulation
  • main.py — Integration into copilot WebSocket session as async background task

Frontend

  • frontend/src/hooks/useCopilotStream.js — Add strategy_recommendation case to WebSocket message switch, ensuring MCTS search results are forwarded to the UI via onUpdate callback (without this the backend pushes the message but the frontend silently drops it)

Bug Fixes

  • ASR 启动逻辑修复: NLS SDK start() 返回 None 而非 truthy 值,改用 try/except
  • WebSocket 断开时 MCTS cleanup: finally 块中增加 mcts.stop() 调用,防止搜索 Task 写入已关闭的 WS
  • 候选人回答后不再触发多余 MCTS 搜索: 搜索仅在 HR 发言时触发
  • _try_merge_static 变量命名: _matched_node_id(实际使用不应为 throwaway)
  • 展开深度使用配置值: 新增 max_expansion_depth 替代硬编码 3
  • embedding 调用不阻塞 event loop: asyncio.to_thread() 包装同步 API

Docs and Tests

  • docs/mcts-strategy.md — User-facing feature documentation
  • docs/SUMMARY.md — Updated index
  • tests/test_mcts_engine.py — 39 unit tests covering all modules

Key Design Decisions

  • Feature-flagged: MCTS_ENABLED=false by default, zero impact when disabled
  • PUCT variant: AlphaGo-style selection with LLM confidence as prior, c_puct=1.4
  • Pure numpy: No heavy ML dependencies, less than 10ms per reward evaluation
  • Graceful degradation: Falls back to reward-only evaluation if LLM rollout fails

Copy link
Copy Markdown
Owner

@AnnaSuSu AnnaSuSu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

整体设计不错——博弈建模思路清晰,模块拆分干净,feature flag 零侵入,降级策略也考虑到了。以下几个问题需要先修一下:


Bug(必须修)

1. _get_weak_points 永远返回空列表

mcts_engine.py:506prep_state.get("profile", {})weak_points,但 prep_result 里没有 "profile" 这个 key。候选人画像不在 prep state 里。需要改成从 fit_report.get("gaps", []) 读取,或者在 _init_mcts_engine 时把 profile 传进 prep_state。

2. WebSocket 断开时 MCTS 引擎没有 cleanup

main.pyfinally 块只清理了 ASR,没调 mcts.stop()。断连后搜索 Task 会继续跑然后尝试 ws.send_json() 到已关闭的 WebSocket。需要加上:

finally:
    if session and session.get("asr"):
        session["asr"].shutdown()
    if session and session.get("mcts_engine"):
        await session["mcts_engine"].stop()
    _copilot_sessions.pop(session_id, None)

3. 候选人回答后触发 MCTS 搜索逻辑有问题

main.pyon_candidate_response 之后又 create_task(_run_mcts_and_push),但此时根节点仍然是上一轮 HR 的问题。候选人已经回答了,再在旧根上搜候选人策略没意义。建议:

  • 去掉候选人回答后的 MCTS 触发
  • 或者改成以候选人回答为新根,搜索预测 HR 下一步追问

需要你确认一下这里的设计意图。


建议改进

4. _try_merge_static_ 做变量名但实际在用

_, static_intent, score = self.navigator.match_utterance(...)
static_node = self.navigator.get_node(_)

_ 按惯例是 throwaway,这里实际当 node_id 用,建议改名。

5. 展开深度硬编码

_run_iterationleaf.depth < 3 是硬编码的,config 里有 rollout_depth 但没用上。建议用配置值或单独加个 max_expansion_depth

6. get_text_embedding() 同步调用阻塞 event loop

_expandon_hr_utterance 里直接调 embed.get_text_embedding(),如果用的是 API embedding 会阻塞。建议 asyncio.to_thread() 包一下。


修完 1-3 后再看一轮,其他的不阻塞合入。

@ldemon2333
Copy link
Copy Markdown
Author

ldemon2333 commented Apr 8, 2026 via email

@AnnaSuSu
Copy link
Copy Markdown
Owner

AnnaSuSu commented Apr 8, 2026

补一条,第 1 条我收回,是我 review 错了。

刚才重新 trace 了下 prep_result 的构造,copilot_prep.py:207 返回的 dict 里确实有 "profile"(line 211),来自 memory.get_profile(user_id),里面也是带 weak_points 的(memory.py 那边一直在维护这个字段)。你测试没复现是对的,_get_weak_points 能正常拿到数据,这块不用改。

我之前凭印象说"prep_result 里没有 profile key",没 trace 到源头,抱歉。

另外提醒一下,你 PR 里后端加了 strategy_recommendation 这个消息类型,但 frontend/src/hooks/useCopilotStream.js 的 switch 里没加对应的 case,前端会默默把这条消息丢掉,面板拿不到 MCTS 的搜索结果。合入前记得带上前端的改动,不然整条链路是不通的。

其他几条修完一起推上来,我再过一遍。

…lout simulation

- Add MCTSConfig, MCTSNode, StrategyRecommendation data structures
- Add RewardModel with cosine similarity scoring (R = W1·Match + W2·Safe - W3·Risk)
- Add SimulationEngine with 3-level degradation rollout
- Add MCTSEngine with PUCT selection, LLM expansion, backpropagation
- Integrate MCTS into copilot WebSocket session (feature-flagged, off by default)
- Add 11 mcts_* settings to config and rollout LLM provider
- Add user-facing docs and 39 unit tests
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants