Skip to content

Add online trade-outcome ML model and integrate into AdaptiveSelector (AI decisioning)#4

Open
EJMM17 wants to merge 1 commit intomainfrom
codex/improve-and-fix-bugs-in-autolearning-llm
Open

Add online trade-outcome ML model and integrate into AdaptiveSelector (AI decisioning)#4
EJMM17 wants to merge 1 commit intomainfrom
codex/improve-and-fix-bugs-in-autolearning-llm

Conversation

@EJMM17
Copy link
Owner

@EJMM17 EJMM17 commented Feb 16, 2026

Motivation

  • Provide an explicit supervised ML layer that estimates per-(regime, strategy) win probability so decisions are not driven by RL memory alone.
  • Improve decision quality and explainability by blending a learned win-prob into signal confidence and AI reasoning traces.
  • Keep the system robust and persistent by sanitizing inputs, supporting export/import, and failing gracefully when model updates error.

Description

  • Added a new online logistic-style model OnlineTradeOutcomeModel in darwin_agent/ml/outcome_model.py with robust feature sanitization, weighted online updates, export() and from_dict() persistence.
  • Integrated per-(regime,strategy) outcome models into AdaptiveSelector (darwin_agent/ml/selector.py) so decide() queries ml_win_prob, blends it into signal.confidence, includes it in the AI reason text, and stores model metadata in the pending trade.
  • On trade resolution report_result() now updates the corresponding outcome model with a sign label and magnitude-weighted sample weight; the selector also persists/restores outcome models via export_for_dna() / import_from_dna().
  • Small related improvements to the brain and README: darwin_agent/ml/brain.py gained explainability and lightweight pattern-memory hooks used in action explanations, and README.md was updated to mention the new online trade-outcome ML capability.

Testing

  • Compiled the package with python -m compileall darwin_agent which completed successfully.
  • Ran a selector smoke flow using synthetic candles that exercised decide(), report_result(), export_for_dna() and import_from_dna() and observed no runtime errors (decision printed, models loaded).
  • Performed a direct outcome model unit-like smoke test that called predict_proba(), ran multiple update() calls, and verified probability moved and export()/from_dict() preserved state.
  • All automated checks described above passed locally (no exceptions raised during the smoke flows and compile step succeeded).

Codex Task

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: fc10cafff9

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +161 to +164
model = self._get_outcome_model(regime, strat)
label = 1 if pnl_pct > 0 else 0
weight = min(2.0, max(0.25, abs(pnl_pct) + 0.25))
model.update(pending_state, label, sample_weight=weight)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Gate outcome-model training on executed trades only

This path updates OnlineTradeOutcomeModel for any non-unknown strategy, but report_result(0, 0) is also called when RiskManager rejects a signal and no order is placed (darwin_agent/core/agent_v2.py lines 341 and 391). In that common rejection flow, the code still assigns label = 0 and trains the model as if a real losing trade occurred, so operational constraints (max positions, daily limits, etc.) get baked into ml_win_prob as false negatives and can systematically skew future confidence blending.

Useful? React with 👍 / 👎.

next_state=next_state, done=done,
metadata={"regime": regime}))
self._update(state, action, reward, next_state, done, regime)
self._remember_pattern(state, action, reward, regime)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Skip pattern-memory updates for non-filled decisions

The new retrieval memory is updated on every learn() call, including report_result(0, 0) invocations used for rejected (unfilled) trades. Because reward shaping adds a positive baseline (calculate_reward), these rejected decisions can be recorded as wins in _remember_pattern, which then feeds back into _pattern_bias and reinforces actions that were never actually executed, corrupting the memory signal used during action selection.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant