Add online trade-outcome ML model and integrate into AdaptiveSelector (AI decisioning) by EJMM17 · Pull Request #4 · EJMM17/Bot

EJMM17 · 2026-02-16T04:57:14Z

Motivation

Provide an explicit supervised ML layer that estimates per-(regime, strategy) win probability so decisions are not driven by RL memory alone.
Improve decision quality and explainability by blending a learned win-prob into signal confidence and AI reasoning traces.
Keep the system robust and persistent by sanitizing inputs, supporting export/import, and failing gracefully when model updates error.

Description

Added a new online logistic-style model OnlineTradeOutcomeModel in darwin_agent/ml/outcome_model.py with robust feature sanitization, weighted online updates, export() and from_dict() persistence.
Integrated per-(regime,strategy) outcome models into AdaptiveSelector (darwin_agent/ml/selector.py) so decide() queries ml_win_prob, blends it into signal.confidence, includes it in the AI reason text, and stores model metadata in the pending trade.
On trade resolution report_result() now updates the corresponding outcome model with a sign label and magnitude-weighted sample weight; the selector also persists/restores outcome models via export_for_dna() / import_from_dna().
Small related improvements to the brain and README: darwin_agent/ml/brain.py gained explainability and lightweight pattern-memory hooks used in action explanations, and README.md was updated to mention the new online trade-outcome ML capability.

Testing

Compiled the package with python -m compileall darwin_agent which completed successfully.
Ran a selector smoke flow using synthetic candles that exercised decide(), report_result(), export_for_dna() and import_from_dna() and observed no runtime errors (decision printed, models loaded).
Performed a direct outcome model unit-like smoke test that called predict_proba(), ran multiple update() calls, and verified probability moved and export()/from_dict() preserved state.
All automated checks described above passed locally (no exceptions raised during the smoke flows and compile step succeeded).

Codex Task

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: fc10cafff9

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-02-16T05:02:32Z

darwin_agent/ml/selector.py

+                model = self._get_outcome_model(regime, strat)
+                label = 1 if pnl_pct > 0 else 0
+                weight = min(2.0, max(0.25, abs(pnl_pct) + 0.25))
+                model.update(pending_state, label, sample_weight=weight)


Gate outcome-model training on executed trades only

This path updates OnlineTradeOutcomeModel for any non-unknown strategy, but report_result(0, 0) is also called when RiskManager rejects a signal and no order is placed (darwin_agent/core/agent_v2.py lines 341 and 391). In that common rejection flow, the code still assigns label = 0 and trains the model as if a real losing trade occurred, so operational constraints (max positions, daily limits, etc.) get baked into ml_win_prob as false negatives and can systematically skew future confidence blending.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-02-16T05:02:33Z

darwin_agent/ml/brain.py

                                      next_state=next_state, done=done,
                                      metadata={"regime": regime}))
        self._update(state, action, reward, next_state, done, regime)
+        self._remember_pattern(state, action, reward, regime)


Skip pattern-memory updates for non-filled decisions

The new retrieval memory is updated on every learn() call, including report_result(0, 0) invocations used for rejected (unfilled) trades. Because reward shaping adds a positive baseline (calculate_reward), these rejected decisions can be recorded as wins in _remember_pattern, which then feeds back into _pattern_bias and reinforces actions that were never actually executed, corrupting the memory signal used during action selection.

Useful? React with 👍 / 👎.

Add online trading ML outcome model and integrate into selector

fc10caf

EJMM17 added the codex label Feb 16, 2026 — with ChatGPT Codex Connector

chatgpt-codex-connector bot reviewed Feb 16, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add online trade-outcome ML model and integrate into AdaptiveSelector (AI decisioning)#4

Add online trade-outcome ML model and integrate into AdaptiveSelector (AI decisioning)#4
EJMM17 wants to merge 1 commit intomainfrom
codex/improve-and-fix-bugs-in-autolearning-llm

EJMM17 commented Feb 16, 2026

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

chatgpt-codex-connector bot Feb 16, 2026

Uh oh!

chatgpt-codex-connector bot Feb 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

EJMM17 commented Feb 16, 2026

Motivation

Description

Testing

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Feb 16, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector bot Feb 16, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant