Add MKTD reader and offline backtest mode to ctrader#70
Add MKTD reader and offline backtest mode to ctrader#70
Conversation
…raction (#26) * Fix trade_execution_listener error handling for None events and robust float extraction - Guard against trade_event_from_dict returning None in _run_stdin and _run_alpaca - Extract raw_qty/raw_price before float() to avoid float(None) crash in Alpaca handler - Remove unused TradeEvent import Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Address PR review: paper credentials, position seeding, robust qty/price extraction - Use ALP_KEY_ID/ALP_SECRET_KEY for paper mode (was always using PROD keys) - Seed PositionLots from Alpaca positions on startup so closes after restart are correctly matched to opening lots - Extract raw_qty/raw_price before dict construction so None values don't get coerced to 0.0 silently - Guard process_event with None check from trade_event_from_dict Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ries Add --union flag to export_data_daily.py that uses min(starts)/max(ends) instead of intersection, letting the tradable mask handle symbols with shorter histories. Also adds graceful FileNotFoundError handling and per-symbol tradability diagnostics. New export_crypto15_daily.sh exports 15 crypto symbols (BTC, ETH, SOL, LTC, AVAX, DOGE, LINK, ADA, UNI, AAVE, ALGO, DOT, SHIB, XRP, MATIC) to train (1375 days) and val (184 days) MKTD binaries. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Export YELP,EBAY,TRIP,MTCH,KIND,ANGI,Z,EXPE,BKNG,NWSA daily data (train: up to 2025-06-01, val: 2025-06-01 onward) for daily RL training with shorts enabled. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Export AAPL,MSFT,NVDA,GOOG,AMZN,META,TSLA,PLTR,NET,NFLX,AMD,ADBE,CRM,PYPL,INTC to MKTD binary format for pufferlib_market daily RL training with 2x leverage and shorting experiments. Train split ends 2025-06-01, val starts there. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Shell script to run the full autoresearch RL experiment suite on stocks15 daily data with correct stock annualization (252 periods/year), 90-day episodes, and 10bps fee rate. Includes data existence check. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sweep script tests mixed23 data (15 stocks + 8 crypto) using periods_per_year=252 (stock-appropriate) instead of 365. Tests 1x and 2x leverage across multiple seeds, and compares against existing autoresearch_mixed23_daily checkpoints trained with ppy=365. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Scans preaugstrategies, best configs, chronos2 hourly params, LoRA sweep results, and finetuned model directories to produce a prioritized table of all 79 trading symbols showing MAE%, best preaug strategy, LoRA status, and retrain priority. Outputs both a terminal table and CSV. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sweeps 12 configs: trade_penalty {0.0, 0.05, 0.10, 0.15} x weight_decay
{0.01, 0.05, 0.10} with seeds 42/314 for the core wd=0.05 axis. Each
config trains for 300s then evaluates on market sim at 60d/90d/120d/180d
with 8bps fill and 0.001 fee. Uses monkey-patched early exit disable
and simulate_daily_policy for deterministic eval.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Trains 10 new seeds for trade_penalty=0.15 on crypto5 daily data, then evaluates all 15 seeds (5 original + 10 new) on 120d market sim. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Export AAPL,MSFT,NVDA,GOOG,AMZN,META,TSLA,PLTR,NET,NFLX,AMD,ADBE,CRM,PYPL,INTC to MKTD binary format for pufferlib_market daily RL training with 2x leverage and shorting experiments. Train split ends 2025-06-01, val starts there. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sweeps trade_penalty × leverage × seeds on shortable10 daily data with shorts enabled and 6.25% borrow APR. Includes long-only baselines for comparison and market sim evaluation. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sweeps leverage (1x, 2x) × trade_penalty (0.05-0.20) × seeds (42, 314) on stocks15 daily data with periods_per_year=252 and 6.25% margin interest. Includes market sim evaluation for top results. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Re-evaluates all crypto5 daily checkpoints with periods_per_year=365. Also trains+evals crypto5 at 1x/2x/3x leverage to confirm 1x is optimal. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
New flags: --wandb-project, --wandb-entity, --wandb-mode Logs loss, return, sortino, lr, steps/sec when enabled. Fully optional — no behavior change when flags not set. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Trains tp=0.15 s314 on crypto5 daily for 60min with eval every 5min. Tracks val return/sortino/maxDD curve to find optimal training budget. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
8 tests covering: determinism, fee calculation, leverage budget, short borrow cost, annualization, no-trade policy, early exit patch. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Created download_expanded_stocks.sh to download 52 liquid stocks across diverse sectors (tech, biotech, consumer, mining, international ADRs, etc.) - Fixed fetch_external_data.py to handle MultiIndex columns from newer yfinance - Removed dead identity-mapping rename dict from _standardize_df - Downloads convert yfinance format to trainingdata/train/ schema (timestamp,open,high,low,close,volume,trade_count,vwap,symbol) - All 52 symbols verified to load via export_data_daily.load_price_data() - Training data count: 637 -> 689 CSVs Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sweeps LoRA rank across 5 symbols (BTCUSD, ETHUSD, SOLUSD, AAPL, NVDA) to find optimal expressiveness/generalization tradeoff. Uses best preaug per symbol from preaugstrategies/best/hourly/ with fallback to baseline. Key design: loads base pipeline once per symbol and data once per symbol to avoid redundant GPU model loading and CSV parsing across rank configs. Outputs: per-run JSON in hyperparams/lora_rank_sweep/, summary CSV, and a printed comparison table. Smoke tested: SOLUSD r=8 -> val MAE% 3.30%, test MAE% 4.72% (223s). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Ensemble top3_logit_avg ties on Sortino (4.18) but trails on return (+69.6% vs +73.2%). Key finding: ensembling reduces max drawdown (16.7% vs 25.6%) but doesn't beat the best single model overall. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
… on drawdown tp0.15_s314 (single): Sortino 4.18, +73.2%, MaxDD 25.6% top3_logit_avg (ensemble): Sortino 4.18, +69.6%, MaxDD 16.7% Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- supervisor/ (deployment configs) - **/signal_cache/, **/gemini_cache_eval/, **/live_state.json (runtime state) - Remove tracked binance_worksteal/live_state.json from index - env_real.py and binance_spot_daily/ already covered Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Project migrated to chronos2; toto/kronos packages are no longer used. Deletes 177 files (45k lines): package dirs, root scripts, hyperparams, test files, docs, model wrappers, and eval artifacts. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Export 40-stock daily dataset (stocks15 + 25 new liquid stocks) for expanded RL training. Raise MKTD symbol limit from 32 to 64 across C env header, Python reader, and all export scripts. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Triton JIT kernel replacing Chronos2LayerNorm (T5-style RMS norm: no mean subtraction, no bias, FP32 variance). Includes autograd backward for training compatibility, nn.Module drop-in (TritonRMSNorm), and 17 tests covering FP32/BF16, multiple shapes, dtype/shape preservation, zero input, and gradient flow. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Implements a drop-in replacement for Chronos2 TimeSelfAttention that dispatches to Triton kernels (RoPE, unscaled attention, RMS LayerNorm) on CUDA and falls back to pure PyTorch on CPU. Shared fallback ops are extracted to _fallbacks.py to avoid duplication across modules. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Shares data loading, caches results by mtime, supports ProcessPoolExecutor for parallel checkpoint evaluation. Cleaner than comprehensive_marketsim_eval. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Fuses the rearrange + sinh + unscale operations after the output ResidualBlock into a single Triton kernel pass, avoiding intermediate memory reads/writes. Includes FusedOutputHead module with PyTorch fallback and 20 tests verifying MAE equivalence < 1e-5. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…nits D+F) - src/runpod_client.py: add RTX 6000 Ada Generation alias (~$0.79/hr, CC 8.9, 48GB) - pufferlib_market/gpu_pool_rl.py: A40 ($0.69/hr) + RTX6000 Ada rates and aliases; update DEFAULT_POOL_LIMITS to prefer A40×2 for cost-efficient training - pufferlib_market/kernels/fused_mlp.py: add CC 8.9 (Ada Lovelace) + CC 8.6 (A40/Ampere) autotune configs; comment cost comparisons - pufferlib_market/autoresearch_rl.py: add --a40-mode flag (128 envs, bf16, cuda-graph, 250s budget); add A40 experiment preset configs in STOCK_EXPERIMENTS - pufferlib_market/kernels/fused_obs_encode.py: new Triton kernel fusing (obs-mean)/std + linear + ReLU into one pass (eliminates 2 intermediate allocations); CC-aware autotune for H100/CC9, A100+A40/CC8, V100/CC7; PyTorch fallback - pufferlib_market/bench_fused_mlp.py: benchmark script for fused MLP kernel Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Eliminate per-update GPU allocations in the PPO training loop and per-step host-device allocations in evaluate_fast.py inference loop. - Pre-allocate _advantages (CPU), _last_gae (CPU), _b_obs/_b_act/ _b_logprob/_b_returns/_b_values/_b_advantages (GPU), and _clip_eps_t/_vf_coef_t/_ent_coef_t (GPU scalar) before training loop; reuse via .zero_()/.copy_()/.fill_() each update (~22% SPS improvement) - Pre-allocate static _obs_cuda GPU tensor in evaluate_fast.py; replace per-step torch.from_numpy(...).to(device) with _obs_cuda.copy_() - Add --profile-steps N flag: wraps first N PPO updates in torch.profiler, exports Chrome trace to profile_output/trace_update_NNNN.json, prints top-20 CUDA memory allocations table Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
At each decision point, simulate K parallel future trajectories per
candidate action and pick the action with the highest mean discounted
return. Uses the fast C env (n_actions*K envs in one vec_init call).
New: pufferlib_market/inference_tts.py
- best_action_tts(): core TTS search, K=1 fast-paths to greedy argmax
- get_signal_tts(): PPOTrader wrapper returning (TradingSignal, stats)
- benchmark_tts(): measures ms/decision for a list of K values
- CLI: --tts-k, --horizon, --dry-run, --benchmark modes
Modified: pufferlib_market/inference.py
- PPOTrader.get_signal() gains tts_k parameter; tts_k>1 delegates to
get_signal_tts() for TTS search
- PPOTrader._decode_action() extracted from get_signal() to avoid
duplicating action-decode logic in inference_tts.py
New: tests/test_inference_tts.py — 9 unit tests covering K=1 fast-path,
K>1 rollout search, stats keys, TradingSignal decoding, greedy parity,
ValueError guard, short-data early termination, deterministic_after_first.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds inference_server_daily.py: provisions a RunPod A40 pod, rsyncs code+checkpoint+data, runs inference with K-rollout TTS, pulls decisions JSON, then always terminates the pod in a finally block. ~$0.17 per daily decision. Supports --dry-run without RUNPOD_API_KEY. Also adds --inference-only mode to bootstrap_runpod_rl.py (activated via INFERENCE_ONLY=1 env var): skips R2 data download, wandb, and exit handler so the pod can be used purely for inference. 28 unit tests cover parse_args, dry-run, GPU fallback, manifest writing, and bootstrap inference-only mode. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
C trading env and all eval scripts now read features_per_sym from the MKTD binary header field (byte 16). Files with features_per_sym=0 (v1/v2) fall back to FEATURES_PER_SYM=16 compile-time constant. - trading_env.h: add features_per_sym to MarketData struct; update obs layout docs - trading_env.c: read features_per_sym on load; stride computations use runtime value - binding.c: propagate features_per_sym through binding init - evaluate.py / evaluate_fast.py / evaluate_multiperiod.py / evaluate_tail.py: read features_per_sym from header; obs_size = S*F + 5 + S - train.py: read features_per_sym; pass to TransformerTradingPolicy - hourly_replay.py: minor cleanup Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
combo_best_daily: val_ret=+1.81, sortino=3.76 (BEST return) robust_champion: val_ret=+1.80, sortino=4.67 (BEST risk-adjusted) combo_best_hourly: val_ret=+0.38, sortino=2.57 h2048 variants all overfit (negative val return) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Tests checkpoints across multiple symbols, decision lags, and time windows with fill_buffer, max_hold, and per-symbol discrete fills. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
… text) Keep origin/main's improved --h100-mode help text while preserving the --a40-mode flag added in the A40 GPU config commit. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
evaluate_sliding.py, evaluate_ttt.py, inference_tts.py were still computing obs_size = S*16 + 5 + S (hardcoded 16). Update to read features_per_sym from the MKTD header (v3 files have 20 features). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Soft sigmoid fills (temp=5e-4) leak gradient info across the decision_lag barrier causing Sort=180 train / Sort=-5 on realistic marketsimulator. - validation_use_binary_fills default True - fill_temperature default 0.01 (20x increase reduces gradient leakage) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Documents all active trading bots, their configs, marketsim results, evaluation commands, and the deployment blocker for pufferlib models. Updates CLAUDE.md with production workflow and marketsim realism rules. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ates Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
stocks12 slip_focus (7 trials, 90s each): - Best: slip_15bps robust=-15.9, 5% neg rate - slip_10bps robust=-35.1 (training non-deterministic, original +21 still best) - stock_robust_champion failed with 0 steps (GPU compile error) stocks20 ent_focus (9 trials, 90s each): - Best: stock_robust_champion robust=-1.61, 0% neg rate - ent_05 robust=-20.24 (variance vs original +19.19) Key finding: RL training variance is high across re-runs (memory note confirmed). The verified checkpoints from the deep sweep remain the deployment candidates: - stocks12 slip_10bps best.pt: +21.04 robust, 0% neg rate - stocks20 ent_05 best.pt: +19.19 robust, 0% neg rate Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Based on local A100 scale-up results (1200s/trial, stocks12): - combo_best_daily: val_return=+1.81, Sortino=3.76 (best return) - robust_champion: val_return=+1.80, Sortino=4.67 (best risk-adjusted) Both configs use: h1024 + obs_norm + cosine LR + anneal_lr + trade_pen=0.05 + slip_5bps. h2048 completely overfits (negative val_return) even with 1200s training. Add these combos plus variations (slip_10bps, tp_03, wd_01 variants) to H100_STOCK_EXPERIMENTS so they run first on the H100. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Update scaling_sweep_stocks{12,15,20}_deep.csv with full 35-trial results.
Previous commit only had 16-18 trials (partial run).
Final results (90s/trial, holdout_robust_score):
stocks12 (35 trials):
1. stock_slip_10bps: +21.04, 0% neg rate
2. stock_wd_05: -0.21, 0% neg rate
3. stock_reward_scale_20: -1.69, 0% neg rate
stocks15 (34 trials):
1. random_mut_2124: -20.32, 20% neg rate (no positive configs)
2. stock_h512_reg: -23.24, 10% neg rate
stocks20 (34 trials):
1. stock_ent_05: +19.19, 0% neg rate
2. stock_ent_08: -24.79, 25% neg rate
stocks12 with slip_10bps and stocks20 with ent_05 are the two deployment candidates.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
argparse returns str but TrainingConfig expects Path, causing TypeError on checkpoint_dir = checkpoint_root / run_name. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Auto-detect obs_size, num_actions, hidden_size, obs_norm, and symbol count from checkpoint state_dict instead of hardcoding 4-symbol layout. Fixes size mismatch crash when loading mixed23 ent_anneal checkpoint (obs_dim=396, action_dim=47 vs old hardcoded 73/9). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…-spot RL model running as primary signal, Gemini as fallback. Marketsim: +28.3% return, Sort=8.04, 58% WR at 5bps slippage. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- market_sim.c: ported from csim/market_sim.c with multi-symbol portfolio extension (per-symbol fees, margin interest, max hold, max positions) - binance_rest.c: Binance REST API stubs (correct interfaces for libcurl impl) - policy_infer.c: libtorch inference stubs (correct interfaces) - trade_loop.c: hourly trading cycle with bar fetch, policy inference, order execution - main.c: CLI entry point with full config parsing - 49 tests passing, valgrind-clean (0 leaks, 0 errors) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
When --rl-checkpoint is set and Gemini fails (403, rate limit, etc), the bot now converts RL signals directly into allocation plans instead of skipping execution entirely. Allocation sizing uses softmax probs from RL logits clamped to 15-50%. When RL signal is FLAT, correctly does nothing. Hybrid mode unchanged when Gemini is working. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…s slippage A40 sweep champion from 50 trials ($0.53 total cost). Config: obs_norm, wd=0.05, slip=8bps, tp=0.005, ent_anneal. RL-only fallback when Gemini unavailable. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Mamba2 kernels require CUDA; this adds a lazy-init pure-torch SSM fallback for CPU inference/testing. Also adds 15 tests covering forward, backward, decode_actions, softcap, and device dispatch. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Parses pufferlib MKTD v2 binary data files (header, symbol names, features, OHLCV prices, tradable mask) and runs a buy-and-hold baseline backtest reporting sortino, max drawdown, total return, and per-symbol breakdown. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
Codex Infinity Start a task on this PR's branch by commenting:
Tasks and logs: https://codex-infinity.com |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 0f8e82d65f
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| int n_eq = T + 1; | ||
| double sortino = compute_sortino(equity_curve, n_eq); |
There was a problem hiding this comment.
Use the dataset's bar frequency when reporting Sortino
When --backtest is run on daily MKTD files (for example the mixed40_daily_val.bin dataset called out in this commit), this path still calls compute_sortino, which annualizes with ANNUALIZE_FACTOR 8760 in ctrader/market_sim.h:7. That constant is correct for hourly bars but overstates daily Sortino by about sqrt(24), so the new backtest mode reports materially wrong risk metrics and makes daily/hourly runs incomparable.
Useful? React with 👍 / 👎.
| if (hdr->version >= 2 && ptr + mask_bytes <= end) { | ||
| data->tradable = (unsigned char *)ptr; | ||
| ptr += mask_bytes; | ||
| } else if (ptr + mask_bytes <= end) { | ||
| data->tradable = (unsigned char *)ptr; |
There was a problem hiding this comment.
Reject v2 MKTD files that are missing the tradable mask
This loader treats the tradable mask as optional even when version >= 2, but the existing MKTD readers in pufferlib_market/src/trading_env.c:82-89 and pufferlib_market/hourly_replay.py:119-131 reject v2 files without that section. If a v2 export is truncated right after the price array, ctrader --backtest will now silently accept the incomplete file and compute results from corrupted input instead of surfacing the data error.
Useful? React with 👍 / 👎.
- C++ policy_infer.cpp with TorchScript model loading via libtorch - Conditional build: uses policy_infer.cpp when TORCH_DIR set, fallback to policy_infer.c (C stub) otherwise - download_libtorch Make target for CPU libtorch - Combined with PR #70: vendor/cJSON.c + mktd_reader.c in C_SRCS Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- C++ policy_infer.cpp with TorchScript model loading via libtorch - Conditional build: uses policy_infer.cpp when TORCH_DIR set, fallback to policy_infer.c (C stub) otherwise - download_libtorch Make target for CPU libtorch - Combined with PR #70: vendor/cJSON.c + mktd_reader.c in C_SRCS Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Summary
mktd_reader.{h,c}to parse pufferlib MKTD v2 binary data files (header, symbol names, features, OHLCV prices, tradable mask)--backtestand--data-fileCLI flags to ctrader for offline backtesting on MKTD dataTest plan
./ctrader --backtest --data-file ../pufferlib_market/data/mixed40_daily_val.binloads 40 symbols, 92 timesteps, prints results-Wall -Wextra -WpedanticGenerated with Claude Code