Skip to content
This repository was archived by the owner on Apr 9, 2026. It is now read-only.

feat: multi-platform compound learning support#1

Open
wooo-jin wants to merge 6 commits intomainfrom
feat/multi-platform
Open

feat: multi-platform compound learning support#1
wooo-jin wants to merge 6 commits intomainfrom
feat/multi-platform

Conversation

@wooo-jin
Copy link
Copy Markdown
Owner

Summary

  • Codex, Gemini, OpenCode, Copilot 4개 플랫폼에서 compound learning 지원
  • 각 플랫폼별 출력 파서 및 evidence counter 구현
  • ESM 호환성 수정, shell injection 방어, unhandled promise rejection 처리

Commits (6)

  • 0666604 feat: multi-platform support (Codex, Gemini, OpenCode, Copilot)
  • dbe2e08 feat: full compound learning on all 4 platforms (Claude parity)
  • 70cf1bc docs: honest multi-platform documentation
  • 28170c2 fix(critical): require() → await import() for ESM evidence counters
  • cf327a6 fix(critical): async/await for evidence counters + ESM require fix
  • f697181 fix: unhandled promise rejection + shell injection sanitization

Test plan

  • 각 플랫폼별 compound extraction 동작 확인
  • ESM 환경에서 evidence counter 정상 로드 확인
  • shell injection 입력에 대한 sanitization 검증

🤖 Generated with Claude Code

wooo-jin and others added 6 commits March 24, 2026 16:58
Platform adapter architecture + 4 platform implementations:

- Codex CLI: AGENTS.md sync + ~/.codex/hooks.json (SessionStart, UserPromptSubmit)
- Gemini CLI: GEMINI.md + .gemini/settings.json hooks (BeforeTool, AfterTool, PreCompress, SessionStart) + commands/compound.toml
- OpenCode: OPENCODE.md + opencode.json instructions + .opencode/plugins/tenetx.ts scaffold (25+ hook events, npm-native)
- Copilot CLI: .github/copilot-instructions.md + .github/hooks/tenetx.json (preToolUse deny/allow) + .github/agents/compound.agent.md + CLAUDE.md native compatibility

Usage:
  tenetx init codex|gemini|opencode|copilot
  tenetx sync codex|gemini|opencode|copilot

Core adapter (platform/adapter.ts):
  - generateSolutionInstructions(): verified+ solutions → markdown
  - syncToInstructionFile(): platform-specific instruction file with markers

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Rewrote all platform adapters from stubs to full implementations:

Gemini CLI (highest parity):
  6 hook scripts: BeforeTool (danger+reflection), AfterTool (negative+write+hints),
  BeforeAgent (solution injection+prompt learning), SessionStart (extraction+lifecycle),
  PreCompress (compound hint), SessionEnd (workflow completion)

Copilot CLI:
  4 hook scripts: preToolUse (danger+reflection with deny/allow), postToolUse
  (negative+write tracking), userPromptSubmitted (prompt learning),
  sessionStart (extraction+lifecycle). Agent: compound.agent.md

OpenCode:
  Full TS plugin with real logic: tool.execute.before (danger+reflection),
  tool.execute.after (negative+write), session.idle (sync),
  session.compacting (context injection). updateEvidence() included.

Codex CLI (3-hook maximum):
  SessionStart (extraction+lifecycle+sync), UserPromptSubmit (prompt learning),
  Stop (workflow completion+resync). Limited by platform.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Mark non-Claude platforms as experimental with clear feature parity table.
Note that current implementation uses generated shell scripts calling tenetx engine,
and native platform packages are planned for v2.2.0.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Evidence counters (injected, reflected, negative, sessions) were silently
failing in ESM runtime because require() is not defined in "type": "module".
All try/catch blocks caught the error silently, so tests passed but
evidence never incremented — making the entire lifecycle system inert.

Fixed: pre-tool-use.ts updateSolutionEvidence() and post-tool-use.ts
updateNegativeEvidence() now use async/await import().

Also fixed: CHANGELOG footer links for v2.0.0, v1.7.0, v1.6.3.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- checkCompoundReflection → async, await updateSolutionEvidence calls
- Call site uses .catch() for non-blocking promise handling
- solution-injector: await on updateSolutionEvidence in injection loop
- search.ts: require('node:os') → import * as os from 'node:os'

Without these fixes, evidence counters (reflected, sessions, injected)
would fire-and-forget their promises, never actually writing to files.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- post-tool-use: updateNegativeEvidence() now has .catch() to prevent
  unhandled promise rejection in Node.js
- pack/search: packName sanitized before shell interpolation in git commit
  (removes non-alphanumeric chars to prevent command injection)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
wooo-jin added a commit that referenced this pull request Apr 9, 2026
The Round 3 v2 fixture exposed 4 hard-positive ranking failures where
hyphenated solution tags (api-key, code-review, red-green-refactor) only
intersected query tokens that contained literal hyphens. extractTags
strips hyphens during query extraction (api-key → [api, key]), so
compound tags only matched via the half-weight partialMatches fallback,
losing to competitors with single-word direct hits.

Fix: two pure helpers in solution-format.ts that recover compound forms
from BOTH sides of the matching pipeline:

  expandCompoundTags(tags)   — solution-side. 'api-key' → [api-key, api, key]
  expandQueryBigrams(tags)   — query-side. ['api','keys'] → +[api-key,
                                apikey, api-keys, apikeys]

Both helpers are layered into calculateRelevance via a new optional
solutionTagsExpanded field on CalculateRelevanceOptions. The expanded
sets are used for intersection/partialMatches, but the Jaccard union
denominator still uses the RAW solution tags so score normalization
stays semantically stable (no asymmetric inflation).

Empirical impact (fixture v2, 53+16+14 queries):
| metric                | baseline | R4-T1 | delta  |
|-----------------------|----------|-------|--------|
| recallAt5             | 1.000    | 1.000 | =      |
| positive mrrAt5       | 0.959    | 0.981 | +0.022 |
| paraphrase mrrAt5     | 1.000    | 1.000 | =      |
| aggregate mrrAt5      | 0.969    | 0.986 | +0.017 |
| negativeAnyResultRate | 0.357    | 0.357 | =      |

2 of 4 v2 hard positives flipped to @1:
  - "managing api keys and credentials safely" → starter-secret-management
  - "red green refactor cycle for new features" → starter-tdd-red-green-refactor

The other 2 (avoiding hardcoded credentials, writing unit tests for a
function with side effects) are query-side English semantics — R4-T2
or R4-T3 territory, not R4-T1's scope.

negativeAnyResultRate unchanged: R4-T1 is a ranking-quality fix, not
an FP filter. R4-T2/R4-T3 will attack that surface.

Hook path (rankCandidates) and MCP path (searchSolutions) are updated
identically — both wire expandQueryBigrams BEFORE normalizeTerms and
expandCompoundTags per solution. ROUND3_BASELINE updated to v3 with
inline history block + per-case mechanism notes.

Test additions:
- 15 unit tests for the two helpers (Korean preservation, dedup,
  Unicode safety, plural stem edge cases, ASCII filter)
- 1 integration regression guard asserting the 2 specific R4-T1 hard
  positives reach @1 (not just aggregate mrrAt5 — silent flip-back
  protection)
- New evaluateQuery() export for per-query test assertions

Reviewed by code-reviewer (2 MED + 2 LOW initially → MED #2 fixed in
this PR, MED #1 deferred to R4-T2 per reviewer guidance, LOWs skip)
and security-reviewer (0 findings, APPROVE).

All 1274 tests pass.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant