Open
Conversation
Contributor
There was a problem hiding this comment.
Pull request overview
Adds first-party Google Workspace OAuth credential management to HybridClaw’s CLI/runtime secrets, and expands the eval/memory subsystem with a native LOCOMO benchmark harness plus new internal memory documentation.
Changes:
- Introduce
src/auth/google-workspace-auth.tsimplementing a PKCE OAuth flow with token persistence in the encrypted runtime secret store, and wire it intohybridclaw auth login/status/logout google-workspace. - Add a native LOCOMO eval suite (
/eval locomo ...) including dataset setup/download, QA + retrieval modes, scoring, managed-run orchestration, and comprehensive tests. - Refactor/extend semantic memory writing (new
MemoryService.storeSemanticMemory()entry point) and add internal docs for memory layering/limits.
Reviewed changes
Copilot reviewed 28 out of 30 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/tui-slash-menu.test.ts | Updates TUI slash menu expectations to include /eval locomo. |
| tests/memory-service.test.ts | Adds unit coverage asserting default embeddings are plain number[]. |
| tests/locomo-native.test.ts | Adds extensive LOCOMO native harness tests (setup, run, retrieval, scoring, concurrency). |
| tests/google-workspace-auth.test.ts | Adds unit coverage for Google Workspace OAuth secret storage, PKCE, exchange, refresh, and state validation. |
| tests/gateway-service.eval-command.test.ts | Updates gateway eval help expectations to include LOCOMO managed commands. |
| tests/eval-command.test.ts | Adds managed LOCOMO eval command tests and helper fixtures. |
| tests/command-registry.test.ts | Ensures /eval locomo is registered and canonically mapped. |
| tests/cli.test.ts | Adds CLI routing tests for auth login/status/logout google-workspace. |
| src/memory/memory-service.ts | Changes hashed embedding provider to return a plain array and adds storeSemanticMemory() wrapper. |
| src/evals/locomo-types.ts | Introduces shared LOCOMO constants/types (marker filename, dataset filename, aggregates). |
| src/evals/locomo-official-scoring.ts | Adds an official LoCoMo scoring port (stemming-based F1 + category handling). |
| src/evals/locomo-native.ts | Implements native LOCOMO CLI harness (setup/download, QA/retrieval modes, summaries/progress). |
| src/evals/eval-command.ts | Wires LOCOMO into managed eval suite framework (setup/run/status/results rendering + internal launcher). |
| src/command-registry.ts | Adds slash command catalog entries for LOCOMO eval commands. |
| src/cli/help.ts | Updates help text for eval and auth to include LOCOMO and Google Workspace OAuth usage. |
| src/cli/auth-command.ts | Adds google-workspace as a first-class auth provider with login/status/logout handling. |
| src/cli.ts | Adds internal entrypoint __eval-locomo-native to run the LOCOMO harness. |
| src/auth/google-workspace-auth.ts | New Google Workspace OAuth module using runtime secrets for client/pending/token storage and refresh. |
| skills/google-workspace/SKILL.md | Updates skill guidance to prefer built-in Google Workspace OAuth flow for API access. |
| package.json | Adds stemmer dependency for LOCOMO official scoring. |
| package-lock.json | Locks stemmer and includes updated lockfile changes. |
| docs/static/docs.js | Adds “Memory” to internals docs navigation. |
| docs/development/reference/configuration.md | Documents Google Workspace OAuth state living in ~/.hybridclaw/credentials.json. |
| docs/development/reference/commands.md | Documents /eval locomo ... and auth ... google-workspace usage. |
| docs/development/internals/session-routing.md | Shifts sidebar position to accommodate new Memory internals doc. |
| docs/development/internals/README.md | Links to the new Memory internals documentation. |
| docs/development/internals/memory.md | New internal documentation describing memory layers, prompt injection, and default limits. |
| docs/development/getting-started/authentication.md | Adds Google Workspace OAuth flow to onboarding/auth docs. |
| docs/development/agents.md | Adds Memory internals link to docs landing page. |
| .gitignore | Ignores *.tmp* temporary files. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Comment on lines
+320
to
+346
| const datasetPath = getDatasetPath(options.installDir); | ||
| if (!fs.existsSync(datasetPath)) { | ||
| console.log(`Downloading dataset from ${LOCOMO_DATASET_URL}`); | ||
| const response = await fetchWithTimeout( | ||
| LOCOMO_DATASET_URL, | ||
| undefined, | ||
| LOCOMO_DATASET_DOWNLOAD_TIMEOUT_MS, | ||
| 'LOCOMO dataset download', | ||
| ); | ||
| if (!response.ok) { | ||
| throw new Error( | ||
| `Failed to download LOCOMO dataset: HTTP ${response.status}`, | ||
| ); | ||
| } | ||
| const rawBuffer = Buffer.from(await response.arrayBuffer()); | ||
| verifyDownloadedDataset(rawBuffer); | ||
| const raw = rawBuffer.toString('utf-8'); | ||
| if (!raw.trim().startsWith('[')) { | ||
| throw new Error('Downloaded LOCOMO dataset is not valid JSON.'); | ||
| } | ||
| fs.writeFileSync(datasetPath, rawBuffer); | ||
| } else { | ||
| console.log(`Dataset already present at ${datasetPath}`); | ||
| } | ||
|
|
||
| const sampleCount = loadSamples(datasetPath).length; | ||
| fs.writeFileSync(getMarkerPath(options.installDir), 'ok\n', 'utf-8'); |
Comment on lines
+548
to
+560
| export async function exchangeGoogleWorkspaceAuthCode( | ||
| codeOrUrl: string, | ||
| ): Promise<ExchangeGoogleWorkspaceAuthCodeResult> { | ||
| const clientSecret = requireStoredClientSecret(); | ||
| const pending = requireStoredPendingAuth(); | ||
| const existingToken = readStoredToken(); | ||
| const { code, state } = extractCodeAndState(codeOrUrl); | ||
| if (state && state !== pending.state) { | ||
| throw new GoogleWorkspaceAuthError( | ||
| 'google_workspace_state_mismatch', | ||
| 'Google Workspace authorization response state mismatch. Run `hybridclaw auth login google-workspace --auth-url` again.', | ||
| ); | ||
| } |
Comment on lines
+667
to
+691
| storeSemanticMemory(params: { | ||
| sessionId: string; | ||
| role: string; | ||
| source?: string | null; | ||
| scope?: string | null; | ||
| metadata?: Record<string, unknown> | string | null; | ||
| content: string; | ||
| confidence?: number; | ||
| embedding?: number[] | null; | ||
| sourceMessageId?: number | null; | ||
| }): number { | ||
| const content = params.content.trim(); | ||
| if (!content) { | ||
| throw new Error('Cannot store empty semantic memory content.'); | ||
| } | ||
|
|
||
| return this.backend.storeSemanticMemory({ | ||
| ...params, | ||
| content, | ||
| embedding: | ||
| params.embedding === undefined | ||
| ? this.embeddingProvider.embed(content) | ||
| : params.embedding, | ||
| }); | ||
| } |
Comment on lines
+268
to
+291
| { | ||
| id: 'locomo', | ||
| title: 'LOCOMO', | ||
| summary: | ||
| 'Native HybridClaw LoCoMo QA benchmark over the official long-conversation dataset.', | ||
| aliases: ['lo-co-mo', 'locomo-memory'], | ||
| prereqs: [ | ||
| 'Node.js 22', | ||
| 'network access during `setup` to download `locomo10.json`', | ||
| ], | ||
| starter: [ | ||
| '/eval locomo setup', | ||
| '/eval locomo run --budget 4000 --max-questions 20', | ||
| '/eval locomo run --mode retrieval --budget 4000 --max-questions 20', | ||
| ], | ||
| notes: [ | ||
| 'The default `qa` mode generates LoCoMo answers through HybridClaw’s local OpenAI-compatible gateway and scores the model outputs directly.', | ||
| '`--mode retrieval` skips model generation, ingests each conversation into an isolated native memory session, and scores evidence hit-rate from recalled semantic memories.', | ||
| 'The `qa` prompt shape follows the upstream `evaluate_gpts` flow: truncated conversation context plus a short-answer QA prompt for each LoCoMo question.', | ||
| '`--num-samples` limits conversation records. Use `--max-questions` for quick smoke runs over a small number of LoCoMo questions.', | ||
| 'By default, LOCOMO creates one fresh template-seeded agent per conversation sample. Use `--current-agent` to reuse the current agent workspace.', | ||
| 'Prompt/profile eval flags flow through `HYBRIDCLAW_EVAL_MODEL`, so agent/workspace mode and prompt ablations affect the benchmarked run.', | ||
| ], | ||
| }, |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What changed
hybridclaw auth login/status/logout google-workspaceinto the CLI, including a Hermes-style stepwise PKCE flowWhy
HybridClaw already had browser-login reuse for Google properties, but it did not have a built-in OAuth flow for API-style Google Workspace access. The missing piece was reusable auth/session plumbing that fits the existing secret-store model.
Impact
Notes
Validation