Skip to content

[codex] Add Google Workspace OAuth auth flow#280

Open
furukama wants to merge 8 commits intomainfrom
codex/google-workspace-oauth
Open

[codex] Add Google Workspace OAuth auth flow#280
furukama wants to merge 8 commits intomainfrom
codex/google-workspace-oauth

Conversation

@furukama
Copy link
Copy Markdown
Contributor

What changed

  • add a first-party Google Workspace OAuth module that uses HybridClaw's encrypted runtime secret store instead of ad hoc token files
  • wire hybridclaw auth login/status/logout google-workspace into the CLI, including a Hermes-style stepwise PKCE flow
  • document the new auth flow and update the bundled Google Workspace skill guidance
  • add focused unit coverage for token storage, PKCE setup, auth-code exchange, state validation, and refresh behavior

Why

HybridClaw already had browser-login reuse for Google properties, but it did not have a built-in OAuth flow for API-style Google Workspace access. The missing piece was reusable auth/session plumbing that fits the existing secret-store model.

Impact

  • users can store a Google OAuth desktop client JSON, create an auth URL, exchange a pasted redirect URL or code, inspect status, and clear the stored session with first-party commands
  • Google Workspace OAuth state now lives in ~/.hybridclaw/credentials.json alongside other encrypted runtime secrets
  • the current built-in scope bundle covers Gmail, Calendar, Drive, Docs, Sheets, and Contacts

Notes

  • this PR adds auth/session plumbing only; it does not yet add first-party Google Calendar/Docs/Drive tool execution in the runtime
  • container-mode consumers still need a follow-up bridge if they are going to use these stored credentials directly

Validation

  • npm run lint
  • /Users/bkoehler/src/hybridclaw/node_modules/.bin/vitest run --configLoader runner --config vitest.unit.config.ts tests/google-workspace-auth.test.ts tests/cli.test.ts

@furukama furukama marked this pull request as ready for review April 11, 2026 11:53
Copilot AI review requested due to automatic review settings April 11, 2026 11:53
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds first-party Google Workspace OAuth credential management to HybridClaw’s CLI/runtime secrets, and expands the eval/memory subsystem with a native LOCOMO benchmark harness plus new internal memory documentation.

Changes:

  • Introduce src/auth/google-workspace-auth.ts implementing a PKCE OAuth flow with token persistence in the encrypted runtime secret store, and wire it into hybridclaw auth login/status/logout google-workspace.
  • Add a native LOCOMO eval suite (/eval locomo ...) including dataset setup/download, QA + retrieval modes, scoring, managed-run orchestration, and comprehensive tests.
  • Refactor/extend semantic memory writing (new MemoryService.storeSemanticMemory() entry point) and add internal docs for memory layering/limits.

Reviewed changes

Copilot reviewed 28 out of 30 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
tests/tui-slash-menu.test.ts Updates TUI slash menu expectations to include /eval locomo.
tests/memory-service.test.ts Adds unit coverage asserting default embeddings are plain number[].
tests/locomo-native.test.ts Adds extensive LOCOMO native harness tests (setup, run, retrieval, scoring, concurrency).
tests/google-workspace-auth.test.ts Adds unit coverage for Google Workspace OAuth secret storage, PKCE, exchange, refresh, and state validation.
tests/gateway-service.eval-command.test.ts Updates gateway eval help expectations to include LOCOMO managed commands.
tests/eval-command.test.ts Adds managed LOCOMO eval command tests and helper fixtures.
tests/command-registry.test.ts Ensures /eval locomo is registered and canonically mapped.
tests/cli.test.ts Adds CLI routing tests for auth login/status/logout google-workspace.
src/memory/memory-service.ts Changes hashed embedding provider to return a plain array and adds storeSemanticMemory() wrapper.
src/evals/locomo-types.ts Introduces shared LOCOMO constants/types (marker filename, dataset filename, aggregates).
src/evals/locomo-official-scoring.ts Adds an official LoCoMo scoring port (stemming-based F1 + category handling).
src/evals/locomo-native.ts Implements native LOCOMO CLI harness (setup/download, QA/retrieval modes, summaries/progress).
src/evals/eval-command.ts Wires LOCOMO into managed eval suite framework (setup/run/status/results rendering + internal launcher).
src/command-registry.ts Adds slash command catalog entries for LOCOMO eval commands.
src/cli/help.ts Updates help text for eval and auth to include LOCOMO and Google Workspace OAuth usage.
src/cli/auth-command.ts Adds google-workspace as a first-class auth provider with login/status/logout handling.
src/cli.ts Adds internal entrypoint __eval-locomo-native to run the LOCOMO harness.
src/auth/google-workspace-auth.ts New Google Workspace OAuth module using runtime secrets for client/pending/token storage and refresh.
skills/google-workspace/SKILL.md Updates skill guidance to prefer built-in Google Workspace OAuth flow for API access.
package.json Adds stemmer dependency for LOCOMO official scoring.
package-lock.json Locks stemmer and includes updated lockfile changes.
docs/static/docs.js Adds “Memory” to internals docs navigation.
docs/development/reference/configuration.md Documents Google Workspace OAuth state living in ~/.hybridclaw/credentials.json.
docs/development/reference/commands.md Documents /eval locomo ... and auth ... google-workspace usage.
docs/development/internals/session-routing.md Shifts sidebar position to accommodate new Memory internals doc.
docs/development/internals/README.md Links to the new Memory internals documentation.
docs/development/internals/memory.md New internal documentation describing memory layers, prompt injection, and default limits.
docs/development/getting-started/authentication.md Adds Google Workspace OAuth flow to onboarding/auth docs.
docs/development/agents.md Adds Memory internals link to docs landing page.
.gitignore Ignores *.tmp* temporary files.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +320 to +346
const datasetPath = getDatasetPath(options.installDir);
if (!fs.existsSync(datasetPath)) {
console.log(`Downloading dataset from ${LOCOMO_DATASET_URL}`);
const response = await fetchWithTimeout(
LOCOMO_DATASET_URL,
undefined,
LOCOMO_DATASET_DOWNLOAD_TIMEOUT_MS,
'LOCOMO dataset download',
);
if (!response.ok) {
throw new Error(
`Failed to download LOCOMO dataset: HTTP ${response.status}`,
);
}
const rawBuffer = Buffer.from(await response.arrayBuffer());
verifyDownloadedDataset(rawBuffer);
const raw = rawBuffer.toString('utf-8');
if (!raw.trim().startsWith('[')) {
throw new Error('Downloaded LOCOMO dataset is not valid JSON.');
}
fs.writeFileSync(datasetPath, rawBuffer);
} else {
console.log(`Dataset already present at ${datasetPath}`);
}

const sampleCount = loadSamples(datasetPath).length;
fs.writeFileSync(getMarkerPath(options.installDir), 'ok\n', 'utf-8');
Comment on lines +548 to +560
export async function exchangeGoogleWorkspaceAuthCode(
codeOrUrl: string,
): Promise<ExchangeGoogleWorkspaceAuthCodeResult> {
const clientSecret = requireStoredClientSecret();
const pending = requireStoredPendingAuth();
const existingToken = readStoredToken();
const { code, state } = extractCodeAndState(codeOrUrl);
if (state && state !== pending.state) {
throw new GoogleWorkspaceAuthError(
'google_workspace_state_mismatch',
'Google Workspace authorization response state mismatch. Run `hybridclaw auth login google-workspace --auth-url` again.',
);
}
Comment on lines +667 to +691
storeSemanticMemory(params: {
sessionId: string;
role: string;
source?: string | null;
scope?: string | null;
metadata?: Record<string, unknown> | string | null;
content: string;
confidence?: number;
embedding?: number[] | null;
sourceMessageId?: number | null;
}): number {
const content = params.content.trim();
if (!content) {
throw new Error('Cannot store empty semantic memory content.');
}

return this.backend.storeSemanticMemory({
...params,
content,
embedding:
params.embedding === undefined
? this.embeddingProvider.embed(content)
: params.embedding,
});
}
Comment on lines +268 to +291
{
id: 'locomo',
title: 'LOCOMO',
summary:
'Native HybridClaw LoCoMo QA benchmark over the official long-conversation dataset.',
aliases: ['lo-co-mo', 'locomo-memory'],
prereqs: [
'Node.js 22',
'network access during `setup` to download `locomo10.json`',
],
starter: [
'/eval locomo setup',
'/eval locomo run --budget 4000 --max-questions 20',
'/eval locomo run --mode retrieval --budget 4000 --max-questions 20',
],
notes: [
'The default `qa` mode generates LoCoMo answers through HybridClaw’s local OpenAI-compatible gateway and scores the model outputs directly.',
'`--mode retrieval` skips model generation, ingests each conversation into an isolated native memory session, and scores evidence hit-rate from recalled semantic memories.',
'The `qa` prompt shape follows the upstream `evaluate_gpts` flow: truncated conversation context plus a short-answer QA prompt for each LoCoMo question.',
'`--num-samples` limits conversation records. Use `--max-questions` for quick smoke runs over a small number of LoCoMo questions.',
'By default, LOCOMO creates one fresh template-seeded agent per conversation sample. Use `--current-agent` to reuse the current agent workspace.',
'Prompt/profile eval flags flow through `HYBRIDCLAW_EVAL_MODEL`, so agent/workspace mode and prompt ablations affect the benchmarked run.',
],
},
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants