Skip to content

branch v0.6.7 cycle#104

Merged
wey-gu merged 13 commits intomainfrom
dev_067
Mar 9, 2026
Merged

branch v0.6.7 cycle#104
wey-gu merged 13 commits intomainfrom
dev_067

Conversation

@wey-gu
Copy link
Member

@wey-gu wey-gu commented Mar 9, 2026

  • polishment of skills and claude code, openclaw plugin
  • introduce cursor plugin
  • introduce gemini cli plugin
  • wip: nmem-bench

Summary by CodeRabbit

  • New Features

    • Added Cursor Plugin with marketplace support, MCP-driven capabilities, and memory skills.
    • Added Gemini CLI Extension with persistent context, memory commands, and agent skills.
    • Added nowledge-mem-bench for evaluating memory systems with LoCoMo and LongMemEval support.
    • Added Codex Prompts integration with comprehensive agent workflows.
    • Introduced minimum recall score filtering for memory retrieval.
  • Documentation

    • Expanded memory lifecycle guidance across all integrations.
    • Updated integration READMEs and release notes.
    • Clarified handoff vs thread semantics.
  • Bug Fixes

    • Fixed remote-mode operations in Claude Code and Gemini extensions.
    • Improved auto-recall search strategy.

wey-gu and others added 13 commits March 5, 2026 13:30
6-phase pipeline exercising the full nowledge-mem stack via nmem CLI:
  1. Ingest (conversations → threads)
  2. Process (distill → KG extraction → Knowledge Agent)
  3. Search (nmem m search per question)
  4. Answer (LLM from retrieved context)
  5. Evaluate (Token-F1 + LLM-as-judge)
  6. Report (JSON + Markdown with per-type breakdowns)

Supports LoCoMo (ACL 2024, 1986 QA pairs) and LongMemEval datasets.
Resumable via checkpoint, fast/deep mode comparison, retrieval metrics.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…nclaw)

event.prompt contains the entire conversation history (system prompt +
all turns), producing irrelevant auto-recall results as conversations
grow. Now extracts the latest user message from event.messages with
fallback to truncated prompt (500 chars).

Added recallMinScore config (0-100, default 0) to filter low-relevance
recalled memories. Bumped to 0.6.9.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Critical:
- Fix broken LongMemEval pipeline: store thread IDs during ingestion,
  call distill_threads for LongMemEval (was creating threads but never
  distilling → search would find zero memories)
- Fix normalize_answer to match LoCoMo reference exactly: punctuation
  removal before article removal (order affects word boundaries)
- Fix adversarial scoring: reduce to reference's 2 phrases (was 8,
  inflating category 5 scores vs published results)

Significant:
- Add real concurrency: answer.py and evaluate.py now use
  asyncio.gather with semaphore instead of sequential loops
- Make content truncation configurable (search.py max_content_len,
  default raised from 500 to 2000 chars)

Minor:
- LongMemEval _find_item: O(1) dict lookup instead of O(n) scan
- Checkpoint forward-compat: filter unknown fields on deserialization
- Remove dead SimpleTokenizer class from f1.py
- Rename distill_locomo → distill_threads (works for any benchmark)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Canonical reference for the ## User / ## Assistant / ## System
conversation format with optional YAML frontmatter. Used by
POST /threads/import, nmem CLI, and browser extension.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Package the Gemini CLI extension for release and tighten the community integrations around routed recall, real thread-save semantics, and honest handoff behavior so each surface matches what its runtime can actually do.

Made-with: Cursor
Introduce a Cursor-native plugin surface with bundled MCP config, rules, and skills so Cursor users get plugin-first setup, routed recall, and honest save-handoff behavior without pretending a live thread importer exists.

Made-with: Cursor
Explain that Claude Code and Gemini remote thread saves still read local session files via nmem before uploading normalized messages to Mem.

Made-with: Cursor
Critical fixes found during deep audit:

- Fix CLI flag: `--extraction-level` → `--level` (matches actual nmem CLI)
- Fix extraction level choices: `comprehensive` → `guided` (valid: swift|guided)
- Add checkpoint validation on resume: reject benchmark mismatch
- Add min_wait to wait_for_processing: prevent false-stabilize when
  background processing hasn't started yet

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Prepare Gemini, Cursor, and Claude integration packages for release with explicit version bumps, clearer marketplace metadata, Cursor repository submission wiring, and sharper release documentation.

Made-with: Cursor
Add a packaged logo, richer manifest metadata, and more precise install wording so the Cursor plugin matches the documented marketplace submission flow without claiming an unsupported local folder-loader.

Made-with: Cursor
Tighten the Cursor marketplace metadata and release guidance around stable reviewer-facing values, while allowing both supported marketplace source path forms in validation.

Made-with: Cursor
Tighten the retrieval and distillation policy across Cursor, Gemini, Claude Code, and shared skills so agents know when to read working memory, search threads, and update existing memories instead of duplicating them.

Made-with: Cursor
@coderabbitai
Copy link

coderabbitai bot commented Mar 9, 2026

📝 Walkthrough

Walkthrough

This pull request introduces three major new integration frameworks for Nowledge Mem: a Cursor plugin package with MCP connectivity and lifecycle rules, a Gemini CLI extension with persistent context and automated hooks, and a comprehensive Python benchmarking tool with dataset loaders, evaluation metrics, and a multi-phase pipeline. Additionally, it updates existing integrations (Claude Code, Codex, OpenClaw, NPX Skills) with clarified semantics, new configuration options, and refined documentation. The repository README is restructured to emphasize standalone integrations.

Changes

Cohort / File(s) Summary
Cursor Plugin Integration
.cursor-plugin/*, nowledge-mem-cursor-plugin/*
New Cursor IDE plugin package with marketplace manifest, plugin configuration, MCP server setup, lifecycle rules for Working Memory/search/distill/save-handoff, skill documentation, validation script, changelog, release guide, and README. Validates plugin structure, manifest alignment, MCP config, and marketplace registration.
Gemini CLI Extension
nowledge-mem-gemini-cli-extension/*
New Gemini CLI integration with persistent context file (GEMINI.md), six command prompts (read-working-memory, search-memory, distill-memory, save-thread, save-handoff, status), session lifecycle hooks (session-start.mjs, session-end.mjs), extension manifest, package configuration, release workflow, validation script, changelog, skill documentation, and release notes.
Benchmark Tool Framework
nowledge-mem-bench/src/*, nowledge-mem-bench/pyproject.toml, nowledge-mem-bench/README.md
New Python benchmarking framework with dataset loaders (LoCoMo, LongMemEval), unified data types (UnifiedMessage, UnifiedSession, UnifiedQuestion), multi-phase pipeline (ingest→search→answer→evaluate→report), checkpoint/resumability system, evaluation modules (F1 scoring, LLM judge, retrieval metrics), nmem CLI client wrapper, and CLI interface with report generation.
Codex Integration Updates
nowledge-mem-codex-prompts/*
New AGENTS.md documenting Codex workflows, new search_memory.md prompt file, updated installer and prompt files (read_working_memory, save_session, distill) with refined CLI commands and simplified workflows emphasizing JSON output and project scoping.
Claude Code Plugin Updates
nowledge-mem-claude-code-plugin/*
Version bump to 0.7.1 with clarified remote-mode Stop hook semantics (local transcript capture via nmem), updated README describing thread-aware memory searches, enhanced skill documentation (distill-memory, save-thread, search-memory), and changelog documenting behavior changes and documentation tightening.
OpenClaw Plugin Updates
nowledge-mem-openclaw-plugin/*
New recallMinScore configuration option (0-100, default 0) to filter auto-recalled memories by relevance threshold; tiered query strategy using event.prompt with context messages for short queries; updated config parsing, validation, UI hints, manifest, and tooling to support score-based filtering and progressive thread inspection.
NPX Skills Updates
nowledge-mem-npx-skills/*
Renamed save-thread to save-handoff throughout, updated read-working-memory to use nmem CLI with fallback, simplified search-memory to a four-step retrieval routing workflow, condensed distill-memory guidance, and updated changelog and README to reflect terminology changes and cross-agent consistency.
Core Documentation & Configuration
README.md, mcp.json, nowledge-mem-alma-plugin/alma-skill-nowledge-mem.md
README restructured from "Nowledge Communities" to "Integrations" with compact table view; mcp.json augmented with X-NMEM-API-Key header; alma plugin thread retrieval guidance expanded with nmem command alternatives and pagination notes.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant CLI as nmem-bench CLI
    participant Benchmark as Benchmark Loader
    participant NmemClient as NmemClient Wrapper
    participant Nmem as nmem Tool
    participant LLM as LLM (GPT/Gemini)
    participant DB as Memory DB

    User->>CLI: run benchmark (locomo/longmemeval)
    CLI->>Benchmark: load dataset & questions
    Benchmark-->>CLI: UnifiedQuestion list
    
    CLI->>NmemClient: ingest_locomo/longmemeval
    NmemClient->>Nmem: thread_create (from sessions)
    Nmem->>DB: create threads
    DB-->>Nmem: thread_ids
    Nmem-->>NmemClient: ThreadInfo
    NmemClient-->>CLI: checkpoint updated
    
    CLI->>NmemClient: search_questions
    loop for each question
        NmemClient->>Nmem: memory_search(query)
        Nmem->>DB: search + rank
        DB-->>Nmem: search results
        Nmem-->>NmemClient: SearchResult list
        NmemClient-->>CLI: checkpoint[search_results, latency]
    end
    
    CLI->>LLM: answer_questions
    loop for each question
        LLM->>LLM: acompletion(context+question)
        LLM-->>CLI: answer + latency
    end
    
    CLI->>CLI: evaluate_f1 & evaluate_llm_judge
    loop for each question
        alt F1 eval
            CLI->>CLI: category-specific F1 score
        else LLM judge
            CLI->>LLM: judge_answer(question, ground_truth, hypothesis)
            LLM-->>CLI: score, label, explanation
        end
    end
    
    CLI->>CLI: generate_report (aggregate statistics)
    CLI->>User: report JSON + markdown
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~75 minutes

Possibly related PRs

  • PR #36: Modifies Claude Code plugin (plugin.json, README, skills) to migrate tooling to nmem CLI, directly aligning with this PR's plugin and skill documentation updates.
  • PR #79: Updates alma-skill-nowledge-mem.md thread retrieval and search behavior, mirroring the thread retrieval guidance changes in this PR.
  • PR #44: Modifies nowledge-mem-npx-skills (README, CHANGELOG, skill SKILL.md files including save-thread→save-handoff rename), directly related to NPX Skills cohort in this PR.

Poem

🐰 Whiskers twitch with glee—
Three new integrations bloom,
Cursor, Gemini, benchmarks all,
Plus memories resume.
Handoffs, workflows, scores so bright—
The warren's tools take flight!

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch dev_067

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 17

Note

Due to the large number of review comments, Critical, Major severity comments were prioritized as inline comments.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
nowledge-mem-openclaw-plugin/src/hooks/recall.js (1)

185-194: ⚠️ Potential issue | 🟡 Minor

Guard against undefined score before multiplication.

If r.score is undefined or null, line 187 will produce NaN% in the output. While the filter on line 182 checks (r.score ?? 0), results that pass (when minScore is 0) may still have undefined scores.

Proposed fix
 const lines = filtered.map((r) => {
   const title = r.title || "(untitled)";
-  const score = `${(r.score * 100).toFixed(0)}%`;
+  const score = `${((r.score ?? 0) * 100).toFixed(0)}%`;
   const labels =
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@nowledge-mem-openclaw-plugin/src/hooks/recall.js` around lines 185 - 194, The
mapping that builds lines can produce "NaN%" because r.score may be undefined;
update the score formatting in the lines mapping (the arrow function mapping
filtered to lines) to coerce/guard r.score (e.g., use (r.score ?? 0) or
Number(r.score) || 0) before multiplying and calling toFixed so it always
produces a numeric percentage string; keep the rest of the assembled string
(title, labels, matchHint, snippet using escapeForPrompt) unchanged.
🟡 Minor comments (12)
nowledge-mem-claude-code-plugin/skills/save-thread/SKILL.md-3-3 (1)

3-3: ⚠️ Potential issue | 🟡 Minor

Scope the explicit-request rule to manual skill use.

This reads like auto-save is unsupported, but the repo also documents a SessionEnd hook that runs nmem t save --from claude-code automatically. Please narrow this to “when invoking this skill manually” so the docs do not contradict each other.

Based on learnings: Add SessionEnd hook to .claude/settings.json that automatically runs nmem t save --from claude-code on session exit.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@nowledge-mem-claude-code-plugin/skills/save-thread/SKILL.md` at line 3,
Update the description in SKILL.md to clarify the explicit-request rule applies
only when invoking the save-thread skill manually (e.g., "Save the real Claude
Code session messages only when the user explicitly requests it via this skill")
and add a note instructing users to use the SessionEnd hook for automatic saves;
also add guidance to include the SessionEnd hook in .claude/settings.json that
runs the exact command nmem t save --from claude-code on session exit so
automatic import is handled separately from the manual-skill behavior.
nowledge-mem-gemini-cli-extension/skills/read-working-memory/SKILL.md-1-21 (1)

1-21: ⚠️ Potential issue | 🟡 Minor

Documentation inconsistency with AI summary.

The AI summary states this file includes "a fallback to a legacy local file for older setups," but this fallback mechanism is not documented anywhere in the actual file content. This discrepancy suggests either missing documentation or an inaccurate summary.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@nowledge-mem-gemini-cli-extension/skills/read-working-memory/SKILL.md` around
lines 1 - 21, The AI summary mentions a "fallback to a legacy local file for
older setups" that isn't documented in SKILL.md; add a short "Fallback for older
setups" section under Usage Pattern that documents the fallback behavior for the
nmem --json wm read command (what file/source is used as the legacy fallback,
when it's used, and how to opt out or prefer the new source), and update any
example invocation or note in the file to clarify that nmem --json wm read will
consult the legacy local file when the primary source is unavailable so the
documentation matches the AI summary.
nowledge-mem-gemini-cli-extension/skills/distill-memory/SKILL.md-20-20 (1)

20-20: ⚠️ Potential issue | 🟡 Minor

Add --json flag to the update command for consistency.

The nmem m update command on line 21 is missing the --json flag. All nmem CLI commands should use --json for consistent machine-readable output parsing. Change it to:

- Use `nmem --json m add` when the insight is genuinely new.
- If an existing memory already captures the same decision, workflow, or preference and the new information refines it, use `nmem --json m update <id> ...` instead of creating a duplicate.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@nowledge-mem-gemini-cli-extension/skills/distill-memory/SKILL.md` at line 20,
The SKILL.md text uses `nmem m update` without the required `--json` flag;
update the documentation so both examples use machine-readable output
consistently by changing the `nmem m update` invocation to `nmem --json m update
<id> ...` (and ensure the adjacent example `nmem --json m add` remains
unchanged), so references to the CLI commands (`nmem --json m add`, `nmem --json
m update`) are consistent and include the `--json` flag.
nowledge-mem-gemini-cli-extension/RELEASING.md-25-42 (1)

25-42: ⚠️ Potential issue | 🟡 Minor

Fix the working-directory examples.

These commands only work if the reader starts one directory above the repo. From the usual clone root, community/nowledge-mem-gemini-cli-extension does not exist, so the first validation/package step fails immediately.

📝 Suggested doc fix
-cd community/nowledge-mem-gemini-cli-extension
+cd nowledge-mem-gemini-cli-extension
 npm run validate
-cd community/nowledge-mem-gemini-cli-extension
+cd nowledge-mem-gemini-cli-extension
 npm run package:release
-cd community/nowledge-mem-gemini-cli-extension
+cd nowledge-mem-gemini-cli-extension
 npm run verify:release
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@nowledge-mem-gemini-cli-extension/RELEASING.md` around lines 25 - 42, The
working-directory examples assume the reader is one directory above the repo;
update the commands so they work from the repository root: replace each "cd
community/nowledge-mem-gemini-cli-extension && npm run <script>" with either a
repo-root-safe change-directory (e.g., prepend ./ if needed) or use npm's
--prefix form like "npm --prefix ./community/nowledge-mem-gemini-cli-extension
run validate" (and similarly for package:release and verify:release) so the
examples run correctly from the usual clone root.
nowledge-mem-gemini-cli-extension/GEMINI.md-100-110 (1)

100-110: ⚠️ Potential issue | 🟡 Minor

Show how to update memory content when refining a decision.

Line 109 shows nmem m update <id> -t "Updated title" as the refinement example. To actually persist the refined insight (not just the title), add the -c flag: nmem m update <id> -c "Refined content here" -t "Updated title". This makes the "update instead of duplicate" guidance actionable.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@nowledge-mem-gemini-cli-extension/GEMINI.md` around lines 100 - 110, The
example for refining an existing memory only shows updating the title; update
the example and guidance to persist refined content by including the content
flag when calling the update command (use the nmem m update command and include
the -c flag along with -t), e.g. show using nmem m update <id> -c "Refined
content here" -t "Updated title" so readers know to pass the new content to
actually overwrite the memory rather than only changing its title.
nowledge-mem-claude-code-plugin/skills/distill-memory/SKILL.md-41-45 (1)

41-45: ⚠️ Potential issue | 🟡 Minor

The update example only changes the title, not the memory content.

Line 44 shows nmem m update <id> -t "Updated title", but this only updates the title. The preceding text says "the new information refines it," which requires updating the actual memory body. Add the -c flag to show the proper form: nmem m update <id> -t "Updated title" -c "Updated content".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@nowledge-mem-claude-code-plugin/skills/distill-memory/SKILL.md` around lines
41 - 45, Update the example command showing how to refine an existing memory so
it updates the memory body as well as the title: change the shown CLI usage in
the SKILL.md example (the `nmem m update <id> -t "Updated title"` snippet) to
include the content flag (e.g., add `-c "Updated content"`) so the example
demonstrates updating the memory content, not just the title.
nowledge-mem-bench/src/nmem_bench/__main__.py-3-5 (1)

3-5: ⚠️ Potential issue | 🟡 Minor

Guard the CLI bootstrap behind if __name__ == "__main__"

Line 5 executes the CLI on import. While __main__.py is typically executed only via python -m, explicit imports of this module will trigger the CLI immediately and may exit the process.

Proposed fix
 from nmem_bench.cli import main
 
-main()
+if __name__ == "__main__":
+    main()
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@nowledge-mem-bench/src/nmem_bench/__main__.py` around lines 3 - 5, The module
currently invokes main() on import causing the CLI to run; modify __main__.py so
it only calls nmem_bench.cli.main() when executed as a script by adding a guard:
wrap the existing main() invocation in if __name__ == "__main__": main()
(referencing the imported main from nmem_bench.cli) to prevent the CLI from
running on import.
nowledge-mem-bench/src/nmem_bench/pipeline/search.py-22-22 (1)

22-22: ⚠️ Potential issue | 🟡 Minor

Use explicit None union for optional callable.

PEP 484 prohibits implicit Optional. The type hint should explicitly declare None as a valid type.

Proposed fix
+from collections.abc import Callable
+
 def search_questions(
     questions: list[UnifiedQuestion],
     client: NmemClient,
     checkpoint: RunCheckpoint,
     search_mode: str = "normal",
     top_k: int = 10,
     max_content_len: int = 2000,
-    on_progress: callable = None,
+    on_progress: Callable[[int, int, str], None] | None = None,
 ) -> None:
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@nowledge-mem-bench/src/nmem_bench/pipeline/search.py` at line 22, The
parameter/attribute annotation "on_progress: callable = None" is using an
implicit Optional; change it to an explicit union type (e.g., on_progress:
Optional[Callable[..., Any]] = None or on_progress: Callable[..., Any] | None =
None) and add the necessary typing imports (Optional, Callable, Any) or use PEP
604 union syntax depending on project Python target; update the annotation
wherever "on_progress" is declared in search.py to reference those typing
symbols.
nowledge-mem-bench/src/nmem_bench/pipeline/evaluate.py-101-108 (1)

101-108: ⚠️ Potential issue | 🟡 Minor

Progress counter increments for skipped items may mislead.

When qstate.llm_judge_score >= 0 or not qstate.answer, the counter judged is incremented but no task is created. This means progress reflects "processed" rather than "actually judged" items, which could be confusing. Consider separating counters or clarifying the semantics.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@nowledge-mem-bench/src/nmem_bench/pipeline/evaluate.py` around lines 101 -
108, The loop that inspects each q (using checkpoint.get_question and qstate)
increments the judged counter even when items are skipped
(qstate.llm_judge_score >= 0 or not qstate.answer), which mislabels progress;
change this by introducing a separate skipped/processed counter (e.g.,
processed_count or skipped_count) and only increment judged when you actually
create a judging task, or alternatively rename judged to processed and add a
distinct judged counter that increments only when you enqueue work (refer to the
loop over batch, qstate.llm_judge_score, and the point where tasks are created)
so progress reflects true “judged” vs “skipped.”
nowledge-mem-bench/src/nmem_bench/cli.py-172-176 (1)

172-176: ⚠️ Potential issue | 🟡 Minor

Remove extraneous f prefix from strings without placeholders.

These f-strings have no interpolation and should be regular strings.

♻️ Suggested fix
-        console.print(f"\n[bold]LoCoMo Dataset[/bold]")
+        console.print("\n[bold]LoCoMo Dataset[/bold]")
         console.print(f"  Conversations: {s['conversations']}")
         console.print(f"  Total sessions: {s['total_sessions']}")
         console.print(f"  Total questions: {s['total_questions']}")
-        console.print(f"  By category:")
+        console.print("  By category:")

 # ...

-        console.print(f"\n[bold]LongMemEval Dataset[/bold]")
+        console.print("\n[bold]LongMemEval Dataset[/bold]")
         console.print(f"  Total questions: {s['total_questions']}")
-        console.print(f"  By type:")
+        console.print("  By type:")

 # ...

-    console.print(f"\n[bold green]═══ Benchmark Results ═══[/bold green]")
+    console.print("\n[bold green]═══ Benchmark Results ═══[/bold green]")

Also applies to: 184-186, 197-197

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@nowledge-mem-bench/src/nmem_bench/cli.py` around lines 172 - 176, Change the
unnecessary f-strings used for plain literals to regular strings in the CLI
prints: replace calls like console.print(f"\n[bold]LoCoMo Dataset[/bold]"),
console.print(f"  By category:"), and the other console.print uses at the
mentioned nearby locations (the console.print calls around the
"conversations/total_sessions/total_questions" and the later static label
prints) so they are plain strings without the leading f prefix; leave any
console.print calls that actually interpolate values (e.g., ones using
{s['conversations']} etc.) as f-strings.
nowledge-mem-bench/src/nmem_bench/evaluation/retrieval.py-62-69 (1)

62-69: ⚠️ Potential issue | 🟡 Minor

IDCG calculation may not accurately reflect ideal relevance.

The ideal DCG assumes min(len(evidence_texts), k) relevant items exist, but this conflates the number of evidence texts with the number of retrievable relevant documents. If multiple evidence texts can match the same document (or some evidence texts never appear in the corpus), the IDCG may be inflated or deflated.

For binary relevance with substring matching, consider using min(sum(relevances), k) or documenting that evidence texts are expected to map 1:1 to distinct retrievable documents.

💡 Alternative approach
     # Ideal DCG (all relevant items first)
-    n_relevant = min(len(evidence_texts), k)
+    # Use actual relevant count from top-K as upper bound for ideal
+    n_relevant = min(int(sum(relevances)), k) if relevances else 0
     idcg = sum(1.0 / math.log2(i + 2) for i in range(n_relevant))

Or document the assumption that each evidence text corresponds to exactly one retrievable document.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@nowledge-mem-bench/src/nmem_bench/evaluation/retrieval.py` around lines 62 -
69, The IDCG calculation uses n_relevant = min(len(evidence_texts), k) which can
miscount ideal relevant items; change it to derive the actual number of relevant
items from the computed relevances (e.g., n_relevant = min(int(sum(relevances)),
k)) or deduplicate/match evidence_texts to distinct retrievable documents before
computing idcg so IDCG reflects the actual number of relevant retrievable
documents; update the idcg line that references idcg, relevances,
evidence_texts, and k accordingly.
nowledge-mem-bench/src/nmem_bench/evaluation/f1.py-74-82 (1)

74-82: ⚠️ Potential issue | 🟡 Minor

Handle empty predictions list to avoid ValueError.

If prediction is empty or only whitespace, predictions will be [''] after split, but if prediction is a single comma, the list could be ['', '']. More critically, if somehow predictions becomes empty, max() will raise ValueError.

Consider adding a guard:

🛡️ Suggested fix
 def f1_score_multi(prediction: str, ground_truth: str) -> float:
     """F1 for multi-answer questions (comma-separated ground truth)."""
     predictions = [p.strip() for p in prediction.split(",")]
     ground_truths = [g.strip() for g in ground_truth.split(",")]

+    if not predictions or not any(predictions):
+        return 0.0
+
     scores = []
     for gt in ground_truths:
-        best = max(f1_score_single(pred, gt) for pred in predictions)
+        best = max((f1_score_single(pred, gt) for pred in predictions), default=0.0)
         scores.append(best)

     return sum(scores) / len(scores) if scores else 0.0
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@nowledge-mem-bench/src/nmem_bench/evaluation/f1.py` around lines 74 - 82, The
loop calling max(f1_score_single(pred, gt) for pred in predictions) can raise
ValueError if predictions is empty; sanitize the predictions list first (e.g.,
build predictions = [p.strip() for p in prediction.split(",") if p.strip()]) and
then either use max(..., default=0.0) on a concrete list or short-circuit so
that when predictions is empty you assign best = 0.0 for each gt; update the
code around the variables prediction/predictions and the max call (which uses
f1_score_single) to use this guarded approach.
🧹 Nitpick comments (28)
nowledge-mem-bench/src/nmem_bench/__init__.py (1)

3-3: Consider sourcing __version__ from package metadata instead of duplicating it.

The version is declared in pyproject.toml (line 3) and hardcoded in __init__.py (line 3). Prefer reading from a single source of truth to prevent drift. Use importlib.metadata.version("nmem-bench") or similar to dynamically retrieve the version at runtime.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@nowledge-mem-bench/src/nmem_bench/__init__.py` at line 3, Replace the
hardcoded __version__ string in __init__.py with a runtime lookup from package
metadata: import importlib.metadata and call
importlib.metadata.version("nmem-bench") (with fallback handling if the
distribution is not found) and assign that value to __version__; update the
top-level symbol __version__ and ensure any import errors are caught so package
import still works in editable/development installs.
nowledge-mem-claude-code-plugin/skills/save-thread/SKILL.md (1)

47-47: Clarify that “Thread” means the recorded session transcript.

Thread = real session messages is still a bit broad here and can be read as the entire history. A more explicit phrase like “recorded Claude Code session transcript” would make the distinction from Memory clearer.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@nowledge-mem-claude-code-plugin/skills/save-thread/SKILL.md` at line 47,
Update the explanatory line in SKILL.md to explicitly define "Thread" as the
recorded Claude Code session transcript; replace or expand the phrase "Thread =
real session messages" with wording like "Thread = recorded Claude Code session
transcript (verbatim session messages)" to clearly distinguish it from "Memory"
(distilled insights).
nowledge-mem-gemini-cli-extension/skills/read-working-memory/SKILL.md (1)

8-8: Document the fallback command for nmem CLI.

The documentation only mentions nmem --json wm read without specifying a fallback if nmem is not installed or not in PATH. Based on learnings, the memory backend should document a fallback to uvx --from nmem-cli nmem.

Consider adding guidance like:

Use `nmem --json wm read` (or `uvx --from nmem-cli nmem --json wm read` if `nmem` is not in PATH) for the user's current priorities, unresolved flags, and recent context.

Based on learnings: Memory backend uses nmem CLI, with fallback to uvx --from nmem-cli nmem.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@nowledge-mem-gemini-cli-extension/skills/read-working-memory/SKILL.md` at
line 8, Update the SKILL.md entry in read-working-memory to document the CLI
fallback: keep the existing primary command text referencing "nmem --json wm
read" and add a fallback clause that shows the alternative invocation "uvx
--from nmem-cli nmem --json wm read" to be used when nmem is not installed or
not in PATH; ensure the sentence replaces the current lone command so it reads
like a single guidance line (e.g., "Use 'nmem --json wm read' (or 'uvx --from
nmem-cli nmem --json wm read' if nmem is not in PATH)...") and preserve mention
of the outputs (current priorities, unresolved flags, recent context).
nowledge-mem-alma-plugin/alma-skill-nowledge-mem.md (2)

19-19: Clarify pagination decision criteria.

The instruction to "fetch more only if the current page is not enough" is subjective and doesn't reference the pagination metadata that the API returns. According to learnings, nowledge_mem_thread_show returns hasMore, totalMessages, and returnedMessages metadata to guide pagination decisions. Consider being more explicit about using these fields.

📝 Suggested clarification
-5. When a memory has a `sourceThreadId`, use `nowledge_mem_thread_show` or `nmem --json t show` progressively: start with the first page and fetch more only if the current page is not enough.
+5. When a memory has a `sourceThreadId`, use `nowledge_mem_thread_show` or `nmem --json t show` progressively: start with the first page and check the `hasMore` metadata field to determine if additional pages are needed to answer the query.

Based on learnings: Thread pagination: nowledge_mem_thread_show must accept offset parameter and return totalMessages, hasMore, returnedMessages metadata

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@nowledge-mem-alma-plugin/alma-skill-nowledge-mem.md` at line 19, The
pagination instruction is ambiguous—update the logic that calls
nowledge_mem_thread_show (or the CLI nmem --json t show flow) to use the API
pagination metadata: pass an offset parameter on the initial call, inspect
returned totalMessages, returnedMessages and hasMore, and only request the next
page(s) when hasMore is true or returnedMessages < totalMessages; repeat
incrementing offset by returnedMessages until hasMore is false or you have
enough messages. Reference the nowledge_mem_thread_show call site (or the CLI
invocation wrapper) and ensure offset, hasMore, totalMessages, and
returnedMessages are read and used to drive additional fetches.

48-48: Consider adding nuance for multi-page scenarios.

The guidance to avoid dumping huge threads is good, but it's binary (one page vs. huge dump). Consider acknowledging that fetching additional pages may sometimes be appropriate when the query requires broader context, while still emphasizing the default of starting with minimal retrieval.

📝 Optional enhancement
-- Avoid dumping a huge thread when one page of messages is enough to answer.
+- Avoid dumping a huge thread when one page of messages is enough to answer. Fetch additional pages only when the query explicitly requires broader conversation context.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@nowledge-mem-alma-plugin/alma-skill-nowledge-mem.md` at line 48, Update the
guidance sentence "Avoid dumping a huge thread when one page of messages is
enough to answer." to acknowledge multi-page needs by adding nuance: keep the
default recommendation to fetch one page, but note that in cases requiring
broader context the system should iteratively fetch additional pages (e.g.,
"Start with one page of messages; if the query requires broader context, fetch
additional pages incrementally until the answer is supported"), and include a
brief heuristic for when to continue (e.g., missing references, ambiguous
timeline, or follow-up questions) so readers know when to expand retrieval.
nowledge-mem-gemini-cli-extension/skills/distill-memory/SKILL.md (1)

18-22: Add guidance on searching for existing memories to avoid duplicates.

The section advises checking whether "an existing memory already captures the same decision" (line 21) but doesn't explain HOW to search for existing memories. Consider adding:

  1. A cross-reference to the search-memory skill (mentioned in the AI summary as part of this extension)
  2. A practical example showing how to search before adding
  3. Concrete command examples with all required parameters (titles, unit-type, labels, capture path mentioned in the AI summary)
📝 Suggested enhancement to add search guidance
 ## Add vs Update
 
+Before adding a new memory, search existing memories to avoid duplicates:
+- Use the `search-memory` skill or run `nmem m search "<keywords>"` to find related memories
+- Review results to determine if your insight refines existing knowledge or is genuinely new
+
 - Use `nmem --json m add` when the insight is genuinely new.
+  Example: `nmem m add --title "API rate limit pattern" --unit-type decision --labels "api,performance" --capture-path "project/feature-x" --content "..."`
 - If an existing memory already captures the same decision, workflow, or preference and the new information refines it, use `nmem m update <id> ...` instead of creating a duplicate.
+  Example: `nmem m update abc123 --content "Updated rationale: ..."`
 
 Prefer atomic, standalone memories with strong titles and structured meaning. Focus on what was learned or decided, not routine chatter.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@nowledge-mem-gemini-cli-extension/skills/distill-memory/SKILL.md` around
lines 18 - 22, Update the "Add vs Update" section to show HOW to search for
existing memories before adding: reference the search-memory skill
(search-memory) and include a short practical example that runs a search
command, then conditionally adds or updates; show concrete commands such as
using the search-memory invocation followed by a full-form example of nmem
--json m add with required fields (title, unit-type, labels, capture-path) and
an example of nmem m update <id> when a match is found so the reader can follow
"search -> add or update" workflow.
nowledge-mem-bench/src/nmem_bench/pipeline/process.py (1)

13-18: Use a typed callback signature in the public API.

callable here is the built-in predicate, not a callback type, and = None makes the parameter implicitly optional. Spell this as Callable[[int, int, str], None] | None so the function signature matches what callers are expected to pass.

✍️ Suggested typing cleanup
+from collections.abc import Callable
 import logging
 
@@
 def distill_threads(
     client: NmemClient,
     checkpoint: RunCheckpoint,
     extraction_level: str = "guided",
-    on_progress: callable = None,
+    on_progress: Callable[[int, int, str], None] | None = None,
 ) -> None:
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@nowledge-mem-bench/src/nmem_bench/pipeline/process.py` around lines 13 - 18,
The public function distill_threads uses the builtin predicate `callable` for
the on_progress parameter and sets `= None` implicitly; change the parameter
type to a proper callback signature `Callable[[int, int, str], None] | None` and
import `Callable` (or `from typing import Callable` / `Optional` if targeting
older Python) so the function signature explicitly documents the expected
callback shape; update the type annotation on `on_progress` in distill_threads
and add the necessary typing import near the top of the module.
nowledge-mem-codex-prompts/AGENTS.md (1)

3-5: Include uvx fallback guidance in the introductory section.

The prompt assumes nmem is already on PATH. In fresh environments (devcontainers, remote shells), Codex should know the preferred fallback. Other plugin documentation (alma-plugin, openclaw-plugin) and README files already establish this pattern.

Suggested addition
 You have access to the user's Nowledge Mem through the `nmem` CLI.
+If `nmem` is unavailable, use `uvx --from nmem-cli nmem` instead.
 
 Use the installed prompts when convenient, but compose direct `nmem` commands whenever that is clearer or more efficient.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@nowledge-mem-codex-prompts/AGENTS.md` around lines 3 - 5, Update the
introductory section of AGENTS.md (the lines describing the nmem CLI and
installed prompts) to include a concise fallback note recommending the `uvx`
wrapper when `nmem` is not on PATH (e.g., in fresh devcontainers or remote
shells); mirror the pattern and wording used in other docs like alma-plugin and
openclaw-plugin README, show the preferred `uvx` usage as the fallback for Codex
invocations, and keep the guidance short and prescriptive so readers know to run
`uvx nmem ...` when `nmem` is unavailable on PATH.
nowledge-mem-bench/README.md (1)

16-23: Add language specifier to fenced code block.

The fenced code block lacks a language specifier. Use text for ASCII diagrams to satisfy linters and improve rendering consistency.

📝 Proposed fix
-```
+```text
 Conversations ──► nmem t create ──► Threads
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@nowledge-mem-bench/README.md` around lines 16 - 23, The fenced ASCII diagram
block (the block containing "Conversations ──► nmem t create ──► Threads" etc.)
is missing a language specifier; update the opening triple-backtick to include
"text" (i.e., ```text) so the code fence reads as a text block to satisfy
linters and ensure consistent rendering for the ASCII diagram.
nowledge-mem-cursor-plugin/README.md (1)

9-12: Consider adding response contract examples for tools.

The README lists MCP-backed tools (read_working_memory, memory_search, etc.) but doesn't document response contract examples. Based on learnings, README.md should document user-facing behavior and response contract examples for all tools to help users understand expected outputs.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@nowledge-mem-cursor-plugin/README.md` around lines 9 - 12, Add explicit
response contract examples for each MCP-backed tool listed (read_working_memory,
memory_search, thread_search, thread_fetch_messages, memory_add, memory_update)
and for the four skills (read-working-memory, search-memory, distill-memory,
save-handoff) to README.md: for each tool/skill include a short description, the
shape of the returned object (fields, types, optional vs required), example JSON
responses showing typical success and error cases, and a note about any
timing/formatting rules used by the Cursor rules (working memory timing,
proactive recall, retrieval routing, add-vs-update behavior) so users can see
expected outputs and how to handle them.
nowledge-mem-npx-skills/skills/distill-memory/SKILL.md (1)

22-23: Consider documenting the --unit-type parameter.

Based on learnings, memories should be classified using memory types (learning, decision, fact, procedure, event, preference, plan, context) via the --unit-type parameter. Adding this to the usage guidance would help users properly categorize their memories.

📝 Suggested addition
 - Use `nmem --json m add` when the insight is genuinely new.
+  - Consider adding `--unit-type` to classify the memory (e.g., `decision`, `procedure`, `learning`, `preference`).
 - If an existing memory already captures the same decision, workflow, or preference and the new information refines it, use `nmem m update <id> ...` instead of creating a duplicate.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@nowledge-mem-npx-skills/skills/distill-memory/SKILL.md` around lines 22 - 23,
Update the usage guidance near the existing bullets referencing "nmem --json m
add" and "nmem m update <id>" to document the --unit-type parameter and its
accepted values; explicitly list the memory types (learning, decision, fact,
procedure, event, preference, plan, context), show that --unit-type should be
provided when adding/updating memories to classify them, and indicate that
setting --unit-type to the most appropriate category avoids duplicates and
improves retrieval (mentioning the same parameter in both the "nmem --json m
add" and "nmem m update <id>" contexts).
nowledge-mem-npx-skills/skills/save-handoff/SKILL.md (1)

51-56: Add language specifier to fenced code block.

The response format example lacks a language specifier. Use text for plain text output examples.

📝 Proposed fix
-```
+```text
 ✓ Handoff saved
 Title: {title}
 Summary: {content}
 Thread ID: {thread_id}
 ```
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@nowledge-mem-npx-skills/skills/save-handoff/SKILL.md` around lines 51 - 56,
Update the fenced code block example in SKILL.md for the save-handoff skill to
include a language specifier; change the triple-backtick block that shows the
handoff output (the block containing "✓ Handoff saved\nTitle: {title}\nSummary:
{content}\nThread ID: {thread_id}") to use ```text so the example is marked as
plain text.
nowledge-mem-gemini-cli-extension/scripts/validate-extension.mjs (1)

95-100: Clarify error handling for JSON-specific files.

The SyntaxError check at line 96 will only apply to hooks/hooks.json since that's the only file being parsed with JSON.parse in this loop. For other files (TOML, Markdown), readFile won't throw a SyntaxError. Consider scoping this check more explicitly or adding a comment for clarity.

♻️ Suggested improvement
     } catch (error) {
-      if (error instanceof SyntaxError) {
+      // SyntaxError only applies to JSON.parse in hooks.json validation above
+      if (error instanceof SyntaxError && relPath.endsWith('.json')) {
         fail(`${relPath} must contain valid JSON`);
       }
       fail(`missing required file: ${relPath}`);
     }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@nowledge-mem-gemini-cli-extension/scripts/validate-extension.mjs` around
lines 95 - 100, The catch block currently treats any caught SyntaxError as if it
came from JSON parsing but only hooks/hooks.json is JSON-parsed; update the
error handling in validate-extension.mjs so the SyntaxError branch is only
applied when relPath === 'hooks/hooks.json' (or otherwise when you know
JSON.parse was used), and keep the generic fail(`missing required file:
${relPath}`) for other file errors; reference the relPath variable and the
SyntaxError check and ensure process uses fail(...) accordingly.
nowledge-mem-gemini-cli-extension/scripts/package-extension.mjs (1)

88-93: Consider adding existence check before copy.

If a file listed in filesToShip doesn't exist (e.g., CHANGELOG.md missing on first release), the error message from cp won't clearly indicate which file is missing. A pre-flight check could improve debugging.

Optional: Add existence validation
+import { access } from 'node:fs/promises';
+
+// Before the copy loop in main():
+for (const relPath of filesToShip) {
+  await access(path.join(extensionRoot, relPath)).catch(() => {
+    console.error(`ERROR: Required file missing: ${relPath}`);
+    process.exit(1);
+  });
+}
+
 for (const relPath of filesToShip) {
   await cp(path.join(extensionRoot, relPath), path.join(stageDir, relPath), {
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@nowledge-mem-gemini-cli-extension/scripts/package-extension.mjs` around lines
88 - 93, Before calling cp for each path in filesToShip, validate the source
exists and fail with a clear message; iterate filesToShip and for each compute
const src = path.join(extensionRoot, relPath) and use fs.promises.stat or
fs.existsSync to check existence, throwing or logging a descriptive error
(including relPath and src) if missing, then only call cp for confirmed-existing
files so the failing case points to the missing file rather than a generic cp
error.
nowledge-mem-bench/src/nmem_bench/pipeline/search.py (2)

60-62: Use logging.exception to capture traceback.

When logging errors from caught exceptions, logging.exception automatically includes the traceback, which aids debugging failed searches.

Proposed fix
         except Exception as e:
             qstate.error = f"Search failed: {e}"
-            logger.error("Search failed for %s: %s", question.question_id, e)
+            logger.exception("Search failed for %s", question.question_id)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@nowledge-mem-bench/src/nmem_bench/pipeline/search.py` around lines 60 - 62,
The except block sets qstate.error and logs with logger.error but doesn't
include the traceback; update the except block in search.py (the handler that
sets qstate.error = f"Search failed: {e}") to call logger.exception(...) instead
of logger.error so the full traceback is recorded (preserve the contextual
message including question.question_id and the exception string). Keep the
qstate.error assignment but replace the logger call with
logger.exception("Search failed for %s: %s", question.question_id, e) so
diagnostics include the stack trace.

32-32: Rename unused loop variable.

The idx variable is not used within the loop body.

Proposed fix
-    for idx, question in enumerate(questions):
+    for _idx, question in enumerate(questions):

Or simply remove enumeration if the index isn't needed:

-    for idx, question in enumerate(questions):
+    for question in questions:
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@nowledge-mem-bench/src/nmem_bench/pipeline/search.py` at line 32, The loop in
search.py currently uses an unused index variable `idx` in the line `for idx,
question in enumerate(questions):`; remove the unused variable by changing the
loop to `for question in questions:` (or, if you prefer to keep enumeration for
clarity, rename `idx` to `_`) so that the unused binding is eliminated while
leaving the loop body (which uses `question`) unchanged.
nowledge-mem-bench/src/nmem_bench/pipeline/answer.py (4)

86-88: Use logging.exception to preserve traceback.

When catching exceptions, logger.error loses the stack trace. Use logger.exception to include the traceback for debugging.

♻️ Suggested fix
         except Exception as e:
             qstate.error = f"Answer failed: {e}"
-            logger.error("Answer generation failed for %s: %s", q.question_id, e)
+            logger.exception("Answer generation failed for %s", q.question_id)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@nowledge-mem-bench/src/nmem_bench/pipeline/answer.py` around lines 86 - 88,
The except block sets qstate.error but uses logger.error which omits the
traceback; change the logging call in the except of the answer generation (where
qstate.error is set and logger.error is called with q.question_id) to use
logger.exception so the stack trace is preserved (keep the existing
message/context and q.question_id while switching to logger.exception).

69-77: Remove redundant inner import; fix forward reference.

The QuestionState type hint on line 71 is a forward reference string, but the import on line 77 is inside the function body. Since the annotation is quoted, the inner import isn't needed for type checking. Move the import to the top or remove the redundant inner import.

♻️ Suggested fix
+from nmem_bench.pipeline.checkpoint import RunCheckpoint, QuestionState
-from nmem_bench.pipeline.checkpoint import RunCheckpoint

 # ...

 async def _answer_one(
     q: UnifiedQuestion,
-    qstate: "QuestionState",
+    qstate: QuestionState,
     context: str,
     model: str,
     sem: asyncio.Semaphore,
 ) -> None:
     """Answer a single question with semaphore-bounded concurrency."""
-    from nmem_bench.pipeline.checkpoint import QuestionState
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@nowledge-mem-bench/src/nmem_bench/pipeline/answer.py` around lines 69 - 77,
The inner runtime import of QuestionState inside the _answer_one function is
redundant because the type hint uses a forward-reference string; remove the
in-function import statement (from nmem_bench.pipeline.checkpoint import
QuestionState) or, if you need the class at runtime, move that import to the
module top — update the signature to rely on the quoted "QuestionState" only and
ensure no other runtime references require the local import.

131-137: Calling asyncio.run() in a loop is inefficient.

Each iteration creates and tears down a new event loop. Consider running all batches within a single asyncio.run() call or using asyncio.get_event_loop().run_until_complete() if an existing loop is expected.

💡 Alternative approach
+    async def _run_all():
+        for i in range(0, len(questions), batch_size):
+            batch = questions[i : i + batch_size]
+            await _process_batch(batch)
+            if answered % 50 == 0:
+                logger.info("[%d/%d] Answered", answered, total)
+
     # Process in batches for checkpointing
     batch_size = concurrency * 4
-    for i in range(0, len(questions), batch_size):
-        batch = questions[i : i + batch_size]
-        asyncio.run(_process_batch(batch))
-        if answered % 50 == 0:
-            logger.info("[%d/%d] Answered", answered, total)
+    asyncio.run(_run_all())
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@nowledge-mem-bench/src/nmem_bench/pipeline/answer.py` around lines 131 - 137,
The loop currently calls asyncio.run() per batch (batch_size, _process_batch)
which repeatedly creates/destroys event loops; refactor to run a single event
loop and await all batches there: create an async wrapper (e.g., async def
_process_all_batches(questions, concurrency)) that iterates or schedules calls
to _process_batch for each batch and updates answered/ logging, then call
asyncio.run(_process_all_batches(...)) once; ensure you reference and use the
existing _process_batch function, batch_size calculation, and the
answered/logging logic inside the single-run async wrapper so no per-iteration
asyncio.run() calls remain.

91-96: Use explicit | None for optional callable parameter.

PEP 484 prohibits implicit Optional. The type hint should explicitly indicate nullability.

♻️ Suggested fix
 def answer_questions(
     questions: list[UnifiedQuestion],
     checkpoint: RunCheckpoint,
     model: str = "gpt-4o-mini",
     concurrency: int = 5,
-    on_progress: callable = None,
+    on_progress: callable | None = None,
 ) -> None:
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@nowledge-mem-bench/src/nmem_bench/pipeline/answer.py` around lines 91 - 96,
The on_progress parameter in answer_questions is currently typed as a bare
callable; change it to an explicit nullable type (e.g., on_progress: Callable |
None = None) to satisfy PEP 484; update the top-of-file imports to include
Callable (or from collections.abc import Callable for modern Python) and, if you
want to be precise about the callable signature, use Callable[..., Any] | None
and import Any as well so the function signature reads something like
on_progress: Callable[..., Any] | None = None.
nowledge-mem-bench/src/nmem_bench/benchmarks/longmemeval.py (1)

64-68: Specify explicit encoding when opening JSON file.

For cross-platform consistency, use encoding="utf-8".

♻️ Suggested fix
-        with open(data_file) as f:
+        with open(data_file, encoding="utf-8") as f:
             self._items = json.load(f)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@nowledge-mem-bench/src/nmem_bench/benchmarks/longmemeval.py` around lines 64
- 68, Open the JSON file with an explicit UTF-8 encoding to ensure
cross-platform consistency: update the file-opening call that reads data_file
(the with open(...) block used to populate self._items in longmemeval.py) to
pass encoding="utf-8" so json.load reads a consistent byte encoding before
building self._by_id and logging via logger.info.
nowledge-mem-bench/src/nmem_bench/pipeline/evaluate.py (2)

93-94: Use logging.exception to preserve traceback.

♻️ Suggested fix
             except Exception as e:
-                logger.error("LLM judge failed for %s: %s", q.question_id, e)
+                logger.exception("LLM judge failed for %s", q.question_id)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@nowledge-mem-bench/src/nmem_bench/pipeline/evaluate.py` around lines 93 - 94,
The except block currently logs the error with logger.error and drops the
traceback; change the handler in evaluate.py (the except Exception as e block
where logger is used) to call logger.exception("LLM judge failed for %s",
q.question_id) so the full traceback is preserved (keep q.question_id in the
message and remove explicit e from the formatted args since logger.exception
will log the exception info).

18-22: Use explicit | None for optional callable parameters.

Same issue as in answer.py — PEP 484 prohibits implicit Optional.

♻️ Suggested fix
 def evaluate_f1(
     questions: list[UnifiedQuestion],
     checkpoint: RunCheckpoint,
-    on_progress: callable = None,
+    on_progress: callable | None = None,
 ) -> None:

 # ...

 def evaluate_llm_judge(
     questions: list[UnifiedQuestion],
     checkpoint: RunCheckpoint,
     judge_model: str = "gpt-4o-mini",
-    on_progress: callable = None,
+    on_progress: callable | None = None,
 ) -> None:

Also applies to: 64-69

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@nowledge-mem-bench/src/nmem_bench/pipeline/evaluate.py` around lines 18 - 22,
The function signature for evaluate_f1 uses an optional callable typed as
"callable = None", which is implicit Optional and violates PEP 484; change the
type annotation to "on_progress: Callable[..., Any] | None" (or a more specific
Callable signature) and import/qualify Callable/Any from typing (or
typing_extensions) to make the optional explicit; apply the same change to the
other optional callable parameter(s) referenced around the 64-69 block (match
their function names) so all optional callables use the explicit "… | None"
form.
nowledge-mem-bench/src/nmem_bench/cli.py (1)

118-120: Specify explicit encoding when opening files.

File opens should use encoding="utf-8" for cross-platform consistency.

♻️ Suggested fix
     for p in report_paths:
-        with open(p) as f:
+        with open(p, encoding="utf-8") as f:
             reports.append(json.load(f))

 # ...

-    with open(report_path) as f:
+    with open(report_path, encoding="utf-8") as f:
         report = json.load(f)

Also applies to: 193-194

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@nowledge-mem-bench/src/nmem_bench/cli.py` around lines 118 - 120, The file
opens use bare open(...) calls which lack explicit encoding; update the
file-reading in the loop that appends JSON reports (the for p in report_paths
loop where open(p) is used to load into reports) to call open(p,
encoding="utf-8"), and similarly update the other occurrence around the block
referenced at lines 193-194 to use encoding="utf-8" when opening files so reads
are consistent across platforms.
nowledge-mem-cursor-plugin/scripts/validate-plugin.mjs (1)

14-16: Consider adding error handling for JSON parsing.

If plugin.json, .mcp.json, or marketplace.json contain invalid JSON, the script will throw an unhelpful parse error. Consider wrapping the parse in a try-catch to provide a clearer error message indicating which file failed.

💡 Suggested improvement
 async function readJson(filePath) {
-  return JSON.parse(await readFile(filePath, 'utf8'));
+  try {
+    return JSON.parse(await readFile(filePath, 'utf8'));
+  } catch (e) {
+    fail(`Failed to parse JSON from ${filePath}: ${e.message}`);
+  }
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@nowledge-mem-cursor-plugin/scripts/validate-plugin.mjs` around lines 14 - 16,
The readJson function currently calls JSON.parse directly which will surface an
unhelpful parse error; update the async function readJson(filePath) to wrap the
await readFile(... ) + JSON.parse(...) in a try-catch, and on failure throw a
new Error that includes the filePath (e.g., "Failed to parse JSON in
<filePath>") and attach or include the original error message (or use the
original error as the cause) so callers know which file failed (apply this
change to the readJson function used for plugin.json, .mcp.json, and
marketplace.json).
nowledge-mem-bench/src/nmem_bench/pipeline/ingest.py (3)

26-32: Use explicit | None for optional callable parameters.

♻️ Suggested fix
 def ingest_locomo(
     benchmark: LoComoBenchmark,
     client: NmemClient,
     checkpoint: RunCheckpoint,
     sample_ids: list[str] | None = None,
-    on_progress: callable = None,
+    on_progress: callable | None = None,
 ) -> None:

 # ...

 def ingest_longmemeval(
     benchmark: LongMemEvalBenchmark,
     client: NmemClient,
     checkpoint: RunCheckpoint,
     question_ids: list[str] | None = None,
-    on_progress: callable = None,
+    on_progress: callable | None = None,
 ) -> None:

Also applies to: 82-88

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@nowledge-mem-bench/src/nmem_bench/pipeline/ingest.py` around lines 26 - 32,
The optional on_progress parameter in ingest_locomo (and the other function at
lines 82-88) is typed as plain callable; update its annotation to an explicit
optional type such as Callable[..., Any] | None (or Optional[Callable[...,
Any]]) and add the necessary imports (from typing import Callable, Any) so the
signature becomes on_progress: Callable[..., Any] | None = None; keep the
default None and adjust both function signatures consistently.

112-139: Consider periodic checkpoint saves in ingest_longmemeval for resilience.

Unlike ingest_locomo which saves after each sample, ingest_longmemeval only saves after all sessions are processed (line 139). If the script crashes during a long ingestion, progress is lost. Consider saving periodically.

💡 Suggested improvement
             if on_progress:
                 on_progress(idx + 1, total, question.question_id)
+
+            # Periodic checkpoint for resilience
+            if (idx + 1) % 10 == 0:
+                conv_state.thread_ids = thread_ids
+                checkpoint.save()

     conv_state.thread_ids = thread_ids
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@nowledge-mem-bench/src/nmem_bench/pipeline/ingest.py` around lines 112 - 139,
The ingestion loop currently only calls checkpoint.save() after all sessions,
risking lost progress; modify the ingest_longmemeval flow so that after a
successful session ingest (where you append to thread_ids and add to
ingested_sessions) you update conv_state.thread_ids (and optionally
conv_state.phase = "ingested" or an intermediate phase) and call
checkpoint.save() periodically (e.g., after each session or every N sessions).
Locate the block that handles client.thread_create_from_file, the variables
thread_ids and ingested_sessions, and the final checkpoint.save() and insert the
periodic save there; ensure on_progress remains unchanged and that exceptions
still skip saving for failed sessions.

71-72: Use logging.exception to preserve traceback on failures.

Both exception handlers log with logger.error, losing stack trace information that would help debug thread creation failures.

♻️ Suggested fix
                 except Exception as e:
-                    logger.error("Failed to create thread for %s: %s", session.session_id, e)
+                    logger.exception("Failed to create thread for %s", session.session_id)

 # and similarly:

                 except Exception as e:
-                    logger.error("Failed to ingest session %s: %s", session.session_id, e)
+                    logger.exception("Failed to ingest session %s", session.session_id)

Also applies to: 131-132

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@nowledge-mem-bench/src/nmem_bench/pipeline/ingest.py` around lines 71 - 72,
Replace the existing logger.error calls in the thread-creation except handlers
with logger.exception so the traceback is preserved; specifically, in the except
block that currently does logger.error("Failed to create thread for %s: %s",
session.session_id, e) (and the analogous handler later around the other thread
creation), change to logger.exception("Failed to create thread for %s",
session.session_id) (or call logger.exception with the same message and omit
manually interpolating the exception) so the stack trace is logged
automatically.

Comment on lines +7 to +8
"APP": "<MCP Client App Name here>",
"X-NMEM-API-Key": "nmem_..."
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Don’t put API-key slots in a tracked MCP config.

Even with a placeholder value, this nudges users to paste a live nmem_... token into a committed file. Keep the checked-in example header-free and document local-only auth setup separately.

🔐 Suggested change
       "headers": {
-        "APP": "<MCP Client App Name here>",
-        "X-NMEM-API-Key": "nmem_..."
+        "APP": "<MCP Client App Name here>"
       }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
"APP": "<MCP Client App Name here>",
"X-NMEM-API-Key": "nmem_..."
"APP": "<MCP Client App Name here>"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@mcp.json` around lines 7 - 8, Remove the "X-NMEM-API-Key" placeholder from
the committed mcp.json and avoid suggesting an API header value in the example;
keep the example config minimal and header-free by retaining only non-sensitive
keys like "APP" (with a neutral placeholder) and remove any "nmem_..."
placeholder token. Update documentation or a separate local-only README to
explain how contributors should provide their personal API key (e.g., via
environment variables or an untracked local mcp.local.json) and reference the
"X-NMEM-API-Key" header name there instead of embedding it in the tracked file.

Comment on lines +23 to +24
# Default path relative to the monorepo root
_DEFAULT_DATA_PATH = Path(__file__).resolve().parents[5] / "3pp" / "locomo" / "data" / "locomo10.json"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

The default LoCoMo path resolves outside the repo.

From this file's location, parents[5] points one directory above the repository root. LoComoBenchmark() is constructed without a data_path in the orchestrator, so the default LoCoMo flow will fail to find 3pp/locomo/data/locomo10.json.

Proposed fix
-_DEFAULT_DATA_PATH = Path(__file__).resolve().parents[5] / "3pp" / "locomo" / "data" / "locomo10.json"
+_DEFAULT_DATA_PATH = Path(__file__).resolve().parents[4] / "3pp" / "locomo" / "data" / "locomo10.json"
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
# Default path relative to the monorepo root
_DEFAULT_DATA_PATH = Path(__file__).resolve().parents[5] / "3pp" / "locomo" / "data" / "locomo10.json"
# Default path relative to the monorepo root
_DEFAULT_DATA_PATH = Path(__file__).resolve().parents[4] / "3pp" / "locomo" / "data" / "locomo10.json"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@nowledge-mem-bench/src/nmem_bench/benchmarks/locomo.py` around lines 23 - 24,
_DEFAULT_DATA_PATH currently uses Path(__file__).resolve().parents[5] which
resolves one level above the repo root and causes LoComoBenchmark() (constructed
without data_path) to miss 3pp/locomo/data/locomo10.json; change the parent
index to the correct level (e.g. parents[4]) so the path points inside the repo,
update the _DEFAULT_DATA_PATH definition accordingly, and add a simple existence
check (raise a clear error or log) when LoComoBenchmark() falls back to this
default to aid debugging.

Comment on lines +125 to +129
except Exception as exc:
logger.warning("LLM judge failed: %s", exc)
return {"score": 0, "label": "error", "explanation": f"Judge error: {exc}"}

return {"score": 0, "label": "error", "explanation": "No valid JSON in judge response"}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Don't collapse judge failures into "incorrect".

QuestionState.llm_judge_score already reserves -1 for "not evaluated". Returning 0 here makes API failures indistinguishable from a wrong answer and silently drags the benchmark down.

🐛 Proposed fix
     except Exception as exc:
         logger.warning("LLM judge failed: %s", exc)
-        return {"score": 0, "label": "error", "explanation": f"Judge error: {exc}"}
+        return {"score": -1, "label": "error", "explanation": f"Judge error: {exc}"}
 
-    return {"score": 0, "label": "error", "explanation": "No valid JSON in judge response"}
+    return {"score": -1, "label": "error", "explanation": "No valid JSON in judge response"}
🧰 Tools
🪛 Ruff (0.15.4)

[warning] 125-125: Do not catch blind exception: Exception

(BLE001)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@nowledge-mem-bench/src/nmem_bench/evaluation/llm_judge.py` around lines 125 -
129, The code currently maps judge failures to a score of 0 which conflates
API/errors with incorrect judgments; update the exception handling and the
fallback case in the function that produces the judge response (the block using
logger.warning("LLM judge failed: %s", exc) and the final return) to return
{"score": -1, "label": "error", "explanation": f"Judge error: {exc}"} on
exceptions and {"score": -1, "label": "error", "explanation": "No valid JSON in
judge response"} when no valid JSON is returned so that
QuestionState.llm_judge_score retains -1 for “not evaluated”/error cases rather
than treating them as incorrect.

Comment on lines +64 to +66
def __init__(self, nmem_path: str | None = None, api_url: str | None = None):
self._nmem = nmem_path or self._find_nmem()
self._api_url = api_url
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "Repo references to the supported uvx flow:"
rg -n --fixed-strings 'uvx --from nmem-cli nmem'

echo
echo "NmemClient discovery and execution logic:"
sed -n '64,110p' nowledge-mem-bench/src/nmem_bench/nmem/client.py

Repository: nowledge-co/community

Length of output: 3070


🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "=== Full _run() method implementation ==="
sed -n '89,130p' nowledge-mem-bench/src/nmem_bench/nmem/client.py

echo
echo "=== Check for shlex or command parsing ==="
rg -n 'shlex|split.*cmd|command.*split' nowledge-mem-bench/src/nmem_bench/nmem/client.py

echo
echo "=== Check NmemClient docstring and type hints ==="
sed -n '48,75p' nowledge-mem-bench/src/nmem_bench/nmem/client.py

Repository: nowledge-co/community

Length of output: 1389


Accept multi-token launchers in _nmem and _find_nmem().

The documented fallback uvx --from nmem-cli nmem is a multi-token launcher, but _find_nmem() searches only for a single nmem executable and _run() treats self._nmem as a single command token. This blocks users from using the recommended fallback: passing nmem_path="uvx --from nmem-cli nmem" will fail because subprocess will seek an executable with that literal name.

Either modify _find_nmem() to detect and fall back to uvx --from nmem-cli nmem if nmem is not found, or split self._nmem into tokens when constructing the command (using shlex.split() for safety).

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@nowledge-mem-bench/src/nmem_bench/nmem/client.py` around lines 64 - 66, The
__init__ and _find_nmem/_run flow treats self._nmem as a single token so
multi-token launchers like "uvx --from nmem-cli nmem" fail; change the
implementation to accept multi-token launchers by splitting the _nmem value into
tokens with shlex.split() (either when assigning self._nmem in __init__ or
immediately before building subprocess args in _run), and ensure _find_nmem()
either detects/returns the full fallback launcher string or returns None so the
__init__ fallback is used; update references to self._nmem in _run to extend the
command list with the tokenized launcher instead of passing it as a single
string.

Comment on lines +125 to +129
try:
return json.loads(result.stdout)
except json.JSONDecodeError:
logger.warning("Non-JSON output from nmem: %.200s", result.stdout)
return {"raw_output": result.stdout}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Fail fast when --json output is malformed.

Most callers default missing keys to empty or zero values. Returning {"raw_output": ...} here can silently turn a CLI protocol break into blank thread IDs, empty search results, or zero stats instead of surfacing the failure.

🐛 Proposed fix
         try:
             return json.loads(result.stdout)
         except json.JSONDecodeError:
             logger.warning("Non-JSON output from nmem: %.200s", result.stdout)
-            return {"raw_output": result.stdout}
+            raise RuntimeError(
+                f"nmem returned non-JSON output for {' '.join(args)}"
+            )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@nowledge-mem-bench/src/nmem_bench/nmem/client.py` around lines 125 - 129, The
code currently swallows JSON parse failures in the json.loads(result.stdout)
call and returns {"raw_output": result.stdout}, which can hide protocol breaks;
instead, change the except json.JSONDecodeError block (the handler around
json.loads and the logger.warning call) to fail fast by raising a descriptive
exception (e.g., RuntimeError or a new NmemProtocolError) that includes the
original JSON error and the raw result.stdout (use "raise ... from e" to
preserve the original exception), so callers will see a hard failure rather than
silently receiving {"raw_output": ...}.

Comment on lines +9 to +11
4. Use the shell tool to run `nmem --json m add` for each selected memory.
5. Use strong titles, a fitting `--unit-type`, and 0-3 labels only when they improve retrieval.
6. Set `-s gemini-cli` so the capture path stays auditable.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🌐 Web query:

What is the documented safe way to pass arbitrary multiline text to nmem m add? Does the CLI support stdin, heredocs, or file-based input for memory content?

💡 Result:

The Nowledge Mem CLI docs only document nmem m add as taking the memory content as a single positional CLI argument (i.e., nmem m add "…"), and the listed options are things like --title, --importance, --label, --unit-type, --event-start/end, --whenno --file, --stdin, or similar input mode is documented for m add. [1]

By contrast, the same CLI docs do explicitly show file-based input for other commands (e.g., nmem t create … -f notes.md), which strongly suggests that if m add supported file/stdin, it would be documented similarly—but it isn’t. [2]

Practical “safe” way (shell-level; not an nmem feature)

To pass arbitrary multiline text, wrap it into one argument via your shell, e.g.:

# from a file
nmem m add "$(cat notes.md)"

# from a heredoc
nmem m add "$(cat <<'EOF'
line 1
line 2
EOF
)"

So: stdin/heredoc/file input is not documented as supported by nmem m add itself; the documented interface is inline argument text only. [1]

Sources: [1] Nowledge Mem CLI docs (nmem m add section/options), [2] Nowledge Mem CLI docs (nmem t create “From a file” example).


Explain how to safely pass multiline memory text to nmem m add.

The prompt currently lacks guidance on safe input handling. Since nmem m add takes memory content as a single positional argument and does not support dedicated --file or --stdin options, the agent must use shell-level techniques (command substitution or temporary files) to safely pass arbitrary text without breaking on quotes or newlines. Add explicit instruction with an example.

🛠️ Suggested prompt update
-4. Use the shell tool to run `nmem --json m add` for each selected memory.
+4. Use the shell tool to run `nmem --json m add` for each selected memory, passing the title and body as safe shell arguments (e.g., `nmem m add "$(cat <<'EOF'
+…memory text…
+EOF
+)" --title "…"` or by writing to a temp file and using command substitution).
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
4. Use the shell tool to run `nmem --json m add` for each selected memory.
5. Use strong titles, a fitting `--unit-type`, and 0-3 labels only when they improve retrieval.
6. Set `-s gemini-cli` so the capture path stays auditable.
4. Use the shell tool to run `nmem --json m add` for each selected memory, passing the title and body as safe shell arguments (e.g., `nmem m add "$(cat <<'EOF'
…memory text…
EOF
)" --title "…"` or by writing to a temp file and using command substitution).
5. Use strong titles, a fitting `--unit-type`, and 0-3 labels only when they improve retrieval.
6. Set `-s gemini-cli` so the capture path stays auditable.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@nowledge-mem-gemini-cli-extension/commands/nowledge/distill-memory.toml`
around lines 9 - 11, The prompt lacks guidance for safely passing multiline
memory text to the nmem --json m add command; update the instructions to tell
the agent to write the memory body to a temporary file using a shell here-doc
with a single-quoted delimiter (to preserve newlines and prevent variable
expansion), then pass the file contents to nmem --json m add via command
substitution or input redirection (e.g., using $(cat "$tmp") or $(<"$tmp")),
include a trap or explicit rm to delete the temp file afterward, and show a
short example invocation that also includes -s gemini-cli and --unit-type to
keep the capture path auditable.

Comment on lines +26 to +31
spawnSync('nmem', args, {
encoding: 'utf8',
timeout: 20000,
});

process.stdout.write(JSON.stringify({ suppressOutput: true }));
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

find . -name "session-end.mjs" -type f

Repository: nowledge-co/community

Length of output: 122


🏁 Script executed:

find . -path "*/hooks/*" -name "*.mjs" -type f

Repository: nowledge-co/community

Length of output: 182


🏁 Script executed:

cat -n ./nowledge-mem-gemini-cli-extension/hooks/session-end.mjs

Repository: nowledge-co/community

Length of output: 1079


Don't swallow failed auto-saves.

This hook ignores the spawnSync result completely—no checks for error or status. If nmem is missing, times out, or returns non-zero, the session save is silently dropped with no indication to the caller. Keep best-effort behavior, but surface a stderr warning before returning suppressOutput: true.

Suggested hardening
-spawnSync('nmem', args, {
+const result = spawnSync('nmem', args, {
   encoding: 'utf8',
   timeout: 20000,
 });
+
+if (result.error || result.status !== 0) {
+  const reason =
+    result.error?.message ||
+    result.stderr?.trim() ||
+    `nmem exited with status ${result.status}`;
+  process.stderr.write(`[nowledge-mem] SessionEnd auto-save skipped: ${reason}\n`);
+}
 
 process.stdout.write(JSON.stringify({ suppressOutput: true }));
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@nowledge-mem-gemini-cli-extension/hooks/session-end.mjs` around lines 26 -
31, The current spawnSync('nmem', args, { encoding: 'utf8', timeout: 20000 })
call ignores its return value; capture its result (e.g., const res =
spawnSync(...)) and check res.error, res.status !== 0, or res.signal, and if any
indicate failure write a warning including res.stderr (or res.error.message) to
stderr (process.stderr.write or console.error) before emitting
process.stdout.write(JSON.stringify({ suppressOutput: true })); keep the same
best-effort behavior but surface the error details so failed auto-saves are not
silently swallowed.

Comment on lines +24 to +33
You can also install `nmem` standalone:

```bash
# Option 1: pip
pip install nmem-cli

# Option 2: uvx
curl -LsSf https://astral.sh/uv/install.sh | sh
uvx --from nmem-cli nmem --version
```
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "# README install section"
sed -n '18,40p' nowledge-mem-gemini-cli-extension/README.md

echo
echo "# Gemini extension runtime invocations"
rg -n -C2 '\bnmem\b|spawnSync\(' nowledge-mem-gemini-cli-extension -g '!**/dist/**'

Repository: nowledge-co/community

Length of output: 33119


🏁 Script executed:

#!/bin/bash

# Check if hooks have fallback logic for uvx
echo "=== Checking session-start.mjs for fallback logic ==="
sed -n '14,30p' nowledge-mem-gemini-cli-extension/hooks/session-start.mjs

echo
echo "=== Checking session-end.mjs for fallback logic ==="
sed -n '20,35p' nowledge-mem-gemini-cli-extension/hooks/session-end.mjs

echo
echo "=== Search for any error handling or uvx fallback in hooks ==="
rg -n 'try|catch|fallback|uvx' nowledge-mem-gemini-cli-extension/hooks/ || echo "(no matches found)"

echo
echo "=== Check if there's a helper/wrapper for nmem invocation ==="
fd -t f '\.(mjs|js|ts)$' nowledge-mem-gemini-cli-extension -x grep -l 'nmem\|uvx' {} \;

Repository: nowledge-co/community

Length of output: 1418


🏁 Script executed:

#!/bin/bash

# Check if README mentions fallback behavior
echo "=== Search README for fallback or uvx fallback mentions ==="
rg -i 'fallback|uvx.*fallback' nowledge-mem-gemini-cli-extension/README.md || echo "(no fallback mentions)"

echo
echo "=== Check if there's a package.json or setup script that manages nmem availability ==="
cat nowledge-mem-gemini-cli-extension/package.json 2>/dev/null | head -50

echo
echo "=== Search for any npm scripts related to nmem setup ==="
rg -n 'nmem|uvx' nowledge-mem-gemini-cli-extension/package.json 2>/dev/null || echo "(no matches)"

echo
echo "=== Check if hooks are wrapped or imported through another module ==="
head -20 nowledge-mem-gemini-cli-extension/hooks/session-start.mjs
head -20 nowledge-mem-gemini-cli-extension/hooks/session-end.mjs

Repository: nowledge-co/community

Length of output: 2088


Option 2 (uvx) in the install section should not be presented as an equivalent persistent installation method.

Lines 30–32 show uvx --from nmem-cli nmem --version as an installation option, but this is a transient execution only. The extension's lifecycle hooks (session-start.mjs, session-end.mjs) and commands invoke nmem directly via spawnSync('nmem', ...) and expect the binary to be available on PATH. A plain uvx invocation does not satisfy this requirement—users following this path will encounter failures when hooks attempt to access Working Memory at session start, save the session at session end, or run commands like /nowledge:status.

Suggested fix
-# Option 2: uvx
+# Option 2: uvx (temporary, not suitable for extension use)
 curl -LsSf https://astral.sh/uv/install.sh | sh
-uvx --from nmem-cli nmem --version
+# For this extension, use Option 1 or the desktop app CLI installer instead.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@nowledge-mem-gemini-cli-extension/README.md` around lines 24 - 33, The
README's "Option 2: uvx" presents a transient execution as an equivalent install
but the extension expects a persistent nmem binary on PATH (see
session-start.mjs, session-end.mjs and their spawnSync('nmem', ...) calls and
commands like /nowledge:status); update the README to remove or clearly mark the
uvx example as a transient execution only and add explicit instructions advising
users to use a real install (e.g., pip install nmem-cli or adding nmem to PATH)
so hooks and commands that call nmem directly will work reliably.

Comment on lines +17 to +21
## Install

```bash
gemini extensions install github.com/nowledge-co/community --ref nowledge-mem-gemini-cli-extension-v0.1.1
```
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🌐 Web query:

Does gemini extensions install github.com// --ref install from the repository root only, or can it discovergemini-extension.json from a subdirectory in a monorepo?

💡 Result:

gemini extensions install github.com/<owner>/<repo> --ref <tag> installs from the repository root (it effectively clones/copies the repo into ~/.gemini/extensions/<repo> and then validates that gemini-extension.json exists in that installed root directory). It does not scan a monorepo to discover a gemini-extension.json in a subdirectory. [1], [2]

Workarounds:

  • Clone the repo yourself and install from the specific subdirectory via a local path (gemini extensions install <path-to-subdir>). [1]
  • Split the extension into its own repo / release artifact that has gemini-extension.json at the root. [1], [2]

Sources
[1] Gemini CLI docs – Extension reference (install + “manifest must be in its root directory”) (geminicli.com)
[2] google-gemini/gemini-cli Issue #5990 – install command acceptance criteria (clone repo; verify gemini-extension.json) (github.com)

Citations:


Fix the install command to target the correct root.

gemini extensions install github.com/nowledge-co/community --ref ... points Gemini CLI at the repository root, but the extension manifest lives in nowledge-mem-gemini-cli-extension/. Gemini's install command looks for gemini-extension.json only at the repo root—it does not discover manifests in subdirectories—so users following these notes will encounter a failure.

Either:

  • Point the install command at a release asset that has the extension at its root, or
  • Use gemini extensions link <local-path>/nowledge-mem-gemini-cli-extension until a standalone install target exists.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@nowledge-mem-gemini-cli-extension/release-notes/0.1.1.md` around lines 17 -
21, The install command in the release notes currently points Gemini at the repo
root using the string "gemini extensions install
github.com/nowledge-co/community --ref nowledge-mem-gemini-cli-extension-v0.1.1"
which fails because the manifest lives in the nowledge-mem-gemini-cli-extension
subdirectory; update the release note to either (A) instruct users to install
via a release asset that places the extension (and gemini-extension.json) at the
archive root, or (B) instruct users to use the local-link workflow by running
"gemini extensions link" pointing at the nowledge-mem-gemini-cli-extension
directory until a standalone install target exists, and remove or replace the
current repo-root install line in the file.

Comment on lines +34 to +47
Add to your tool's MCP settings:

```json
{
"mcpServers": {
"nowledge-mem": {
"url": "http://localhost:14242/mcp",
"type": "streamableHttp",
"headers": {
"APP": "<MCP Client App Name here>"
}
"type": "streamableHttp"
}
}
}
```

##### Access Your Knowledge

- **⌨️ Global Launcher (⌘⇧K)** | Paste memories anywhere without leaving your workflow
- **🔍 In-App Search** | Deep exploration with semantic search, keywords, and graph navigation
- **🤝 Agent Access** | Let AI agents autonomously search and save memories during tasks

##### Advanced Features

- **🕸️ Knowledge Graph** | Auto-extract entities and relationships for semantic connections
- **🎯 Topic Clustering** | Discover natural groupings and expertise areas via graph algorithms
- **📊 Visual Exploration** | Interactive graph view to explore connections and patterns

### Open Knowledge Layer

[![GitHub](https://img.shields.io/badge/GitHub-OpenKL-181717?style=flat&logo=github&logoColor=white)](https://github.com/nowledge-co/OpenKL)

An **open-source unified protocol** enabling AI agents to access and interact with personal knowledge. **CLI-ready** and file-based for seamless integration with tools like Claude Code. Built for broader knowledge management: massive document collections, entire codebases, and external knowledge corpus with automatic chunking.
See [mcp.json](mcp.json) for the reference config.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "README direct-MCP snippet:"
sed -n '32,48p' README.md

echo
echo "Reference mcp.json:"
fd '^mcp\.json$' -x sed -n '1,160p' {}

Repository: nowledge-co/community

Length of output: 594


Inline README example is missing required authentication headers.

The nowledge-mem server configuration in the README (lines 34–47) omits the headers object present in mcp.json. Users copying this snippet will be missing the required APP and X-NMEM-API-Key fields, causing MCP setup to fail.

Add the headers field to the inline example:

{
  "mcpServers": {
    "nowledge-mem": {
      "url": "http://localhost:14242/mcp",
      "type": "streamableHttp",
      "headers": {
        "APP": "<MCP Client App Name here>",
        "X-NMEM-API-Key": "nmem_..."
      }
    }
  }
}

Alternatively, remove the JSON snippet and direct readers to copy from mcp.json directly.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@README.md` around lines 34 - 47, Update the inline JSON example under
mcpServers/nowledge-mem in README.md to include the required headers object so
users have the APP and X-NMEM-API-Key fields (i.e., add headers with "APP":
"<MCP Client App Name here>" and "X-NMEM-API-Key": "nmem_..."), or remove the
snippet and add a sentence pointing readers to copy the full configuration from
mcp.json; edit the README entry that defines mcpServers -> nowledge-mem to
reflect this change.

@wey-gu wey-gu merged commit f6497ce into main Mar 9, 2026
2 checks passed
@wey-gu wey-gu deleted the dev_067 branch March 9, 2026 16:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant