Fix embedding edge cases: dynamic batch sizing and token limit fallback by drewburchfield · Pull Request #7 · drewburchfield/obsidian-graph

drewburchfield · 2026-04-03T17:53:21Z

Closes NAS-989
Closes NAS-993

Summary

Replace hardcoded batch_size=60 with dynamic calculation based on chunk density (3 chars/token, 28k token budget)
Add fallback: if whole-note embed fails on token limit, auto-retry with chunking
Don't retry deterministic token limit errors in _call_api_with_retry
Extract shared _is_token_limit_error() helper for consistent detection across call sites
Update stale docs (TOOLS.md limitations, vector_store.py comment)

Root cause

Two edge cases in embed_with_chunks():

Sending 60 chunks to contextualized_embed pushes against Voyage's 32k token limit when content is dense
Borderline notes estimated under 30k tokens but actually over 32k had no fallback to chunking

Test plan

All security tests pass locally
Docker image builds and imports work
Server initializes successfully
Quality gate review: 1 important finding (inconsistent token detection) remediated with shared helper
Braintrust consultation (Claude + Gemini): aligned on dynamic batch sizing and catch-and-retry approach

- Replace hardcoded batch_size=60 with dynamic calculation based on chunk density (3 chars/token estimate, 28k token budget) - Add fallback: if whole-note embed fails on token limit, retry with chunking - Don't retry deterministic token limit errors in _call_api_with_retry - Extract shared _is_token_limit_error() helper for consistent detection - Update stale docs: TOOLS.md limitations, vector_store.py comment Closes NAS-989, NAS-993

devin-ai-integration

✅ Devin Review: No Issues Found

Devin Review analyzed this PR and found no potential bugs to report.

View in Devin Review to see 3 additional findings.

If a chunk batch exceeds the token limit, halve the batch size and retry the same position. Keeps halving until it fits or batch_size reaches 1. This handles dense content where the 3 chars/token estimate is still too generous.

devin-ai-integration

Devin Review found 1 new potential issue.

View 5 additional findings in Devin Review.

devin-ai-integration · 2026-04-03T18:09:02Z

src/embedder.py


-                logger.debug(f"Embedded chunks {i + 1}-{i + len(chunk_batch)} of {len(chunks)}")
+                except EmbeddingError as e:
+                    if _is_token_limit_error(e) and batch_size > 1:


🟡 Batch retry checks batch_size > 1 instead of len(chunk_batch) > 1, causing wasteful identical API calls

When the remaining chunks at position i are fewer than batch_size, reducing batch_size does not change the actual chunk_batch sent to the API. The loop at src/embedder.py:229 checks batch_size > 1 to decide whether to retry with a smaller batch, but the actual batch sent is chunks[i : i + batch_size] (src/embedder.py:207), which is bounded by the remaining chunks. This means the code can make multiple identical failing API calls (each with a rate-limit sleep at src/embedder.py:210) before batch_size shrinks below the actual remaining chunk count.

Example scenario: 5 remaining chunks with batch_size=42

batch_size=42 → chunk_batch = 5 chunks → API fails (token limit)

batch_size=21 → chunk_batch = 5 chunks → same request, fails again

batch_size=10 → chunk_batch = 5 chunks → same request, fails again

batch_size=5 → chunk_batch = 5 chunks → same request, fails again

batch_size=2 → chunk_batch = 2 chunks → different request, may succeed

That's 4 wasted API calls with rate-limit delays.

Suggested change

if _is_token_limit_error(e) and batch_size > 1:

if _is_token_limit_error(e) and len(chunk_batch) > 1:

Was this helpful? React with 👍 or 👎 to provide feedback.

devin-ai-integration bot reviewed Apr 3, 2026

View reviewed changes

drewburchfield merged commit c0202e5 into master Apr 3, 2026
4 checks passed

drewburchfield deleted the fix/chunking-token-limits branch April 3, 2026 18:06

devin-ai-integration bot reviewed Apr 3, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix embedding edge cases: dynamic batch sizing and token limit fallback#7

Fix embedding edge cases: dynamic batch sizing and token limit fallback#7
drewburchfield merged 2 commits intomasterfrom
fix/chunking-token-limits

drewburchfield commented Apr 3, 2026 •

edited by devin-ai-integration bot

Loading

Uh oh!

devin-ai-integration bot left a comment

Uh oh!

Uh oh!

devin-ai-integration bot left a comment

Uh oh!

devin-ai-integration bot Apr 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

	if _is_token_limit_error(e) and batch_size > 1:
	if _is_token_limit_error(e) and len(chunk_batch) > 1:

Conversation

drewburchfield commented Apr 3, 2026 • edited by devin-ai-integration bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Root cause

Test plan

Uh oh!

devin-ai-integration bot left a comment

Choose a reason for hiding this comment

✅ Devin Review: No Issues Found

Uh oh!

Uh oh!

devin-ai-integration bot left a comment

Choose a reason for hiding this comment

Uh oh!

devin-ai-integration bot Apr 3, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

drewburchfield commented Apr 3, 2026 •

edited by devin-ai-integration bot

Loading