fix: prevent E2BIG on kernel spawn + add install_packages tool for missing dependencies#50
Open
hazelian0619 wants to merge 2 commits intoaristoteleo:mainfrom
Open
Conversation
Collaborator
|
It would be best to submit these two issues as separate PRs for easier maintenance and testing.
|
Collaborator
|
@hazelian0619 We can't merge your PR right now because you've kept a lot of the old code from when you cloned the repository and reverted changes made to the main branch. This constitutes a rollback-style PR, which is very risky. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Two related issues that caused bioinformatics cases (e.g. PBMC4k) to fail before any code could run:
`[Errno 7] Argument list too long` (E2BIG) on kernel spawn — `PANTHEON_CONTEXT` was included in the kernel spawn environment, inflating `execve()` argument size. On macOS the combined args + env is capped at ~1 MB (`ARG_MAX`). During a long agent session, `context_variables` accumulates and `PANTHEON_CONTEXT` easily exceeds this limit, preventing the kernel from starting at all.
`ModuleNotFoundError` for domain-specific packages — The notebook kernel had no mechanism to install packages such as `scanpy` or `anndata` before executing bioinformatics cells, causing immediate import failures.
Changes
No other files modified.
Fix 1 — E2BIG (`jupyter_kernel.py`)
`PANTHEON_CONTEXT` is already re-injected into the kernel after startup via `_context_prefix_code()`, which base64-encodes it and executes it as a silent cell. Keeping it in the spawn environment is therefore redundant and dangerous.
Measured impact (50 KB context simulation):
The kernel still receives the full context — only the delivery path changes (code injection instead of env var).
Fix 2 — missing packages (`integrated_notebook.py`)
New `install_packages` tool on `IntegratedNotebookToolSet`:
Behaviour:
Agents call this once at the top of a bioinformatics workflow before executing domain-specific cells.
Test results (integration-tested locally, Python 3.12)
Fix 1 — E2BIG:
Fix 2 — install_packages:
Non-goals
🤖 Generated with Claude Code