Implement configurable Karl context compression strategy (heuristic + LLM summarizer)#160
Implement configurable Karl context compression strategy (heuristic + LLM summarizer)#160
Conversation
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 061d84d55f
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| stream = self._llm_core.stream_chat( | ||
| model_name=self._model_name, | ||
| messages=prompt_messages, | ||
| options={"num_predict": self._summary_max_tokens}, |
There was a problem hiding this comment.
Preserve persona context options in Karl summary calls
The Karl summarization request is sent with options={"num_predict": ...} only, so persona-specific options like num_ctx are dropped for this call even though context compression is triggered specifically in near-limit scenarios. In setups where personas run with a larger num_ctx than the backend default, summarization can fail or truncate before compression happens, causing KarlSummarizationError (and a hard failure when no fallback is configured). Pass through relevant persona options (at least num_ctx) when building the summary request options.
Useful? React with 👍 / 👎.
| def _append_log_entry( | ||
| self, summarized_count: int, summary_length: int, model_name: str | ||
| ) -> None: | ||
| Path(self._log_dir).mkdir(parents=True, exist_ok=True) |
There was a problem hiding this comment.
Guard Karl log writes from breaking successful summaries
After a summary is successfully produced, _append_log_entry performs filesystem writes without error handling; if log_dir cannot be created or appended to (e.g., read-only path, permission issues, disk full), the turn raises and fails despite having a valid summary. Because this logging is non-critical telemetry, these I/O failures should be caught and downgraded (e.g., warning) so context compression still completes.
Useful? React with 👍 / 👎.
Motivation
karl_prepare_quick_and_dirtytrimming and allow an LLM-based summarization strategy while keeping the original heuristic available.config.yamlwith no hidden defaults or silent fallbacks.Description
context_managementsection toconfig.yamlwithstrategy: "heuristic" | "karl"and a requiredkarlsub-section containingmodel,summary_max_tokens,keep_last_messages, andlog_dir.src/core/context_summarizer.pyintroducingKarlSummarizer(pure service) andKarlSummarizationError, which accept an injectedllm_coreand raw config and implementsummarize(messages: list[dict]) -> list[dict]returning one system summary plus the tail without mutating the original list.WebUIandTerminalUIso thatheuristicpreserves existing trimming andkarlinvokesKarlSummarizer; LLM failures raiseKarlSummarizationErrorand only fall back to heuristic whenfallback_strategy: heuristicis explicitly configured.logs/karl_YYYY-MM-DD.logcontaining timestamp, number of messages summarized, summary character length, and model used, without altering global logging configuration.Testing
tests/test_context_summarizer.py(validates reduction, tail preservation, no-mutation, model/options, and log creation) and updatedtests/test_web_ui.py/tests/test_terminal_ui.pyto includecontext_managementand to assert strategy switching and non-instantiation of Karl underheuristic.pytest -q tests/test_context_summarizer.py tests/test_web_ui.py tests/test_terminal_ui.pyand all tests passed (22 passed).Codex Task