Skip to content

feat: add RLM (Recursive Language Model) module for REPL-based context exploration#88

Open
JamesHWade wants to merge 2 commits intomainfrom
feature/rlm-module
Open

feat: add RLM (Recursive Language Model) module for REPL-based context exploration#88
JamesHWade wants to merge 2 commits intomainfrom
feature/rlm-module

Conversation

@JamesHWade
Copy link
Owner

Summary

This PR implements the RLM (Recursive Language Model) module for dsprrr, bringing DSPy 3.1.2's RLM capability to R. RLM transforms context from "input" to "environment", enabling LLMs to programmatically explore large contexts through a REPL interface.

Key features:

  • rlm_module() factory function for creating RLM modules
  • SUBMIT() termination mechanism for returning final answers
  • peek() and search() tools for context exploration
  • Support for recursive LLM calls via sub_lm parameter
  • Custom tools injection for domain-specific operations
  • Fallback extraction when max_iterations reached
  • Full integration with module() factory via type = "rlm"

Key Innovation: Instead of llm(prompt, context=huge_document), RLM stores context as R variables that the LLM can peek, slice, search, and recursively query.

Files Changed

  • R/module-rlm.R - Main RLMModule R6 class and rlm_module() factory
  • R/rlm-tools.R - RLM prelude generator with SUBMIT, peek, search tools
  • R/module.R - Updated to support type = "rlm"
  • tests/testthat/test-module-rlm.R - 76 comprehensive unit tests
  • _pkgdown.yml - Added documentation entries

Test plan

  • All 76 RLM-specific tests pass
  • All 2673 package tests pass
  • R CMD check passes (0 errors, 0 warnings)
  • pkgdown site builds successfully
  • Documentation complete with examples

🤖 Generated with Claude Code

…t exploration

RLM transforms context from "input" to "environment", enabling LLMs to
programmatically explore large contexts through a REPL interface rather
than embedding them in prompts.

Key features:
- rlm_module() factory function for creating RLM modules
- SUBMIT() termination mechanism for returning final answers
- peek() and search() tools for context exploration
- Support for recursive LLM calls via sub_lm parameter
- Custom tools injection for domain-specific operations
- Fallback extraction when max_iterations reached
- Full integration with module() factory via type = "rlm"

Based on DSPy 3.1.2's RLM module (PR #9193).

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: fe94d53a1a

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment on lines +729 to +731
query <- request$query
context_slice <- request$context

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Handle rlm_query_batch requests in process_rlm_query

The prelude’s rlm_query_batch() returns an rlm_query_request with queries/slices and batch = TRUE (see R/rlm-tools.R around lines 120–133), but process_rlm_query() only reads request$query and request$context (lines below). For batch requests those fields are NULL, so the prompt becomes NULL and sub_lm$chat(NULL) will error or produce garbage, meaning rlm_query_batch() is unusable despite being advertised in the prompt/docs. You likely need to branch on request$batch and iterate or combine queries into a single prompt, and enforce the call limit accordingly.

Useful? React with 👍 / 👎.

Critical fixes:
- Fix silent failure in fallback extraction (log warning, return structured error)
- Fix silent swallowing of recursive query errors (return structured result)
- Implement rlm_query_batch handling in process_rlm_query()

Important fixes:
- Add validation for runner result structure
- Add NULL checks in format_execution_output()
- Fix unused context variable (was building but passing inputs)
- Add bounds validation for max_iterations (>= 1) and max_llm_calls (>= 0)
- Validate all tools are functions in rlm_module()

Improvements:
- Use extract_rlm_final() helper instead of inline duplication
- Use is_rlm_final() and is_rlm_query_request() helper functions
- Use setNames() for cleaner output building
- Fix awkward line break in duration calculation
- Add warning when max_iterations exhausted (previously only logged in verbose)
- Update documentation: clarify tools parameter, fix "Placeholder" wording

New tests:
- Test tools validation (non-function values)
- Test max_iterations bounds validation
- Test max_llm_calls bounds validation
- Test fallback warning message
- Test LLM response validation (missing/invalid code)
- Test rlm_query_batch marker generation
- Test rlm_query_batch validation

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@claude
Copy link

claude bot commented Jan 24, 2026

Code Review

I found 1 issue that needs attention:

Missing cache handling for stateful mock LLM

File: tests/testthat/test-module-rlm.R (lines 1555-1578)

The create_mock_rlm_llm helper creates a stateful mock that maintains call_count and returns different responses based on call order. According to CLAUDE.md lines 372-375, tests using stateful mocks must disable caching:

IMPORTANT: dsprrr automatically caches LLM responses to speed up development. This can cause test failures when tests expect different responses across multiple calls with the same prompt.

Disable caching when:

  1. Tests use stateful mock LLMs - Mock returns different values based on call count

Problem: All tests using this mock lack cache handling (no .cache = FALSE or local_reset_cache()). If .dsprrr_cache/ contains entries from previous runs, the mock's call counter will be bypassed, causing tests to fail.

Affected tests include:

  • "RLMModule supports multiple iterations" (expects 2 different responses)
  • "RLMModule respects max_iterations" (expects 3 different responses)
  • "RLMModule handles code execution errors" (expects error then success)

Recommended fix: Add .cache = FALSE to all forward() calls using this mock:

result <- rlm$forward(
  list(question = "test"),
  .llm = mock_llm,
  .cache = FALSE  # Disable cache for stateful mock
)

Or add local_reset_cache() at the start of each test using this helper.

See CLAUDE.md example (lines 416-448) for the complete pattern.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant