-
Notifications
You must be signed in to change notification settings - Fork 0
Work/rlm c6s approval UI #4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Story of Collaboration: User requested a dedicated workspace for the agent to live in, noticing that the agent was overwriting the core codebase (specifically 'src/app/page.tsx') during demos. We decided to pivot from a 'Full Repo' mode to a 'Strict Sandbox' mode to ensure safety and focus. Decisions Made: - Created ADR 003 to formalize Sandbox Enforcement. - Implemented SANDBOX_ROOT pointing to './workspace/'. - Modified both Frontend File API and Backend ToolRuntime to hardcode this root. - Integrated directory auto-creation in the API to ensure the sandbox exists. Challenges: - Had to restore 'src/app/page.tsx' after an autonomous overwrite incident. - Balanced the need for 'Self-Improvement' vs 'Safety' - opted for Safety as the default for now.
Story of Collaboration: To close the feedback loop for the user, we needed a way to visualize the HTML/CSS created by the agent. We implemented a Live Preview panel. Decisions Made: - Created a specialized 'Sandbox File Server' API route to serve files from './workspace'. - Implemented a 3-pane layout (Tree | Editor | Preview). - Added refresh controls to the Preview panel. Challenges: - Next.js 16 Route Handlers required awaiting 'params'. - Ensured strict path traversal security in the file server.
Story of Collaboration: User reported that refreshing the page lost chat history and the agent was unaware of manual file edits. We implemented a robust persistence and sync layer. Decisions Made: - Updated 'GET /context' API to return full text for hydration. - Updated 'POST /files' API to inject 'doc_chunk' entries into SQLite, ensuring the agent sees manual edits immediately. - Enhanced 'ContextStore.listEntries' to support comma-separated type filters for efficient querying. Challenges: - Discovered that the Context List API was stripping the 'text' field, breaking frontend hydration.
…y, sync precision)
…ld, lint, and tests)
- Refactored ROOT_SYSTEM_PROMPT to prioritize reading TODO.md FIRST. - Forbid redundant discovery tools (context_stats, search_terms) in ACT phase. - Added engine resumption notice when TODO.md exists. - Verified fix with repro script (agent picks up existing checklist immediately).
- Refined gating logic in RLMEngine to allow 'skipGating' flag. - Propagated 'skipGating' from UI 'Approve & Execute' button to engine loop. - Hardened ROOT_SYSTEM_PROMPT with clear 'PLANNING' and 'EXECUTION' modes. - Mandated aggressive momentum and TODO.md sync in the system prompt. - Fixed ToolRuntime.write_file to return diff stats for accurate gating.
…e enforcement - Implemented Dynamic System Prompting (DSP) to swap instructions based on project state (PLANNING vs EXECUTION). - Enforced strict project vault isolation via Cryptographic Context Scoping. - Added mid-project pivot detection and forced mode switching. - Purged historical Todo App bias from context store. - Fixed gating logic to only pause on new planning assets.
- Add embeddedPlan field to Message type for inline plan/spec display - Create PlanCard component with Monaco editor, annotations, planning mode - Integrate PlanCard with MessageBubble for in-chat viewing - Add success/warning button variants and Card padding support
- Added 'tasks' table to SQLite for project-locked state management. - Implemented 'task_initialize', 'task_get_next', and 'task_complete' native tools. - Refactored RLMEngine to inject the active mission from DB into system prompts. - Added bidirectional sync between Task Registry and workspace/TODO.md. - Enforced strict type safety for database row results (removed implicit any).
- Added GET/PATCH /api/sessions/[sessionId]/tasks for hard state management. - Hardened OpenRouterClient to prevent choice[0] undefined errors. - Propagated skipGating flag through Chat API handlers. - Refactored ApiClient to include task registry operations.
- Refactored RLMEngine to intercept read_file('TODO.md') and return SQLite state.
- Blocked manual writes to TODO.md to enforce Task Registry as the only source of truth.
- Updated system prompts to mandate task_* tools and forbid direct file edits.
- Added get_all_tasks tool for UI sync.
- Verified fix with automated test script.
- Created /api/sandbox catch-all route for workspace asset resolution. - Implemented whitelisted run_shell tool with output capture. - Refactored ProjectHub UI to consume the Task Registry API directly. - Hardened preview environment with security headers and projectId scoping.
|
Important Review skippedDraft detected. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
Looks like there are a few issues preventing this PR from being merged!
If you'd like me to help, just leave a comment, like Feel free to include any additional details that might help me get this PR into a better state. You can manage your notification settings |
qodo-free-for-open-source-projects
bot
commented
Jan 26, 2026
•
edited by qodo-code-review
bot
Loading
edited by qodo-code-review
bot
CI Feedback 🧐(Feedback updated until commit fc972e6)A test triggered by this PR failed. Here is an AI-generated analysis of the failure:
|
|
@openhands-agent review this for prompting and tooloing logic |
|
@OpenHands review this for prompting and tooloing logic |
Summary of ChangesHello @zenchantlive, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly advances the RLM Chat System by transforming it into a more robust and controllable agentic coding environment. It introduces comprehensive multi-project support, enforces a strict sandbox for all agent operations, and integrates a full-featured code editor and file explorer directly into the user interface. These changes are designed to improve agent reliability, provide users with greater transparency and control over the development process, and ensure system stability through hardened architectural patterns. Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
|
I'm on it! zenchantlive can track my progress at all-hands.dev |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces a significant set of features, transforming the application into a more robust and capable agentic development environment. The introduction of a dedicated workspace, project management, a task registry, and an approval gate UI are major steps forward. The code quality is generally high, with good use of modern React patterns and a clear focus on security through sandboxing. I've identified a critical bug in the tool dispatch logic and a high-risk database migration strategy that need attention. Additionally, there are a few opportunities to improve consistency and encapsulation.
| case 'read_file': | ||
| return toolRuntime.read_file( | ||
| sessionId, | ||
| args.path as string | ||
| ); | ||
|
|
||
| case 'write_file': | ||
| return toolRuntime.write_file( | ||
| sessionId, | ||
| args.path as string, | ||
| args.content as string | ||
| ); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are duplicate switch cases for read_file and write_file in the dispatchToolCall method. The first set of cases (lines 601-631) contains special logic for TODO.md, while this second set are generic implementations. Due to the nature of a switch statement, this block of code is unreachable and should be removed to avoid confusion and future maintenance issues.
| ); | ||
|
|
||
| -- Data is not preserved per user request | ||
| DROP TABLE sessions; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This migration is destructive as it uses DROP TABLE (also on lines 54 and 83) which will result in the loss of all existing data in the sessions, context_entries, and run_traces tables. While the comment "Data is not preserved per user request" suggests this might be intentional for a development environment, this is a high-risk approach.
For any environment with valuable data, a non-destructive migration using ALTER TABLE to add the project_id column and then backfilling the data would be required. Please confirm that this data loss is acceptable for all target environments before merging.
|
|
||
| // Note: Using raw DB for a quick specialized sync without broad refactor | ||
|
|
||
| const db = this.contextStore.rawDb; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The syncTaskRegistryToDisk method directly accesses the raw database connection via this.contextStore.rawDb. This breaks the encapsulation of the ContextStore, which should ideally be the sole manager of database interactions.
To maintain a clean separation of concerns, consider adding a dedicated method to ContextStore, such as getTaskSummaries(projectId: string), that performs this query and returns the structured data. This would keep all SQL logic within the db layer.
| // Store a DIFF summary instead of full content to prevent DB bloat | ||
| const diffSummary = `USER EDITED FILE: ${path}\n` + | ||
| `- Change: ${content.length - oldContent.length > 0 ? '+' : ''}${content.length - oldContent.length} chars`; | ||
|
|
||
| await contextStore.addEntry({ | ||
| sessionId, | ||
| projectId: session.projectId, | ||
| type: 'system_meta', | ||
| source: 'user_edit', | ||
| text: diffSummary, | ||
| meta: { | ||
| path, | ||
| operation: 'user_edit', | ||
| charLen: content.length, | ||
| tokenEst: Math.ceil(content.length / 4), | ||
| hash: crypto.createHash('sha256').update(content).digest('hex').slice(0, 16) | ||
| } | ||
| }); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When a user edits a file via the UI, this POST handler logs a system_meta entry with a simple character count difference. However, when the agent edits a file via the write_file tool, a much more informative unified diff is generated and stored as a file_diff entry. This creates an inconsistency in how file changes are audited.
For a more consistent and powerful audit trail, I recommend refactoring this block to use the DiffService (similar to how tool-runtime.ts does it) to generate a proper unified diff for user edits. The entry type should also be file_diff to align with agent-based edits.
Code Review: PR #4 - Work/rlm c6s approval-uiSummaryThis PR introduces significant tooling enhancements including a shell execution capability ( 🔴 Critical IssuesIssue: Command Injection Vulnerability in
|
🐢 Turtlez Pull Request
Type: [feat | fix | chore | docs | refactor]
Scope: [ui | backend | engine | infra]
Story of Collaboration
Decisions Made
Challenges
Verification Results
bun run lintpassestsc --noEmitpassesThis PR acts as a page in our project notebook.