Conversation
Add a repo-idiomatic RAG-Critic generation pipeline with a critic, planner, and bounded corrective action loop over the existing retrieval/generation abstractions. The test suite locks helper parsing, multi-action execution, and service-layer persistence so the new pipeline behaves like the rest of the framework. Constraint: Must fit the existing BaseGenerationPipeline + retrieval composition model Constraint: No new dependencies or config-system changes Rejected: Full paper-faithful multi-agent reproduction | too broad for the current pipeline abstraction and issue scope Rejected: A single-pass critic-only wrapper | would miss the paper's planner/executor correction loop Confidence: medium Scope-risk: moderate Reversibility: clean Directive: Keep planner/critic outputs JSON-based unless you also harden parser fallbacks and tests together Tested: uv run pytest -q tests/autorag_research/pipelines/generation/test_rag_critic_pipeline.py; make check; uv build; make test-only Not-tested: Live LLM compliance with the structured prompt contracts
|
I ran the Findings
Real Result
Recommendation: REQUEST CHANGES |
The review surfaced two user-facing gaps in the new generation pipeline:
config-driven discovery could not find RAG-Critic because the repo lacked
its generation YAML, and planner/decomposition JSON parsing crashed when an
LLM returned a valid top-level array instead of an object. This adds the
repo-standard config entry, hardens both callers to accept list-or-dict
payloads, and locks the behavior with focused regressions.
Constraint: Pipeline discovery is driven only by YAML files under configs/pipelines/generation
Constraint: LLM JSON responses may legitimately be top-level arrays
Rejected: Restrict _parse_json_payload to objects only | would reject valid planner/decomposition outputs already tolerated by the parser
Confidence: high
Scope-risk: narrow
Reversibility: clean
Directive: Keep rag_critic.yaml aligned with RAGCriticPipelineConfig whenever config fields change
Tested: uv run pytest tests/autorag_research/cli/test_utils.py tests/autorag_research/pipelines/generation/test_rag_critic_pipeline.py -q
Tested: uv run python snippet verifying discover_pipelines('generation') includes rag_critic and planner array payload no longer crashes
Tested: make check
Tested: uv build
Tested: make clean-docker && make docker-up && make docker-wait && make test-only
Not-tested: Manual CLI invocation of a generation run using the new rag_critic config
|
Addressed the review feedback on
Verification:
|
|
Round 2 review on I re-checked the two blocking issues from round 1 and did not find any additional blocking problems in the current patch.
Real Result
Recommendation: APPROVE |
… simple The approved feature branch already fixed discoverability and top-level array parsing, but the unit tests still exercised DB-backed pipeline construction for helper-level behavior. This cleanup keeps the tested behavior intact while extracting shared payload-list normalization in the pipeline and switching helper-focused unit tests to a lightweight non-persistent fixture, which makes the verification path more robust and faster without changing the feature contract. Constraint: Must preserve the approved feature behavior and repo verification flow Rejected: Leave helper tests DB-backed | unnecessary coupling to container state for unit-only behavior Confidence: high Scope-risk: narrow Reversibility: clean Directive: Keep helper-focused RAG-Critic tests persistence-free unless the behavior under test requires repository integration Tested: uv run pytest tests/autorag_research/pipelines/generation/test_rag_critic_pipeline.py -q; make check; uv build; make clean-docker && make docker-up && make docker-wait && make test-only; architect verification Not-tested: Additional LLM prompt-quality scenarios beyond the existing regression coverage
|
Follow-up on
Verification:
|
|
Round 3 review on I re-checked the approved discoverability + array-payload fix set after the follow-up simplification in
Real Result
Recommendation: APPROVE |
The approved RAG-Critic follow-up already covered config discovery and list-shaped planner/decomposition payloads, but a singleton recommended_actions string still reached fallback paths and expanded into per-character actions. Normalize string-or-list action metadata before storing critique state or building planner fallbacks, and lock the fix with focused regressions. Constraint: LLM structured outputs may return singleton strings even when prompts ask for lists Rejected: Coerce strings only in planner fallback | would leave critique metadata inconsistent for audit traces Confidence: high Scope-risk: narrow Reversibility: clean Directive: Keep critique/planner payload normalization centralized so fallback action handling stays consistent with stored metadata Tested: uv run pytest -q tests/autorag_research/pipelines/generation/test_rag_critic_pipeline.py -k 'string_recommended_actions'; uv run pytest -q tests/autorag_research/cli/test_utils.py tests/autorag_research/pipelines/generation/test_rag_critic_pipeline.py; uv run python - <<'PY' ... singleton recommended_actions repro ... PY; make check; uv build; make clean-docker && make docker-up && make docker-wait && make test-only Not-tested: Malformed non-string/non-list recommended_actions payloads beyond existing fallback behavior
|
Follow-up landed on
|
|
REJECT — the RAG-Critic changes test cleanly, but PR #334 is currently not mergeable against Short reason Real Result
Concrete evidence Once the |
Resolve conflict in pipelines/generation/__init__.py by keeping both RAGCriticPipeline and SelfRAGPipeline imports.
This moves the local pipeline beyond a simple JSON action loop by adding an official-style planner/executor path, published critic-checkpoint integration, and stricter adapters around the public RAG-Critic outputs. The goal is to reduce the gap with the paper and official repository while keeping the implementation compatible with AutoRAG-Research's existing generation/retrieval abstractions and safety expectations. Constraint: Must fit AutoRAG-Research BaseGenerationPipeline and existing retrieval wrappers Constraint: Must not add new dependencies or require FlashRAG runtime integration Rejected: Direct exec of planner-generated Python | too unsafe for repository integration Rejected: Full FlashRAG port | too invasive for this PR and incompatible with existing pipeline abstractions Confidence: medium Scope-risk: moderate Reversibility: clean Directive: Keep official-style python_agent execution on the restricted AST path unless a broader sandbox model is introduced Tested: uv run pytest tests/autorag_research/pipelines/generation/test_rag_critic_pipeline.py -q -k 'not integration'; uv run pytest tests/autorag_research/cli/test_utils.py tests/autorag_research/test_injection.py tests/autorag_research/pipelines/generation/test_rag_critic_pipeline.py -q -k 'not integration'; uv run ruff check autorag_research/pipelines/generation/rag_critic.py tests/autorag_research/pipelines/generation/test_rag_critic_pipeline.py Not-tested: End-to-end vLLM smoke test against dongguanting/RAG-Critic-3B; full make check remains blocked by existing deptry noise under .omx/review-worktrees
|
Implemented a closer-to-official RAG-Critic integration and pushed it on in commit . What changed:
Validation:
Remaining gap vs official repo:
|
|
Implemented a closer-to-official RAG-Critic integration and pushed it on branch feature/#184 in commit 0050f44. What changed:
Validation:
Remaining gap vs official repo:
|
The local .omx directory is workspace/runtime state and should not affect version control or dependency scanning. This adds .omx to gitignore and deptry excludes so local orchestration artifacts stop polluting status output and project-level dependency checks. Constraint: Must avoid changing unrelated tooling behavior Rejected: Leave .omx visible and clean it manually | noisy and brittle across sessions Confidence: high Scope-risk: narrow Reversibility: clean Directive: Keep transient OMX state excluded unless a specific .omx artifact is intentionally promoted into version control Tested: git status --short; uv run deptry . Not-tested: Full make check after this commit
Summary
RAGCriticPipelineand config that model a critic → planner → executor refinement loop over the existing generation pipeline abstractionVerification
make checkuv buildmake test-only