-
Notifications
You must be signed in to change notification settings - Fork 23
Description
(from this bluesky thread)
IMO, it would be useful to have some kind of manual/human review checkpoint system.
For each requirement, ask a reviewer:
- Does this requirement make sense? (or any other quality checklist criteria)
- Then, for each
impltag linked to that requirement, ask "does this tagged block actually and correctly implement a part of the associated requirement?" - Then for ALL
impltags of a requirement, ask "do these impl blocks FULLY implement the associated requirement?" - Then for each
verifyblock, ask "does this tagged block actually and correctly verify a part of the associated requirement?" - Then for ALL
verifyblocks, ask "do these verify blocks FULLY verify the associated requirement?"
When the manual review "signs off" on this chain, mark them as "reviewed". If any of the three aspects change, invalidate the chain.
I think the web interface could be used to drive this in a PR-like manner, or as an interactive step-by-step process.
My intended use for this would be to have a human-in-the-loop, letting the agent write/expand requirements, implement code, and write verification tests, but to periodically "QA" the process, and track when local changes potentially have larger ripple effects that need to be updated.
I think this could even be fed-forward into the agent (if it's not already by the MCP?), "if you touch this code, make sure you put effort into resolving any changes in requirements or tests".
I personally think the agent shouldn't be doing the manual review step, but I think it would also be okay to have that as a configurable option for people who just want to use the metadata but still work "open loop".
CC @eikopf