Skip to content

feat: add rfc for checkpoints#513

Closed
vitalii-dynamiq wants to merge 1 commit intomainfrom
add-checkpoints
Closed

feat: add rfc for checkpoints#513
vitalii-dynamiq wants to merge 1 commit intomainfrom
add-checkpoints

Conversation

@vitalii-dynamiq
Copy link
Copy Markdown
Contributor

@vitalii-dynamiq vitalii-dynamiq commented Jan 7, 2026

Note

Low Risk
Documentation-only change with no runtime or library behavior modifications; risk is limited to potential design misinterpretation until implemented.

Overview
Adds a new documentation-only RFC (RFC-001) proposing opt-in checkpoint/resume for Dynamiq workflows, including HITL-aware PENDING_INPUT pausing and resume semantics.

The RFC is split into multiple detailed docs covering industry research, runtime/HITL integration and API/schema implications, node-by-node state requirements, proposed Pydantic data models/protocols, and pluggable persistence backends (file/SQLite/Redis/Postgres) with retention/cleanup guidance.

Written by Cursor Bugbot for commit 8f08787. This will update automatically on new commits. Configure here.

@vitalii-dynamiq vitalii-dynamiq requested a review from a team as a code owner January 7, 2026 12:28
cutoff = datetime.utcnow()
cutoff = cutoff.replace(
day=cutoff.day - older_than_days
)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Incorrect date arithmetic causes ValueError in cleanup

Medium Severity

The SQLite backend's cleanup method uses cutoff.replace(day=cutoff.day - older_than_days) to calculate a date cutoff, but datetime.replace() expects a valid day value (1-31). If the current day minus older_than_days results in zero or negative (e.g., January 5th with older_than_days=10 yields -5), Python raises ValueError: day is out of range for month. The correct approach is to use datetime.utcnow() - timedelta(days=older_than_days) for date arithmetic.

Fix in Cursor Fix in Web

"storage_id": file_id,
}
else:
result["files"][fname] = value
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wrong variable assigned in file serialization loop

Medium Severity

In _serialize_tool_output, when iterating over value.items() to get fname, fdata pairs, the else branch incorrectly assigns value (the entire files dictionary) instead of fdata (the individual file's data). This would cause each file entry to contain the entire files dictionary rather than its own content, corrupting the serialized output.

Fix in Cursor Fix in Web


# === CRITICAL: Set resume loop ===
# Agent._run_agent will start from this loop instead of 1
self._resume_from_loop = state.get("current_loop", 0)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agent resume starts from completed loop instead of next

Medium Severity

The RFC example code has an off-by-one error in agent loop resume logic. _resume_from_loop is set to state.get("current_loop", 0), but checkpoints are saved AFTER each loop iteration completes. When resuming, range(start_loop, max_loops + 1) re-executes the already-completed loop. The conversation history already contains that loop's messages (restored from checkpoint), so this would cause duplicate LLM calls and potential message duplication. The fix is state.get("current_loop", 0) + 1 to skip the completed loop.

Additional Locations (1)

Fix in Cursor Fix in Web

except (TypeError, ValueError):
# Large or non-serializable results: store truncated
if isinstance(value, str) and len(value) > 10000:
serialized[cache_key] = value[:10000] + "...[truncated]"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Non-serializable tool cache entries silently dropped during checkpoint

Low Severity

In _serialize_tool_cache, when a tool result value is not JSON-serializable and is not a large string, the entry is silently dropped with no fallback or warning. On resume, these missing cache entries would cause the corresponding tools to be re-executed unnecessarily, contradicting the RFC's goal of using the tool cache to "skip re-executing identical tool calls on resume." Non-serializable results (e.g., custom objects) are simply not added to serialized.

Fix in Cursor Fix in Web

@github-actions
Copy link
Copy Markdown

github-actions bot commented Jan 14, 2026

Coverage

Coverage Report •
FileStmtsMissCoverMissing
TOTAL23616738468% 
report-only-changed-files is enabled. No files were changed during this commit :)

Tests Skipped Failures Errors Time
1288 38 💤 0 ❌ 0 🔥 9m 22s ⏱️

@acoola acoola added the on hold work currently on hold label Mar 5, 2026
@acoola
Copy link
Copy Markdown
Collaborator

acoola commented Mar 5, 2026

would be closed after #566 merge

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

on hold work currently on hold

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants