Skip to content

feat: add checkpoints#566

Open
acoola wants to merge 29 commits intomainfrom
feat/add-checkpoints
Open

feat: add checkpoints#566
acoola wants to merge 29 commits intomainfrom
feat/add-checkpoints

Conversation

@acoola
Copy link
Copy Markdown
Collaborator

@acoola acoola commented Feb 17, 2026

Note

High Risk
High risk because it introduces a new persistence/resume layer and modifies core Flow.run_* and Node/agent/orchestrator execution loops; bugs could cause skipped work, duplicated work, or corrupted state across runs.

Overview
Adds a first-class checkpointing system that can persist and resume Flow execution, including node outputs/errors, pending human-approval prompts, and mid-loop progress for iterative nodes.

Introduces dynamiq/checkpoints with FlowCheckpoint schema, CheckpointConfig/CheckpointContext, backend interface + InMemory and atomic FileSystem implementations (with retention/TTL cleanup and APPEND/REPLACE snapshot behavior).

Integrates checkpointing into runtime: Flow now saves checkpoints after node completion / on failure and can resume from a checkpoint ID/object (skipping completed nodes and restoring node internal state), RunnableConfig gains per-run checkpoint overrides, YAML loading can instantiate checkpoint backends from type, and core nodes/agents/orchestrators/LLMs/tools implement to_checkpoint_state()/from_checkpoint_state() plus IterativeCheckpointMixin to support loop-level resume and mid-loop checkpoint requests.

Reviewed by Cursor Bugbot for commit cb912eb. Bugbot is set up for automated code reviews on this repo. Configure here.

@acoola acoola added the WIP label Feb 17, 2026
@acoola
Copy link
Copy Markdown
Collaborator Author

acoola commented Feb 17, 2026

bugbot run

@acoola acoola force-pushed the feat/add-checkpoints branch from d25f9c9 to a92f40e Compare February 24, 2026 10:36
@acoola
Copy link
Copy Markdown
Collaborator Author

acoola commented Feb 24, 2026

bugbot run

@acoola
Copy link
Copy Markdown
Collaborator Author

acoola commented Feb 24, 2026

bugbot run

@github-actions
Copy link
Copy Markdown

github-actions bot commented Feb 24, 2026

Coverage

Coverage Report •
FileStmtsMissCoverMissing
dynamiq/checkpoints
   checkpoint.py3414985%38, 85, 89, 215, 219–221, 390, 396, 412, 434–435, 453, 458, 491, 501–502, 507, 513, 523, 545–546, 565, 570, 603, 613–614, 622–624, 632–634, 642–648, 650–652, 663–665, 673–675
   utils.py26292%10, 13
dynamiq/checkpoints/backends
   base.py1001684%15, 35, 50, 55, 60, 65, 77, 82, 92–93, 101, 113, 176, 188, 201, 207
   filesystem.py1752685%18, 51, 54, 106, 109, 129–132, 145–146, 151, 176, 180, 184, 199, 202, 226, 232, 234, 249–252, 254–255
   in_memory.py66395%13, 53, 111
dynamiq/flows
   flow.py3466780%134, 141, 148, 182–183, 241–243, 280–281, 285–286, 316, 319, 325, 327–331, 338, 341, 347, 349, 351–352, 354, 356–358, 508–509, 559, 672–675, 677–680, 688–689, 703–704, 706, 725–726, 728, 731, 750–751, 754–755, 758–759, 761–762, 765–767, 769–773, 775
dynamiq/nodes
   node.py7199886%23, 325, 342, 375, 401, 406–407, 411–412, 456–457, 459, 477, 498, 532, 534, 541–542, 614–615, 618–620, 627–629, 637–638, 648–653, 660–661, 736, 749, 751, 753, 757, 768–769, 790–791, 805, 810–811, 826–827, 831, 833–834, 1211, 1245–1246, 1271–1272, 1294–1295, 1323–1324, 1350–1351, 1376–1377, 1399–1400, 1420–1421, 1444–1445, 1459, 1478, 1484, 1487, 1504–1506, 1579–1585, 1587, 1589–1590, 1601, 1621, 1656, 1690, 1696, 1749, 1771–1773
dynamiq/nodes/agents
   agent.py71013580%218–221, 226, 234, 239, 257–258, 276–277, 330–332, 493–502, 505–507, 522, 531, 557, 560–561, 571–574, 579, 592, 603–606, 621, 633, 638, 654–655, 665, 668, 678, 718–720, 726–728, 837, 842, 892, 926, 928, 943–944, 951, 954, 979, 981–982, 998, 1023–1027, 1094, 1100, 1104, 1107, 1140, 1165, 1168, 1173, 1189, 1235, 1249, 1288–1289, 1315, 1329, 1343, 1346–1347, 1357, 1364, 1426–1427, 1431, 1433–1435, 1437, 1439, 1472, 1480–1481, 1483, 1486, 1489, 1540–1543, 1559, 1568, 1607, 1613–1619, 1625, 1627–1636, 1638–1639
   base.py101630969%64, 70, 162–164, 179, 187, 190, 358, 376, 386, 389, 504, 506, 537–539, 541–546, 551–553, 575–581, 585–590, 593–597, 609–610, 612–617, 619–620, 628, 648, 650, 652, 674, 747–748, 757–760, 774–781, 789–791, 826, 837–838, 879–882, 886, 888, 890–897, 899–904, 908–909, 911, 915–916, 918, 929, 936–941, 945–946, 951–952, 1007, 1018, 1042–1045, 1047, 1049, 1051–1053, 1055–1056, 1063, 1065–1066, 1070–1072, 1078, 1083, 1086–1088, 1109, 1111, 1129–1131, 1135–1137, 1140, 1152–1153, 1174–1179, 1196–1197, 1199–1200, 1204–1205, 1207–1214, 1217–1220, 1222–1223, 1225–1227, 1247, 1254–1255, 1260, 1277–1278, 1283, 1294–1296, 1307–1308, 1313, 1321, 1323, 1340, 1346, 1353, 1372–1374, 1376, 1397–1398, 1414, 1421–1424, 1427, 1432, 1440, 1464, 1468–1469, 1496–1498, 1500–1505, 1507–1508, 1510–1518, 1525, 1527–1530, 1532–1540, 1547, 1549, 1551–1553, 1567, 1586–1587, 1592, 1598, 1649, 1663, 1669, 1672, 1679, 1734, 1743, 1746, 1768–1770, 1775–1776, 1792–1793, 1797, 1810, 1814, 1820, 1822–1823, 1825–1826, 1828–1831, 1833, 1835, 1837–1840, 1842, 1845, 1847, 1851, 1854, 1856, 1860, 1863, 1865, 1869, 1872–1874, 1881, 1888, 1891–1892
dynamiq/nodes/agents/orchestrators
   adaptive.py19514326%64, 72–75, 89, 92–93, 98, 110–112, 119, 132, 142, 144–146, 148, 150–151, 163, 165–168, 170–175, 177, 191–193, 195–196, 198–199, 201–203, 205–209, 211–214, 216–217, 226–227, 236–238, 244–247, 249, 256, 267–270, 274, 276, 298, 309, 311–313, 315, 317, 323, 327–329, 345–353, 355–358, 361, 363, 365–368, 370–371, 375, 377, 384–387, 391–392, 394–397, 400–401, 406, 408–413, 433–439, 443–454
   graph.py1845271%77, 80–81, 99, 107–110, 126, 134, 139, 161–162, 164–165, 193, 217–218, 224, 244, 254, 256–259, 261–262, 266–267, 275–277, 279–280, 282–283, 303, 314, 319, 323, 329, 345, 348–349, 354, 364, 386, 397, 421–424
   linear.py21414333%88, 96–99, 130, 133–134, 139, 157–158, 160–161, 173, 175–177, 179–180, 184–187, 195–197, 199–200, 202, 206–208, 210, 215–218, 220, 222–225, 231–234, 237–238, 242–243, 245–250, 252, 257–258, 260–262, 264–266, 277, 279–280, 282–283, 285–286, 294, 298, 304–306, 312, 317–319, 321–325, 327, 345, 349–351, 353–356, 358, 364–373, 375–376, 389–391, 393–394, 396–397, 399–402, 406–408, 421–422, 424–430, 434–435, 447–454, 456–457
   orchestrator.py1707357%105–106, 119–120, 122–128, 147–148, 150, 161–165, 167–172, 174–176, 184–188, 192–193, 195–196, 198–201, 204–205, 207, 229–230, 233, 235–238, 242, 302, 307, 328–329, 339, 352–353, 361–362, 364–365, 367–372, 389–392
dynamiq/nodes/llms
   base.py3203489%32, 139, 145, 320, 361–364, 371–377, 404, 443, 445, 551–560, 581–582, 624, 626, 656, 760
dynamiq/nodes/tools
   agent_tool.py129695%17–18, 206, 267–268, 314
   context_manager.py15110629%91, 95, 105, 113–115, 124–125, 130, 134, 148–149, 151–153, 155–157, 159–163, 165, 169, 181–183, 185–186, 188–195, 197–200, 202–203, 205–206, 208, 217, 223, 225–227, 229–232, 256–257, 259–261, 264–266, 268–269, 274–277, 280, 282–283, 288–290, 294, 298–303, 308–310, 323–325, 327–328, 330–331, 333, 335, 341–342, 344, 352–353, 360–363, 365–366
   e2b_sandbox.py42333021%79, 99–103, 105–106, 108, 110, 141, 155, 163, 176, 190–191, 193–194, 210–222, 224, 226, 269–271, 277–278, 280, 329, 355, 360–362, 367, 371, 373–374, 376, 378–380, 385–387, 391, 395–399, 403–405, 418–423, 425–426, 439–443, 450–451, 455–465, 481–482, 484, 486–488, 491, 494, 497–498, 500, 509–513, 518, 536–537, 539–543, 545, 547–550, 552–553, 555–557, 560, 562–568, 570, 572–573, 594–595, 597–600, 602–605, 621–622, 624–626, 628, 630, 632, 634–636, 638, 651–658, 663, 676–677, 679, 681–685, 687, 689–690, 692–693, 697, 699–700, 702, 704–705, 707–711, 713, 715–717, 736–737, 739–740, 742–745, 747, 751, 757–762, 764–769, 771–772, 774–775, 777–779, 781–782, 786–787, 789–791, 793–796, 798–799, 805–806, 813–815, 817–818, 820–821, 823–824, 826, 828–833, 835, 838–843, 845, 847, 849–852, 855–856, 858–860, 863–866, 869, 871, 873–876, 878, 880–883, 885–894, 896, 898–904, 906–907, 916, 927–929, 934–937, 941–943, 945, 954–960, 964–967
   file_tools.py76825566%91, 94–95, 101–102, 112–114, 425, 432–438, 441, 494–495, 498–499, 519–525, 532–533, 538, 540, 543, 545–547, 553–555, 560–562, 581–583, 597–598, 604–605, 609, 624, 635–636, 652, 664–668, 671, 707, 721–724, 726–727, 730–733, 735, 742–744, 823, 832–833, 840–841, 854, 863, 865–867, 885, 887, 889–890, 892, 894–897, 899, 910, 924, 926, 928–929, 931, 933–936, 938, 949, 959, 961, 964, 971–981, 985–987, 993–1001, 1009, 1025–1027, 1032, 1038–1042, 1061, 1133–1134, 1141–1144, 1146–1147, 1193, 1196–1198, 1230, 1246–1248, 1269–1270, 1366, 1368, 1373–1378, 1380–1381, 1384–1386, 1389, 1452, 1454–1455, 1457–1460, 1462–1467, 1471–1472, 1474–1476, 1485–1486, 1488, 1495, 1499, 1501–1505, 1513–1516, 1518–1519, 1532–1534, 1536, 1538–1540, 1550–1551, 1553–1560, 1570, 1572, 1576–1577, 1579–1590, 1593–1596, 1603–1607, 1660–1662
   llm_summarizer.py691085%135, 147–149, 166–167, 193, 228, 232–233
   thinking_tool.py994257%147, 152, 156–158, 162, 164–165, 167–168, 170–172, 178, 180, 199–201, 203–205, 207, 209, 211, 213, 215, 222, 224, 226–227, 229, 231–232, 242, 246, 272–275, 279–280, 284
dynamiq/prompts
   prompts.py1832387%192, 199, 219, 231, 265–267, 275, 278, 290, 299, 325, 338, 396–397, 399, 401, 424, 438, 444–447
dynamiq/runnables
   base.py83297%235, 250
dynamiq/serializers/loaders
   yaml.py43911773%63, 68–72, 195, 217–219, 244–245, 248–251, 254, 282, 284, 291–292, 320, 332–333, 353–356, 376–378, 398–403, 434, 439, 451, 479, 485–487, 490, 508–513, 531–537, 557–562, 567–569, 574–575, 580, 719, 721, 733–734, 754–757, 795, 822, 828, 830, 850–852, 938–939, 941–942, 946–949, 951, 955, 995, 998, 1005, 1010–1011, 1032–1033, 1063, 1065, 1068, 1071, 1098, 1100, 1106–1107, 1161–1162, 1261–1265
dynamiq/utils
   utils.py2163086%59, 91, 93, 264, 291, 378–379, 385–386, 396–397, 412–418, 420, 422–423, 440, 466–472, 474
dynamiq/workflow
   workflow.py1261885%18, 44, 47, 74–76, 94, 99–100, 102–105, 155–157, 236–237
tests/integration/checkpoints
   conftest.py21576%15, 21, 27, 37, 47
TOTAL28084905867% 

Tests Skipped Failures Errors Time
1875 1 💤 0 ❌ 0 🔥 2m 9s ⏱️

@acoola
Copy link
Copy Markdown
Collaborator Author

acoola commented Feb 25, 2026

bugbot run

@acoola acoola force-pushed the feat/add-checkpoints branch from f6dd827 to 1d17d51 Compare February 26, 2026 14:46
@acoola
Copy link
Copy Markdown
Collaborator Author

acoola commented Feb 26, 2026

bugbot run

@acoola
Copy link
Copy Markdown
Collaborator Author

acoola commented Feb 26, 2026

bugbot run

@acoola
Copy link
Copy Markdown
Collaborator Author

acoola commented Feb 26, 2026

bugbot run

@acoola acoola requested a review from a team as a code owner February 27, 2026 23:37
- extend bugbot rules
@acoola acoola removed the WIP label Mar 3, 2026
@TrachukT TrachukT added the run-integration-tests-with-creds Trigger integration tests with credentials (optional) label Apr 8, 2026
Copy link
Copy Markdown

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit cb912eb. Configure here.


def to_checkpoint_state(self) -> LLMCheckpointState:
"""Extract LLM-specific state for checkpointing."""
return LLMCheckpointState(is_fallback_run=self._is_fallback_run)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BaseLLM.to_checkpoint_state() missing super() call

High Severity

BaseLLM.to_checkpoint_state() creates a LLMCheckpointState directly without calling super().to_checkpoint_state() and merging the result. Every other subclass in this PR (Agent, Orchestrator, SubAgentTool, etc.) correctly calls super().to_checkpoint_state().model_dump(exclude_none=True) and spreads the base fields. Without this, the approval_response and iteration fields from the base Node.to_checkpoint_state() are silently lost whenever an LLM node's checkpoint state is captured.

Fix in Cursor Fix in Web

Triggered by project rule: Bugbot Rules for Dynamiq

Reviewed by Cursor Bugbot for commit cb912eb. Configure here.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

approval_response and iteration not applicable for LLM nodes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

run-integration-tests-with-creds Trigger integration tests with credentials (optional)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants