Skip to content

Conversation

@Brijesh-Thakkar
Copy link

Summary

This change completes the implementation of TTL-based cleanup for
CANCELLED triggers by integrating the existing mark_as_cancelled()
helper into the actual cancellation flow and removing all remaining
delete_many() operations that bypassed TTL handling.

With this update:

  • CANCELLED triggers now receive a valid expires_at timestamp
  • MongoDB TTL automatically removes them after the configured retention period
  • Destructive deletes are removed, preserving audit history
  • Unbounded accumulation of cancelled triggers is resolved

This finalizes the cleanup strategy introduced in PR #464.


Background

PR #464 added TTL cleanup for TRIGGERED and FAILED triggers and
introduced mark_as_cancelled(), but the helper was never actually used.

As a result:

  • CANCELLED triggers had expires_at = null
  • TTL indexes ignored them
  • They accumulated indefinitely
  • Some components continued using delete_many(), permanently deleting
    documents outside the retention lifecycle

This commit addresses those gaps.


Changes Included

1. Integrates mark_as_cancelled() into the cancellation flow

Every cancellation now applies:

  • trigger_status = CANCELLED
  • expires_at = now + retention_hours

2. Removes all manual delete_many() logic

Replaced with TTL-aware updates in:

  • upsert_graph_template.py
  • init_tasks.py

3. Ensures consistent retention behavior

All terminal states (TRIGGERED, FAILED, CANCELLED) now follow the
same TTL-based lifecycle.

4. Improves logging

Adds clearer logs around cancellation and state transitions.


Testing

Manual verification was performed by:

  1. Inserting a PENDING trigger
  2. Running the cancellation flow
  3. Confirming the resulting document updates to:
{
  "trigger_status": "CANCELLED",
  "expires_at": "<future timestamp>"
}

Copilot AI review requested due to automatic review settings December 10, 2025 14:41
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Dec 10, 2025

Note

Other AI code review bot(s) detected

CodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review.

📝 Walkthrough

Summary by CodeRabbit

Release Notes

  • New Features

    • Triggers are now marked as cancelled and scheduled for delayed removal instead of immediate deletion, with configurable retention period controlled by system settings.
  • Refactor

    • Updated trigger lifecycle management to track cancelled status separately and expire old triggers based on retention hours rather than immediate removal.

✏️ Tip: You can customize this high-level summary in your review settings.

Walkthrough

Replaces immediate deletion of CRON triggers with marking them as CANCELLED and assigning an expires_at computed from settings.trigger_retention_hours; updates TTL index to include CANCELLED; adds helpers to mark/cancel triggers and runs a startup task to bulk-cancel old triggers.

Changes

Cohort / File(s) Summary
Configuration
state-manager/app/config/settings.py
Formatting-only change: added a blank line after Settings class docstring, no behavioral/API changes.
Controller: upsert & cleanup
state-manager/app/controller/upsert_graph_template.py
Replaces immediate deletion of matching CRON triggers with bulk update setting trigger_status = CANCELLED and expires_at; loads settings via get_settings() and improves structured logging and error handling.
Background init tasks
state-manager/app/tasks/init_tasks.py
Adds mark_old_triggers_cancelled() and module logger; startup init_tasks() now calls this to update old triggers to CANCELLED with expires_at instead of deleting them.
Trigger cron helpers
state-manager/app/tasks/trigger_cron.py
Adds mark_as_cancelled and cancel_trigger helpers; normalizes use of .value for trigger_status and sets expires_at on status transitions.
Database model / TTL
state-manager/app/models/db/trigger.py
TTL index partialFilterExpression expanded to include TriggerStatusEnum.CANCELLED alongside TRIGGERED and FAILED so expired CANCELLED documents are removed by TTL.

Sequence Diagram

sequenceDiagram
    autonumber
    participant Controller as Controller\n(upsert_graph_template)
    participant Settings as Settings\n(get_settings)
    participant DB as MongoDB\n(Triggers collection)
    participant Task as Init Task\n(mark_old_triggers_cancelled)
    participant TTL as TTL Worker

    rect rgb(220,240,220)
    Note over Controller,Settings: During upsert of graph template
    Controller->>Settings: get trigger_retention_hours
    Settings-->>Controller: retention_hours
    Controller->>DB: update_many(filter=CRON triggers for graph/namespace, set {trigger_status: CANCELLED, expires_at})
    DB-->>Controller: update result
    end

    rect rgb(220,240,240)
    Note over Task,Settings: Startup cleanup
    Task->>Settings: get trigger_retention_hours
    Settings-->>Task: retention_hours
    Task->>DB: update_many(filter=old TRIGGERED/FAILED with expires_at null, set {trigger_status: CANCELLED, expires_at})
    DB-->>Task: update result
    end

    rect rgb(240,230,220)
    Note over DB,TTL: TTL-based expiration
    TTL->>DB: remove documents where expires_at <= now() and status in [TRIGGERED, FAILED, CANCELLED]
    DB-->>TTL: deletion complete
    end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

  • Areas to focus:
    • Consistency of UTC expires_at calculation across controller, tasks, and mark helpers.
    • Correctness of filter queries used in bulk update_many calls (graph/namespace scoping and trigger type).
    • TTL index partialFilterExpression now including CANCELLED — verify intended retention semantics and index rebuild implications.
    • Ensure use of status enum values (.value) matches DB schema elsewhere.

Possibly related issues

Possibly related PRs

Suggested reviewers

  • NiveditJain

"I hopped through logs and code with care,
Marked cron jobs cancelled instead of bare,
Set expires_at to fade in time's art,
TTL will sweep — a soft departure, not a dart. 🐇✨"

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 25.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately summarizes the main change: wiring mark_as_cancelled into the cancellation flow and replacing delete operations with TTL-based cleanup.
Description check ✅ Passed The description is comprehensive and directly related to the changeset, explaining the context, objectives, and implementation details of the TTL-based cleanup for cancelled triggers.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 40b1be2 and a445bec.

📒 Files selected for processing (3)
  • state-manager/app/controller/upsert_graph_template.py (3 hunks)
  • state-manager/app/tasks/init_tasks.py (1 hunks)
  • state-manager/app/tasks/trigger_cron.py (5 hunks)
🧰 Additional context used
🧠 Learnings (4)
📚 Learning: 2025-09-28T13:35:42.862Z
Learnt from: NiveditJain
Repo: exospherehost/exospherehost PR: 428
File: state-manager/app/tasks/verify_graph.py:4-5
Timestamp: 2025-09-28T13:35:42.862Z
Learning: In the cron trigger scheduling logic for state-manager/app/tasks/verify_graph.py, the system intentionally schedules at least one trigger beyond the trigger_ahead_time window to ensure continuity of scheduled executions. The current logic of appending an event then breaking is by design to guarantee "at least one next" trigger.

Applied to files:

  • state-manager/app/tasks/trigger_cron.py
📚 Learning: 2025-10-22T14:23:18.774Z
Learnt from: NiveditJain
Repo: exospherehost/exospherehost PR: 464
File: state-manager/app/tasks/init_tasks.py:16-20
Timestamp: 2025-10-22T14:23:18.774Z
Learning: In state-manager/app/tasks/init_tasks.py, the init_tasks() function intentionally uses asyncio.gather with a list pattern (even with a single task) to provide an extensible structure for easily adding future startup tasks without refactoring.

Applied to files:

  • state-manager/app/tasks/init_tasks.py
📚 Learning: 2025-08-03T16:46:04.030Z
Learnt from: NiveditJain
Repo: exospherehost/exospherehost PR: 132
File: state-manager/app/controller/upsert_graph_template.py:11-27
Timestamp: 2025-08-03T16:46:04.030Z
Learning: In the exospherehost codebase, for upsert operations on graph templates, the team prioritizes API idempotency over avoiding race conditions in the database layer implementation. The approach of separate find and insert/update operations is acceptable when the API behavior remains consistent.

Applied to files:

  • state-manager/app/controller/upsert_graph_template.py
📚 Learning: 2025-08-06T14:04:44.515Z
Learnt from: NiveditJain
Repo: exospherehost/exospherehost PR: 158
File: state-manager/app/controller/upsert_graph_template.py:49-49
Timestamp: 2025-08-06T14:04:44.515Z
Learning: In the exospherehost codebase, the team prefers to run graph template verification on every upsert operation rather than optimizing by skipping verification when templates haven't changed, prioritizing comprehensive validation over performance optimization.

Applied to files:

  • state-manager/app/controller/upsert_graph_template.py
🧬 Code graph analysis (2)
state-manager/app/tasks/trigger_cron.py (4)
state-manager/app/models/db/trigger.py (1)
  • DatabaseTriggers (9-53)
state-manager/app/models/trigger_models.py (1)
  • TriggerStatusEnum (9-14)
python-sdk/exospherehost/statemanager.py (1)
  • trigger (37-91)
state-manager/app/config/settings.py (1)
  • get_settings (34-38)
state-manager/app/tasks/init_tasks.py (4)
state-manager/app/config/settings.py (1)
  • get_settings (34-38)
state-manager/app/models/db/trigger.py (1)
  • DatabaseTriggers (9-53)
state-manager/app/models/trigger_models.py (1)
  • TriggerStatusEnum (9-14)
state-manager/app/singletons/logs_manager.py (2)
  • LogsManager (9-66)
  • get_logger (65-66)
🔇 Additional comments (13)
state-manager/app/tasks/trigger_cron.py (9)

16-28: LGTM!

The get_due_triggers function correctly uses .value for enum comparisons with the raw pymongo collection. The atomic find-and-update pattern properly prevents race conditions when multiple workers fetch triggers.


30-37: LGTM!

The helper function correctly invokes trigger_graph with required parameters and generates a unique request ID.


39-49: LGTM!

The mark_as_failed function correctly uses .value for the enum and sets expires_at for TTL-based cleanup, consistent with the PR's objective.


52-65: LGTM!

The mark_as_cancelled helper correctly uses .value for the enum (line 62) and sets expires_at for TTL cleanup. This addresses the previous review feedback.


78-103: LGTM!

The create_next_triggers function correctly uses TriggerStatusEnum.PENDING (without .value) for Beanie's insert() method, which handles enum conversion automatically. The loop correctly schedules triggers and breaks after scheduling at least one beyond the current time. Based on learnings, this "at least one next" behavior is intentional.


106-115: LGTM!

The mark_as_triggered function correctly uses .value for the raw pymongo update and sets expires_at for TTL cleanup.


118-127: LGTM!

The handle_trigger function correctly processes due triggers, marks results appropriately, and ensures create_next_triggers is always called via the finally block.


130-137: LGTM!

The trigger_cron function correctly spawns multiple concurrent workers using asyncio.gather. Based on learnings, this extensible list pattern is intentional for easily adding future tasks.


67-76: Verify that cancel_trigger is used elsewhere in the codebase.

The cancel_trigger wrapper is defined but not called within this file. Per the PR objective to "wire mark_as_cancelled into trigger cancellation flow," verify that this helper is actually used by other modules. If it's not used anywhere, consider removing it to avoid dead code.

state-manager/app/tasks/init_tasks.py (1)

51-54: LGTM!

The init_tasks function correctly uses asyncio.gather with the list pattern. Based on learnings, this structure is intentional for extensibility when adding future startup tasks.

state-manager/app/controller/upsert_graph_template.py (3)

24-26: LGTM!

Settings are now loaded at request time inside the function, addressing the prior feedback about runtime configuration changes.


84-115: LGTM!

The bulk update correctly replaces the destructive delete_many() with a TTL-aware cancellation:

  • Uses .value for enum comparisons with raw pymongo
  • Includes namespace in the filter (addressing prior feedback)
  • Sets both trigger_status and expires_at for TTL cleanup

This aligns with the PR objective of preserving audit history.


134-141: LGTM!

The exception handling correctly uses bare raise to preserve the original traceback, addressing prior feedback.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @Brijesh-Thakkar, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request finalizes the implementation of a robust TTL-based cleanup strategy for database triggers. It addresses previous gaps where CANCELLED triggers were not being properly handled by the Time-To-Live mechanism, leading to their indefinite accumulation. By integrating the mark_as_cancelled utility and replacing direct deletion operations with status updates that leverage MongoDB's TTL, the system now ensures consistent, auditable, and automatic cleanup across all terminal trigger states.

Highlights

  • TTL-based Cleanup for CANCELLED Triggers: The mark_as_cancelled() helper is now fully integrated into the trigger cancellation flow, ensuring that CANCELLED triggers receive a valid expires_at timestamp and are automatically removed by MongoDB's TTL mechanism after the configured retention period.
  • Elimination of Destructive Deletes: All remaining delete_many() operations that bypassed TTL handling have been replaced with update_many() operations. This change ensures that triggers are marked as CANCELLED with an expires_at field rather than being immediately deleted, preserving audit history and allowing for consistent cleanup.
  • Consistent Retention Policy: All terminal trigger states, including TRIGGERED, FAILED, and now CANCELLED, consistently follow the same TTL-based lifecycle, preventing the unbounded accumulation of triggers in the database.
  • Improved Logging: Clearer logging has been added around trigger cancellation and state transitions to provide better visibility into the cleanup process.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request correctly transitions from destructive trigger deletion to a TTL-based cleanup by marking them as CANCELLED and setting an expires_at timestamp. The changes to the MongoDB index and the logic in upsert_graph_template.py are well-implemented. However, I've identified critical issues in the newly added code in init_tasks.py and trigger_cron.py. In both files, enum members are passed directly to raw PyMongo queries instead of their string values, which will likely cause these database operations to fail. My review includes suggestions to fix these issues by using the .value attribute on the enums, ensuring the TTL cleanup mechanism works as intended.

@coderabbitai coderabbitai bot added the enhancement New feature or request label Dec 10, 2025
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f4c6fe6 and 40b1be2.

📒 Files selected for processing (5)
  • state-manager/app/config/settings.py (1 hunks)
  • state-manager/app/controller/upsert_graph_template.py (3 hunks)
  • state-manager/app/models/db/trigger.py (1 hunks)
  • state-manager/app/tasks/init_tasks.py (1 hunks)
  • state-manager/app/tasks/trigger_cron.py (1 hunks)
🧰 Additional context used
🧠 Learnings (5)
📓 Common learnings
Learnt from: NiveditJain
Repo: exospherehost/exospherehost PR: 428
File: state-manager/app/tasks/verify_graph.py:4-5
Timestamp: 2025-09-28T13:35:42.862Z
Learning: In the cron trigger scheduling logic for state-manager/app/tasks/verify_graph.py, the system intentionally schedules at least one trigger beyond the trigger_ahead_time window to ensure continuity of scheduled executions. The current logic of appending an event then breaking is by design to guarantee "at least one next" trigger.
📚 Learning: 2025-09-28T13:35:42.862Z
Learnt from: NiveditJain
Repo: exospherehost/exospherehost PR: 428
File: state-manager/app/tasks/verify_graph.py:4-5
Timestamp: 2025-09-28T13:35:42.862Z
Learning: In the cron trigger scheduling logic for state-manager/app/tasks/verify_graph.py, the system intentionally schedules at least one trigger beyond the trigger_ahead_time window to ensure continuity of scheduled executions. The current logic of appending an event then breaking is by design to guarantee "at least one next" trigger.

Applied to files:

  • state-manager/app/tasks/trigger_cron.py
  • state-manager/app/controller/upsert_graph_template.py
📚 Learning: 2025-10-22T14:23:18.774Z
Learnt from: NiveditJain
Repo: exospherehost/exospherehost PR: 464
File: state-manager/app/tasks/init_tasks.py:16-20
Timestamp: 2025-10-22T14:23:18.774Z
Learning: In state-manager/app/tasks/init_tasks.py, the init_tasks() function intentionally uses asyncio.gather with a list pattern (even with a single task) to provide an extensible structure for easily adding future startup tasks without refactoring.

Applied to files:

  • state-manager/app/tasks/init_tasks.py
📚 Learning: 2025-08-03T16:46:04.030Z
Learnt from: NiveditJain
Repo: exospherehost/exospherehost PR: 132
File: state-manager/app/controller/upsert_graph_template.py:11-27
Timestamp: 2025-08-03T16:46:04.030Z
Learning: In the exospherehost codebase, for upsert operations on graph templates, the team prioritizes API idempotency over avoiding race conditions in the database layer implementation. The approach of separate find and insert/update operations is acceptable when the API behavior remains consistent.

Applied to files:

  • state-manager/app/controller/upsert_graph_template.py
📚 Learning: 2025-08-06T14:04:44.515Z
Learnt from: NiveditJain
Repo: exospherehost/exospherehost PR: 158
File: state-manager/app/controller/upsert_graph_template.py:49-49
Timestamp: 2025-08-06T14:04:44.515Z
Learning: In the exospherehost codebase, the team prefers to run graph template verification on every upsert operation rather than optimizing by skipping verification when templates haven't changed, prioritizing comprehensive validation over performance optimization.

Applied to files:

  • state-manager/app/controller/upsert_graph_template.py
🧬 Code graph analysis (3)
state-manager/app/tasks/trigger_cron.py (2)
state-manager/app/models/db/trigger.py (1)
  • DatabaseTriggers (9-53)
state-manager/app/models/trigger_models.py (1)
  • TriggerStatusEnum (9-14)
state-manager/app/models/db/trigger.py (1)
state-manager/app/models/trigger_models.py (1)
  • TriggerStatusEnum (9-14)
state-manager/app/tasks/init_tasks.py (4)
state-manager/app/config/settings.py (1)
  • get_settings (34-38)
state-manager/app/models/db/trigger.py (1)
  • DatabaseTriggers (9-53)
state-manager/app/models/trigger_models.py (1)
  • TriggerStatusEnum (9-14)
state-manager/app/singletons/logs_manager.py (2)
  • LogsManager (9-66)
  • get_logger (65-66)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Agent
🔇 Additional comments (4)
state-manager/app/config/settings.py (1)

7-9: LGTM!

Minor formatting adjustment. The trigger_retention_hours setting is correctly defined and will be used by other modules in this PR for TTL-based cleanup.

state-manager/app/models/db/trigger.py (1)

43-51: LGTM!

The TTL index partial filter now correctly includes CANCELLED alongside TRIGGERED and FAILED, ensuring all terminal states are subject to TTL-based cleanup. This aligns with the PR objective of replacing destructive deletes with TTL expiration.

state-manager/app/tasks/init_tasks.py (1)

19-24: Verify filter scope is intentional.

The filter targets triggers with TRIGGERED or FAILED status but without an expires_at value. This appears to be a migration path for legacy triggers that predate TTL support.

However, this filter does not include namespace or graph_name, so it will affect all matching triggers across the entire database. Confirm this global scope is intentional for the startup cleanup task.

state-manager/app/tasks/trigger_cron.py (1)

48-61: Verify that mark_as_cancelled is wired into the trigger cancellation flow.

The function implementation is sound and follows established patterns, but its integration point needs confirmation. Ensure this function is called from the appropriate cancellation handler in the trigger processing logic.

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR aims to complete the TTL-based cleanup implementation for CANCELLED triggers by wiring in the mark_as_cancelled() helper and replacing destructive delete_many() operations with TTL-aware updates. However, there are critical issues with the implementation that need to be addressed.

Key intended changes:

  • Add mark_as_cancelled() helper function for consistent cancellation handling
  • Replace delete_many() with update_many() to preserve audit history
  • Ensure all CANCELLED triggers receive expires_at timestamps for MongoDB TTL cleanup

Critical issues identified:

  1. The new mark_as_cancelled() function is defined but never actually used—cancellation logic is still implemented inline
  2. The init task incorrectly marks already-completed triggers (TRIGGERED/FAILED) as CANCELLED instead of just setting their expires_at
  3. Missing test coverage for the new function despite existing test patterns

Reviewed changes

Copilot reviewed 4 out of 5 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
state-manager/app/tasks/trigger_cron.py Adds mark_as_cancelled() helper function, but it's not integrated anywhere
state-manager/app/tasks/init_tasks.py Converts delete_many() to update_many(), but incorrectly transitions TRIGGERED/FAILED to CANCELLED status
state-manager/app/models/db/trigger.py Updates TTL index to include CANCELLED status (correct)
state-manager/app/controller/upsert_graph_template.py Replaces delete_many() with inline cancellation logic using update_many()
state-manager/app/config/settings.py Minor whitespace-only change

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

…ed across init, cron, and upsert_graph_template
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant