Skip to content

Conversation

@sonndinh
Copy link
Member

@sonndinh sonndinh commented Jun 27, 2025

  • Problem:
    Power device relies on the Heartbeat messages from microgrid controllers to determine its active controller. Heartbeat's period is 1 second. If a Heartbeat is not received from a selected (or active) controller within 3 seconds since the last Heartbeat, the selected controller is considered disappearing and the power device starts a process to select a new selected controller. Current implementation schedule (or reschedule if the timer already exists) a timer with expiration of 3 seconds to detect missed Heartbeats. The default timer queue used by ACE_Reactor, ACE_Timer_Heap, is having an issue with this setup: missed Heartbeat timers fire even though the Heartbeats are still received from the active controller. This happens around time when the timer Id returned by ACE_Timer_Heap has reached its limit and wrapped around.

  • Solution:

  1. Use ACE_Timer_Hash for the ACE reactor -- ACE_Timer_Hash doesn't seem to have the same issue.
  2. Support a separate reactor instance for the controller selector to isolate it from the reactor for Handshaking. This can help with debugging the controller selector.
  • Misc. changes:
  1. Add tms::ActiveMicrogridControllerState topic to power devices that gives the CLI updates on their selected controller
  2. Improve printing format for the list of power devices in CLI and add information for each device's selected controller

@sonndinh sonndinh marked this pull request as ready for review July 2, 2025 16:32
Comment on lines 211 to 212
ACE_ERROR((LM_WARNING, "(%P|%t) WARNING: TimerHandler::handle_timeout: timer id %d does NOT exist\n",
timer_id));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is TimerId a type that can be logged with %d on all platforms? It may need a cast or a different log formatter.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, TimerId is an alias for long so it can be logged with %d.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACE's logger will pass through %d to snprintf unchanged. %d takes an int so on some systems it will read a 4-byte integer from the varargs but the long is 8 bytes.
https://en.cppreference.com/w/cpp/io/c/printf.html

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like ACE's %q logger format will cover long.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's one option, but it doesn't match long on all platforms. %q is always 8 bytes and long is sometimes 8 bytes. So when using %q the parameter will need to be widened (or stay the same) to be always 8 bytes.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In case long is 4 bytes, wouldn't it be implicitly widened similar to assigning an integer of a smaller type to a variable of a bigger type? Perhaps it works differently than the assignment operation.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's using varargs, so there is no parameter type.

@sonndinh sonndinh requested a review from mitza-oci July 7, 2025 19:48
Co-authored-by: Adam Mitz <mitza@objectcomputing.com>
@jrw972 jrw972 merged commit 5399e4a into main Jul 8, 2025
12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants