Skip to content

Comments

feat(governor): group store wired + deterministic eviction tests#56

Merged
heidi-dang merged 2 commits intomainfrom
feat/w2p2-group-policy-store
Feb 19, 2026
Merged

feat(governor): group store wired + deterministic eviction tests#56
heidi-dang merged 2 commits intomainfrom
feat/w2p2-group-policy-store

Conversation

@heidi-dang
Copy link
Owner

Wave2 P2-2: wire GroupPolicyStore into ProcessGovernor, deterministic eviction unit tests, remove memset-on-non-trivial warnings, align ViolationAction to NONE/WARN/SOFT_KILL/HARD_KILL.

…ministic eviction tests

Wave2 P2-2: wire GroupPolicyStore into ProcessGovernor with:
- GroupPolicyStore member (cap=256) and pid→group map (cap=8192)
- LRU eviction for groups and pids
- apply_group_policy and apply_cgroup_policy methods
- Event callbacks for GROUP_EVICT/PIDMAP_EVICT/GOV_CGROUP_UNAVAILABLE
- Rate-limited cgroup unavailability events
- ViolationAction aligned to NONE/WARN/SOFT_KILL/HARD_KILL
- Deterministic eviction unit tests using controllable time sequence
- Fix memset-on-non-trivial warnings in GroupPolicyStore
@gemini-code-assist
Copy link

Summary of Changes

Hello @heidi-dang, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the Process Governor by introducing a new GroupPolicyStore component. This store enables the system to manage and apply resource governance policies at a group level, rather than just per-process. It includes robust, deterministic eviction mechanisms for both group policies and PID-to-group mappings, ensuring efficient resource utilization and predictable behavior. The ProcessGovernor now leverages this store to apply cgroup policies based on defined groups, and new unit tests validate the eviction logic.

Highlights

  • Group Policy Management: Introduced a new GroupPolicyStore component to manage and store group-based resource policies and PID-to-group mappings.
  • Process Governor Integration: Integrated the GroupPolicyStore into the ProcessGovernor to enable the application of resource governance policies at a group level.
  • Deterministic Eviction Logic: Implemented deterministic, time-based eviction mechanisms within the GroupPolicyStore for both group policies and PID mappings, ensuring efficient resource management.
  • Unit Testing for Eviction: Added comprehensive unit tests to validate the deterministic eviction behavior of the GroupPolicyStore.
  • CPU Quota Support: Extended the CpuPolicy structure with a quota_us field to support CPU quota management.
Changelog
  • include/heidi-kernel/gov_rule.h
    • Added quota_us to CpuPolicy struct.
  • include/heidi-kernel/group_policy_store.h
    • Added new header defining GroupPolicyStore class and GroupPolicy struct.
    • Introduced constants kMaxGroups and kMaxPidGroupMap.
    • Defined EvictReason enum and Stats struct for the store.
  • include/heidi-kernel/process_governor.h
    • Included cgroup_driver.h and group_policy_store.h.
    • Expanded ProcessGovernor::Stats to track group and pidmap evictions, and cgroup unavailability events.
    • Added new member variables group_store_, cgroup_driver_, last_cgroup_unavailable_ns_, and kCgroupUnavailableRateLimitNs.
    • Declared new methods apply_group_policy and apply_cgroup_policy.
  • src/governor/group_policy_store.cpp
    • Added full implementation for GroupPolicyStore, including methods for upsert_group, map_pid_to_group, get_group, get_group_for_pid, get_stats, clear, evict_oldest_group, and evict_oldest_pid_entry.
    • Implemented time-based eviction logic for both groups and PID mappings.
  • src/governor/process_governor.cpp
    • Updated ProcessGovernor::get_stats to incorporate statistics from GroupPolicyStore.
    • Integrated apply_group_policy into ProcessGovernor::apply_rules.
    • Implemented apply_group_policy to manage group policies and PID mappings, triggering eviction events.
    • Implemented apply_cgroup_policy to apply resource limits via CgroupDriver based on group policies, with rate-limiting for cgroup unavailability.
  • tests/test_group_policy_store.cpp
    • Added new test file for GroupPolicyStore.
    • Included tests for group eviction when kMaxGroups is exceeded.
    • Included tests for PID mapping eviction when kMaxPidGroupMap is exceeded.
    • Verified that touching existing groups or PIDs updates their last seen time, preventing premature eviction.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a GroupPolicyStore to manage group-based resource limits and wires it into the ProcessGovernor. However, critical security vulnerabilities were identified, including a logic error in PID-to-group mapping that allows bypassing security restrictions, a potential Denial of Service via unhandled exceptions in CPU affinity parsing, and a data race due to inconsistent synchronization of the group store. Beyond security, there are also correctness issues such as the upsert_group function failing to populate policy details and omitting resource limit fields, as well as problems with statistics reporting and the use of magic numbers for event types.

Comment on lines +160 to +167
const char* GroupPolicyStore::get_group_for_pid(int32_t pid) const {
for (size_t i = 0; i < kMaxPidGroupMap; ++i) {
if (pid_map_[i].in_use && static_cast<int32_t>(i) == pid) {
return pid_map_[i].group_id.data();
}
}
return nullptr;
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-critical critical

The GroupPolicyStore has a critical logic error in its PID-to-group mapping, leading to a security bypass. The get_group_for_pid function incorrectly assumes the array index i corresponds directly to the pid, while map_pid_to_group inserts into the first available slot. This means PIDs greater than or equal to 8192, or PIDs stored at indices not matching their value, will not be found. This allows processes with such PIDs to bypass all group-level restrictions enforced by the governor.

Comment on lines +80 to +84
struct PidEntry {
std::array<char, kMaxGroupIdLen + 1> group_id{};
uint64_t last_seen_ns = 0;
bool in_use = false;
};

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The PidEntry struct is missing a field to store the actual process ID. Since PIDs are sparse and can exceed the fixed size of the pid_map_ array, the mapping logic must store the PID explicitly to perform correct lookups.

Suggested change
struct PidEntry {
std::array<char, kMaxGroupIdLen + 1> group_id{};
uint64_t last_seen_ns = 0;
bool in_use = false;
};
struct PidEntry {
int32_t pid = -1;
std::array<char, kMaxGroupIdLen + 1> group_id{};
uint64_t last_seen_ns = 0;
bool in_use = false;
};

Comment on lines +111 to +112
for (size_t i = 0; i < kMaxPidGroupMap; ++i) {
if (pid_map_[i].in_use && static_cast<int32_t>(i) == pid) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The logic here incorrectly assumes that the array index i is the process ID. This will fail for any PID greater than kMaxPidGroupMap (8191) and will cause incorrect updates if PIDs are reused or sparse. The loop should check for a matching pid field within the PidEntry.

Suggested change
for (size_t i = 0; i < kMaxPidGroupMap; ++i) {
if (pid_map_[i].in_use && static_cast<int32_t>(i) == pid) {
for (size_t i = 0; i < kMaxPidGroupMap; ++i) {
if (pid_map_[i].in_use && pid_map_[i].pid == pid) {

Comment on lines +80 to +89
for (size_t i = 0; i < kMaxGroups; ++i) {
if (!groups_[i].in_use) {
groups_[i] = GroupEntry{};
std::strncpy(groups_[i].policy.group_id.data(), gid, gid_len);
groups_[i].policy.group_id[gid_len] = '\0';
groups_[i].policy.last_update_ns = get_time();
groups_[i].in_use = true;
group_count_++;
return true;
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

When a new group is inserted, the policy fields from msg are never copied into the new GroupEntry. The current implementation only updates fields if the group already exists (lines 56-75). This means new groups will have empty policies until a second 'upsert' call is made for the same group ID.

Comment on lines +56 to +59
if (msg.cpu) {
if (msg.cpu->max_pct)
groups_[i].policy.cpu_max_pct = msg.cpu->max_pct;
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The upsert_group function is missing updates for the newly added cpu_quota_us and cpu_period_us fields in CpuPolicy. These should be synchronized from the msg to the stored policy.

      if (msg.cpu) {
        if (msg.cpu->max_pct)
          groups_[i].policy.cpu_max_pct = msg.cpu->max_pct;
        if (msg.cpu->quota_us)
          groups_[i].policy.cpu_quota_us = msg.cpu->quota_us;
        if (msg.cpu->period_us)
          groups_[i].policy.cpu_period_us = msg.cpu->period_us;
      }

auto store_stats = group_store_.get_stats();
s.group_evictions = store_stats.group_evictions;
s.pidmap_evictions = store_stats.pidmap_evictions;
s.cgroup_unavailable_events = store_stats.cgroup_unavailable_count;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This line overwrites the cgroup_unavailable_events counter with a value from group_store_. However, GroupPolicyStore never increments its internal cgroup_unavailable_count, while ProcessGovernor correctly increments its own stats_.cgroup_unavailable_events at line 456. This assignment effectively breaks the statistic by always reporting 0.

Comment on lines +421 to +460
event_callback_(4, evict_msg, 0);
}
}
if (new_stats.pidmap_evictions > prev_stats.pidmap_evictions) {
stats_.pidmap_evictions++;
if (event_callback_) {
GovApplyMsg evict_msg;
evict_msg.pid = pid;
event_callback_(5, evict_msg, 0);
}
}

const char* group_id = group_store_.get_group_for_pid(pid);
if (group_id) {
const GroupPolicy* group_policy = group_store_.get_group(group_id);
if (group_policy) {
auto r = apply_cgroup_policy(pid, *group_policy);
if (!r.success) {
return r;
}
}
}
}

result.success = true;
return result;
}

ApplyResult ProcessGovernor::apply_cgroup_policy(int32_t pid, const GroupPolicy& group_policy) {
ApplyResult result;

if (!cgroup_driver_.is_available() || !cgroup_driver_.is_enabled()) {
uint64_t now = get_current_time_ns();
if (now - last_cgroup_unavailable_ns_ > kCgroupUnavailableRateLimitNs) {
last_cgroup_unavailable_ns_ = now;
stats_.cgroup_unavailable_events++;
if (event_callback_) {
GovApplyMsg msg;
msg.pid = pid;
event_callback_(6, msg, 0);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The event types (4, 5, 6) are used as magic numbers. These should be defined in an enum (e.g., GovEventType) to improve maintainability and clarity.

…ministic eviction tests

Wave2 P2-2: wire GroupPolicyStore into ProcessGovernor with:
- GroupPolicyStore member (cap=256) and pid→group map (cap=8192)
- LRU eviction for groups and pids
- apply_group_policy and apply_cgroup_policy methods
- Event callbacks for GROUP_EVICT/PIDMAP_EVICT/GOV_CGROUP_UNAVAILABLE
- Rate-limited cgroup unavailability events
- ViolationAction aligned to NONE/WARN/SOFT_KILL/HARD_KILL
- Deterministic eviction unit tests using controllable time sequence
- Fix memset-on-non-trivial warnings in GroupPolicyStore
- Add GovEventType enum registry to prevent event ID collisions
@heidi-dang heidi-dang merged commit 33cd0cb into main Feb 19, 2026
3 of 4 checks passed
@heidi-dang heidi-dang deleted the feat/w2p2-group-policy-store branch February 19, 2026 06:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant