Skip to content

Conversation

@tallpsmith
Copy link
Contributor

@tallpsmith tallpsmith commented Jan 17, 2026

Overview

Add column grouping capability to pmrep - allowing configuration files to define visual group headers that span multiple metric columns, similar to vmstat output format.

Why

When monitoring many related metrics (e.g., memory.free, memory.buff, memory.cache), they're difficult to mentally group without visual separation. Column grouping makes output more readable and easier to scan.

What

Example Output (with group separators):

 % pmrep :macstat-grouped -t1 -s3          
           load |             memory            |   paging  |     io    |      cpu     
            1min|   free   wired  active   inact| pi/k  po/k| bi/k  bo/k|  us   sy   id
06:21:47    0.89     166     957    3401    3366   N/A   N/A   N/A   N/A  N/A  N/A  N/A
06:21:48    0.89     166     958    3400    3367     0     0     0     1    2    2   96
06:21:49    0.89     169     958    3396    3369     0     0     0     0    1    2   96

Configuration Syntax:

See macstat-grouped.conf included in the PR for further example.

[vmstat-grouped]
header = yes
groupalign = center          # Optional: left|center|right
groupsep = |                 # Optional: separator between groups

# Group definitions
group.procs = running, blocked
group.procs.prefix = kernel.all
group.procs.label = procs

group.memory = swap.used, free, bufmem, allcache
group.memory.prefix = mem.util
group.memory.label = memory

# ... metric definitions follow ...

Implementation Status

  • ✅ Core parsing and rendering complete
  • ✅ 166+ unit tests passing
  • ✅ Manual VM testing successful
  • ⏳ QA integration tests to follow

Design Notes

  • Backwards compatible: Existing configs unchanged
  • Auto-detection: Group headers enable automatically if group.* entries exist
  • Prefix resolution: Leaf metric names get prefix prepended; FQDNs used as-is
  • Phase 1 scope: Stdout output; CSV support planned for Phase 2

Add unit testing infrastructure for pmrep that allows tests to run
WITHOUT requiring PCP to be installed. This enables fast TDD cycles
(tests complete in <1 second) compared to the 5-6 hour CI cycle.

Key components:
- src/pmrep/test/mock_pcp.py: Mock implementations of PCP modules
  (pmapi, pmconfig, pmi, cpmapi, cpmi) that are injected into
  sys.modules before importing pmrep
- src/pmrep/test/test_smoke.py: Initial smoke tests verifying the
  module can be imported and constants are defined
- src/pmrep/test/GNUmakefile: Test runner using unittest discover

The mock infrastructure follows the pattern established by mpstat and
pidstat tests, but goes further by stubbing out the PCP C extensions
entirely, enabling testing on systems without PCP installed.

Run tests with: cd src/pmrep/test && make test
Eliminate symlink workaround by importing pmrep directly via sys.path.
Test now cleans up __pycache__ after running.
Enable unit testing by extracting three pure functions to module level:
parse_non_number, remove_delimiter, and option_override. Class methods
now delegate to these functions, preserving the existing API.
Extract format_stdout_value as a pure module-level function that returns
(value, format_string) tuple. The class method now delegates to this
function and handles the format list mutation. Adds 15 new tests.
Replace scattered attributes with structured, immutable config objects
(OutputConfig, FilterConfig, ScaleConfig) following Single Responsibility.
Enable unit testing of header generation by extracting
HeaderFormatter class with format string building logic.
Wrap pmconfig access in a mockable interface following the dependency
injection pattern used by mpstat/pidstat for unit testing.
Add GroupConfig and GroupHeaderFormatter for vmstat-style grouped
column headers. Enables displaying related metrics under common labels.
Add groupalign, groupheader, groupsep, and groupsep_data to valid
configuration keys. Without these, the config parser treats group
options as metric names, causing PM_ERR_NAME errors.

Tests verify all four keys are recognized as valid configuration options.
Document completion of bug fix (Commit 7b4e88f) and consolidate
unit testing information from PLAN-pmrep-unit-testing.md.

Update includes:
- Implementation status section with phase completion tracking
- Bug fix details and TDD cycle documentation
- Fast unit testing workflow (146 tests in 0.002s)
- Updated implementation order reflecting actual work done
- Next steps: documentation and QA tests
This plan has been successfully completed and its content consolidated
into PLAN-pmrep-column-grouping.md. Keeping for historical reference.
Complete documentation for column grouping feature to enable users
to organize metrics visually. Adds example config, option references,
and comprehensive man page sections.
Prevent group.<handle> pattern keys from being parsed as metric
definitions by adding keys_ignore container with pattern matching.
Validate entire macstat config parses without PM_ERR_NAME by testing
all option keys, group keys, and metric definitions are correctly
recognized or ignored during config parsing.
VM testing revealed groups.py module not wired into pmrep.py despite
existing and passing all unit tests. Config parsing works but headers
don't render. Mark Step 6 incomplete with detailed next steps.
Enables group headers to appear above metric columns when configured.
Integration tested with TDD - failing tests proved missing rendering,
implementation makes tests pass. Adds GroupHeaderFormatter call in
write_header_stdout() when groupheader enabled and groups configured.
Key insights from investigation:
- Rendering code at pmrep.py:1050-1055 works correctly
- Integration tests pass because they manually set up configs
- Config parsing from file not implemented - why group headers
  don't appear in VM despite rendering code working
- Module installation missing - groups.py not installed to system
- Pylint errors from reimported sys/os in try-except block

Updated plan with:
- Bug Fix performancecopilot#3 section for module installation and linting
- Clarified Step 6 status (rendering done, parsing missing)
- Added linting requirement to testing workflow
- Updated next steps with priority order
Enable group headers to work from config files by implementing full
config parsing and module installation. Previously, rendering code
existed but wasn't wired up to parse group.* entries from configs.

Key changes:
- Install groups.py to $(PCP_SHARE_DIR)/lib/pmrep for runtime access
- Fix import logic to check installed location first
- Implement parse_group_definitions() in groups.py (pure function)
- Wire up parsing in prepare_stdout_std() with column width calculation
- Add linting to unit test workflow
- Add 6 TDD tests for config parsing (166 tests total, all passing)

Architecture: parse_group_definitions() lives in groups.py for better
separation of concerns - groups.py now owns all group logic.
The --ignore-unknown flag existed but didn't work - it still caused
fatal exits on unknown metrics. Fixed by adding per-metric failure
tracking (metric_sts dict) and checking ignore_unknown_metrics()
before exiting in exception handlers.

Fixes performancecopilot#2452
Update linting section to be more actionable and specific:
- Emphasize running linting before every commit
- Document pmrep-specific command with 10.00/10 score requirement
- Explain CI blocking behavior to encourage local testing
- Standardize format for other modules
Three related fixes for column grouping display:

1. Timestamp indentation (pcp-6n3): Group headers now account for actual timestamp
   column width instead of hardcoded 2 spaces, ensuring headers align over correct
   columns when timestamp is enabled.

2. Centering asymmetry (pcp-5st): Improved label centering with consistent
   left-biased padding when total padding is odd, instead of Python's default
   right-biased approach, for more balanced visual presentation.

3. Wide label warnings (pcp-fy3): Added check_label_widths() method to detect
   group labels wider than their spans and emit startup warnings to inform users
   that labels will be truncated.

All 32 group tests passing.
Corrects mock module loading to ensure consistent pmapi.pmErr exception
handling across test modules. Loads real pmconfig with mocked pmapi
dependencies once during test setup to prevent module reloading issues.
Also sets TRUNC constant correctly.
Provides clear directive to run 'make test' before committing changes.
Group separators were misaligned because format_header() didn't account
for the delimiter that appears between groups in actual column output.
Groupsep was being added after delimiter, creating cumulative rightward
drift. Groupsep should replace the delimiter position to maintain proper
column alignment.
Prevents awkward truncation artifacts like '1 minut' appearing in headers when instance names are too long for their column widths
Better reflects the config's purpose and removes redundant instance info when column labels already identify instances
@tallpsmith tallpsmith requested a review from myllynen January 18, 2026 06:43
@tallpsmith tallpsmith self-assigned this Jan 18, 2026
@tallpsmith
Copy link
Contributor Author

tallpsmith commented Jan 18, 2026

@myllynen - this is still draft, I just have to fix up some linting errors, and add in support for --no-group-headers CLI option to override any view configuration. But would love your eyes on this to see what you think.

I've refactored the code pulling out into different responsible code module/blocks, just because the pmrep file was getting looooong and Claude was choking on it :-D

…uppression

Allows users to disable group headers without requiring a config file, enabling
convenient command-line control over group header display. Registers new CLI
option and implements handler in option() callback, with auto-detection logic
left unchanged for backward compatibility.
@tallpsmith
Copy link
Contributor Author

--no-group-headers option added:

admin@Manageds-Virtual-Machine pmrep % pmrep --no-group-headers -s3 :macstat-grouped
            1min|   free   wired  active   inact| pi/k  po/k| bi/k  bo/k|  us   sy   id
01:06:51   10.48     361     800    3528    3488   N/A   N/A   N/A   N/A  N/A  N/A  N/A
01:06:52   10.48     361     801    3526    3490     0     0     0     0    1    2   96
01:06:53   10.48     361     801    3525    3492     0     0     0     0    2    2   96

admin@Manageds-Virtual-Machine pmrep % pmrep -s3 :macstat-grouped 
           load |             memory            |   paging  |     io    |      cpu     
            1min|   free   wired  active   inact| pi/k  po/k| bi/k  bo/k|  us   sy   id
01:06:58    9.80     258     807    3585    3527   N/A   N/A   N/A   N/A  N/A  N/A  N/A
01:06:59    9.80     258     808    3583    3528     0     0     0     0    1    1   97
01:07:00    9.02     260     807    3582    3529     0     0     0     0    2    2   97

Ensures local linting matches CI behavior by:
- Installing pylint in macOS workflow to match other platforms
- Removing deprecated suggestion-mode from .pylintrc for pylint 4.x
- Fixing no-member error in pmrep.py (invalid write_msg call)
- Removing unused variables in groups.py and header.py
…o validate this directory when working with pmrep.
@natoscott
Copy link
Member

@tallpsmith FWIW, this is looking increasingly like pcp-dstat ... would it be worth exploring that on macOS rather than adding more pmrep options? 🤷

@tallpsmith
Copy link
Contributor Author

@tallpsmith FWIW, this is looking increasingly like pcp-dstat ... would it be worth exploring that on macOS rather than adding more pmrep options? 🤷

@natoscott Ok, well, I never knew that existed! Doesn' this violate the pm* standard tooling convention? I see a number of tools in the pcp tree now I had no idea about (I thought that directory was related to a core library thing. I best get exploring...

Having said that, there may still be legitimate niceties to group metrics in 'pmrep' for whatever reason, but it may certainly reduce the likelihood of needing a more 'adhoc' tool like pmrep now ?

Still, this just opens another macOS-specific bug to fix:

Screenshot 2026-01-20 at 11 39 46 am

(sorry, copy/paste seems to have choked in my VM and can't seem to fix it ). I'll open a bug.

I can see we probably won't need the :macstat pmrep config if I can get pcp dstat to work on macOS.

@tallpsmith tallpsmith marked this pull request as ready for review January 20, 2026 00:41
@natoscott
Copy link
Member

That's coming from src/pcp/dstat/plugins/cpu:

usr = 100 * (rate(kernel.percpu.cpu.user) + rate(kernel.percpu.cpu.nice) + rate(kernel.percpu.cpu.irq.soft) + rate(kernel.percpu.cpu.irq.hard))

This is a derived metrics expression, as in pmRegisterDerived(3). Its likely the IRQ metrics that are problematic here, and I think it might be solvable with the "defined" keyword - @kmcdonell will know for sure though.

@natoscott
Copy link
Member

Doesn't this violate the pm* standard tooling convention?

Yes, so historically these tools are those that have a matching non-PCP implementation, and we had several issues to solve as a result. We wanted to match the original tool arguments as best we can (which may conflict with usual PCP options). We wanted a way to pass those standard PCP command line options (like -h/-a/-t, etc) into the PCP versions, which can be done via "pcp --host www.example.com --interval 5sec dstat" (because the pcp(1) front end handles the PCP arguments separately and magically passes that info to the tool).

Every now and then I toy with the idea of making all the tools accessible this way (e.g. "pcp info", etc) ... which could be done by a bunch of symlinks initially ... but I never get around to following up and exploring that further. I think that could be a better user experience though, if all tools were accessible this same way.

@kmcdonell
Copy link
Member

@natoscott you're correct, these metric names need to be "wrapped" in some defined() magic. I've just push commit e9d51aa that does the wrapping for the "irq", "wait" and "steal" CPU metrics.
It remains open to decide if pcp-dstat should say nothing and report 0 for "metrics" that have no underlying instrumentation, e.g. wai, hiq, siq and stl in the example below:

kenj@vm06:~$ dstat --cpu-adv
------total-cpu-usage------
usr sys idl wai hiq siq stl
  1   2  96   0   0   0   0
  0   2  96   0   0   0   0

And of course this is the tip of the iceberg, don't even think about dstat --vm for a non-Linux system, and I've not looked at any of the other dstat derived metric definitions.

@natoscott
Copy link
Member

Awesome!

And of course this is the tip of the iceberg, don't even think about dstat --vm for a non-Linux system, and I've not looked at any of the other dstat derived metric definitions.

Replaying Linux archives on macOS - and vice-versa - is another wrinkle we need to keep in mind here.

@kmcdonell
Copy link
Member

@natoscott Replaying will "just work" because the metadata will match the PMNS for the PMAPI context.

If it works on Linux for a local pmcd, an archive created there will replay just fine on macOS or *BSD. And vice versa.

Or am I missing something obvious?

@natoscott
Copy link
Member

Or am I missing something obvious?

I was just thinking in relation to your "tip of the iceberg" and "dstat --vm for a non-Linux system" comments ... some of these may be tough to solve via derived metrics (memory metrics in particular, can be totally different between platforms).

@tallpsmith
Copy link
Contributor Author

@natoscott I (personally) think this is nice to have for the next release, but if we want to hold off on a review until the next one that's also cool by me.

@natoscott
Copy link
Member

@tallpsmith @myllynen pmrep is Marko's baby so let's wait to hear his thoughts. No rush on this one though I guess, looks like 7.1.1 material to me if he's keen.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants