-
-
Notifications
You must be signed in to change notification settings - Fork 262
darwin PMDA: Comprehensive metrics expansion #2442
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
darwin PMDA: Comprehensive metrics expansion #2442
Conversation
- Add scripts/darwin/ directory with complete development toolkit - Standalone GNUmakefile enables 5-10 second rebuilds (vs 5-30 min full build) - Unit tests use dbpmda for pre-install testing without system installation - Integration tests validate pminfo, pmval, pmstat work correctly - Update .github/workflows/macOS.yml with 3 new CI test phases: * Unit tests (after build, before install) * Integration tests (after install) * pmstat validation - Quick-test.sh runs build and all tests in ~30 seconds total Solves fast local iteration for Darwin PMDA development by: 1. Using locally built libraries from Makepkgs output (pcp-X.Y.Z/) 2. Building only the Darwin PMDA, not entire PCP 3. Automating test execution in CI/CD 4. Providing clear error messages for missing prerequisites
Replace dbpmda-based tests with basic DSO/binary validation tests: - Verify DSO is valid Mach-O dylib - Verify binary executable was built - Check for required PMDA symbols - Verify binary responds to --help This approach works with locally-built libraries from Makepkgs without requiring PCP to be installed system-wide or having compiled namespace files. Full dbpmda testing remains available via integration tests after PCP system installation.
- Add scripts/darwin/ directory with complete development toolkit - Standalone GNUmakefile enables 5-10 second rebuilds (vs 5-30 min full build) - Unit tests use dbpmda for pre-install testing without system installation - Integration tests validate pminfo, pmval, pmstat work correctly - Update .github/workflows/macOS.yml with 3 new CI test phases: * Unit tests (after build, before install) * Integration tests (after install) * pmstat validation - Quick-test.sh runs build and all tests in ~30 seconds total Solves fast local iteration for Darwin PMDA development by: 1. Using locally built libraries from Makepkgs output (pcp-X.Y.Z/) 2. Building only the Darwin PMDA, not entire PCP 3. Automating test execution in CI/CD 4. Providing clear error messages for missing prerequisites
Replace dbpmda-based tests with basic DSO/binary validation tests: - Verify DSO is valid Mach-O dylib - Verify binary executable was built - Check for required PMDA symbols - Verify binary responds to --help This approach works with locally-built libraries from Makepkgs without requiring PCP to be installed system-wide or having compiled namespace files. Full dbpmda testing remains available via integration tests after PCP system installation.
…cally for isolation.
…art VM. It appeared that doing the `pkgbuild` inside whatever filesystem it was originally caused a .PKG corruption, so we simply write it initially to a cleaner temp area before copying it back in one hit.
…ferent branch)." This reverts commit 2dbb6d8.
… macos-darwin-pmda-qa # Conflicts: # scripts/darwin/test/integration/run-integration-tests.sh
…uild tree and not rely on (accidentally) having run configure in the root of the source tree.
…ctory, not the project source root.
BREAKING CHANGE: Memory counters now 64-bit (affects archive compatibility) - Use host_statistics64() with HOST_VM_INFO64 instead of host_statistics() - Prevents counter overflow on high-activity systems
Add 5 new metrics exclusive to vm_statistics64 API that expose macOS memory compressor functionality: - mem.util.compressed - KB of compressed memory - mem.compressions - cumulative compression operations - mem.decompressions - cumulative decompression operations - mem.compressor.pages - current pages in compressor - mem.compressor.uncompressed_pages - uncompressed size of compressed pages These metrics provide visibility into memory pressure relief via the macOS memory compression subsystem. Changes: - pmda.c: Added 5 metrictab entries (items 130-134) and fetch_vmstat case 130 - pmns: Added PMNS entries for all 5 metrics - help: Added documentation for compression metrics - test-memory-compression.txt: Unit test for new metrics
Add 7 new metrics for system resource tracking via kern.* sysctls: - vfs.files.count, vfs.files.max, vfs.files.free - vfs.vnodes.count, vfs.vnodes.max - kernel.all.nprocs, kernel.all.nthreads Implementation follows modular pattern with dedicated vfs.c/vfs.h files. Includes unit and integration tests.
…s, but oriented towards what is available for Mac. Note: Swap metrics are commented out deliberately, waiting on a previous PR to merge until they become available.
Add detailed diagnostic output to diagnose pmcd connection failures in GitHub Actions CI environment. Changes include: - Pre-startup configuration checks (pmconfig, pmcd.conf, env vars) - Post-startup verification (process status, network ports, connectivity) - Enhanced failure debugging with logs and connection tests - Network connectivity tests for both localhost and 127.0.0.1 - Port binding verification using lsof - Test pminfo with multiple host specifications This will help identify whether issues are with pmcd startup, network binding, localhost resolution, or client configuration. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Add detailed pre-startup and post-startup diagnostics to identify why pmcd is running but not listening on network ports. New checks include: Pre-startup: - Binary locations for pmcd and pminfo - Critical directory existence (/var/log/pcp/pmcd, /var/lib/pcp) - PMDA installation directory contents Startup: - Explicit logging of which start method succeeded - Capture error output from all start attempts Post-startup: - Full process details (ps aux output) - Command line arguments pmcd was started with - Log directory status and permissions - All open file descriptors for pmcd process - System log entries for pmcd errors This will help identify if the issue is missing directories, incorrect startup command, or pmcd failing to bind to network sockets. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
The previous build showed pmcd was starting with --verify -A flags, which puts it in verification mode instead of daemon mode. This caused pmcd to check the config but not actually listen on network ports. Root cause: - launchctl load was failing (I/O error) - Falling back to 'pmcd start' ran /etc/init.d/pmcd - The init script incorrectly launched pmcd with --verify -A Fix: - Bypass all init scripts and launchd - Run pmcd directly: /usr/local/libexec/pcp/bin/pmcd -f - The -f flag runs in foreground mode (needed for background job) - Increased sleep to 5s to allow full initialization Also added: - Display launchd plist contents for debugging - Display plist.stdout/stderr from failed launchd attempts This should make pmcd actually listen on port 44321 for client connections. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Follow the same approach as .cirrus.yml - let the installer start pmcd via launchd, then wait for it to become ready. This validates that the actual installation works correctly, rather than bypassing the normal startup mechanism. Changes: - Removed manual pmcd startup attempts (launchctl, init scripts, direct) - Added wait loop (60s timeout, 3s intervals) like .cirrus.yml pattern - Tests both 'pcp' command and 'pminfo' to verify pmcd is responding - If timeout expires, shows diagnostics and fails the build - Simplified pre-checks to focus on installation verification This properly tests that: 1. The installer creates the launchd plist correctly 2. pmcd starts automatically post-install 3. The service becomes available within reasonable time 4. All components are properly configured Rather than masking installation issues with manual workarounds. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
PCP tools (pmstat, pmrep, pmval) all support the -s flag to specify number of samples before exiting. The timeout wrapper was: - Redundant: tools already exit after N samples - Non-portable: GNU timeout doesn't exist on macOS - Unnecessary complexity Changed: - pmstat -t 1 -s 2: naturally exits after 2 samples - pmrep -t 1 -s 2: naturally exits after 2 samples - pmval -t 1 -s 1: naturally exits after 1 sample This fixes the "timeout: command not found" errors on macOS while maintaining the same test behavior. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
pmrep is failing with "ModuleNotFoundError: No module named 'pcp'" while it works fine in Cirrus CI. Adding diagnostics to identify: - Which Python interpreter pmrep is using - Python sys.path for that interpreter - Where the pcp module is installed (if anywhere) - Whether pcp module can be imported This will help identify if: 1. PCP Python module wasn't installed by the package 2. It was installed to wrong Python location 3. PYTHONPATH needs to be set 4. There's a Python version mismatch 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
The pcp Python module is installed at /usr/local/lib/python3.13/site-packages but pmrep uses pmpython which has sys.path pointing to the build venv. Root cause: - pmrep shebang: #!/usr/bin/env pmpython - pmpython sys.path includes build venv, not /usr/local - pcp module is installed to /usr/local/lib/python3.13/site-packages - Module exists but isn't in Python's search path Fix: - Export PYTHONPATH=/usr/local/lib/python3.13/site-packages - Verify pcp module can be imported before running tests - This makes the installed pcp module accessible to pmrep This should fix the 2 failing pmrep :macstat tests. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Missed this script in the previous timeout removal commit. pmstat already exits after -s 2 samples, no need for timeout wrapper. Fixes: "timeout: command not found" error on macOS in test-pmstat.sh 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
The test was failing because grep patterns expected column names and values on the same line, but pmstat's formatted output has: - Line 1: Column headers (loadavg, memory, io, cpu) - Line 2: Sub-headers (1 min, swpd, free, etc.) - Line 3+: Numeric data Fixed by separating checks: - Verify column headers exist - Verify numeric data is present in output - Made memory check lenient (only checks header) since some values show "?" on macOS which is expected behavior 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
…w them to be exposed for things like pmstat Used Claude Code to generate this.
Now that swap metrics have been added to the darwin PMDA, enable the swap.used metric in both :macstat and :macstat-x configurations. Tested with integration tests - pmrep :macstat correctly displays the swpd column with swap usage data. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
@kmcdonell to quote something @thecowan said many moons ago, I think it was "That's like saying Jack the Ripper was a bit of a lad". Understatement. I agree. I would have liked @natoscott to have had his eyes on it, or indeed anyone else with macOS interest, but if we're happy with the principles of "If you break it you fix it", I'm up for continuing with any finer touches and further future feedback from others later. Depends if there's any release planned any time soon? Maybe only @natoscott knows? |
|
Also, I'd plan to squash it as a single commit merge (just for the sanity of the git history)? |
|
@tallpsmith I'll take a look today.
IMO, its OK to have multiple independent commits introducing distinct functionality ... so either one commit if you prefer, or multiple commits with sensible boundaries (if we have lots of one-liner train-of-thought commits, ideally squash those). |
|
@natoscott dis try and commit in logical chucks (each page) but there might some follow up bug fixes and tidy ups in there. It's just a lot of change and therefore quite a lot of commits. If you don't mind all of them coming across I like the history too but your call. |
natoscott
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks really good. One high-level thing - I'm really not a fan of the new "dev" top level directory nor the several new top-level files. Could we move the dir to either "scripts/darwin", "build/mac", "qa/darwin"? Some PMDAs do have their unit tests below the PMDA subdir, so perhaps another option for some of that content would be src/pmdas/darwin[_proc] too. Likewise for the files, I've made some inline suggestions.
PMDA code all looks great too, nice cleanups and additions!
|
The scripts were originally in scripts/Darwin but some asked me to move them here. So maybe you two rock paper scissors it and let me know. :) |
|
I noticed in another PR there were heaps in build/mac - how about we go there by default, pmdas/darwin for unit tests and qa/darwin for anything that could/should become part of regular (installed) PCP QA over time? |
|
@natoscott do you remember which PR that was? (I honestly can't remember any in `build/mac' but there's quite a number of things going on at the moment I'm struggling to hold it in my head myself.. |
…hon wasn't building)
…this at some point.
Not sure the PR but its commit ce2f98b |
|
@natoscott with the QA/packaging issue broken out into #2459 , are you happy to approve this change to merge to |
|
I'd note that this one is worth the next release. |
Move unit tests closer to source code (src/pmdas/*/test/) while centralizing integration tests and orchestration in build/mac/test/. This improves maintainability and makes the test structure more intuitive.
The --help test requires PCP libraries to be installed system-wide, which are not available during pre-installation unit tests. Integration tests already validate PMDA functionality post-installation.
…Header files (since the split out of the PMDA).
Overview
Comprehensive expansion of darwin PMDA monitoring capabilities, adding 60+ new metrics across memory compression, VFS resources, network protocols, and process statistics.
New Metrics
Memory (5 metrics)
mem.util.compressedmem.compressionsmem.decompressionsmem.compressor.pagesmem.compressor.uncompressed_pagesVFS Resources (7 metrics)
vfs.files.countvfs.files.maxvfs.files.freevfs.vnodes.countvfs.vnodes.maxkernel.all.nprocskernel.all.nthreadsNetwork - UDP (5 metrics)
network.udp.indatagramsnetwork.udp.outdatagramsnetwork.udp.noportsnetwork.udp.inerrorsnetwork.udp.rcvbuferrorsNetwork - ICMP (8 metrics)
network.icmp.inmsgsnetwork.icmp.outmsgsnetwork.icmp.inerrorsnetwork.icmp.indestunreachsnetwork.icmp.inechosnetwork.icmp.inechorepsnetwork.icmp.outechosnetwork.icmp.outechorepsNetwork - Socket Statistics (2 metrics)
network.sockstat.tcp.inusenetwork.sockstat.udp.inuseNetwork - TCP Connection States (11 metrics)
network.tcpconn.establishednetwork.tcpconn.syn_sentnetwork.tcpconn.syn_recvnetwork.tcpconn.fin_wait1network.tcpconn.fin_wait2network.tcpconn.time_waitnetwork.tcpconn.closenetwork.tcpconn.close_waitnetwork.tcpconn.last_acknetwork.tcpconn.listennetwork.tcpconn.closingNetwork - TCP Protocol Statistics (15 metrics)
network.tcp.activeopensnetwork.tcp.passiveopensnetwork.tcp.attemptfailsnetwork.tcp.estabresetsnetwork.tcp.currestabnetwork.tcp.insegsnetwork.tcp.outsegsnetwork.tcp.retranssegsnetwork.tcp.inerrsnetwork.tcp.outrstsnetwork.tcp.incsumerrorsnetwork.tcp.rtoalgorithmnetwork.tcp.rtominnetwork.tcp.rtomaxnetwork.tcp.maxconnNote: TCP statistics require
net.inet.tcp.disable_access_to_stats=0(documented in newpmdadarwin(1)man page).Process Metrics (3 metrics)
proc.io.read_bytesproc.io.write_bytesproc.fd.countpmrep Monitoring Views
Six new/enhanced pmrep configurations for comprehensive macOS monitoring:
:macstat- Enhanced overview with aggregated network bandwidth:macstat-x- Extended view with VFS and disk byte metrics:macstat-mem- Memory deep-dive (compression, paging, swap):macstat-dsk- Disk I/O analysis (IOPS, throughput, latency):macstat-tcp- TCP connection lifecycle and health:macstat-proto- Network protocol overview (UDP, ICMP, TCP summary)These views provide out-of-the-box monitoring without custom configuration.
Testing Infrastructure
scripts/darwin/test/unit/)scripts/darwin/test/integration/)Claude Code Artifacts (For Discussion)
This PR includes AI development tooling for the first time in the PCP codebase:
Agents (
src/claude-code/agents/)macos-darwin-pmda-qa- Automated QA testing agent that builds darwin PMDA in isolation and runs full unit/integration test suitespcp-code-reviewer- Code review agent that validates PCP coding standards, style consistency, and darwin PMDA architectural patternsSkills (
src/claude-code/skills/)macos-qa-test- Skill for running darwin PMDA tests in isolated Cirrus VM environmentThese tools automate repetitive QA tasks and enforce code quality standards. Maintainers should review whether these should be retained, relocated, or removed from the main codebase.
Documentation Files
Several planning and development documents are included for transparency:
DARWIN-PMDA-ENHANCEMENT-PLAN.md- Complete development plan and implementation patternsPMDA_REFACTOR.md- Code organization and refactoring notes.plan.mdfiles - Phase-specific planning documentsNote: These documentation files can be removed before merge if desired, after maintainer review.
Code Organization
All enhancements follow modular architecture with dedicated subsystem files (e.g.,
vfs.c/h,udp.c/h,tcp.c/h), keepingpmda.cfocused on coordination rather than implementation.