Skip to content

Conversation

@tallpsmith
Copy link
Contributor

@tallpsmith tallpsmith commented Jan 2, 2026

Overview

Comprehensive expansion of darwin PMDA monitoring capabilities, adding 60+ new metrics across memory compression, VFS resources, network protocols, and process statistics.

New Metrics

Memory (5 metrics)

Metric Description
mem.util.compressed Compressed memory size (KB)
mem.compressions Compression operations count
mem.decompressions Decompression operations count
mem.compressor.pages Pages held by compressor
mem.compressor.uncompressed_pages Uncompressed pages in compressor

VFS Resources (7 metrics)

Metric Description
vfs.files.count Open file descriptors
vfs.files.max Maximum file descriptors
vfs.files.free Available file descriptors
vfs.vnodes.count Active vnodes
vfs.vnodes.max Maximum vnodes
kernel.all.nprocs Total process count
kernel.all.nthreads Total thread count

Network - UDP (5 metrics)

Metric Description
network.udp.indatagrams UDP datagrams received
network.udp.outdatagrams UDP datagrams sent
network.udp.noports Datagrams to closed ports
network.udp.inerrors Input errors
network.udp.rcvbuferrors Receive buffer full errors

Network - ICMP (8 metrics)

Metric Description
network.icmp.inmsgs ICMP messages received
network.icmp.outmsgs ICMP messages sent
network.icmp.inerrors Input errors
network.icmp.indestunreachs Destination unreachable received
network.icmp.inechos Echo requests received
network.icmp.inechoreps Echo replies received
network.icmp.outechos Echo requests sent
network.icmp.outechoreps Echo replies sent

Network - Socket Statistics (2 metrics)

Metric Description
network.sockstat.tcp.inuse TCP sockets in use
network.sockstat.udp.inuse UDP sockets in use

Network - TCP Connection States (11 metrics)

Metric Description
network.tcpconn.established Established connections
network.tcpconn.syn_sent SYN sent (active open)
network.tcpconn.syn_recv SYN received (passive open)
network.tcpconn.fin_wait1 FIN_WAIT_1 state
network.tcpconn.fin_wait2 FIN_WAIT_2 state
network.tcpconn.time_wait TIME_WAIT state
network.tcpconn.close CLOSED state
network.tcpconn.close_wait CLOSE_WAIT state
network.tcpconn.last_ack LAST_ACK state
network.tcpconn.listen LISTEN state
network.tcpconn.closing CLOSING state

Network - TCP Protocol Statistics (15 metrics)

Metric Description
network.tcp.activeopens Active connection attempts
network.tcp.passiveopens Passive connection accepts
network.tcp.attemptfails Failed connection attempts
network.tcp.estabresets Established connections reset
network.tcp.currestab Currently established
network.tcp.insegs Segments received
network.tcp.outsegs Segments sent
network.tcp.retranssegs Segments retransmitted
network.tcp.inerrs Input errors
network.tcp.outrsts RST segments sent
network.tcp.incsumerrors Checksum errors
network.tcp.rtoalgorithm RTO algorithm (constant)
network.tcp.rtomin Minimum RTO (ms)
network.tcp.rtomax Maximum RTO (ms)
network.tcp.maxconn Max connections (constant)

Note: TCP statistics require net.inet.tcp.disable_access_to_stats=0 (documented in new pmdadarwin(1) man page).

Process Metrics (3 metrics)

Metric Description
proc.io.read_bytes Bytes read from disk
proc.io.write_bytes Bytes written to disk
proc.fd.count Open file descriptor count

pmrep Monitoring Views

Six new/enhanced pmrep configurations for comprehensive macOS monitoring:

  • :macstat - Enhanced overview with aggregated network bandwidth
  • :macstat-x - Extended view with VFS and disk byte metrics
  • :macstat-mem - Memory deep-dive (compression, paging, swap)
  • :macstat-dsk - Disk I/O analysis (IOPS, throughput, latency)
  • :macstat-tcp - TCP connection lifecycle and health
  • :macstat-proto - Network protocol overview (UDP, ICMP, TCP summary)

These views provide out-of-the-box monitoring without custom configuration.

Testing Infrastructure

  • Unit tests: dbpmda-based tests for all new metrics (scripts/darwin/test/unit/)
  • Integration tests: PCP tool validation (scripts/darwin/test/integration/)
  • CI enhancement: macOS GitHub Actions workflow now runs darwin-specific tests post-installation

Claude Code Artifacts (For Discussion)

This PR includes AI development tooling for the first time in the PCP codebase:

Agents (src/claude-code/agents/)

  • macos-darwin-pmda-qa - Automated QA testing agent that builds darwin PMDA in isolation and runs full unit/integration test suites
  • pcp-code-reviewer - Code review agent that validates PCP coding standards, style consistency, and darwin PMDA architectural patterns

Skills (src/claude-code/skills/)

  • macos-qa-test - Skill for running darwin PMDA tests in isolated Cirrus VM environment

These tools automate repetitive QA tasks and enforce code quality standards. Maintainers should review whether these should be retained, relocated, or removed from the main codebase.

Documentation Files

Several planning and development documents are included for transparency:

  • DARWIN-PMDA-ENHANCEMENT-PLAN.md - Complete development plan and implementation patterns
  • PMDA_REFACTOR.md - Code organization and refactoring notes
  • Various .plan.md files - Phase-specific planning documents

Note: These documentation files can be removed before merge if desired, after maintainer review.

Code Organization

All enhancements follow modular architecture with dedicated subsystem files (e.g., vfs.c/h, udp.c/h, tcp.c/h), keeping pmda.c focused on coordination rather than implementation.

tallpsmith and others added 30 commits December 11, 2025 13:59
- Add scripts/darwin/ directory with complete development toolkit
- Standalone GNUmakefile enables 5-10 second rebuilds (vs 5-30 min full build)
- Unit tests use dbpmda for pre-install testing without system installation
- Integration tests validate pminfo, pmval, pmstat work correctly
- Update .github/workflows/macOS.yml with 3 new CI test phases:
  * Unit tests (after build, before install)
  * Integration tests (after install)
  * pmstat validation
- Quick-test.sh runs build and all tests in ~30 seconds total

Solves fast local iteration for Darwin PMDA development by:
1. Using locally built libraries from Makepkgs output (pcp-X.Y.Z/)
2. Building only the Darwin PMDA, not entire PCP
3. Automating test execution in CI/CD
4. Providing clear error messages for missing prerequisites
Replace dbpmda-based tests with basic DSO/binary validation tests:
- Verify DSO is valid Mach-O dylib
- Verify binary executable was built
- Check for required PMDA symbols
- Verify binary responds to --help

This approach works with locally-built libraries from Makepkgs
without requiring PCP to be installed system-wide or having
compiled namespace files.

Full dbpmda testing remains available via integration tests
after PCP system installation.
- Add scripts/darwin/ directory with complete development toolkit
- Standalone GNUmakefile enables 5-10 second rebuilds (vs 5-30 min full build)
- Unit tests use dbpmda for pre-install testing without system installation
- Integration tests validate pminfo, pmval, pmstat work correctly
- Update .github/workflows/macOS.yml with 3 new CI test phases:
  * Unit tests (after build, before install)
  * Integration tests (after install)
  * pmstat validation
- Quick-test.sh runs build and all tests in ~30 seconds total

Solves fast local iteration for Darwin PMDA development by:
1. Using locally built libraries from Makepkgs output (pcp-X.Y.Z/)
2. Building only the Darwin PMDA, not entire PCP
3. Automating test execution in CI/CD
4. Providing clear error messages for missing prerequisites
Replace dbpmda-based tests with basic DSO/binary validation tests:
- Verify DSO is valid Mach-O dylib
- Verify binary executable was built
- Check for required PMDA symbols
- Verify binary responds to --help

This approach works with locally-built libraries from Makepkgs
without requiring PCP to be installed system-wide or having
compiled namespace files.

Full dbpmda testing remains available via integration tests
after PCP system installation.
…art VM.

It appeared that doing the `pkgbuild` inside whatever filesystem it was originally caused a .PKG corruption, so we simply write it initially to a cleaner temp area before copying it back in one hit.
… macos-darwin-pmda-qa

# Conflicts:
#	scripts/darwin/test/integration/run-integration-tests.sh
…uild tree and not rely on (accidentally) having run configure in the root of the source tree.
BREAKING CHANGE: Memory counters now 64-bit (affects archive compatibility)

- Use host_statistics64() with HOST_VM_INFO64 instead of host_statistics()
- Prevents counter overflow on high-activity systems
Add 5 new metrics exclusive to vm_statistics64 API that expose
macOS memory compressor functionality:

- mem.util.compressed - KB of compressed memory
- mem.compressions - cumulative compression operations
- mem.decompressions - cumulative decompression operations
- mem.compressor.pages - current pages in compressor
- mem.compressor.uncompressed_pages - uncompressed size of compressed pages

These metrics provide visibility into memory pressure relief via
the macOS memory compression subsystem.

Changes:
- pmda.c: Added 5 metrictab entries (items 130-134) and fetch_vmstat case 130
- pmns: Added PMNS entries for all 5 metrics
- help: Added documentation for compression metrics
- test-memory-compression.txt: Unit test for new metrics
Add 7 new metrics for system resource tracking via kern.* sysctls:
- vfs.files.count, vfs.files.max, vfs.files.free
- vfs.vnodes.count, vfs.vnodes.max
- kernel.all.nprocs, kernel.all.nthreads

Implementation follows modular pattern with dedicated vfs.c/vfs.h files.
Includes unit and integration tests.
…s, but oriented towards what is available for Mac.

Note: Swap metrics are commented out deliberately, waiting on a previous PR to merge until they become available.
Add detailed diagnostic output to diagnose pmcd connection failures in
GitHub Actions CI environment. Changes include:

- Pre-startup configuration checks (pmconfig, pmcd.conf, env vars)
- Post-startup verification (process status, network ports, connectivity)
- Enhanced failure debugging with logs and connection tests
- Network connectivity tests for both localhost and 127.0.0.1
- Port binding verification using lsof
- Test pminfo with multiple host specifications

This will help identify whether issues are with pmcd startup, network
binding, localhost resolution, or client configuration.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Add detailed pre-startup and post-startup diagnostics to identify why
pmcd is running but not listening on network ports. New checks include:

Pre-startup:
- Binary locations for pmcd and pminfo
- Critical directory existence (/var/log/pcp/pmcd, /var/lib/pcp)
- PMDA installation directory contents

Startup:
- Explicit logging of which start method succeeded
- Capture error output from all start attempts

Post-startup:
- Full process details (ps aux output)
- Command line arguments pmcd was started with
- Log directory status and permissions
- All open file descriptors for pmcd process
- System log entries for pmcd errors

This will help identify if the issue is missing directories, incorrect
startup command, or pmcd failing to bind to network sockets.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
The previous build showed pmcd was starting with --verify -A flags,
which puts it in verification mode instead of daemon mode. This caused
pmcd to check the config but not actually listen on network ports.

Root cause:
- launchctl load was failing (I/O error)
- Falling back to 'pmcd start' ran /etc/init.d/pmcd
- The init script incorrectly launched pmcd with --verify -A

Fix:
- Bypass all init scripts and launchd
- Run pmcd directly: /usr/local/libexec/pcp/bin/pmcd -f
- The -f flag runs in foreground mode (needed for background job)
- Increased sleep to 5s to allow full initialization

Also added:
- Display launchd plist contents for debugging
- Display plist.stdout/stderr from failed launchd attempts

This should make pmcd actually listen on port 44321 for client
connections.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Follow the same approach as .cirrus.yml - let the installer start pmcd
via launchd, then wait for it to become ready. This validates that the
actual installation works correctly, rather than bypassing the normal
startup mechanism.

Changes:
- Removed manual pmcd startup attempts (launchctl, init scripts, direct)
- Added wait loop (60s timeout, 3s intervals) like .cirrus.yml pattern
- Tests both 'pcp' command and 'pminfo' to verify pmcd is responding
- If timeout expires, shows diagnostics and fails the build
- Simplified pre-checks to focus on installation verification

This properly tests that:
1. The installer creates the launchd plist correctly
2. pmcd starts automatically post-install
3. The service becomes available within reasonable time
4. All components are properly configured

Rather than masking installation issues with manual workarounds.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
PCP tools (pmstat, pmrep, pmval) all support the -s flag to specify
number of samples before exiting. The timeout wrapper was:
- Redundant: tools already exit after N samples
- Non-portable: GNU timeout doesn't exist on macOS
- Unnecessary complexity

Changed:
- pmstat -t 1 -s 2: naturally exits after 2 samples
- pmrep -t 1 -s 2: naturally exits after 2 samples
- pmval -t 1 -s 1: naturally exits after 1 sample

This fixes the "timeout: command not found" errors on macOS while
maintaining the same test behavior.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
pmrep is failing with "ModuleNotFoundError: No module named 'pcp'"
while it works fine in Cirrus CI. Adding diagnostics to identify:

- Which Python interpreter pmrep is using
- Python sys.path for that interpreter
- Where the pcp module is installed (if anywhere)
- Whether pcp module can be imported

This will help identify if:
1. PCP Python module wasn't installed by the package
2. It was installed to wrong Python location
3. PYTHONPATH needs to be set
4. There's a Python version mismatch

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
The pcp Python module is installed at /usr/local/lib/python3.13/site-packages
but pmrep uses pmpython which has sys.path pointing to the build venv.

Root cause:
- pmrep shebang: #!/usr/bin/env pmpython
- pmpython sys.path includes build venv, not /usr/local
- pcp module is installed to /usr/local/lib/python3.13/site-packages
- Module exists but isn't in Python's search path

Fix:
- Export PYTHONPATH=/usr/local/lib/python3.13/site-packages
- Verify pcp module can be imported before running tests
- This makes the installed pcp module accessible to pmrep

This should fix the 2 failing pmrep :macstat tests.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Missed this script in the previous timeout removal commit.
pmstat already exits after -s 2 samples, no need for timeout wrapper.

Fixes: "timeout: command not found" error on macOS in test-pmstat.sh

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
The test was failing because grep patterns expected column names
and values on the same line, but pmstat's formatted output has:
- Line 1: Column headers (loadavg, memory, io, cpu)
- Line 2: Sub-headers (1 min, swpd, free, etc.)
- Line 3+: Numeric data

Fixed by separating checks:
- Verify column headers exist
- Verify numeric data is present in output
- Made memory check lenient (only checks header) since some
  values show "?" on macOS which is expected behavior

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
…w them to be exposed for things like pmstat

Used Claude Code to generate this.
Now that swap metrics have been added to the darwin PMDA, enable
the swap.used metric in both :macstat and :macstat-x configurations.

Tested with integration tests - pmrep :macstat correctly displays
the swpd column with swap usage data.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
@tallpsmith
Copy link
Contributor Author

...This PR is getting sorta long ...

@kmcdonell to quote something @thecowan said many moons ago, I think it was "That's like saying Jack the Ripper was a bit of a lad". Understatement.

I agree. I would have liked @natoscott to have had his eyes on it, or indeed anyone else with macOS interest, but if we're happy with the principles of "If you break it you fix it", I'm up for continuing with any finer touches and further future feedback from others later.

Depends if there's any release planned any time soon? Maybe only @natoscott knows?

@tallpsmith
Copy link
Contributor Author

Also, I'd plan to squash it as a single commit merge (just for the sanity of the git history)?

@natoscott
Copy link
Member

@tallpsmith I'll take a look today.

Also, I'd plan to squash it as a single commit merge (just for the sanity of the git history)?

IMO, its OK to have multiple independent commits introducing distinct functionality ... so either one commit if you prefer, or multiple commits with sensible boundaries (if we have lots of one-liner train-of-thought commits, ideally squash those).

@tallpsmith
Copy link
Contributor Author

@natoscott dis try and commit in logical chucks (each page) but there might some follow up bug fixes and tidy ups in there. It's just a lot of change and therefore quite a lot of commits.

If you don't mind all of them coming across I like the history too but your call.

Copy link
Member

@natoscott natoscott left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks really good. One high-level thing - I'm really not a fan of the new "dev" top level directory nor the several new top-level files. Could we move the dir to either "scripts/darwin", "build/mac", "qa/darwin"? Some PMDAs do have their unit tests below the PMDA subdir, so perhaps another option for some of that content would be src/pmdas/darwin[_proc] too. Likewise for the files, I've made some inline suggestions.

PMDA code all looks great too, nice cleanups and additions!

@tallpsmith
Copy link
Contributor Author

The scripts were originally in scripts/Darwin but some asked me to move them here. So maybe you two rock paper scissors it and let me know. :)

@natoscott
Copy link
Member

I noticed in another PR there were heaps in build/mac - how about we go there by default, pmdas/darwin for unit tests and qa/darwin for anything that could/should become part of regular (installed) PCP QA over time?

@tallpsmith
Copy link
Contributor Author

@natoscott do you remember which PR that was? (I honestly can't remember any in `build/mac' but there's quite a number of things going on at the moment I'm struggling to hold it in my head myself..

@natoscott
Copy link
Member

@natoscott do you remember which PR that was? (I honestly can't remember any in `build/mac' but there's quite a number of things going on at the moment I'm struggling to hold it in my head myself..

Not sure the PR but its commit ce2f98b

@tallpsmith
Copy link
Contributor Author

@natoscott with the QA/packaging issue broken out into #2459 , are you happy to approve this change to merge to main and I'll get on the QA-rework as a seperate branch/changeset.

@tallpsmith
Copy link
Contributor Author

I'd note that this one is worth the next release.

@tallpsmith
Copy link
Contributor Author

Just need to attempt to correct Nathans comment on #2441 and include it here, as this Darwin additions branch has that other PR included (amongst other Tart/Cirrus changes I think), so #2441 is now superseded.

Move unit tests closer to source code (src/pmdas/*/test/) while
centralizing integration tests and orchestration in build/mac/test/.
This improves maintainability and makes the test structure more intuitive.
The --help test requires PCP libraries to be installed system-wide,
which are not available during pre-installation unit tests. Integration
tests already validate PMDA functionality post-installation.
…Header files (since the split out of the PMDA).
@natoscott natoscott merged commit 2b3457b into performancecopilot:main Jan 22, 2026
3 of 20 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

macOS For issues specific or related to macOS

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants