Skip to content

Are the default redis save settings too aggressive? #22

@mhjacks

Description

@mhjacks

I have a group of 16 servers that I'm monitoring using ansible-pcp. I've added a few pmda's and I've left the other settings (sampling interval and retention period) at defaults. My metrics collection system was overwhelmed trying to keep up with the metrics reported, and I am theorizing that the redis save interval was responsible for the high rates of IO my system reported.

The metrics collection host is now a 4 vCPU, 16 GB RAM VM with an 80GB disk. (That seems sufficient based on the sizing guidance in https://pcp.readthedocs.io/en/latest/HowTos/scaling/index.html). On a previous iteration set up using this collection (on Fedora 35), the rdb file in /var/lib/redis was almost continuously being written, and that rdb file was continuously growing. It would get killed by systemd-oom beyond a certain point, depending on the amount of RAM I configured the VM with.

Below is a sample graph from the second setup I tried, this time running on CentOS 8-stream (and Redis 5). The same save settings by default, with increasing amounts of disk I/O:

image

This represents the first 4 hours of reporting from a newly set up collector.

I'm currently experimenting with significantly dialing down the save interval (save 3600 1 only), but I'm not sure of how well that will work with the semantics of how pmproxy/pmlogger work.

I'm willing to PR some changes and take input on them working through this - including if I'm off base. (I am new to the pcp toolset, and to redis). Thanks for your time and attention!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions