Skip to content

andreas-mausch/grafana-prometheus-loki-alertmanager-setup

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

60 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

What is this?

A Grafana setup to analyse logs from docker services with

  • Loki for storing the logs.
  • Prometheus (with cadvisor) for storing docker container metrics.
  • Grafana Alloy Alloy is used to send logs from docker containers to the Loki service, and metrics to Prometheus.
  • Mailpit as SMTP host and e-mail browser client.

Link to my Blog post: https://andreas-mausch.de/blog/2021-05-14-monitoring-grafana/

Run

docker compose --profile=service up

Grafana is accessible at http://localhost:3000
The login credentials are admin/password.

Grafana dashboard

I've set up a custom Grafana dashboard which shows the memory usage of all docker services and the latest log entries with warning or error.

You can find it's configuration in the grafana/ folder.

To see all the logs, you can select the Explore menu item, make sure Code is selected from Builder/Code and enter the query string {platform="docker"}.

This should give you all the logs, including info level logging.

Advanced queries

You can write more advanced log queries, for example:

{platform="docker"} | detected_level=`info` | pattern "<date> <time> <level> <_>"
{platform="docker", docker_compose_project="grafana-prometheus-loki-alertmanager-setup", docker_compose_service="my-service"} | pattern "<date> <time> <level> <_>"
  • detected_level:

    Explore Logs adds a special detected_level label to all log lines where Loki assigns a level of the log line, including debug, info, warn, error, fatal, critical, trace, or unknown if no level could be determined. -- https://grafana.com/docs/grafana/latest/explore/simplified-exploration/logs/labels-and-fields/

    See also here

  • pattern: See here

  • docker_compose_*: These labels are added in the config.alloy and help to filter by e.g. service name.

  • docker_swarm_*: These labels are still missing, I need to add them.

Links

After starting the services, these URLs become available:

Use example logging service

The service is started automatically if --profile=service was passed.

After the service started, you should begin seeing logs in grafana.

Stop the service, wait a few minutes, and you should see an alert email in mailpit.

Use the following command to attach to the service:

docker compose attach --sig-proxy=false my-service

Press CTRL+C twice to detach. Enter strings or 1, 2, 3 followed by enter to generate log entries.

Multiline

We tell Loki how to split multiline logs into the right chunks by telling how a new line starts. In our case, it is the regex above for a date.

Check the Grafana docs on this topic to see allowed values.

Check alerts

There are two types of alerts, both managed via Grafana Alerting:

  • Prometheus and Cadvisor provide the metrics to see if our services run.
  • Loki provides the logs to check for any warnings or errors in the log files.

Alerts are sent:

  • When the my-service is down for more than 1 minute
  • Any message is logged with the content error, failure or exception.

The target recipient is grafana-contact@test.com. You can find the e-mails in the Mailpit. Note: It might take some minutes, until Grafana Alerting decides to fire them.

Especially when a container does not exist anymore, cadvisor seems to continue sending metrics for it. Only after a few minutes a service is marked as not running, so be patient please. And only after that the one-minute timer in Grafana Alerting will start to tick and eventually will switch from Pending to Firing.

You are of course free to set up your own alerting rules and other services which can send alerts to Grafana Alerting. I've just set this up to show an example alert.

Clean up

To stop all services and the according volumes run this:

docker compose --profile=service down --volumes --remove-orphans

Grafana Alerting

Up to the tag grafana-12, I used Alertmanager. Loki and Prometheus have alerts configured and sent them to Alertmanager. Alertmanager then sent emails.

I have now switched to using Grafana Alerting instead. I think the decision is not clear-cut:

  • Pro: All alerting can be configured in one place.
  • Pro: We only need to configure contacts, silences etc. there.
  • Pro: We can eliminate a service (Alertmanager), without losing any functionality.
  • Con: Alertmanager is widely used, and many tools can send alerts to it, which is less common with Grafana Alerting.
  • Neutral: We only need to configure contacts, silences etc. in Grafana Alerting then. This would also be the case when we only use Alertmanager. Just in case of mixing Alertmanager and Grafana Alerting this would result in double maintenance.

Grafana Alloy

Grafana Alloy is a replacement for the Loki Docker driver client, which needed to be installed as a Docker plugin before.

I have written a little rant about it in the old README, check the grafana-12 tag for that.

Now Alloy is a big improvement over that: It sends all logs to the Loki service and additionally all metrics to Prometheus. It even has cadvisor built-in, so cadvisor doesn't need to run as a separate service anymore. This changes how metrics are transferred: Previously, Prometheus had a scrape config and queried it every 15 seconds. Now, Grafana Alloy actively pushes metrics to Prometheus instead.

However, I have noticed it takes ~250 MB of RAM, which is quite a lot in my opinion.

Regarding security, I am not sure I have configured everything correctly. Grafana Alloy has access to a lot of system volumes, including all docker directories. You might want to double-check everything before using parts of it.

Regarding the multiline logging: I have found a solution to configure it (see config.alloy), but no solution to configure it differently for each docker service yet.

About

Grafana setup with a typical logging and alerting stack

Resources

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages