Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 4 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,8 +1,10 @@
# Terraform modules by FastRobot

This is a collection of modules in use by FastRobot. We typically call them
via `terragrunt`. Each top-level directory is its own self-contained module.
via `terragrunt`. Eventually we'll have some common utility modules but for now each top level dir is standalone and defaults to the cheapest possible way to accomplish a task.

## Modules

* `elk` - stands up an AWS ES endpoint and an instance running logstash
* `atlantis` - opinionated wrapper for the official atlantis terraform module, setting up github webhooks and auth for selected repos, plus some ALB authentication schemes through OIDC providers.
* `elk` - stands up an AWS ES endpoint and an instance running logstash.
* `monitoring` - prometheus with various exporters as ecs tasks remote writing to an Amazon Managed Prometheus central collector, fronted by your choice of three grafanas, Amazon Managed Grafana, Grafana Cloud, Open Source Grafana as an ECS task
47 changes: 34 additions & 13 deletions monitoring/README.md
Original file line number Diff line number Diff line change
@@ -1,25 +1,46 @@
# FR approved Monitoring stack
Sets up:
Conditionally sets up most mixes of:
* AWS Managed Prometheus backend (done)
* ECS cluster
* Prometheus task
* prometheus server
* prometheus-sdconfig-reloader
* node_exporter (daemonset)
* blackbox
* ECS fargate cluster (done)
* Prometheus scraper task (done)
* prometheus server (done)
* prometheus-sdconfig-reloader (done)
* node_exporter (daemonset) TODO
* blackbox_exporter TODO
* cadvisor (daemonset)
* grafana (optional!)
* Grafana
* in ecs, cheapest?
* via aws managed (default)
* via grafana cloud (fanciest)
* As an ecs service, maybe the cheapest?
* via aws managed (default) (done)
* via grafana cloud (fanciest, TODO)

We're using a AMP workspace as the central collection point for as many prometheus scrapers and their assorted exporters as you need. Your choice of grafana to view the collected metrics, plus cloudwatch, opensearch and any other service you want to configure.

## AMP Metric Retention
from https://docs.aws.amazon.com/prometheus/latest/userguide/what-is-Amazon-Managed-Service-Prometheus.html
> Metrics ingested into a workspace are stored for 150 days, and are then automatically deleted.

## AMP Ruler and AlertManager

This module also allows you to import a collection of yaml files as recording and alerting rules. We store these in terragrunt's live repo structure, each namespace as their own file, but you can pass any map that matches the structure as linked below.

* use https://docs.aws.amazon.com/prometheus/latest/userguide/AMP-Ruler.html to pass a map of names of yaml strings in by namespace
* Expects prometheus standard rule and alert configuration
* express the inputs however you like. Examples include templates through terragrunt or direct yaml conversion of hcl types in the calling module


Roughly based off of the structure from https://github.com/aws-samples/prometheus-for-ecs

# Developing for
## All the ways to run grafana
Depending on your complexity/scale/money tradeoffs you may have a clear preference for one of these grafana interfaces:

### Amazon Managed Grafana
https://aws.amazon.com/grafana/

This one is presumably the easiest to run going forward, as it's completely hosted by Amazon, but charges per user/month ($9 per admin, $5 per lesser user) so is probably best suited for very small teams. Because it's charger per user, you have to set up some form of enterprise auth, in this case I defaulted to AWS SSO and it was a complicated thing I didn't want to take on AND has org-wide implications I had to setup with manual intervention.

### Open Source Grafana in ECS Fargate
Not working yet, but next, as I suspect it's the cheapest AND most flexible way to run the grafana I'm used to. I'd prefer to use grafana's auth-against-github to manage access and AFAICT you can't against AMG.

### Grafana Cloud
https://grafana.com/products/cloud/

Free forever for 3 users, probably easy to point at the AMP/ES/opensearch. Will probably be cutting edge grafana so worth exploring the premium tier which is a dollar cheaper than AMG. Not working yet
12 changes: 12 additions & 0 deletions monitoring/data.tf
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
data "aws_region" "current" {}

data "aws_subnets" "private" {
filter {
name = "vpc-id"
values = [var.vpc_id]
}
}

#output "subnet_cidr_blocks" {
# value = [for s in data.aws_subnet.example : s.cidr_block]
#}
35 changes: 35 additions & 0 deletions monitoring/ecs.tf
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
locals {
create_ecs = alltrue([var.enable, anytrue([var.enable_grafana_ecs, var.enable_prometheus])])
# also local.create_prometheus from main.tf
}

# ECS cluster for monitoring tasks
module "ecs" {
count = local.create_ecs ? 1 : 0
source = "terraform-aws-modules/ecs/aws"
version = "3.5.0"
name = "${local.full_name}-ecs"

container_insights = true

capacity_providers = ["FARGATE", "FARGATE_SPOT"]

default_capacity_provider_strategy = [
{
capacity_provider = "FARGATE_SPOT"
}
]

tags = local.tags
}

# alternately, setup the cloudwatch agent to scrape and remote-write to AMP
# https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/ContainerInsights-Prometheus.html

# define container definitions for a prometheus and sd sidecar and other exporters
# https://github.com/cloudposse/terraform-aws-ecs-container-definition



# combine multiple above defs into
# https://github.com/cloudposse/terraform-aws-ecs-alb-service-task
Loading