FastRobot · lamont · May 4, 2022 · May 4, 2022 · Aug 25, 2022
diff --git a/README.md b/README.md
@@ -1,8 +1,10 @@
 # Terraform modules by FastRobot
 
 This is a collection of modules in use by FastRobot. We typically call them 
-via `terragrunt`. Each top-level directory is its own self-contained module.
+via `terragrunt`. Eventually we'll have some common utility modules but for now each top level dir is standalone and defaults to the cheapest possible way to accomplish a task. 
 
 ## Modules
 
-* `elk` - stands up an AWS ES endpoint and an instance running logstash
+* `atlantis` - opinionated wrapper for the official atlantis terraform module, setting up github webhooks and auth for selected repos, plus some ALB authentication schemes through OIDC providers.
+* `elk` - stands up an AWS ES endpoint and an instance running logstash.
+* `monitoring` - prometheus with various exporters as ecs tasks remote writing to an Amazon Managed Prometheus central collector, fronted by your choice of three grafanas, Amazon Managed Grafana, Grafana Cloud, Open Source Grafana as an ECS task
diff --git a/monitoring/README.md b/monitoring/README.md
@@ -1,25 +1,46 @@
 # FR approved Monitoring stack
-Sets up:
+Conditionally sets up most mixes of:
 * AWS Managed Prometheus backend (done)
-* ECS cluster
-  * Prometheus task
-    * prometheus server
-    * prometheus-sdconfig-reloader
-  * node_exporter (daemonset)
-  * blackbox
+* ECS fargate cluster (done)
+  * Prometheus scraper task (done)
+    * prometheus server (done)
+    * prometheus-sdconfig-reloader (done)
+  * node_exporter (daemonset) TODO
+  * blackbox_exporter TODO
   * cadvisor (daemonset)
-  * grafana (optional!)
 * Grafana
-  * in ecs, cheapest?
-  * via aws managed (default)
-  * via grafana cloud (fanciest)
+  * As an ecs service, maybe the cheapest?
+  * via aws managed (default) (done)
+  * via grafana cloud (fanciest, TODO)
+
+We're using a AMP workspace as the central collection point for as many prometheus scrapers and their assorted exporters as you need. Your choice of grafana to view the collected metrics, plus cloudwatch, opensearch and any other service you want to configure.
+
+## AMP Metric Retention
+from https://docs.aws.amazon.com/prometheus/latest/userguide/what-is-Amazon-Managed-Service-Prometheus.html
+> Metrics ingested into a workspace are stored for 150 days, and are then automatically deleted.
 
 ## AMP Ruler and AlertManager
+
+This module also allows you to import a collection of yaml files as recording and alerting rules. We store these in terragrunt's live repo structure, each namespace as their own file, but you can pass any map that matches the structure as linked below.
+
 * use https://docs.aws.amazon.com/prometheus/latest/userguide/AMP-Ruler.html to pass a map of names of yaml strings in by namespace
 * Expects prometheus standard rule and alert configuration 
 * express the inputs however you like. Examples include templates through terragrunt or direct yaml conversion of hcl types in the calling module
 
-
 Roughly based off of the structure from https://github.com/aws-samples/prometheus-for-ecs
 
-# Developing for
+## All the ways to run grafana
+Depending on your complexity/scale/money tradeoffs you may have a clear preference for one of these grafana interfaces:
+
+### Amazon Managed Grafana
+https://aws.amazon.com/grafana/
+
+This one is presumably the easiest to run going forward, as it's completely hosted by Amazon, but charges per user/month ($9 per admin, $5 per lesser user) so is probably best suited for very small teams. Because it's charger per user, you have to set up some form of enterprise auth, in this case I defaulted to AWS SSO and it was a complicated thing I didn't want to take on AND has org-wide implications I had to setup with manual intervention.   
+
+### Open Source Grafana in ECS Fargate
+Not working yet, but next, as I suspect it's the cheapest AND most flexible way to run the grafana I'm used to. I'd prefer to use grafana's auth-against-github to manage access and AFAICT you can't against AMG. 
+
+### Grafana Cloud
+https://grafana.com/products/cloud/
+
+Free forever for 3 users, probably easy to point at the AMP/ES/opensearch. Will probably be cutting edge grafana so worth exploring the premium tier which is a dollar cheaper than AMG. Not working yet
diff --git a/monitoring/data.tf b/monitoring/data.tf
@@ -0,0 +1,12 @@
+data "aws_region" "current" {}
+
+data "aws_subnets" "private" {
+  filter {
+    name   = "vpc-id"
+    values = [var.vpc_id]
+  }
+}
+
+#output "subnet_cidr_blocks" {
+#  value = [for s in data.aws_subnet.example : s.cidr_block]
+#}
diff --git a/monitoring/ecs.tf b/monitoring/ecs.tf
@@ -0,0 +1,35 @@
+locals {
+  create_ecs = alltrue([var.enable, anytrue([var.enable_grafana_ecs, var.enable_prometheus])])
+  # also local.create_prometheus from main.tf
+}
+
+# ECS cluster for monitoring tasks
+module "ecs" {
+  count   = local.create_ecs ? 1 : 0
+  source  = "terraform-aws-modules/ecs/aws"
+  version = "3.5.0"
+  name    = "${local.full_name}-ecs"
+
+  container_insights = true
+
+  capacity_providers = ["FARGATE", "FARGATE_SPOT"]
+
+  default_capacity_provider_strategy = [
+    {
+      capacity_provider = "FARGATE_SPOT"
+    }
+  ]
+
+  tags = local.tags
+}
+
+# alternately, setup the cloudwatch agent to scrape and remote-write to AMP
+# https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/ContainerInsights-Prometheus.html
+
+# define container definitions for a prometheus and sd sidecar and other exporters
+# https://github.com/cloudposse/terraform-aws-ecs-container-definition
+
+
+
+# combine multiple above defs into
+# https://github.com/cloudposse/terraform-aws-ecs-alb-service-task