From c26c571c911018bbb1f89ffddf2a6f42802a54e2 Mon Sep 17 00:00:00 2001 From: saphoooo Date: Thu, 2 Dec 2021 07:57:35 +0100 Subject: [PATCH] Install and configure Datadog agent on Minikube --- README.md | 1 + datadog_agent_on_minikube/README.md | 812 ++++++++++ .../datadog-agent-all-features.yaml | 1234 ++++++++++++++ datadog_agent_on_minikube/values.yaml | 1413 +++++++++++++++++ 4 files changed, 3460 insertions(+) create mode 100644 datadog_agent_on_minikube/README.md create mode 100644 datadog_agent_on_minikube/datadog-agent-all-features.yaml create mode 100644 datadog_agent_on_minikube/values.yaml diff --git a/README.md b/README.md index 43c9b5a..2ff2d4c 100644 --- a/README.md +++ b/README.md @@ -70,6 +70,7 @@ These scripts and tools live in this repo, some scripts/tools have their own REA | [Packer Image Build + Terraform Deploy with AWS Secrets Manager and Datadog Agent](./agent_bootstrapping/README.md) | N/A | A guide that helps you to build an AMI with updates + Datadog Agent preinstalled using Packer, and deploy to AWS using Terraform while storing your Datadog API key in AWS Secrets Manager and using an IAM Instance Profile to retrive it at deployment time. | | [ddog](./ddog/README.md) | Java 1.8+ | A command line tool to troubleshoot connectivity issues for on-premises environments. [Ping](./ddog/README.md#Ping), [SendMetric](./ddog/README.md#SendMetric), [SendLog](./ddog/README.md#SendLog) and [SendTrace](./ddog/README.md#SendTrace) are the available commands. | | [webhooks](./webhooks/) | N/A | A collection of [Webhooks](https://docs.datadoghq.com/integrations/webhooks/) examples which use the [Datadog API](https://docs.datadoghq.com/api/latest/) to perform different actions | +| [Install and configure Datadog agent on Minikube](./datadog_agent_on_minikube/) | N/A | A detailed guide to configure Datadog agent in order to monitor each component of a Kubernetes cluster | ## Additional tools These are some additional tools and scripts written by Datadog. diff --git a/datadog_agent_on_minikube/README.md b/datadog_agent_on_minikube/README.md new file mode 100644 index 0000000..5a8b9b1 --- /dev/null +++ b/datadog_agent_on_minikube/README.md @@ -0,0 +1,812 @@ +# Datadog on Minikube + +**Table of content** + +1. Installation with the DaemonSet + 1. Setup your environment +1. Installation with Helm + 1. Setup your environment +1. Tips + +## Setup your environment + +1. Install Minikube: [minikube.sigs.k8s.io/docs/start](https://minikube.sigs.k8s.io/docs/start/) +1. Install Kubectl: [kubernetes.io/docs/tasks/tools](https://kubernetes.io/docs/tasks/tools/#kubectl) +1. Install Helm (only if you target the Helm installation): [helm.sh/docs/intro/install](https://helm.sh/docs/intro/install/) +1. Start Minikube: `minikube start --memory='8g' --cpus='2'` +1. Check your installation: `$ kubectl get nodes` + +``` +NAME STATUS ROLES AGE VERSION +minikube Ready control-plane,master 9d v1.22.2 +``` + +## Installation with the DaemonSet + +### Install Datadog agent + +Follow the first steps from this documentation (1 and 2): [docs.datadoghq.com/agent/kubernetes](https://docs.datadoghq.com/agent/kubernetes/?tab=daemonset) + +From the step 3, pick the full installation: [Manifest template](https://docs.datadoghq.com/resources/yaml/datadog-agent-all-features.yaml). The reason is, if you want to monitor your entire cluster, inculding Kubernetes resources, you have to monitor the process. + +If you don't need the full install (for example if you don't use the APM module, or the Security), you can as well remove these components from the Yaml file, or choose another installation flavor, keeping in mind you'll have to implement the process monitoring by hand. See (docs.datadoghq.com/agent/kubernetes/?tab=daemonset#kubernetes-resources-for-live-containers)[https://docs.datadoghq.com/agent/kubernetes/?tab=daemonset#kubernetes-resources-for-live-containers]. + +### Specific configuration for Minikube + +In order to be able to monitor the control plane, we need some more tuning, and that's not a specificity of Minikube. But Minikube (which relies on Kubeadm) have its specifities as well, and the second section address these specificities. + +#### Initial configuration + +In this section, we assume that you use [datadog-agent-all-features.yaml](https://docs.datadoghq.com/resources/yaml/datadog-agent-all-features.yaml) yaml file as a base configuration. + +To be able apply the configuration file, some customization a required: + +- Remove or comment the secret part, as the secret is already created in step 2 + +```yaml +--- +# Source: datadog/templates/secret-api-key.yaml +apiVersion: v1 +kind: Secret +metadata: + name: datadog-agent + namespace: default + labels: {} +type: Opaque +data: + api-key: PUT_YOUR_BASE64_ENCODED_API_KEY_HERE +``` + +- Replace `PUT_A_BASE64_ENCODED_RANDOM_STRING_HERE` (needs to be at least 32 characters a-zA-z) + +```bash +$ echo -n abcdefghijklmnopqrstuvwxyz1234567890|base64 +YWJjZGVmZ2hpamtsbW5vcHFyc3R1dnd4eXoxMjM0NTY3ODkw +``` + +```yaml +--- +# Source: datadog/templates/secret-cluster-agent-token.yaml +apiVersion: v1 +kind: Secret +metadata: + name: datadog-agent-cluster-agent + namespace: default + labels: {} +type: Opaque +data: + token: YWJjZGVmZ2hpamtsbW5vcHFyc3R1dnd4eXoxMjM0NTY3ODkw +``` + +- For all the container, add the variable `DD_CLUSTER_NAME`, as this variable can't be extrated from cloud provider metadata + +```yaml +- name: DD_CLUSTER_NAME + value: "minikube" +``` + +- Do the same with `DD_KUBELET_TLS_VERIFY` to allow the agent to communicate with kubelet + +```yaml +- name: DD_KUBELET_TLS_VERIFY + value: "false" +``` + +Apply the configuration file: + +``` +$ kubectl apply -f datadog-agent-all-features.yaml +$ kubectl get pods +NAME READY STATUS RESTARTS AGE +datadog-agent-cluster-agent-6f66d65d7b-58lzg 1/1 Running 0 3m31s +datadog-agent-gfhxk 5/5 Running 0 3m23s +``` + +By executing `agent status` against the agent pod, you can see that many component are already running correctly, and some not. That is expected, because monitoring the control plane require some more configuration. For the moment, let's see what is already working: + +```bash +$ kubectl exec -ti datadog-agent-gfhxk -- agent status +===================== +Datadog Cluster Agent +===================== + + - Datadog Cluster Agent endpoint detected: https://10.101.197.84:5005 + Successfully connected to the Datadog Cluster Agent. + - Running: 1.15.1+commit.b9b97b0 + +========== +Logs Agent +========== + + Sending compressed logs in HTTPS to agent-http-intake.logs.datadoghq.com on port 443 + BytesSent: 2.205683e+06 + EncodedBytesSent: 147647 + LogsProcessed: 2169 + LogsSent: 2165 + + kubelet (7.0.0) + --------------- + Instance ID: kubelet:5bbc63f3938c02f4 [OK] + Configuration Source: file:/etc/datadog-agent/conf.d/kubelet.d/conf.yaml.default + Total Runs: 11 + Metric Samples: Last Run: 928, Total: 10,058 + Events: Last Run: 0, Total: 0 + Service Checks: Last Run: 4, Total: 44 + Average Execution Time : 295ms + Last Execution Date : 2021-11-19 14:44:09 UTC (1637333049000) + Last Successful Execution Date : 2021-11-19 14:44:09 UTC (1637333049000) + + kube_apiserver_metrics (1.10.0) + ------------------------------- + Instance ID: kube_apiserver_metrics:64b51327a52a8e5 [OK] + Configuration Source: file:/etc/datadog-agent/conf.d/kube_apiserver_metrics.d/auto_conf.yaml + Total Runs: 13 + Metric Samples: Last Run: 9,274, Total: 115,119 + Events: Last Run: 0, Total: 0 + Service Checks: Last Run: 1, Total: 13 + Average Execution Time : 1.002s + Last Execution Date : 2021-11-19 14:44:03 UTC (1637333043000) + Last Successful Execution Date : 2021-11-19 14:44:03 UTC (1637333043000) +``` + +![minikube_without_control_plane_monitoring](https://user-images.githubusercontent.com/13923756/142644417-79898351-cf5d-4417-8f28-6498397b8d85.png) + +In the next part, we'll configure the agent to be able to monitor the control plane: + +- etcd +- kube_controller_manager +- kube_scheduler + +#### Monitoring the controle plane + +##### etcd + +etcd is the brain of Kubernetes, where the state of everything is stored. Obviously, you want to be able to monitor it. + +[docs.datadoghq.com/integrations/etcd](https://docs.datadoghq.com/integrations/etcd/?tab=containerized) + +To be able to monitor etcd, we need: + +1. the certificates to communicate with +1. a customized configuration for the Datadog agent, which by default relies on auto-discovery to configure etcd monitoring + +The certificates used by etcd can be found directly on the minikube filesystem: + +```bash +$ minikube ssh +$ cd /var/lib/minikube/certs/etcd +$ ls +ca.crt ca.key healthcheck-client.crt healthcheck-client.key peer.crt peer.key server.crt server.key +``` + +So, all we need to do it mounting them inside the agent pod. Find the volumes section in `datadog-agent-all-features.yaml`, and add a new volume of type hostPath: + +```yaml +- hostPath: + path: /var/lib/minikube/certs/etcd + name: etcd-certs +``` + +Only the agent container need to access to etcd, so we just need now to update the agent volumeMounts: + +```yaml +- name: etcd-certs + mountPath: /etc/datadog-agent/minkikube + readOnly: true +``` + +Now, let's create a configMap to replace the actual `auto_conf.yaml` file in `/etc/datadog-agent/conf.d/etcd.d` (you can place it just above the daemonset configuration): + +```yaml +--- +kind: ConfigMap +apiVersion: v1 +metadata: + name: ad-etcd + namespace: default +data: + conf.yaml: |- + ad_identifiers: + - etcd + instances: + - prometheus_url: https://%%host%%:2379/metrics + tls_ca_cert: /etc/datadog-agent/minkikube/ca.crt + tls_cert: /etc/datadog-agent/minkikube/server.crt + tls_private_key: /etc/datadog-agent/minkikube/server.key +``` + +To be able to use it in the agent pod, we have to create a volume from it. Once again, in the volumes section, add the following: + +```yaml +- name: dd-etcd + configMap: + name: ad-etcd +``` + +Then, in the agent container volumeMounts: + +```yaml +- name: dd-etcd + mountPath: /etc/datadog-agent/conf.d/etcd.d/ +``` + +Because we created the configuration file ourself, we don't want to rely on the autoconf for etcd anymore. In order to disable autoconf for etcd, add this new variable in the agent container: + +```yaml +- name: DD_IGNORE_AUTOCONF + value: etcd +``` + +It's now time to test our configuration, by deploying one more time the agent: + +```bash +$ kubectl apply -f datadog-agent-all-features.yaml +$ kubectl get po +NAME READY STATUS RESTARTS AGE +datadog-agent-cluster-agent-6f66d65d7b-58lzg 1/1 Running 0 53m +datadog-agent-t9rtl 5/5 Running 0 38s +$ kubectl exec -ti datadog-agent-t9rtl -- agent status + + etcd (2.7.1) + ------------ + Instance ID: etcd:d09e4493abb0512 [OK] + Configuration Source: file:/etc/datadog-agent/conf.d/etcd.d/conf.yaml + Total Runs: 5 + Metric Samples: Last Run: 1,059, Total: 5,295 + Events: Last Run: 0, Total: 0 + Service Checks: Last Run: 1, Total: 5 + Average Execution Time : 126ms + Last Execution Date : 2021-11-19 15:33:31 UTC (1637336011000) + Last Successful Execution Date : 2021-11-19 15:33:31 UTC (1637336011000) + metadata: + version.major: 3 + version.minor: 5 + version.patch: 0 + version.raw: 3.5.0 + version.scheme: semver +``` + +#### controller manager and scheduler + +These two components have a very similar configuration, and now that we already know how to update a configuration in the agent, let's do this in a bulk. + +In kubeadm, controller manager and scheduler are only listening on the hostnetwork, on 127.0.0.1. To be able to reach them, we also need to run the agent on the hostnetwork. We just need one more variable to do so: + +```yaml +hostNetwork: true +``` + +We have to place it at the daemonset level: + +```yaml +apiVersion: apps/v1 +kind: DaemonSet +metadata: + name: datadog-agent + namespace: default + labels: {} +spec: + selector: + matchLabels: + app: datadog-agent + template: + metadata: + labels: + app: datadog-agent + name: datadog-agent + spec: + hostPID: true + hostNetwork: true +``` + +Next, let's create two more configmaps, one for the scheduler and one for the controller manager in order to replace autoconf: + +```yaml +kind: ConfigMap +apiVersion: v1 +metadata: + name: ad-scheduler + namespace: default +data: + conf.yaml: |- + ad_identifiers: + - kube-scheduler + + init_config: + + instances: + - prometheus_url: https://localhost:10259/metrics + bearer_token_auth: true + ssl_verify: false + leader_election: false +--- +kind: ConfigMap +apiVersion: v1 +metadata: + name: ad-controller-manager + namespace: default +data: + conf.yaml: |- + ad_identifiers: + - kube-controller-manager + + init_config: + + instances: + - prometheus_url: https://localhost:10257/metrics + bearer_token_auth: true + ssl_verify: false + leader_election: false +``` + +We can place it just below the etcd configmap. This two components are by default autoconfigured by the agent, which is not what we want here, so let's update the `DD_IGNORE_AUTOCONF` variable: + +```yaml +- name: DD_IGNORE_AUTOCONF + value: etcd kube-scheduler kube-controller-manager +``` + +Next, as previously, we need to create a volume to be able to these data in our containers. + +```yaml +- name: ad-scheduler + configMap: + name: ad-scheduler +- name: ad-controller-manager + configMap: + name: ad-controller-manager +``` + +Finaly, we need to mount the volumes in the container. Let's update one more time the volumeMounts of the agent: + +```yaml +volumeMounts: + - name: ad-scheduler + mountPath: /etc/datadog-agent/conf.d/kube_scheduler.d/ + - name: ad-controller-manager + mountPath: /etc/datadog-agent/conf.d/kube_controller_manager.d/ +``` + +Now let's apply this config: + +``` +$ kubectl apply -f datadog-agent-all-features.yaml +$ kubectl get po +NAME READY STATUS RESTARTS AGE +datadog-agent-cluster-agent-6f66d65d7b-58lzg 1/1 Running 1 (2d16h ago) 2d18h +datadog-agent-k897l 5/5 Running 0 114s +$ kubectl exec -ti datadog-agent-k897l -- agent status + + kube_controller_manager (2.0.1) + ------------------------------- + Instance ID: kube_controller_manager:aa60000b603ad467 [OK] + Configuration Source: file:/etc/datadog-agent/conf.d/kube_controller_manager.d/conf.yaml + Total Runs: 2 + Metric Samples: Last Run: 1,423, Total: 2,846 + Events: Last Run: 0, Total: 0 + Service Checks: Last Run: 2, Total: 4 + Average Execution Time : 200ms + Last Execution Date : 2021-11-22 09:04:08 UTC (1637571848000) + Last Successful Execution Date : 2021-11-22 09:04:08 UTC (1637571848000) + + + kube_scheduler (2.0.1) + ---------------------- + Instance ID: kube_scheduler:855aa9d114404c21 [OK] + Configuration Source: file:/etc/datadog-agent/conf.d/kube_scheduler.d/conf.yaml + Total Runs: 2 + Metric Samples: Last Run: 49, Total: 98 + Events: Last Run: 0, Total: 0 + Service Checks: Last Run: 2, Total: 4 + Average Execution Time : 74ms + Last Execution Date : 2021-11-22 09:04:05 UTC (1637571845000) + Last Successful Execution Date : 2021-11-22 09:04:05 UTC (1637571845000) +``` + +Voilà! Everything is working fine now, are we are able to monitor every single piece of our Kubernetes cluster. + +## Installation with Helm + +### Install Datadog agent + +The process is straightforward, as explained in the following documentation: [docs.datadoghq.com/agent/kubernetes](https://docs.datadoghq.com/agent/kubernetes/?tab=helm) + +```bash +$ helm repo add datadog https://helm.datadoghq.com +$ helm repo update +$ helm install datadog --set datadog.apiKey= datadog/datadog +``` + +However, to expect monitoring minikube properly, some changes in `values.yaml` must be performed. + +### Retrive values.yaml file + +The easiest way to capture the values of the helm chart, is to execute the folling command: + +```bash +$ helm show values datadog/datadog > values.yaml +``` + +Now that we have a complete `values.yaml`, we can start editing. + +### Basic configuration + +Around line 59, provide a name to your cluster: + +```yaml +clusterName: minikube +``` + +Around line 147, turn `tlsVerify` to false: + +```yaml +tlsVerify: false +``` + +Around line 221, turn logs on: + +```yaml +logs: + # datadog.logs.enabled -- Enables this to activate Datadog Agent log collection + ## ref: https://docs.datadoghq.com/agent/basic_agent_usage/kubernetes/#log-collection-setup + enabled: true +``` + +With these modifications in place, let's deploy the agent: + +```bash +$ helm install datadog -f values.yaml --set datadog.apiKey= datadog/datadog +NAME: datadog +LAST DEPLOYED: Fri Nov 26 10:55:58 2021 +NAMESPACE: default +STATUS: deployed +REVISION: 1 +TEST SUITE: None +NOTES: +Datadog agents are spinning up on each node in your cluster. After a few +minutes, you should see your agents starting in your event stream: + https://app.datadoghq.com/event/stream +``` + +They are some points that we have to fix: + +```bash + Loading Errors + ============== + kube_controller_manager + ----------------------- + Core Check Loader: + Check kube_controller_manager not found in Catalog + + JMX Check Loader: + check is not a jmx check, or unable to determine if it's so + + Python Check Loader: + could not configure check instance for python check kube_controller_manager: could not invoke 'kube_controller_manager' python check constructor. New constructor API returned: +Traceback (most recent call last): + File "/opt/datadog-agent/embedded/lib/python3.8/site-packages/datadog_checks/kube_controller_manager/kube_controller_manager.py", line 136, in __init__ + if url is None and re.search(r'/metrics$', prometheus_url): + File "/opt/datadog-agent/embedded/lib/python3.8/re.py", line 201, in search + return _compile(pattern, flags).search(string) +TypeError: expected string or bytes-like object +Deprecated constructor API returned: +__init__() got an unexpected keyword argument 'agentConfig' + + kube_scheduler + -------------- + Core Check Loader: + Check kube_scheduler not found in Catalog + + JMX Check Loader: + check is not a jmx check, or unable to determine if it's so + + Python Check Loader: + could not configure check instance for python check kube_scheduler: could not invoke 'kube_scheduler' python check constructor. New constructor API returned: +Traceback (most recent call last): + File "/opt/datadog-agent/embedded/lib/python3.8/site-packages/datadog_checks/kube_scheduler/kube_scheduler.py", line 140, in __init__ + if url is None and re.search(r'/metrics$', prometheus_url): + File "/opt/datadog-agent/embedded/lib/python3.8/re.py", line 201, in search + return _compile(pattern, flags).search(string) +TypeError: expected string or bytes-like object +Deprecated constructor API returned: +__init__() got an unexpected keyword argument 'agentConfig' + + etcd (2.8.0) + ------------ + Instance ID: etcd:b584110e00adcdae [ERROR] + Configuration Source: file:/etc/datadog-agent/conf.d/etcd.d/auto_conf.yaml + Total Runs: 2 + Metric Samples: Last Run: 0, Total: 0 + Events: Last Run: 0, Total: 0 + Service Checks: Last Run: 0, Total: 0 + Average Execution Time : 23ms + Last Execution Date : 2021-11-30 09:50:22 UTC (1638265822000) + Last Successful Execution Date : Never + Error: Detected 1 error while loading configuration model `InstanceConfig`: +prometheus_url + field required + Traceback (most recent call last): + File "/opt/datadog-agent/embedded/lib/python3.8/site-packages/datadog_checks/base/checks/base.py", line 992, in run + initialization() + File "/opt/datadog-agent/embedded/lib/python3.8/site-packages/datadog_checks/base/checks/base.py", line 407, in load_configuration_models + instance_config = self.load_configuration_model(package_path, 'InstanceConfig', raw_instance_config) + File "/opt/datadog-agent/embedded/lib/python3.8/site-packages/datadog_checks/base/checks/base.py", line 447, in load_configuration_model + raise_from(ConfigurationError('\n'.join(message_lines)), None) + File "", line 3, in raise_from + datadog_checks.base.errors.ConfigurationError: Detected 1 error while loading configuration model `InstanceConfig`: + prometheus_url + field required +``` + +### etcd + +In this guided part, we are folling the instructions on monitoring the control plane with Datadog agent: https://docs.datadoghq.com/agent/kubernetes/control_plane/?tab=helm#Kubeadm + +Let's start with etcd. To be able to monitor etcd, we need the certificates necessary to communicate with it. In minikube, we can find them under `/var/lib/minikube/certs/` : + +```bash +$ ls /var/lib/minikube/certs/ +apiserver-etcd-client.crt apiserver-kubelet-client.crt apiserver.crt ca.crt etcd front-proxy-ca.key front-proxy-client.key proxy-client-ca.key proxy-client.key sa.pub +apiserver-etcd-client.key apiserver-kubelet-client.key apiserver.key ca.key front-proxy-ca.crt front-proxy-client.crt proxy-client-ca.crt proxy-client.crt sa.key +``` + +Since we need to mount these certificates in the agent pod, we fisrt need to create a volume. Edit the `values.yaml` (around line 1066): + +```yaml + # agents.volumes -- Specify additional volumes to mount in the dd-agent container + volumes: [] + - hostPath: + path: /var/lib/minikube/certs/etcd + name: etcd-certs +``` + +Then, let's create the associated volumeMounts (just below in the file): + +```yaml + # clusterAgent.volumeMounts -- Specify additional volumes to mount in the cluster-agent container + volumeMounts: + - name: etcd-certs + mountPath: /etc/datadog-agent/minkikube + readOnly: true +``` + +Now, we have to create a new entry in conf.d for etcd (around line 281): + +```yaml + # datadog.confd -- Provide additional check configurations (static and Autodiscovery) + ## Each key becomes a file in /conf.d + ## ref: https://github.com/DataDog/datadog-agent/tree/main/Dockerfiles/agent#optional-volumes + ## ref: https://docs.datadoghq.com/agent/autodiscovery/ + confd: + etcd.yaml: |- + ad_identifiers: + - etcd + instances: + - prometheus_url: https://%%host%%:2379/metrics + tls_ca_cert: /etc/datadog-agent/minkikube/ca.crt + tls_cert: /etc/datadog-agent/minkikube/server.crt + tls_private_key: /etc/datadog-agent/minkikube/server.key +``` + +Because we created a manual configuration for etcd, we also want to disable autodiscovery for the component. To do so, let's update `ignoreAutoConfig` around line 469: + +```yaml + # datadog.ignoreAutoConfig -- List of integration to ignore auto_conf.yaml. + ## ref: https://docs.datadoghq.com/agent/faq/auto_conf/ + ignoreAutoConfig: + - etcd +``` + +With these element in place, let's update our config: + +```bash +$ helm upgrade datadog -f values.yaml --set datadog.apiKey= datadog/datadog +$ kubectl exec -ti datadog-dqx48 -- agent status + + etcd (2.8.0) + ------------ + Instance ID: etcd:ed7fa7d544bf41bd [OK] + Configuration Source: file:/etc/datadog-agent/conf.d/etcd.yaml + Total Runs: 3 + Metric Samples: Last Run: 1,060, Total: 3,180 + Events: Last Run: 0, Total: 0 + Service Checks: Last Run: 1, Total: 3 + Average Execution Time : 134ms + Last Execution Date : 2021-11-30 13:15:29 UTC (1638278129000) + Last Successful Execution Date : 2021-11-30 13:15:29 UTC (1638278129000) + metadata: + version.major: 3 + version.minor: 5 + version.patch: 0 + version.raw: 3.5.0 + version.scheme: semver +``` + +#### controller manager and scheduler + +We will update the entry in conf.d (around line 289): + +```yaml + # datadog.confd -- Provide additional check configurations (static and Autodiscovery) + ## Each key becomes a file in /conf.d + ## ref: https://github.com/DataDog/datadog-agent/tree/main/Dockerfiles/agent#optional-volumes + ## ref: https://docs.datadoghq.com/agent/autodiscovery/ + confd: + etcd.yaml: |- + ad_identifiers: + - etcd + instances: + - prometheus_url: https://%%host%%:2379/metrics + tls_ca_cert: /etc/datadog-agent/minkikube/ca.crt + tls_cert: /etc/datadog-agent/minkikube/server.crt + tls_private_key: /etc/datadog-agent/minkikube/server.key + kube_scheduler.yaml: |- + ad_identifiers: + - kube-scheduler + instances: + - prometheus_url: http://localhost:10259/metrics + ssl_verify: false + bearer_token_auth: true + leader_election: false + kube_controller_manager.yaml: |- + ad_identifiers: + - kube-controller-manager + instances: + - prometheus_url: http://localhost:10257/metrics + ssl_verify: false + bearer_token_auth: true + leader_election: false +``` + +And also `ignoreAutoConfig` (line 476) + +```yaml + # datadog.ignoreAutoConfig -- List of integration to ignore auto_conf.yaml. + ## ref: https://docs.datadoghq.com/agent/faq/auto_conf/ + ignoreAutoConfig: + - etcd + - kube_scheduler + - kube_controller_manager +``` + +And finally on line 1094, configure the agent to use host network: + +```yaml + # agents.useHostNetwork -- Bind ports on the hostNetwork + ## Useful for CNI networking where hostPort might + ## not be supported. The ports need to be available on all hosts. It Can be + ## used for custom metrics instead of a service endpoint. + ## + ## WARNING: Make sure that hosts using this are properly firewalled otherwise + ## metrics and traces are accepted from any host able to connect to this host. + useHostNetwork: true +``` + +Then update the agent: + +```bash +$ helm upgrade datadog -f values.yaml --set datadog.apiKey= datadog/datadog +$ kubectl exec -ti datadog-dqx48 -- agent status + + kube_controller_manager (2.0.1) + ------------------------------- + Instance ID: kube_controller_manager:d2f8d67dc8653df9 [OK] + Configuration Source: file:/etc/datadog-agent/conf.d/kube_controller_manager.yaml + Total Runs: 2 + Metric Samples: Last Run: 1,424, Total: 2,848 + Events: Last Run: 0, Total: 0 + Service Checks: Last Run: 2, Total: 4 + Average Execution Time : 204ms + Last Execution Date : 2021-11-30 13:48:21 UTC (1638280101000) + Last Successful Execution Date : 2021-11-30 13:48:21 UTC (1638280101000) + + + kube_scheduler (2.1.1) + ---------------------- + Instance ID: kube_scheduler:b74dcfe4c1e75a03 [OK] + Configuration Source: file:/etc/datadog-agent/conf.d/kube_scheduler.yaml + Total Runs: 2 + Metric Samples: Last Run: 68, Total: 136 + Events: Last Run: 0, Total: 0 + Service Checks: Last Run: 2, Total: 4 + Average Execution Time : 117ms + Last Execution Date : 2021-11-30 13:48:13 UTC (1638280093000) + Last Successful Execution Date : 2021-11-30 13:48:13 UTC (1638280093000) +``` + +TADA! 🎉 + +## Tips + +You probably noticed this line in the scheduler and controller-manager configmap: + +```yaml +leader_election: false +``` + +Leader election, in simple words, is the mechanism that guarantees that only one instance of the kube-scheduler — or one instance of the kube-controller-manager — is actively making decisions, while all the other instances are inactive, but ready to take leadership if something happens to the active one. [Leader election in Kubernetes control plane](https://blog.heptio.com/leader-election-in-kubernetes-control-plane-heptioprotip-1ed9fb0f3e6d#:~:text=Leader%20election%2C%20in%20simple%20words,happens%20to%20the%20active%20one.) + +Because only one instance is actively making decisions, it's crucial to always monitor this very one. So why do we used `leader_election: false`? For two reasons: + +1. There is only one instance of kube-scheduler and kube-controller-manager by master node. And because Minikube is mono-master, we are sure to always monitor the leader one. +1. Leader election status is commonly detected through endpoints, but in Kubernetes the default parameter for `leader-elect-resource-lock` is leases ([kube-controller-manager](https://kubernetes.io/docs/reference/command-line-tools-reference/kube-controller-manager/)), which means there is no endpoint to parse to capture this specific information: + +```bash +$ kubectl get ep -n kube-system +NAME ENDPOINTS AGE +k8s.io-minikube-hostpath 16d +kube-dns 172.17.0.2:53,172.17.0.2:53,172.17.0.2:9153 16d +``` + +However, it's easy to change this specific behavior (ok, for minikube is useless, but a good gotcha for your other Kubernetes implementation, if your goal is to run it in production). Let's start a new minikube instance instance with the following parameters: + +```bash +$ minikube start -p new --memory='8g' --cpus='2' --extra-config=controller-manager.leader-elect-resource-lock=endpoints \ + --extra-config=controller-manager.leader-elect=true \ + --extra-config=scheduler.leader-elect-resource-lock=endpoints \ + --extra-config=scheduler.leader-elect=true +$ kubectl get ep -n kube-system +NAME ENDPOINTS AGE +k8s.io-minikube-hostpath 3m34s +kube-controller-manager 4m21s +kube-dns 172.17.0.2:53,172.17.0.2:53,172.17.0.2:9153 4m7s +kube-scheduler 4m19s +$ kubectl describe ep -n kube-system kube-controller-manager +Name: kube-controller-manager +Namespace: kube-system +Labels: +Annotations: control-plane.alpha.kubernetes.io/leader: + {"holderIdentity":"leader-elec_d57d3e05-aec6-4a47-b6fc-354ca5b6c8a1","leaseDurationSeconds":15,"acquireTime":"2021-11-22T09:34:49Z","renew... +Subsets: +Events: + Type Reason Age From Message + ---- ------ ---- ---- ------- + Normal LeaderElection 5m21s kube-controller-manager leader-elec_d57d3e05-aec6-4a47-b6fc-354ca5b6c8a1 became leader +``` + +That's it! So now let's update this line from the kube-scheduler and kube-controller-manager: + +```yaml +leader_election: true +``` + +An then, apply our config in this cluster: + +> Don't forget, because it's a brand new cluster, you have to go through steps 1 and 2 of [docs.datadoghq.com/agent/kubernetes](https://docs.datadoghq.com/agent/kubernetes/?tab=daemonset) to create RBAC rules and to encode your Datadog API key in a secret. + +```bash +$ kubectl get po +NAME READY STATUS RESTARTS AGE +datadog-agent-cluster-agent-6f66d65d7b-gjrpw 1/1 Running 0 59s +datadog-agent-nbhvm 5/5 Running 0 59s + +$ kubectl exec -ti datadog-agent-nbhvm -- agent status + kube_controller_manager (2.0.1) + ------------------------------- + Instance ID: kube_controller_manager:11266a51fd1eaec8 [OK] + Configuration Source: file:/etc/datadog-agent/conf.d/kube_controller_manager.d/conf.yaml + Total Runs: 4 + Metric Samples: Last Run: 1,426, Total: 5,704 + Events: Last Run: 0, Total: 0 + Service Checks: Last Run: 3, Total: 12 + Average Execution Time : 766ms + Last Execution Date : 2021-11-22 09:51:43 UTC (1637574703000) + Last Successful Execution Date : 2021-11-22 09:51:43 UTC (1637574703000) + + + kube_scheduler (2.0.1) + ---------------------- + Instance ID: kube_scheduler:50f753a83ecda252 [OK] + Configuration Source: file:/etc/datadog-agent/conf.d/kube_scheduler.d/conf.yaml + Total Runs: 3 + Metric Samples: Last Run: 72, Total: 216 + Events: Last Run: 0, Total: 0 + Service Checks: Last Run: 3, Total: 9 + Average Execution Time : 102ms + Last Execution Date : 2021-11-22 09:51:35 UTC (1637574695000) + Last Successful Execution Date : 2021-11-22 09:51:35 UTC (1637574695000) +``` + +This time, we are able to collecte the status of the leader election, so even with multi-master cluster kube-scheduler and kube-controller-manager is consistent. + +I know it's a lot of configuration, and because dealing with configuration is always error prone, I'm providing you with a full example of Daemonset configuration: [datadog-agent-all-features](./datadog-agent-all-features.yaml). \ No newline at end of file diff --git a/datadog_agent_on_minikube/datadog-agent-all-features.yaml b/datadog_agent_on_minikube/datadog-agent-all-features.yaml new file mode 100644 index 0000000..f2f0ac2 --- /dev/null +++ b/datadog_agent_on_minikube/datadog-agent-all-features.yaml @@ -0,0 +1,1234 @@ +--- +# Source: datadog/templates/cluster-agent-rbac.yaml +apiVersion: v1 +kind: ServiceAccount +metadata: + labels: + app: "datadog-agent" + chart: "datadog-2.22.15" + heritage: "Helm" + release: "datadog-agent" + name: datadog-agent-cluster-agent + namespace: default +--- +# Source: datadog/templates/secret-cluster-agent-token.yaml +apiVersion: v1 +kind: Secret +metadata: + name: datadog-agent-cluster-agent + namespace: default + labels: {} +type: Opaque +data: + token: YWJjZGVmZ2hpamtsbW5vcHFyc3R1dnd4eXoxMjM0NTY3ODkw +--- +# Source: datadog/templates/install_info-configmap.yaml +apiVersion: v1 +kind: ConfigMap +metadata: + name: datadog-agent-installinfo + namespace: default + labels: {} + annotations: {} +data: + install_info: | + --- + install_method: + tool: kubernetes sample manifests + tool_version: kubernetes sample manifests + installer_version: kubernetes sample manifests +--- +# Source: datadog/templates/system-probe-configmap.yaml +apiVersion: v1 +kind: ConfigMap +metadata: + name: datadog-agent-system-probe-config + namespace: default + labels: {} +data: + system-probe.yaml: | + system_probe_config: + enabled: true + debug_port: 0 + sysprobe_socket: /var/run/sysprobe/sysprobe.sock + enable_conntrack: true + bpf_debug: false + enable_tcp_queue_length: false + enable_oom_kill: false + collect_dns_stats: true + max_tracked_connections: 131072 + conntrack_max_state_size: 131072 + network_config: + enabled: true + conntrack_init_timeout: 10s + runtime_security_config: + enabled: true + debug: false + socket: /var/run/sysprobe/runtime-security.sock + policies: + dir: /etc/datadog-agent/runtime-security.d + syscall_monitor: + enabled: false +--- +# Source: datadog/templates/system-probe-configmap.yaml +apiVersion: v1 +kind: ConfigMap +metadata: + name: datadog-agent-security + namespace: default + labels: {} +data: + system-probe-seccomp.json: | + { + "defaultAction": "SCMP_ACT_ERRNO", + "syscalls": [ + { + "names": [ + "accept4", + "access", + "arch_prctl", + "bind", + "bpf", + "brk", + "capget", + "capset", + "chdir", + "clock_gettime", + "clone", + "close", + "connect", + "copy_file_range", + "creat", + "dup", + "dup2", + "dup3", + "epoll_create", + "epoll_create1", + "epoll_ctl", + "epoll_ctl_old", + "epoll_pwait", + "epoll_wait", + "epoll_wait", + "epoll_wait_old", + "eventfd", + "eventfd2", + "execve", + "execveat", + "exit", + "exit_group", + "fchmod", + "fchmodat", + "fchown", + "fchown32", + "fchownat", + "fcntl", + "fcntl64", + "fstat", + "fstat64", + "fstatfs", + "fsync", + "futex", + "getcwd", + "getdents", + "getdents64", + "getegid", + "geteuid", + "getgid", + "getpeername", + "getpid", + "getppid", + "getpriority", + "getrandom", + "getresgid", + "getresgid32", + "getresuid", + "getresuid32", + "getrlimit", + "getrusage", + "getsid", + "getsockname", + "getsockopt", + "gettid", + "gettimeofday", + "getuid", + "getxattr", + "ioctl", + "ipc", + "listen", + "lseek", + "lstat", + "lstat64", + "madvise", + "mkdir", + "mkdirat", + "mmap", + "mmap2", + "mprotect", + "mremap", + "munmap", + "nanosleep", + "newfstatat", + "open", + "openat", + "pause", + "perf_event_open", + "pipe", + "pipe2", + "poll", + "ppoll", + "prctl", + "pread64", + "prlimit64", + "pselect6", + "read", + "readlink", + "readlinkat", + "recvfrom", + "recvmmsg", + "recvmsg", + "rename", + "restart_syscall", + "rmdir", + "rt_sigaction", + "rt_sigpending", + "rt_sigprocmask", + "rt_sigqueueinfo", + "rt_sigreturn", + "rt_sigsuspend", + "rt_sigtimedwait", + "rt_tgsigqueueinfo", + "sched_getaffinity", + "sched_yield", + "seccomp", + "select", + "semtimedop", + "send", + "sendmmsg", + "sendmsg", + "sendto", + "set_robust_list", + "set_tid_address", + "setgid", + "setgid32", + "setgroups", + "setgroups32", + "setns", + "setrlimit", + "setsid", + "setsidaccept4", + "setsockopt", + "setuid", + "setuid32", + "sigaltstack", + "socket", + "socketcall", + "socketpair", + "stat", + "stat64", + "statfs", + "sysinfo", + "tgkill", + "umask", + "uname", + "unlink", + "unlinkat", + "wait4", + "waitid", + "waitpid", + "write", + "getgroups", + "getpgrp", + "setpgid" + ], + "action": "SCMP_ACT_ALLOW", + "args": null + }, + { + "names": [ + "setns" + ], + "action": "SCMP_ACT_ALLOW", + "args": [ + { + "index": 1, + "value": 1073741824, + "valueTwo": 0, + "op": "SCMP_CMP_EQ" + } + ], + "comment": "", + "includes": {}, + "excludes": {} + } + ] + } +--- +# Source: datadog/templates/cluster-agent-rbac.yaml +apiVersion: "rbac.authorization.k8s.io/v1" +kind: ClusterRole +metadata: + labels: {} + name: datadog-agent-cluster-agent +rules: + - apiGroups: + - "" + resources: + - services + - endpoints + - pods + - nodes + - namespaces + - componentstatuses + verbs: + - get + - list + - watch + - apiGroups: + - "" + resources: + - events + verbs: + - get + - list + - watch + - create + - apiGroups: ["quota.openshift.io"] + resources: + - clusterresourcequotas + verbs: + - get + - list + - apiGroups: + - "autoscaling" + resources: + - horizontalpodautoscalers + verbs: + - list + - watch + - apiGroups: + - "" + resources: + - configmaps + resourceNames: + - datadogtoken # Kubernetes event collection state + verbs: + - get + - update + - apiGroups: + - "" + resources: + - configmaps + resourceNames: + - datadog-leader-election # Leader election token + verbs: + - get + - update + - apiGroups: # To create the leader election token and hpa events + - "" + resources: + - configmaps + - events + verbs: + - create + - nonResourceURLs: + - "/version" + - "/healthz" + verbs: + - get + - apiGroups: # to get the kube-system namespace UID and generate a cluster ID + - "" + resources: + - namespaces + resourceNames: + - "kube-system" + verbs: + - get + - apiGroups: # To create the cluster-id configmap + - "" + resources: + - configmaps + resourceNames: + - "datadog-cluster-id" + verbs: + - create + - get + - update + - apiGroups: + - "apps" + resources: + - deployments + - replicasets + - daemonsets + - statefulsets + verbs: + - list + - get + - watch + - apiGroups: + - "batch" + resources: + - cronjobs + - jobs + verbs: + - list + - get + - watch + - apiGroups: + - "" + resources: + - serviceaccounts + - namespaces + verbs: + - list + - apiGroups: + - "policy" + resources: + - podsecuritypolicies + verbs: + - get + - list + - watch + - apiGroups: + - rbac.authorization.k8s.io + resources: + - clusterrolebindings + - rolebindings + verbs: + - list + - apiGroups: + - networking.k8s.io + resources: + - networkpolicies + verbs: + - list + - apiGroups: + - policy + resources: + - podsecuritypolicies + verbs: + - use + resourceNames: + - datadog-agent-cluster-agent + - apiGroups: + - "security.openshift.io" + resources: + - securitycontextconstraints + verbs: + - use + resourceNames: + - datadog-agent-cluster-agent + - hostnetwork +--- +# Source: datadog/templates/cluster-agent-rbac.yaml +apiVersion: "rbac.authorization.k8s.io/v1" +kind: ClusterRoleBinding +metadata: + labels: {} + name: datadog-agent-cluster-agent +roleRef: + apiGroup: rbac.authorization.k8s.io + kind: ClusterRole + name: datadog-agent-cluster-agent +subjects: + - kind: ServiceAccount + name: datadog-agent-cluster-agent + namespace: default +--- +# Source: datadog/templates/agent-services.yaml +apiVersion: v1 +kind: Service +metadata: + name: datadog-agent-cluster-agent + namespace: default + labels: {} +spec: + type: ClusterIP + selector: + app: datadog-agent-cluster-agent + ports: + - port: 5005 + name: agentport + protocol: TCP +--- +kind: ConfigMap +apiVersion: v1 +metadata: + name: ad-etcd + namespace: default +data: + conf.yaml: |- + ad_identifiers: + - etcd + instances: + - prometheus_url: https://%%host%%:2379/metrics + tls_ca_cert: /etc/datadog-agent/minkikube/ca.crt + tls_cert: /etc/datadog-agent/minkikube/server.crt + tls_private_key: /etc/datadog-agent/minkikube/server.key +--- +kind: ConfigMap +apiVersion: v1 +metadata: + name: ad-scheduler + namespace: default +data: + conf.yaml: |- + ad_identifiers: + - kube-scheduler + + init_config: + + instances: + - prometheus_url: https://localhost:10259/metrics + bearer_token_auth: true + ssl_verify: false + leader_election: true +--- +kind: ConfigMap +apiVersion: v1 +metadata: + name: ad-controller-manager + namespace: default +data: + conf.yaml: |- + ad_identifiers: + - kube-controller-manager + + init_config: + + instances: + - prometheus_url: https://localhost:10257/metrics + bearer_token_auth: true + ssl_verify: false + leader_election: true +--- +# Source: datadog/templates/daemonset.yaml +apiVersion: apps/v1 +kind: DaemonSet +metadata: + name: datadog-agent + namespace: default + labels: {} +spec: + selector: + matchLabels: + app: datadog-agent + template: + metadata: + labels: + app: datadog-agent + name: datadog-agent + annotations: + container.apparmor.security.beta.kubernetes.io/system-probe: unconfined + container.seccomp.security.alpha.kubernetes.io/system-probe: localhost/system-probe + spec: + hostPID: true + hostNetwork: true + containers: + - name: agent + image: "gcr.io/datadoghq/agent:7.31.1" + imagePullPolicy: IfNotPresent + command: ["agent", "run"] + resources: {} + ports: + - containerPort: 8125 + name: dogstatsdport + protocol: UDP + env: + - name: DD_IGNORE_AUTOCONF + value: etcd kube-scheduler kube-controller-manager + - name: DD_KUBELET_TLS_VERIFY + value: "false" + - name: DD_CLUSTER_NAME + value: "minikube" + # Needs to be removed when Agent N-2 is built with Golang 1.17 + - name: GODEBUG + value: x509ignoreCN=0 + - name: DD_API_KEY + valueFrom: + secretKeyRef: + name: "datadog-agent" + key: api-key + - name: DD_KUBERNETES_KUBELET_HOST + valueFrom: + fieldRef: + fieldPath: status.hostIP + - name: KUBERNETES + value: "yes" + - name: DD_LOG_LEVEL + value: "INFO" + - name: DD_DOGSTATSD_PORT + value: "8125" + - name: DD_DOGSTATSD_NON_LOCAL_TRAFFIC + value: "true" + - name: DD_CLUSTER_AGENT_ENABLED + value: "true" + - name: DD_CLUSTER_AGENT_KUBERNETES_SERVICE_NAME + value: datadog-agent-cluster-agent + - name: DD_CLUSTER_AGENT_AUTH_TOKEN + valueFrom: + secretKeyRef: + name: datadog-agent-cluster-agent + key: token + - name: DD_APM_ENABLED + value: "false" + - name: DD_LOGS_ENABLED + value: "true" + - name: DD_LOGS_CONFIG_CONTAINER_COLLECT_ALL + value: "true" + - name: DD_LOGS_CONFIG_K8S_CONTAINER_USE_FILE + value: "true" + - name: DD_HEALTH_PORT + value: "5555" + - name: DD_DOGSTATSD_SOCKET + value: "/var/run/datadog/dsd.socket" + - name: DD_EXTRA_CONFIG_PROVIDERS + value: "clusterchecks endpointschecks" + volumeMounts: + - name: ad-scheduler + mountPath: /etc/datadog-agent/conf.d/kube_scheduler.d/ + - name: ad-controller-manager + mountPath: /etc/datadog-agent/conf.d/kube_controller_manager.d/ + - name: dd-etcd + mountPath: /etc/datadog-agent/conf.d/etcd.d/ + - name: etcd-certs + mountPath: /etc/datadog-agent/minkikube + readOnly: true + - name: installinfo + subPath: install_info + mountPath: /etc/datadog-agent/install_info + readOnly: true + - name: logdatadog + mountPath: /var/log/datadog + - name: tmpdir + mountPath: /tmp + readOnly: false + - name: config + mountPath: /etc/datadog-agent + - name: runtimesocketdir + mountPath: /host/var/run + mountPropagation: None + readOnly: true + - name: dsdsocket + mountPath: /var/run/datadog + - name: sysprobe-socket-dir + mountPath: /var/run/sysprobe + readOnly: true + - name: sysprobe-config + mountPath: /etc/datadog-agent/system-probe.yaml + subPath: system-probe.yaml + - name: procdir + mountPath: /host/proc + mountPropagation: None + readOnly: true + - name: cgroups + mountPath: /host/sys/fs/cgroup + mountPropagation: None + readOnly: true + - name: pointerdir + mountPath: /opt/datadog-agent/run + mountPropagation: None + - name: logpodpath + mountPath: /var/log/pods + mountPropagation: None + readOnly: true + - name: logscontainerspath + mountPath: /var/log/containers + mountPropagation: None + readOnly: true + - name: logdockercontainerpath + mountPath: /var/lib/docker/containers + mountPropagation: None + readOnly: true + livenessProbe: + failureThreshold: 6 + httpGet: + path: /live + port: 5555 + scheme: HTTP + initialDelaySeconds: 15 + periodSeconds: 15 + successThreshold: 1 + timeoutSeconds: 5 + readinessProbe: + failureThreshold: 6 + httpGet: + path: /ready + port: 5555 + scheme: HTTP + initialDelaySeconds: 15 + periodSeconds: 15 + successThreshold: 1 + timeoutSeconds: 5 + - name: trace-agent + image: "gcr.io/datadoghq/agent:7.31.1" + imagePullPolicy: IfNotPresent + command: ["trace-agent", "-config=/etc/datadog-agent/datadog.yaml"] + resources: {} + ports: + - containerPort: 8126 + hostPort: 8126 + name: traceport + protocol: TCP + env: + - name: DD_KUBELET_TLS_VERIFY + value: "false" + - name: DD_CLUSTER_NAME + value: "minikube" + # Needs to be removed when Agent N-2 is built with Golang 1.17 + - name: GODEBUG + value: x509ignoreCN=0 + - name: DD_API_KEY + valueFrom: + secretKeyRef: + name: "datadog-agent" + key: api-key + - name: DD_KUBERNETES_KUBELET_HOST + valueFrom: + fieldRef: + fieldPath: status.hostIP + - name: KUBERNETES + value: "yes" + - name: DD_CLUSTER_AGENT_ENABLED + value: "true" + - name: DD_CLUSTER_AGENT_KUBERNETES_SERVICE_NAME + value: datadog-agent-cluster-agent + - name: DD_CLUSTER_AGENT_AUTH_TOKEN + valueFrom: + secretKeyRef: + name: datadog-agent-cluster-agent + key: token + - name: DD_LOG_LEVEL + value: "INFO" + - name: DD_APM_ENABLED + value: "true" + - name: DD_APM_NON_LOCAL_TRAFFIC + value: "true" + - name: DD_APM_RECEIVER_PORT + value: "8126" + - name: DD_APM_RECEIVER_SOCKET + value: "/var/run/datadog/apm.socket" + - name: DD_DOGSTATSD_SOCKET + value: "/var/run/datadog/dsd.socket" + volumeMounts: + - name: config + mountPath: /etc/datadog-agent + - name: logdatadog + mountPath: /var/log/datadog + - name: tmpdir + mountPath: /tmp + readOnly: false + - name: dsdsocket + mountPath: /var/run/datadog + - name: runtimesocketdir + mountPath: /host/var/run + mountPropagation: None + readOnly: true + livenessProbe: + initialDelaySeconds: 15 + periodSeconds: 15 + tcpSocket: + port: 8126 + timeoutSeconds: 5 + - name: process-agent + image: "gcr.io/datadoghq/agent:7.31.1" + imagePullPolicy: IfNotPresent + command: ["process-agent", "-config=/etc/datadog-agent/datadog.yaml"] + resources: {} + env: + - name: DD_KUBELET_TLS_VERIFY + value: "false" + - name: DD_CLUSTER_NAME + value: "minikube" + # Needs to be removed when Agent N-2 is built with Golang 1.17 + - name: GODEBUG + value: x509ignoreCN=0 + - name: DD_API_KEY + valueFrom: + secretKeyRef: + name: "datadog-agent" + key: api-key + - name: DD_KUBERNETES_KUBELET_HOST + valueFrom: + fieldRef: + fieldPath: status.hostIP + - name: KUBERNETES + value: "yes" + - name: DD_CLUSTER_AGENT_ENABLED + value: "true" + - name: DD_CLUSTER_AGENT_KUBERNETES_SERVICE_NAME + value: datadog-agent-cluster-agent + - name: DD_CLUSTER_AGENT_AUTH_TOKEN + valueFrom: + secretKeyRef: + name: datadog-agent-cluster-agent + key: token + - name: DD_PROCESS_AGENT_ENABLED + value: "true" + - name: DD_LOG_LEVEL + value: "INFO" + - name: DD_SYSTEM_PROBE_ENABLED + value: "true" + - name: DD_SYSTEM_PROBE_NETWORK_ENABLED + value: "true" + - name: DD_DOGSTATSD_SOCKET + value: "/var/run/datadog/dsd.socket" + - name: DD_ORCHESTRATOR_EXPLORER_ENABLED + value: "true" + volumeMounts: + - name: config + mountPath: /etc/datadog-agent + - name: runtimesocketdir + mountPath: /host/var/run + mountPropagation: None + readOnly: true + - name: logdatadog + mountPath: /var/log/datadog + - name: tmpdir + mountPath: /tmp + readOnly: false + - name: cgroups + mountPath: /host/sys/fs/cgroup + mountPropagation: None + readOnly: true + - name: passwd + mountPath: /etc/passwd + readOnly: true + - name: procdir + mountPath: /host/proc + mountPropagation: None + readOnly: true + - name: dsdsocket + mountPath: /var/run/datadog + readOnly: true + - name: sysprobe-socket-dir + mountPath: /var/run/sysprobe + readOnly: true + - name: sysprobe-config + mountPath: /etc/datadog-agent/system-probe.yaml + subPath: system-probe.yaml + - name: system-probe + image: "gcr.io/datadoghq/agent:7.31.1" + imagePullPolicy: IfNotPresent + securityContext: + capabilities: + add: + - SYS_ADMIN + - SYS_RESOURCE + - SYS_PTRACE + - NET_ADMIN + - NET_BROADCAST + - NET_RAW + - IPC_LOCK + privileged: false + command: ["/opt/datadog-agent/embedded/bin/system-probe", "--config=/etc/datadog-agent/system-probe.yaml"] + env: + - name: DD_KUBELET_TLS_VERIFY + value: "false" + - name: DD_CLUSTER_NAME + value: "minikube" + # Needs to be removed when Agent N-2 is built with Golang 1.17 + - name: GODEBUG + value: x509ignoreCN=0 + - name: DD_API_KEY + valueFrom: + secretKeyRef: + name: "datadog-agent" + key: api-key + - name: DD_KUBERNETES_KUBELET_HOST + valueFrom: + fieldRef: + fieldPath: status.hostIP + - name: KUBERNETES + value: "yes" + - name: DD_LOG_LEVEL + value: "INFO" + resources: {} + volumeMounts: + - name: logdatadog + mountPath: /var/log/datadog + - name: tmpdir + mountPath: /tmp + readOnly: false + - name: debugfs + mountPath: /sys/kernel/debug + mountPropagation: None + - name: config + mountPath: /etc/datadog-agent + - name: sysprobe-config + mountPath: /etc/datadog-agent/system-probe.yaml + subPath: system-probe.yaml + - name: sysprobe-socket-dir + mountPath: /var/run/sysprobe + - name: procdir + mountPath: /host/proc + mountPropagation: None + readOnly: true + - name: os-release + mountPath: /host/etc/os-release + mountPropagation: None + readOnly: true + - name: security-agent + image: "gcr.io/datadoghq/agent:7.31.1" + imagePullPolicy: IfNotPresent + securityContext: + capabilities: + add: ["AUDIT_CONTROL", "AUDIT_READ"] + command: ["security-agent", "start", "-c=/etc/datadog-agent/datadog.yaml"] + resources: {} + env: + - name: DD_KUBELET_TLS_VERIFY + value: "false" + - name: DD_CLUSTER_NAME + value: "minikube" + # Needs to be removed when Agent N-2 is built with Golang 1.17 + - name: GODEBUG + value: x509ignoreCN=0 + - name: DD_API_KEY + valueFrom: + secretKeyRef: + name: "datadog-agent" + key: api-key + - name: DD_KUBERNETES_KUBELET_HOST + valueFrom: + fieldRef: + fieldPath: status.hostIP + - name: KUBERNETES + value: "yes" + - name: DD_LOG_LEVEL + value: "INFO" + - name: DD_COMPLIANCE_CONFIG_ENABLED + value: "true" + - name: DD_COMPLIANCE_CONFIG_CHECK_INTERVAL + value: "20m" + - name: HOST_ROOT + value: /host/root + - name: DD_CLUSTER_AGENT_ENABLED + value: "true" + - name: DD_CLUSTER_AGENT_KUBERNETES_SERVICE_NAME + value: datadog-agent-cluster-agent + - name: DD_CLUSTER_AGENT_AUTH_TOKEN + valueFrom: + secretKeyRef: + name: datadog-agent-cluster-agent + key: token + - name: DD_RUNTIME_SECURITY_CONFIG_ENABLED + value: "true" + - name: DD_RUNTIME_SECURITY_CONFIG_POLICIES_DIR + value: "/etc/datadog-agent/runtime-security.d" + - name: DD_RUNTIME_SECURITY_CONFIG_SOCKET + value: /var/run/sysprobe/runtime-security.sock + - name: DD_RUNTIME_SECURITY_CONFIG_SYSCALL_MONITOR_ENABLED + value: "false" + - name: DD_DOGSTATSD_SOCKET + value: "/var/run/datadog/dsd.socket" + volumeMounts: + - name: config + mountPath: /etc/datadog-agent + - name: logdatadog + mountPath: /var/log/datadog + - name: tmpdir + mountPath: /tmp + readOnly: false + - name: runtimesocketdir + mountPath: /host/var/run + readOnly: true + - name: dsdsocket + mountPath: /var/run/datadog + readOnly: true + - name: cgroups + mountPath: /host/sys/fs/cgroup + readOnly: true + - name: passwd + mountPath: /etc/passwd + readOnly: true + - name: group + mountPath: /etc/group + readOnly: true + - name: hostroot + mountPath: /host/root + readOnly: true + - name: runtimesocketdir + mountPath: /host/root/var/run + readOnly: true + - name: procdir + mountPath: /host/proc + readOnly: true + - name: sysprobe-socket-dir + mountPath: /var/run/sysprobe + readOnly: true + - name: sysprobe-config + mountPath: /etc/datadog-agent/system-probe.yaml + subPath: system-probe.yaml + initContainers: + - name: init-volume + image: "gcr.io/datadoghq/agent:7.31.1" + imagePullPolicy: IfNotPresent + command: ["bash", "-c"] + args: + - cp -r /etc/datadog-agent /opt + volumeMounts: + - name: config + mountPath: /opt/datadog-agent + resources: {} + - name: init-config + image: "gcr.io/datadoghq/agent:7.31.1" + imagePullPolicy: IfNotPresent + command: ["bash", "-c"] + args: + - for script in $(find /etc/cont-init.d/ -type f -name '*.sh' | sort) ; do bash $script ; done + volumeMounts: + - name: logdatadog + mountPath: /var/log/datadog + - name: config + mountPath: /etc/datadog-agent + - name: procdir + mountPath: /host/proc + mountPropagation: None + readOnly: true + - name: runtimesocketdir + mountPath: /host/var/run + mountPropagation: None + readOnly: true + - name: sysprobe-config + mountPath: /etc/datadog-agent/system-probe.yaml + subPath: system-probe.yaml + env: + - name: DD_KUBELET_TLS_VERIFY + value: "false" + - name: DD_CLUSTER_NAME + value: "minikube" + # Needs to be removed when Agent N-2 is built with Golang 1.17 + - name: GODEBUG + value: x509ignoreCN=0 + - name: DD_API_KEY + valueFrom: + secretKeyRef: + name: "datadog-agent" + key: api-key + - name: DD_KUBERNETES_KUBELET_HOST + valueFrom: + fieldRef: + fieldPath: status.hostIP + - name: KUBERNETES + value: "yes" + resources: {} + - name: seccomp-setup + image: "gcr.io/datadoghq/agent:7.31.1" + command: + - cp + - /etc/config/system-probe-seccomp.json + - /host/var/lib/kubelet/seccomp/system-probe + volumeMounts: + - name: datadog-agent-security + mountPath: /etc/config + - name: seccomp-root + mountPath: /host/var/lib/kubelet/seccomp + mountPropagation: None + resources: {} + volumes: + - name: ad-scheduler + configMap: + name: ad-scheduler + - name: ad-controller-manager + configMap: + name: ad-controller-manager + - name: dd-etcd + configMap: + name: ad-etcd + - hostPath: + path: /var/lib/minikube/certs/etcd + name: etcd-certs + - name: installinfo + configMap: + name: datadog-agent-installinfo + - name: config + emptyDir: {} + - hostPath: + path: /var/run + name: runtimesocketdir + - name: logdatadog + emptyDir: {} + - name: tmpdir + emptyDir: {} + - hostPath: + path: /proc + name: procdir + - hostPath: + path: /sys/fs/cgroup + name: cgroups + - hostPath: + path: /var/run/datadog/ + type: DirectoryOrCreate + name: dsdsocket + - hostPath: + path: /var/run/datadog/ + type: DirectoryOrCreate + name: apmsocket + - name: s6-run + emptyDir: {} + - name: sysprobe-config + configMap: + name: datadog-agent-system-probe-config + - name: datadog-agent-security + configMap: + name: datadog-agent-security + - hostPath: + path: /var/lib/kubelet/seccomp + name: seccomp-root + - hostPath: + path: /sys/kernel/debug + name: debugfs + - name: sysprobe-socket-dir + emptyDir: {} + - hostPath: + path: /etc/passwd + name: passwd + - hostPath: + path: /etc/group + name: group + - hostPath: + path: / + name: hostroot + - hostPath: + path: /etc/os-release + name: os-release + - hostPath: + path: /var/lib/datadog-agent/logs + name: pointerdir + - hostPath: + path: /var/log/pods + name: logpodpath + - hostPath: + path: /var/log/containers + name: logscontainerspath + - hostPath: + path: /var/lib/docker/containers + name: logdockercontainerpath + tolerations: + affinity: {} + serviceAccountName: "datadog-agent" + nodeSelector: + kubernetes.io/os: linux + updateStrategy: + rollingUpdate: + maxUnavailable: 10% + type: RollingUpdate +--- +# Source: datadog/templates/cluster-agent-deployment.yaml +apiVersion: apps/v1 +kind: Deployment +metadata: + name: datadog-agent-cluster-agent + namespace: default + labels: {} +spec: + replicas: 1 + strategy: + rollingUpdate: + maxSurge: 1 + maxUnavailable: 0 + type: RollingUpdate + selector: + matchLabels: + app: datadog-agent-cluster-agent + template: + metadata: + labels: + app: datadog-agent-cluster-agent + name: datadog-agent-cluster-agent + annotations: {} + spec: + serviceAccountName: datadog-agent-cluster-agent + containers: + - name: cluster-agent + image: "gcr.io/datadoghq/cluster-agent:1.15.1" + imagePullPolicy: IfNotPresent + resources: {} + ports: + - containerPort: 5005 + name: agentport + protocol: TCP + env: + - name: DD_HEALTH_PORT + value: "5556" + - name: DD_API_KEY + valueFrom: + secretKeyRef: + name: "datadog-agent" + key: api-key + optional: true + - name: DD_KUBELET_TLS_VERIFY + value: "false" + - name: DD_CLUSTER_NAME + value: "minikube" + - name: DD_CLUSTER_CHECKS_ENABLED + value: "true" + - name: DD_EXTRA_CONFIG_PROVIDERS + value: "kube_endpoints kube_services" + - name: DD_EXTRA_LISTENERS + value: "kube_endpoints kube_services" + - name: DD_LOG_LEVEL + value: "INFO" + - name: DD_LEADER_ELECTION + value: "true" + - name: DD_LEADER_LEASE_DURATION + value: "15" + - name: DD_COLLECT_KUBERNETES_EVENTS + value: "true" + - name: DD_CLUSTER_AGENT_KUBERNETES_SERVICE_NAME + value: datadog-agent-cluster-agent + - name: DD_CLUSTER_AGENT_AUTH_TOKEN + valueFrom: + secretKeyRef: + name: datadog-agent-cluster-agent + key: token + - name: DD_KUBE_RESOURCES_NAMESPACE + value: default + - name: DD_ORCHESTRATOR_EXPLORER_ENABLED + value: "true" + - name: DD_ORCHESTRATOR_EXPLORER_CONTAINER_SCRUBBING_ENABLED + value: "true" + - name: DD_COMPLIANCE_CONFIG_ENABLED + value: "true" + - name: DD_COMPLIANCE_CONFIG_CHECK_INTERVAL + value: "20m" + livenessProbe: + failureThreshold: 6 + httpGet: + path: /live + port: 5556 + scheme: HTTP + initialDelaySeconds: 15 + periodSeconds: 15 + successThreshold: 1 + timeoutSeconds: 5 + readinessProbe: + failureThreshold: 6 + httpGet: + path: /ready + port: 5556 + scheme: HTTP + initialDelaySeconds: 15 + periodSeconds: 15 + successThreshold: 1 + timeoutSeconds: 5 + volumeMounts: + - name: installinfo + subPath: install_info + mountPath: /etc/datadog-agent/install_info + readOnly: true + volumes: + - name: installinfo + configMap: + name: datadog-agent-installinfo + affinity: + # Force scheduling the cluster agents on different nodes + # to guarantee that the standby instance can immediately take the lead from a leader running of a faulty node. + podAntiAffinity: + requiredDuringSchedulingIgnoredDuringExecution: + - labelSelector: + matchLabels: + app: datadog-agent-cluster-agent + topologyKey: kubernetes.io/hostname + nodeSelector: + kubernetes.io/os: linux diff --git a/datadog_agent_on_minikube/values.yaml b/datadog_agent_on_minikube/values.yaml new file mode 100644 index 0000000..90587b7 --- /dev/null +++ b/datadog_agent_on_minikube/values.yaml @@ -0,0 +1,1413 @@ +## Default values for Datadog Agent +## See Datadog helm documentation to learn more: +## https://docs.datadoghq.com/agent/kubernetes/helm/ + +# nameOverride -- Override name of app +nameOverride: # "" + +# fullnameOverride -- Override the full qualified app name +fullnameOverride: # "" + +# targetSystem -- Target OS for this deployment (possible values: linux, windows) +targetSystem: "linux" + +# registry -- Registry to use for all Agent images (default gcr.io) +## Currently we offer Datadog Agent images on: +## GCR - use gcr.io/datadoghq (default) +## DockerHub - use docker.io/datadog +## AWS - use public.ecr.aws/datadog +registry: gcr.io/datadoghq + +datadog: + # datadog.apiKey -- Your Datadog API key + # ref: https://app.datadoghq.com/account/settings#agent/kubernetes + apiKey: + + # datadog.apiKeyExistingSecret -- Use existing Secret which stores API key instead of creating a new one + ## If set, this parameter takes precedence over "apiKey". + apiKeyExistingSecret: # + + # datadog.appKey -- Datadog APP key required to use metricsProvider + ## If you are using clusterAgent.metricsProvider.enabled = true, you must set + ## a Datadog application key for read access to your metrics. + appKey: # + + # datadog.appKeyExistingSecret -- Use existing Secret which stores APP key instead of creating a new one + ## If set, this parameter takes precedence over "appKey". + appKeyExistingSecret: # + + # datadog.securityContext -- Allows you to overwrite the default PodSecurityContext on the Daemonset or Deployment + securityContext: {} + # seLinuxOptions: + # user: "system_u" + # role: "system_r" + # type: "spc_t" + # level: "s0" + + # datadog.hostVolumeMountPropagation -- Allow to specify the `mountPropagation` value on all volumeMounts using HostPath + ## ref: https://kubernetes.io/docs/concepts/storage/volumes/#mount-propagation + hostVolumeMountPropagation: None + + # datadog.clusterName -- Set a unique cluster name to allow scoping hosts and Cluster Checks easily + ## The name must be unique and must be dot-separated tokens with the following restrictions: + ## * Lowercase letters, numbers, and hyphens only. + ## * Must start with a letter. + ## * Must end with a number or a letter. + ## * Overall length should not be higher than 80 characters. + ## Compared to the rules of GKE, dots are allowed whereas they are not allowed on GKE: + ## https://cloud.google.com/kubernetes-engine/docs/reference/rest/v1beta1/projects.locations.clusters#Cluster.FIELDS.name + clusterName: minikube + + # datadog.site -- The site of the Datadog intake to send Agent data to + ## Set to 'datadoghq.eu' to send data to the EU site. + site: # datadoghq.com + + # datadog.dd_url -- The host of the Datadog intake server to send Agent data to, only set this option if you need the Agent to send data to a custom URL + ## Overrides the site setting defined in "site". + dd_url: # https://app.datadoghq.com + + # datadog.logLevel -- Set logging verbosity, valid log levels are: trace, debug, info, warn, error, critical, off + logLevel: INFO + + # datadog.kubeStateMetricsEnabled -- If true, deploys the kube-state-metrics deployment + ## ref: https://github.com/kubernetes/kube-state-metrics/tree/kube-state-metrics-helm-chart-2.13.2/charts/kube-state-metrics + kubeStateMetricsEnabled: true + + kubeStateMetricsNetworkPolicy: + # datadog.kubeStateMetricsNetworkPolicy.create -- If true, create a NetworkPolicy for kube state metrics + create: false + + kubeStateMetricsCore: + # datadog.kubeStateMetricsCore.enabled -- Enable the kubernetes_state_core check in the Cluster Agent (Requires Cluster Agent 1.12.0+) + ## ref: https://docs.datadoghq.com/integrations/kubernetes_state_core + enabled: false + + # datadog.kubeStateMetricsCore.ignoreLegacyKSMCheck -- Disable the auto-configuration of legacy kubernetes_state check (taken into account only when datadog.kubeStateMetricsCore.enabled is true) + ## Disabling this field is not recommended as it results in enabling both checks, it can be useful though during the migration phase. + ## Migration guide: https://docs.datadoghq.com/integrations/kubernetes_state_core/?tab=helm#migration-from-kubernetes_state-to-kubernetes_state_core + ignoreLegacyKSMCheck: true + + # datadog.kubeStateMetricsCore.collectSecretMetrics -- Enable watching secret objects and collecting their corresponding metrics kubernetes_state.secret.* + ## Configuring this field will change the default kubernetes_state_core check configuration and the RBACs granted to Datadog Cluster Agent to run the kubernetes_state_core check. + collectSecretMetrics: true + + # datadog.kubeStateMetricsCore.useClusterCheckRunners -- For large clusters where the Kubernetes State Metrics Check Core needs to be distributed on dedicated workers. + ## Configuring this field will create a separate deployment which will run Cluster Checks, including Kubernetes State Metrics Core. + ## ref: https://docs.datadoghq.com/agent/cluster_agent/clusterchecksrunner?tab=helm + useClusterCheckRunners: false + + ## Manage Cluster checks feature + ## ref: https://docs.datadoghq.com/agent/autodiscovery/clusterchecks/ + ## Autodiscovery via Kube Service annotations is automatically enabled + clusterChecks: + # datadog.clusterChecks.enabled -- Enable the Cluster Checks feature on both the cluster-agents and the daemonset + enabled: true + + # datadog.nodeLabelsAsTags -- Provide a mapping of Kubernetes Node Labels to Datadog Tags + nodeLabelsAsTags: {} + # beta.kubernetes.io/instance-type: aws-instance-type + # kubernetes.io/role: kube_role + # : + + # datadog.podLabelsAsTags -- Provide a mapping of Kubernetes Labels to Datadog Tags + podLabelsAsTags: {} + # app: kube_app + # release: helm_release + # : + + # datadog.podAnnotationsAsTags -- Provide a mapping of Kubernetes Annotations to Datadog Tags + podAnnotationsAsTags: {} + # iam.amazonaws.com/role: kube_iamrole + # : + + # datadog.namespaceLabelsAsTags -- Provide a mapping of Kubernetes Namespace Labels to Datadog Tags + namespaceLabelsAsTags: {} + # env: environment + # : + + # datadog.tags -- List of static tags to attach to every metric, event and service check collected by this Agent. + ## Learn more about tagging: https://docs.datadoghq.com/tagging/ + tags: [] + # - ":" + # - ":" + + # datadog.checksCardinality -- Sets the tag cardinality for the checks run by the Agent. + ## https://docs.datadoghq.com/getting_started/tagging/assigning_tags/?tab=containerizedenvironments#environment-variables + checksCardinality: # low, orchestrator or high (not set by default to avoid overriding existing DD_CHECKS_TAG_CARDINALITY configurations, the default value in the Agent is low) + + # kubelet configuration + kubelet: + # datadog.kubelet.host -- Override kubelet IP + host: + valueFrom: + fieldRef: + fieldPath: status.hostIP + # datadog.kubelet.tlsVerify -- Toggle kubelet TLS verification + # @default -- true + tlsVerify: false + # datadog.kubelet.hostCAPath -- Path (on host) where the Kubelet CA certificate is stored + # @default -- None (no mount from host) + hostCAPath: + # datadog.kubelet.agentCAPath -- Path (inside Agent containers) where the Kubelet CA certificate is stored + # @default -- /var/run/host-kubelet-ca.crt if hostCAPath else /var/run/secrets/kubernetes.io/serviceaccount/ca.crt + agentCAPath: + + # datadog.expvarPort -- Specify the port to expose pprof and expvar to not interfer with the agentmetrics port from the cluster-agent, which defaults to 5000 + expvarPort: 6000 + + ## dogstatsd configuration + ## ref: https://docs.datadoghq.com/agent/kubernetes/dogstatsd/ + ## To emit custom metrics from your Kubernetes application, use DogStatsD. + dogstatsd: + # datadog.dogstatsd.port -- Override the Agent DogStatsD port + ## Note: Make sure your client is sending to the same UDP port. + port: 8125 + + # datadog.dogstatsd.originDetection -- Enable origin detection for container tagging + ## https://docs.datadoghq.com/developers/dogstatsd/unix_socket/#using-origin-detection-for-container-tagging + originDetection: false + + # datadog.dogstatsd.tags -- List of static tags to attach to every custom metric, event and service check collected by Dogstatsd. + ## Learn more about tagging: https://docs.datadoghq.com/tagging/ + tags: [] + # - ":" + # - ":" + + # datadog.dogstatsd.tagCardinality -- Sets the tag cardinality relative to the origin detection + ## https://docs.datadoghq.com/developers/dogstatsd/unix_socket/#using-origin-detection-for-container-tagging + tagCardinality: low + + # datadog.dogstatsd.useSocketVolume -- Enable dogstatsd over Unix Domain Socket with an HostVolume + ## ref: https://docs.datadoghq.com/developers/dogstatsd/unix_socket/ + useSocketVolume: true + + # datadog.dogstatsd.socketPath -- Path to the DogStatsD socket + socketPath: /var/run/datadog/dsd.socket + + # datadog.dogstatsd.hostSocketPath -- Host path to the DogStatsD socket + hostSocketPath: /var/run/datadog/ + + # datadog.dogstatsd.useHostPort -- Sets the hostPort to the same value of the container port + ## Needs to be used for sending custom metrics. + ## The ports need to be available on all hosts. + ## + ## WARNING: Make sure that hosts using this are properly firewalled otherwise + ## metrics and traces are accepted from any host able to connect to this host. + useHostPort: false + + # datadog.dogstatsd.useHostPID -- Run the agent in the host's PID namespace + ## This is required for Dogstatsd origin detection to work. + ## See https://docs.datadoghq.com/developers/dogstatsd/unix_socket/ + useHostPID: false + + # datadog.dogstatsd.nonLocalTraffic -- Enable this to make each node accept non-local statsd traffic (from outside of the pod) + ## ref: https://github.com/DataDog/docker-dd-agent#environment-variables + nonLocalTraffic: true + + # datadog.collectEvents -- Enables this to start event collection from the kubernetes API + ## ref: https://docs.datadoghq.com/agent/kubernetes/#event-collection + collectEvents: true + + # datadog.leaderElection -- Enables leader election mechanism for event collection + leaderElection: true + + # datadog.leaderLeaseDuration -- Set the lease time for leader election in second + leaderLeaseDuration: # 60 + + ## Enable logs agent and provide custom configs + logs: + # datadog.logs.enabled -- Enables this to activate Datadog Agent log collection + ## ref: https://docs.datadoghq.com/agent/basic_agent_usage/kubernetes/#log-collection-setup + enabled: true + + # datadog.logs.containerCollectAll -- Enable this to allow log collection for all containers + ## ref: https://docs.datadoghq.com/agent/basic_agent_usage/kubernetes/#log-collection-setup + containerCollectAll: false + + # datadog.logs.containerCollectUsingFiles -- Collect logs from files in /var/log/pods instead of using container runtime API + ## It's usually the most efficient way of collecting logs. + ## ref: https://docs.datadoghq.com/agent/basic_agent_usage/kubernetes/#log-collection-setup + containerCollectUsingFiles: true + + ## Enable apm agent and provide custom configs + apm: + # datadog.apm.socketEnabled -- Enable APM over Socket (Unix Socket or windows named pipe) + ## ref: https://docs.datadoghq.com/agent/kubernetes/apm/ + socketEnabled: true + + # datadog.apm.portEnabled -- Enable APM over TCP communication (port 8216 by default + ## ref: https://docs.datadoghq.com/agent/kubernetes/apm/ + portEnabled: false + + # datadog.apm.enabled -- Enable this to enable APM and tracing, on port 8126 + # DEPRECATED. Use datadog.apm.portEnabled instead + ## ref: https://github.com/DataDog/docker-dd-agent#tracing-from-the-host + enabled: false + + # datadog.apm.port -- Override the trace Agent port + ## Note: Make sure your client is sending to the same UDP port. + port: 8126 + + # datadog.apm.useSocketVolume -- Enable APM over Unix Domain Socket + # DEPRECATED. Use datadog.apm.socketEnabled instead + ## ref: https://docs.datadoghq.com/agent/kubernetes/apm/ + useSocketVolume: false + + # datadog.apm.socketPath -- Path to the trace-agent socket + socketPath: /var/run/datadog/apm.socket + + # datadog.apm.hostSocketPath -- Host path to the trace-agent socket + hostSocketPath: /var/run/datadog/ + + # datadog.envFrom -- Set environment variables for all Agents directly from configMaps and/or secrets + ## envFrom to pass configmaps or secrets as environment + envFrom: [] + # - configMapRef: + # name: + # - secretRef: + # name: + + # datadog.env -- Set environment variables for all Agents + ## The Datadog Agent supports many environment variables. + ## ref: https://docs.datadoghq.com/agent/docker/?tab=standard#environment-variables + env: [] + # - name: + # value: + + # datadog.confd -- Provide additional check configurations (static and Autodiscovery) + ## Each key becomes a file in /conf.d + ## ref: https://github.com/DataDog/datadog-agent/tree/main/Dockerfiles/agent#optional-volumes + ## ref: https://docs.datadoghq.com/agent/autodiscovery/ + confd: + etcd.yaml: |- + ad_identifiers: + - etcd + instances: + - prometheus_url: https://%%host%%:2379/metrics + tls_ca_cert: /etc/datadog-agent/minkikube/ca.crt + tls_cert: /etc/datadog-agent/minkikube/server.crt + tls_private_key: /etc/datadog-agent/minkikube/server.key + kube_scheduler.yaml: |- + ad_identifiers: + - kube-scheduler + instances: + - prometheus_url: https://localhost:10259/metrics + ssl_verify: false + bearer_token_auth: true + leader_election: false + kube_controller_manager.yaml: |- + ad_identifiers: + - kube-controller-manager + instances: + - prometheus_url: https://localhost:10257/metrics + ssl_verify: false + bearer_token_auth: true + leader_election: false + # redisdb.yaml: |- + # init_config: + # instances: + # - host: "name" + # port: "6379" + # kubernetes_state.yaml: |- + # ad_identifiers: + # - kube-state-metrics + # init_config: + # instances: + # - kube_state_url: http://%%host%%:8080/metrics + + # datadog.checksd -- Provide additional custom checks as python code + ## Each key becomes a file in /checks.d + ## ref: https://github.com/DataDog/datadog-agent/tree/main/Dockerfiles/agent#optional-volumes + checksd: {} + # service.py: |- + + # datadog.dockerSocketPath -- Path to the docker socket + dockerSocketPath: # /var/run/docker.sock + + # datadog.criSocketPath -- Path to the container runtime socket (if different from Docker) + criSocketPath: # /var/run/containerd/containerd.sock + + ## Enable process agent and provide custom configs + processAgent: + # datadog.processAgent.enabled -- Set this to true to enable live process monitoring agent + ## Note: /etc/passwd is automatically mounted to allow username resolution. + ## ref: https://docs.datadoghq.com/graphing/infrastructure/process/#kubernetes-daemonset + enabled: true + + # datadog.processAgent.processCollection -- Set this to true to enable process collection in process monitoring agent + ## Requires processAgent.enabled to be set to true to have any effect + processCollection: false + + # datadog.processAgent.stripProcessArguments -- Set this to scrub all arguments from collected processes + ## Requires processAgent.enabled and processAgent.processCollection to be set to true to have any effect + ## ref: https://docs.datadoghq.com/infrastructure/process/?tab=linuxwindows#process-arguments-scrubbing + stripProcessArguments: false + + ## Enable systemProbe agent and provide custom configs + systemProbe: + + # datadog.systemProbe.debugPort -- Specify the port to expose pprof and expvar for system-probe agent + debugPort: 0 + + # datadog.systemProbe.enableConntrack -- Enable the system-probe agent to connect to the netlink/conntrack subsystem to add NAT information to connection data + ## Ref: http://conntrack-tools.netfilter.org/ + enableConntrack: true + + # datadog.systemProbe.seccomp -- Apply an ad-hoc seccomp profile to the system-probe agent to restrict its privileges + ## Note that this will break `kubectl exec … -c system-probe -- /bin/bash` + seccomp: localhost/system-probe + + # datadog.systemProbe.seccompRoot -- Specify the seccomp profile root directory + seccompRoot: /var/lib/kubelet/seccomp + + # datadog.systemProbe.bpfDebug -- Enable logging for kernel debug + bpfDebug: false + + # datadog.systemProbe.apparmor -- Specify a apparmor profile for system-probe + apparmor: unconfined + + # datadog.systemProbe.enableTCPQueueLength -- Enable the TCP queue length eBPF-based check + enableTCPQueueLength: false + + # datadog.systemProbe.enableOOMKill -- Enable the OOM kill eBPF-based check + enableOOMKill: false + + # datadog.systemProbe.collectDNSStats -- Enable DNS stat collection + collectDNSStats: true + + # datadog.systemProbe.maxTrackedConnections -- the maximum number of tracked connections + maxTrackedConnections: 131072 + + # datadog.systemProbe.conntrackMaxStateSize -- the maximum size of the userspace conntrack cache + conntrackMaxStateSize: 131072 # 2 * maxTrackedConnections by default, per https://github.com/DataDog/datadog-agent/blob/d1c5de31e1bba72dfac459aed5ff9562c3fdcc20/pkg/process/config/config.go#L229 + + # datadog.systemProbe.conntrackInitTimeout -- the time to wait for conntrack to initialize before failing + conntrackInitTimeout: 10s + + orchestratorExplorer: + # datadog.orchestratorExplorer.enabled -- Set this to false to disable the orchestrator explorer + ## This requires processAgent.enabled and clusterAgent.enabled to be set to true + ## ref: TODO - add doc link + enabled: true + + # datadog.orchestratorExplorer.container_scrubbing -- Enable the scrubbing of containers in the kubernetes resource YAML for sensitive information + ## The container scrubbing is taking significant resources during data collection. + ## If you notice that the cluster-agent uses too much CPU in larger clusters + ## turning this option off will improve the situation. + container_scrubbing: + enabled: true + + networkMonitoring: + # datadog.networkMonitoring.enabled -- Enable network performance monitoring + enabled: false + + ## Universal Service Monitoring is currently in private beta. + ## See https://www.datadoghq.com/blog/universal-service-monitoring-datadog/ for more details and private beta signup. + serviceMonitoring: + # datadog.serviceMonitoring.enabled -- Enable Universal Service Monitoring + enabled: false + + ## Enable security agent and provide custom configs + securityAgent: + compliance: + # datadog.securityAgent.compliance.enabled -- Set to true to enable Cloud Security Posture Management (CSPM) + enabled: false + + # datadog.securityAgent.compliance.configMap -- Contains CSPM compliance benchmarks that will be used + configMap: + + # datadog.securityAgent.compliance.checkInterval -- Compliance check run interval + checkInterval: 20m + + runtime: + # datadog.securityAgent.runtime.enabled -- Set to true to enable Cloud Workload Security (CWS) + enabled: false + + policies: + # datadog.securityAgent.runtime.policies.configMap -- Contains CWS policies that will be used + configMap: + + syscallMonitor: + # datadog.securityAgent.runtime.syscallMonitor.enabled -- Set to true to enable the Syscall monitoring (recommended for troubleshooting only) + enabled: false + + ## Manage NetworkPolicy + networkPolicy: + # datadog.networkPolicy.create -- If true, create NetworkPolicy for all the components + create: false + + # datadog.networkPolicy.flavor -- Flavor of the network policy to use. + # Can be: + # * kubernetes for networking.k8s.io/v1/NetworkPolicy + # * cilium for cilium.io/v2/CiliumNetworkPolicy + flavor: kubernetes + + cilium: + # datadog.networkPolicy.cilium.dnsSelector -- Cilium selector of the DNS server entity + # @default -- kube-dns in namespace kube-system + dnsSelector: + toEndpoints: + - matchLabels: + "k8s:io.kubernetes.pod.namespace": kube-system + "k8s:k8s-app": kube-dns + + ## Configure prometheus scraping autodiscovery + ## ref: https://docs.datadoghq.com/agent/kubernetes/prometheus/ + prometheusScrape: + # datadog.prometheusScrape.enabled -- Enable autodiscovering pods and services exposing prometheus metrics. + enabled: false + # datadog.prometheusScrape.serviceEndpoints -- Enable generating dedicated checks for service endpoints. + serviceEndpoints: false + # datadog.prometheusScrape.additionalConfigs -- Allows adding advanced openmetrics check configurations with custom discovery rules. (Requires Agent version 7.27+) + additionalConfigs: [] + # - + # autodiscovery: + # kubernetes_annotations: + # include: + # custom_include_label: 'true' + # exclude: + # custom_exclude_label: 'true' + # kubernetes_container_names: + # - my-app + # configurations: + # - send_distribution_buckets: true + # timeout: 5 + + # datadog.ignoreAutoConfig -- List of integration to ignore auto_conf.yaml. + ## ref: https://docs.datadoghq.com/agent/faq/auto_conf/ + ignoreAutoConfig: + - etcd + - kube_scheduler + - kube_controller_manager + + # datadog.containerExclude -- Exclude containers from the Agent + # Autodiscovery, as a space-sepatered list + ## ref: https://docs.datadoghq.com/agent/guide/autodiscovery-management/?tab=containerizedagent#exclude-containers + containerExclude: # "image:datadog/agent" + + # datadog.containerInclude -- Include containers in the Agent Autodiscovery, + # as a space-separated list. If a container matches an include rule, it’s + # always included in the Autodiscovery + ## ref: https://docs.datadoghq.com/agent/guide/autodiscovery-management/?tab=containerizedagent#include-containers + containerInclude: + + # datadog.containerExcludeLogs -- Exclude logs from the Agent Autodiscovery, + # as a space-separated list + containerExcludeLogs: + + # datadog.containerIncludeLogs -- Include logs in the Agent Autodiscovery, as + # a space-separated list + containerIncludeLogs: + + # datadog.containerExcludeMetrics -- Exclude metrics from the Agent + # Autodiscovery, as a space-separated list + containerExcludeMetrics: + + # datadog.containerIncludeMetrics -- Include metrics in the Agent + # Autodiscovery, as a space-separated list + containerIncludeMetrics: + + # datadog.excludePauseContainer -- Exclude pause containers from the Agent + # Autodiscovery. + ## ref: https://docs.datadoghq.com/agent/guide/autodiscovery-management/?tab=containerizedagent#pause-containers + excludePauseContainer: true + +## This is the Datadog Cluster Agent implementation that handles cluster-wide +## metrics more cleanly, separates concerns for better rbac, and implements +## the external metrics API so you can autoscale HPAs based on datadog metrics +## ref: https://docs.datadoghq.com/agent/kubernetes/cluster/ +clusterAgent: + # clusterAgent.enabled -- Set this to false to disable Datadog Cluster Agent + enabled: true + + ## Define the Datadog Cluster-Agent image to work with + image: + # clusterAgent.image.name -- Cluster Agent image name to use (relative to `registry`) + name: cluster-agent + + # clusterAgent.image.tag -- Cluster Agent image tag to use + tag: 1.16.0 + + # clusterAgent.image.repository -- Override default registry + image.name for Cluster Agent + repository: + + # clusterAgent.image.pullPolicy -- Cluster Agent image pullPolicy + pullPolicy: IfNotPresent + + # clusterAgent.image.pullSecrets -- Cluster Agent repository pullSecret (ex: specify docker registry credentials) + ## See https://kubernetes.io/docs/concepts/containers/images/#specifying-imagepullsecrets-on-a-pod + pullSecrets: [] + # - name: "" + + # clusterAgent.securityContext -- Allows you to overwrite the default PodSecurityContext on the cluster-agent pods. + securityContext: {} + + containers: + clusterAgent: + # clusterAgent.containers.clusterAgent.securityContext -- Specify securityContext on the cluster-agent container. + securityContext: {} + + # clusterAgent.command -- Command to run in the Cluster Agent container as entrypoint + command: [] + + # clusterAgent.token -- Cluster Agent token is a preshared key between node agents and cluster agent (autogenerated if empty, needs to be at least 32 characters a-zA-z) + token: "" + + # clusterAgent.tokenExistingSecret -- Existing secret name to use for Cluster Agent token + tokenExistingSecret: "" + + # clusterAgent.replicas -- Specify the of cluster agent replicas, if > 1 it allow the cluster agent to work in HA mode. + replicas: 1 + + ## Provide Cluster Agent Deployment pod(s) RBAC configuration + rbac: + # clusterAgent.rbac.create -- If true, create & use RBAC resources + create: true + + # clusterAgent.rbac.serviceAccountName -- Specify a preexisting ServiceAccount to use if clusterAgent.rbac.create is false + serviceAccountName: default + + # clusterAgent.rbac.serviceAccountAnnotations -- Annotations to add to the ServiceAccount if clusterAgent.rbac.create is true + serviceAccountAnnotations: {} + + ## Provide Cluster Agent pod security configuration + podSecurity: + podSecurityPolicy: + # clusterAgent.podSecurity.podSecurityPolicy.create -- If true, create a PodSecurityPolicy resource for Cluster Agent pods + create: false + securityContextConstraints: + # clusterAgent.podSecurity.securityContextConstraints.create -- If true, create a SCC resource for Cluster Agent pods + create: false + + # Enable the metricsProvider to be able to scale based on metrics in Datadog + metricsProvider: + # clusterAgent.metricsProvider.enabled -- Set this to true to enable Metrics Provider + enabled: false + + # clusterAgent.metricsProvider.wpaController -- Enable informer and controller of the watermark pod autoscaler + ## NOTE: You need to install the `WatermarkPodAutoscaler` CRD before + wpaController: false + + # clusterAgent.metricsProvider.useDatadogMetrics -- Enable usage of DatadogMetric CRD to autoscale on arbitrary Datadog queries + ## NOTE: It will install DatadogMetrics CRD automatically (it may conflict with previous installations) + useDatadogMetrics: false + + # clusterAgent.metricsProvider.createReaderRbac -- Create `external-metrics-reader` RBAC automatically (to allow HPA to read data from Cluster Agent) + createReaderRbac: true + + # clusterAgent.metricsProvider.aggregator -- Define the aggregator the cluster agent will use to process the metrics. The options are (avg, min, max, sum) + aggregator: avg + + ## Configuration for the service for the cluster-agent metrics server + service: + # clusterAgent.metricsProvider.service.type -- Set type of cluster-agent metrics server service + type: ClusterIP + + # clusterAgent.metricsProvider.service.port -- Set port of cluster-agent metrics server service (Kubernetes >= 1.15) + port: 8443 + + # clusterAgent.metricsProvider.endpoint -- Override the external metrics provider endpoint. If not set, the cluster-agent defaults to `datadog.site` + endpoint: # https://api.datadoghq.com + + # clusterAgent.env -- Set environment variables specific to Cluster Agent + ## The Cluster-Agent supports many additional environment variables + ## ref: https://docs.datadoghq.com/agent/cluster_agent/commands/#cluster-agent-options + env: [] + + # clusterAgent.envFrom -- Set environment variables specific to Cluster Agent from configMaps and/or secrets + ## The Cluster-Agent supports many additional environment variables + ## ref: https://docs.datadoghq.com/agent/cluster_agent/commands/#cluster-agent-options + envFrom: [] + # - configMapRef: + # name: + # - secretRef: + # name: + + admissionController: + # clusterAgent.admissionController.enabled -- Enable the admissionController to be able to inject APM/Dogstatsd config and standard tags (env, service, version) automatically into your pods + enabled: false + + # clusterAgent.admissionController.mutateUnlabelled -- Enable injecting config without having the pod label 'admission.datadoghq.com/enabled="true"' + mutateUnlabelled: false + + # clusterAgent.confd -- Provide additional cluster check configurations + ## Each key will become a file in /conf.d + ## ref: https://docs.datadoghq.com/agent/autodiscovery/ + confd: {} + # mysql.yaml: |- + # cluster_check: true + # instances: + # - server: '' + # port: 3306 + # user: datadog + # pass: '' + + # clusterAgent.advancedConfd -- Provide additional cluster check configurations + ## Each key is an integration containing several config files, it replaces clusterAgent.confd if set + ## ref: https://docs.datadoghq.com/agent/autodiscovery/ + advancedConfd: {} + # mysql.d: + # 1.yaml: |- + # cluster_check: true + # instances: + # - server: '' + # port: 3306 + # user: datadog + # pass: '' + # 2.yaml: |- + # cluster_check: true + # instances: + # - server: '' + # port: 3306 + # user: datadog + # pass: '' + + # clusterAgent.resources -- Datadog cluster-agent resource requests and limits. + resources: {} + # requests: + # cpu: 200m + # memory: 256Mi + # limits: + # cpu: 200m + # memory: 256Mi + + # clusterAgent.priorityClassName -- Name of the priorityClass to apply to the Cluster Agent + priorityClassName: # system-cluster-critical + + # clusterAgent.nodeSelector -- Allow the Cluster Agent Deployment to be scheduled on selected nodes + ## Ref: https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#nodeselector + ## Ref: https://kubernetes.io/docs/user-guide/node-selection/ + nodeSelector: {} + + # clusterAgent.affinity -- Allow the Cluster Agent Deployment to schedule using affinity rules + ## By default, Cluster Agent Deployment Pods are forced to run on different Nodes. + ## Ref: https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#affinity-and-anti-affinity + affinity: {} + + # clusterAgent.healthPort -- Port number to use in the Cluster Agent for the healthz endpoint + healthPort: 5556 + + # clusterAgent.livenessProbe -- Override default Cluster Agent liveness probe settings + # @default -- Every 15s / 6 KO / 1 OK + livenessProbe: + initialDelaySeconds: 15 + periodSeconds: 15 + timeoutSeconds: 5 + successThreshold: 1 + failureThreshold: 6 + + # clusterAgent.readinessProbe -- Override default Cluster Agent readiness probe settings + # @default -- Every 15s / 6 KO / 1 OK + readinessProbe: + initialDelaySeconds: 15 + periodSeconds: 15 + timeoutSeconds: 5 + successThreshold: 1 + failureThreshold: 6 + + # clusterAgent.strategy -- Allow the Cluster Agent deployment to perform a rolling update on helm update + ## ref: https://kubernetes.io/docs/concepts/workloads/controllers/deployment/#strategy + strategy: + type: RollingUpdate + rollingUpdate: + maxSurge: 1 + maxUnavailable: 0 + + # clusterAgent.deploymentAnnotations -- Annotations to add to the cluster-agents's deployment + deploymentAnnotations: {} + # key: "value" + + # clusterAgent.podAnnotations -- Annotations to add to the cluster-agents's pod(s) + podAnnotations: {} + # key: "value" + + # clusterAgent.useHostNetwork -- Bind ports on the hostNetwork + ## Useful for CNI networking where hostPort might + ## not be supported. The ports need to be available on all hosts. It can be + ## used for custom metrics instead of a service endpoint. + ## + ## WARNING: Make sure that hosts using this are properly firewalled otherwise + ## metrics and traces are accepted from any host able to connect to this host. + # + useHostNetwork: false + + # clusterAgent.dnsConfig -- Specify dns configuration options for datadog cluster agent containers e.g ndots + ## ref: https://kubernetes.io/docs/concepts/services-networking/dns-pod-service/#pod-dns-config + dnsConfig: {} + # options: + # - name: ndots + # value: "1" + + # clusterAgent.volumes -- Specify additional volumes to mount in the cluster-agent container + volumes: [] + + # clusterAgent.volumeMounts -- Specify additional volumes to mount in the cluster-agent container + volumeMounts: [] + + # clusterAgent.datadog_cluster_yaml -- Specify custom contents for the datadog cluster agent config (datadog-cluster.yaml) + datadog_cluster_yaml: {} + + # clusterAgent.createPodDisruptionBudget -- Create pod disruption budget for Cluster Agent deployments + createPodDisruptionBudget: false + + networkPolicy: + # clusterAgent.networkPolicy.create -- If true, create a NetworkPolicy for the cluster agent. + # DEPRECATED. Use datadog.networkPolicy.create instead + create: false + + # clusterAgent.additionalLabels -- Adds labels to the Cluster Agent deployment and pods + additionalLabels: {} + # key: "value" + +## This section lets you configure the agents deployed by this chart to connect to a Cluster Agent +## deployed independently +existingClusterAgent: + # existingClusterAgent.join -- set this to true if you want the agents deployed by this chart to + # connect to a Cluster Agent deployed independently + join: false + + # existingClusterAgent.tokenSecretName -- Existing secret name to use for external Cluster Agent token + tokenSecretName: # + + # existingClusterAgent.serviceName -- Existing service name to use for reaching the external Cluster Agent + serviceName: # + + # existingClusterAgent.clusterchecksEnabled -- set this to false if you don’t want the agents to run the cluster checks of the joined external cluster agent + clusterchecksEnabled: true + +agents: + # agents.enabled -- You should keep Datadog DaemonSet enabled! + ## The exceptional case could be a situation when you need to run + ## single Datadog pod per every namespace, but you do not need to + ## re-create a DaemonSet for every non-default namespace install. + ## Note: StatsD and DogStatsD work over UDP, so you may not + ## get guaranteed delivery of the metrics in Datadog-per-namespace setup! + # + enabled: true + + ## Define the Datadog image to work with + image: + # agents.image.name -- Datadog Agent image name to use (relative to `registry`) + ## use "dogstatsd" for Standalone Datadog Agent DogStatsD 7 + name: agent + + # agents.image.tag -- Define the Agent version to use + tag: 7.32.1 + + # agents.image.tagSuffix -- Suffix to append to Agent tag + ## Ex: + ## jmx to enable jmx fetch collection + ## servercore to get Windows images based on servercore + tagSuffix: "" + + # agents.image.repository -- Override default registry + image.name for Agent + repository: + + # agents.image.doNotCheckTag -- Skip the version<>chart compatibility check + ## By default, the version passed in agents.image.tag is checked + ## for compatibility with the version of the chart. + ## This boolean permits to completely skip this check. + ## This is useful, for example, for custom tags that are not + ## respecting semantic versioning + doNotCheckTag: # false + + # agents.image.pullPolicy -- Datadog Agent image pull policy + pullPolicy: IfNotPresent + + # agents.image.pullSecrets -- Datadog Agent repository pullSecret (ex: specify docker registry credentials) + ## See https://kubernetes.io/docs/concepts/containers/images/#specifying-imagepullsecrets-on-a-pod + pullSecrets: [] + # - name: "" + + ## Provide Daemonset RBAC configuration + rbac: + # agents.rbac.create -- If true, create & use RBAC resources + create: true + + # agents.rbac.serviceAccountName -- Specify a preexisting ServiceAccount to use if agents.rbac.create is false + serviceAccountName: default + + # agents.rbac.serviceAccountAnnotations -- Annotations to add to the ServiceAccount if agents.rbac.create is true + serviceAccountAnnotations: {} + + ## Provide Daemonset PodSecurityPolicy configuration + podSecurity: + podSecurityPolicy: + # agents.podSecurity.podSecurityPolicy.create -- If true, create a PodSecurityPolicy resource for Agent pods + create: false + + securityContextConstraints: + # agents.podSecurity.securityContextConstraints.create -- If true, create a SecurityContextConstraints resource for Agent pods + create: false + + # agents.podSecurity.seLinuxContext -- Provide seLinuxContext configuration for PSP/SCC + # @default -- Must run as spc_t + seLinuxContext: + rule: MustRunAs + seLinuxOptions: + user: system_u + role: system_r + type: spc_t + level: s0 + + # agents.podSecurity.privileged -- If true, Allow to run privileged containers + privileged: false + + # agents.podSecurity.capabilities -- Allowed capabilities + capabilities: + - SYS_ADMIN + - SYS_RESOURCE + - SYS_PTRACE + - NET_ADMIN + - NET_BROADCAST + - NET_RAW + - IPC_LOCK + - AUDIT_CONTROL + - AUDIT_READ + + # agents.podSecurity.volumes -- Allowed volumes types + volumes: + - configMap + - downwardAPI + - emptyDir + - hostPath + - secret + + # agents.podSecurity.seccompProfiles -- Allowed seccomp profiles + seccompProfiles: + - "runtime/default" + - "localhost/system-probe" + + apparmor: + # agents.podSecurity.apparmor.enabled -- If true, enable apparmor enforcement + ## see: https://kubernetes.io/docs/tutorials/clusters/apparmor/ + enabled: true + + # agents.podSecurity.apparmorProfiles -- Allowed apparmor profiles + apparmorProfiles: + - "runtime/default" + - "unconfined" + + # agents.podSecurity.defaultApparmor -- Default AppArmor profile for all containers but system-probe + defaultApparmor: runtime/default + + containers: + agent: + # agents.containers.agent.env -- Additional environment variables for the agent container + env: [] + + # agents.containers.agent.envFrom -- Set environment variables specific to agent container from configMaps and/or secrets + envFrom: [] + # - configMapRef: + # name: + # - secretRef: + # name: + + # agents.containers.agent.logLevel -- Set logging verbosity, valid log levels are: trace, debug, info, warn, error, critical, and off + ## If not set, fall back to the value of datadog.logLevel. + logLevel: # INFO + + # agents.containers.agent.resources -- Resource requests and limits for the agent container. + resources: {} + # requests: + # cpu: 200m + # memory: 256Mi + # limits: + # cpu: 200m + # memory: 256Mi + + # agents.containers.agent.healthPort -- Port number to use in the node agent for the healthz endpoint + healthPort: 5555 + + # agents.containers.agent.livenessProbe -- Override default agent liveness probe settings + # @default -- Every 15s / 6 KO / 1 OK + livenessProbe: + initialDelaySeconds: 15 + periodSeconds: 15 + timeoutSeconds: 5 + successThreshold: 1 + failureThreshold: 6 + + # agents.containers.agent.readinessProbe -- Override default agent readiness probe settings + # @default -- Every 15s / 6 KO / 1 OK + readinessProbe: + initialDelaySeconds: 15 + periodSeconds: 15 + timeoutSeconds: 5 + successThreshold: 1 + failureThreshold: 6 + + # agents.containers.agent.securityContext -- Allows you to overwrite the default container SecurityContext for the agent container. + securityContext: {} + + # agents.containers.agent.ports -- Allows to specify extra ports (hostPorts for instance) for this container + ports: [] + + processAgent: + # agents.containers.processAgent.env -- Additional environment variables for the process-agent container + env: [] + + # agents.containers.processAgent.envFrom -- Set environment variables specific to process-agent from configMaps and/or secrets + envFrom: [] + # - configMapRef: + # name: + # - secretRef: + # name: + + # agents.containers.processAgent.logLevel -- Set logging verbosity, valid log levels are: trace, debug, info, warn, error, critical, and off + ## If not set, fall back to the value of datadog.logLevel. + logLevel: # INFO + + # agents.containers.processAgent.resources -- Resource requests and limits for the process-agent container + resources: {} + # requests: + # cpu: 100m + # memory: 200Mi + # limits: + # cpu: 100m + # memory: 200Mi + + # agents.containers.processAgent.securityContext -- Allows you to overwrite the default container SecurityContext for the process-agent container. + securityContext: {} + + # agents.containers.processAgent.ports -- Allows to specify extra ports (hostPorts for instance) for this container + ports: [] + + traceAgent: + # agents.containers.traceAgent.env -- Additional environment variables for the trace-agent container + env: + + # agents.containers.traceAgent.envFrom -- Set environment variables specific to trace-agent from configMaps and/or secrets + envFrom: [] + # - configMapRef: + # name: + # - secretRef: + # name: + + # agents.containers.traceAgent.logLevel -- Set logging verbosity, valid log levels are: trace, debug, info, warn, error, critical, and off + logLevel: # INFO + + # agents.containers.traceAgent.resources -- Resource requests and limits for the trace-agent container + resources: {} + # requests: + # cpu: 100m + # memory: 200Mi + # limits: + # cpu: 100m + # memory: 200Mi + + # agents.containers.traceAgent.livenessProbe -- Override default agent liveness probe settings + # @default -- Every 15s + livenessProbe: + initialDelaySeconds: 15 + periodSeconds: 15 + timeoutSeconds: 5 + + # agents.containers.traceAgent.securityContext -- Allows you to overwrite the default container SecurityContext for the trace-agent container. + securityContext: {} + + # agents.containers.traceAgent.ports -- Allows to specify extra ports (hostPorts for instance) for this container + ports: [] + + systemProbe: + # agents.containers.systemProbe.env -- Additional environment variables for the system-probe container + env: [] + + # agents.containers.systemProbe.envFrom -- Set environment variables specific to system-probe from configMaps and/or secrets + envFrom: [] + # - configMapRef: + # name: + # - secretRef: + # name: + + # agents.containers.systemProbe.logLevel -- Set logging verbosity, valid log levels are: trace, debug, info, warn, error, critical, and off. + ## If not set, fall back to the value of datadog.logLevel. + logLevel: # INFO + + # agents.containers.systemProbe.resources -- Resource requests and limits for the system-probe container + resources: {} + # requests: + # cpu: 100m + # memory: 200Mi + # limits: + # cpu: 100m + # memory: 200Mi + + # agents.containers.systemProbe.securityContext -- Allows you to overwrite the default container SecurityContext for the system-probe container. + securityContext: + privileged: false + capabilities: + add: ["SYS_ADMIN", "SYS_RESOURCE", "SYS_PTRACE", "NET_ADMIN", "NET_BROADCAST", "NET_RAW", "IPC_LOCK"] + + # agents.containers.systemProbe.ports -- Allows to specify extra ports (hostPorts for instance) for this container + ports: [] + + securityAgent: + # agents.containers.securityAgent.env -- Additional environment variables for the security-agent container + env: + + # agents.containers.securityAgent.envFrom -- Set environment variables specific to security-agent from configMaps and/or secrets + envFrom: [] + # - configMapRef: + # name: + # - secretRef: + # name: + + # agents.containers.securityAgent.logLevel -- Set logging verbosity, valid log levels are: trace, debug, info, warn, error, critical, and off + ## If not set, fall back to the value of datadog.logLevel. + logLevel: # INFO + + # agents.containers.securityAgent.resources -- Resource requests and limits for the security-agent container + resources: {} + # requests: + # cpu: 100m + # memory: 200Mi + # limits: + # cpu: 100m + # memory: 200Mi + + # agents.containers.securityAgent.ports -- Allows to specify extra ports (hostPorts for instance) for this container + ports: [] + + initContainers: + # agents.containers.initContainers.resources -- Resource requests and limits for the init containers + resources: {} + # requests: + # cpu: 100m + # memory: 200Mi + # limits: + # cpu: 100m + # memory: 200Mi + + # agents.volumes -- Specify additional volumes to mount in the dd-agent container + volumes: + - hostPath: + path: /var/lib/minikube/certs/etcd + name: etcd-certs + + # agents.volumeMounts -- Specify additional volumes to mount in all containers of the agent pod + volumeMounts: + - name: etcd-certs + mountPath: /etc/datadog-agent/minkikube + readOnly: true + + # agents.useHostNetwork -- Bind ports on the hostNetwork + ## Useful for CNI networking where hostPort might + ## not be supported. The ports need to be available on all hosts. It Can be + ## used for custom metrics instead of a service endpoint. + ## + ## WARNING: Make sure that hosts using this are properly firewalled otherwise + ## metrics and traces are accepted from any host able to connect to this host. + useHostNetwork: true + + # agents.dnsConfig -- specify dns configuration options for datadog cluster agent containers e.g ndots + ## ref: https://kubernetes.io/docs/concepts/services-networking/dns-pod-service/#pod-dns-config + dnsConfig: {} + # options: + # - name: ndots + # value: "1" + + # agents.daemonsetAnnotations -- Annotations to add to the DaemonSet + daemonsetAnnotations: {} + # key: "value" + + # agents.podAnnotations -- Annotations to add to the DaemonSet's Pods + podAnnotations: {} + # : '[{"key": "", "value": ""}]' + + # agents.tolerations -- Allow the DaemonSet to schedule on tainted nodes (requires Kubernetes >= 1.6) + tolerations: [] + + # agents.nodeSelector -- Allow the DaemonSet to schedule on selected nodes + ## Ref: https://kubernetes.io/docs/user-guide/node-selection/ + nodeSelector: {} + + # agents.affinity -- Allow the DaemonSet to schedule using affinity rules + ## Ref: https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#affinity-and-anti-affinity + affinity: {} + + # agents.updateStrategy -- Allow the DaemonSet to perform a rolling update on helm update + ## ref: https://kubernetes.io/docs/tasks/manage-daemon/update-daemon-set/ + updateStrategy: + type: RollingUpdate + rollingUpdate: + maxUnavailable: "10%" + + # agents.priorityClassName -- Sets PriorityClassName if defineds + priorityClassName: + + # agents.podLabels -- Sets podLabels if defined + # Note: These labels are also used as label selectors so they are immutable. + podLabels: {} + + # agents.additionalLabels -- Adds labels to the Agent daemonset and pods + additionalLabels: {} + # key: "value" + + # agents.useConfigMap -- Configures a configmap to provide the agent configuration. Use this in combination with the `agents.customAgentConfig` parameter. + useConfigMap: # false + + # agents.customAgentConfig -- Specify custom contents for the datadog agent config (datadog.yaml) + ## ref: https://docs.datadoghq.com/agent/guide/agent-configuration-files/?tab=agentv6 + ## ref: https://github.com/DataDog/datadog-agent/blob/main/pkg/config/config_template.yaml + ## Note the `agents.useConfigMap` needs to be set to `true` for this parameter to be taken into account. + customAgentConfig: {} + # # Autodiscovery for Kubernetes + # listeners: + # - name: kubelet + # config_providers: + # - name: kubelet + # polling: true + # # needed to support legacy docker label config templates + # - name: docker + # polling: true + # + # # Enable java cgroup handling. Only one of those options should be enabled, + # # depending on the agent version you are using along that chart. + # + # # agent version < 6.15 + # # jmx_use_cgroup_memory_limit: true + # + # # agent version >= 6.15 + # # jmx_use_container_support: true + + networkPolicy: + # agents.networkPolicy.create -- If true, create a NetworkPolicy for the agents. + # DEPRECATED. Use datadog.networkPolicy.create instead + create: false + + localService: + # agents.localService.overrideName -- Name of the internal traffic service to target the agent running on the local node + overrideName: "" + + # agents.localService.forceLocalServiceEnabled -- Force the creation of the internal traffic policy service to target the agent running on the local node. + # By default, the internal traffic service is created only on Kubernetes 1.22+ where the feature became beta and enabled by default. + # This option allows to force the creation of the internal traffic service on kubernetes 1.21 where the feature was alpha and required a feature gate to be explicitly enabled. + forceLocalServiceEnabled: false + +clusterChecksRunner: + # clusterChecksRunner.enabled -- If true, deploys agent dedicated for running the Cluster Checks instead of running in the Daemonset's agents. + ## ref: https://docs.datadoghq.com/agent/autodiscovery/clusterchecks/ + enabled: false + + ## Define the Datadog image to work with. + image: + # clusterChecksRunner.image.name -- Datadog Agent image name to use (relative to `registry`) + name: agent + + # clusterChecksRunner.image.tag -- Define the Agent version to use + tag: 7.32.1 + + # clusterChecksRunner.image.tagSuffix -- Suffix to append to Agent tag + ## Ex: + ## jmx to enable jmx fetch collection + ## servercore to get Windows images based on servercore + tagSuffix: "" + + # clusterChecksRunner.image.repository -- Override default registry + image.name for Cluster Check Runners + repository: + + # clusterChecksRunner.image.pullPolicy -- Datadog Agent image pull policy + pullPolicy: IfNotPresent + + # clusterChecksRunner.image.pullSecrets -- Datadog Agent repository pullSecret (ex: specify docker registry credentials) + ## See https://kubernetes.io/docs/concepts/containers/images/#specifying-imagepullsecrets-on-a-pod + pullSecrets: [] + # - name: "" + + # clusterChecksRunner.createPodDisruptionBudget -- Create the pod disruption budget to apply to the cluster checks agents + createPodDisruptionBudget: false + + # Provide Cluster Checks Deployment pods RBAC configuration + rbac: + # clusterChecksRunner.rbac.create -- If true, create & use RBAC resources + create: true + + # clusterChecksRunner.rbac.dedicated -- If true, use a dedicated RBAC resource for the cluster checks agent(s) + dedicated: false + + # clusterChecksRunner.rbac.serviceAccountAnnotations -- Annotations to add to the ServiceAccount if clusterChecksRunner.rbac.dedicated is true + serviceAccountAnnotations: {} + + # clusterChecksRunner.rbac.serviceAccountName -- Specify a preexisting ServiceAccount to use if clusterChecksRunner.rbac.create is false + serviceAccountName: default + + # clusterChecksRunner.replicas -- Number of Cluster Checks Runner instances + ## If you want to deploy the clusterChecks agent in HA, keep at least clusterChecksRunner.replicas set to 2. + ## And increase the clusterChecksRunner.replicas according to the number of Cluster Checks. + replicas: 2 + + # clusterChecksRunner.resources -- Datadog clusterchecks-agent resource requests and limits. + resources: {} + # requests: + # cpu: 200m + # memory: 500Mi + # limits: + # cpu: 200m + # memory: 500Mi + + # clusterChecksRunner.affinity -- Allow the ClusterChecks Deployment to schedule using affinity rules. + ## By default, ClusterChecks Deployment Pods are preferred to run on different Nodes. + ## Ref: https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#affinity-and-anti-affinity + affinity: {} + + # clusterChecksRunner.strategy -- Allow the ClusterChecks deployment to perform a rolling update on helm update + ## ref: https://kubernetes.io/docs/concepts/workloads/controllers/deployment/#strategy + strategy: + type: RollingUpdate + rollingUpdate: + maxSurge: 1 + maxUnavailable: 0 + + # clusterChecksRunner.dnsConfig -- specify dns configuration options for datadog cluster agent containers e.g ndots + ## ref: https://kubernetes.io/docs/concepts/services-networking/dns-pod-service/#pod-dns-config + dnsConfig: {} + # options: + # - name: ndots + # value: "1" + + # clusterChecksRunner.priorityClassName -- Name of the priorityClass to apply to the Cluster checks runners + priorityClassName: # system-cluster-critical + + # clusterChecksRunner.nodeSelector -- Allow the ClusterChecks Deployment to schedule on selected nodes + ## Ref: https://kubernetes.io/docs/user-guide/node-selection/ + # + nodeSelector: {} + + # clusterChecksRunner.tolerations -- Tolerations for pod assignment + ## Ref: https://kubernetes.io/docs/concepts/configuration/taint-and-toleration/ + # + tolerations: [] + + # clusterChecksRunner.healthPort -- Port number to use in the Cluster Checks Runner for the healthz endpoint + healthPort: 5557 + + # clusterChecksRunner.livenessProbe -- Override default agent liveness probe settings + # @default -- Every 15s / 6 KO / 1 OK + ## In case of issues with the probe, you can disable it with the + ## following values, to allow easier investigating: + # + # livenessProbe: + # exec: + # command: ["/bin/true"] + # + livenessProbe: + initialDelaySeconds: 15 + periodSeconds: 15 + timeoutSeconds: 5 + successThreshold: 1 + failureThreshold: 6 + + # clusterChecksRunner.readinessProbe -- Override default agent readiness probe settings + # @default -- Every 15s / 6 KO / 1 OK + ## In case of issues with the probe, you can disable it with the + ## following values, to allow easier investigating: + # + # readinessProbe: + # exec: + # command: ["/bin/true"] + # + readinessProbe: + initialDelaySeconds: 15 + periodSeconds: 15 + timeoutSeconds: 5 + successThreshold: 1 + failureThreshold: 6 + + # clusterChecksRunner.deploymentAnnotations -- Annotations to add to the cluster-checks-runner's Deployment + deploymentAnnotations: {} + # key: "value" + + # clusterChecksRunner.podAnnotations -- Annotations to add to the cluster-checks-runner's pod(s) + podAnnotations: {} + # key: "value" + + # clusterChecksRunner.env -- Environment variables specific to Cluster Checks Runner + ## ref: https://github.com/DataDog/datadog-agent/tree/main/Dockerfiles/agent#environment-variables + env: [] + # - name: + # value: + + # clusterChecksRunner.envFrom -- Set environment variables specific to Cluster Checks Runner from configMaps and/or secrets + ## envFrom to pass configmaps or secrets as environment + ## ref: https://github.com/DataDog/datadog-agent/tree/main/Dockerfiles/agent#environment-variables + envFrom: [] + # - configMapRef: + # name: + # - secretRef: + # name: + + # clusterChecksRunner.volumes -- Specify additional volumes to mount in the cluster checks container + volumes: [] + + # clusterChecksRunner.volumeMounts -- Specify additional volumes to mount in the cluster checks container + volumeMounts: [] + + networkPolicy: + # clusterChecksRunner.networkPolicy.create -- If true, create a NetworkPolicy for the cluster checks runners. + # DEPRECATED. Use datadog.networkPolicy.create instead + create: false + + # clusterChecksRunner.additionalLabels -- Adds labels to the cluster checks runner deployment and pods + additionalLabels: {} + # key: "value" + + # clusterChecksRunner.securityContext -- Allows you to overwrite the default PodSecurityContext on the clusterchecks pods. + securityContext: {} + + # clusterChecksRunner.ports -- Allows to specify extra ports (hostPorts for instance) for this container + ports: [] + +datadog-crds: + crds: + # datadog-crds.crds.datadogMetrics -- Set to true to deploy the DatadogMetrics CRD + datadogMetrics: true + +kube-state-metrics: + rbac: + # kube-state-metrics.rbac.create -- If true, create & use RBAC resources + create: true + + serviceAccount: + # kube-state-metrics.serviceAccount.create -- If true, create ServiceAccount, require rbac kube-state-metrics.rbac.create true + create: true + + # kube-state-metrics.serviceAccount.name -- The name of the ServiceAccount to use. + ## If not set and create is true, a name is generated using the fullname template + name: + + # kube-state-metrics.resources -- Resource requests and limits for the kube-state-metrics container. + resources: {} + # requests: + # cpu: 200m + # memory: 256Mi + # limits: + # cpu: 200m + # memory: 256Mi + + # kube-state-metrics.nodeSelector -- Node selector for KSM. KSM only supports Linux. + nodeSelector: + kubernetes.io/os: linux + + # # kube-state-metrics.image -- Override default image information for the kube-state-metrics container. + # image: + # # kube-state-metrics.repository -- Override default image registry for the kube-state-metrics container. + # repository: k8s.gcr.io/kube-state-metrics/kube-state-metrics + # # kube-state-metrics.tag -- Override default image tag for the kube-state-metrics container. + # tag: v1.9.8 + # # kube-state-metrics.pullPolicy -- Override default image pullPolicy for the kube-state-metrics container. + # pullPolicy: IfNotPresent + +providers: + gke: + # providers.gke.autopilot -- Enables Datadog Agent deployment on GKE Autopilot + autopilot: false + + eks: + ec2: + # providers.eks.ec2.useHostnameFromFile -- Use hostname from EC2 filesystem instead of fetching from metadata endpoint. + ## When deploying to EC2-backed EKS infrastructure, there are situations where the + ## IMDS metadata endpoint is not accesible to containers. This flag mounts the host's + ## `/var/lib/cloud/data/instance-id` and uses that for Agent's hostname instead. + useHostnameFromFile: false +