Open
Conversation
Periodically emit gauge metrics tracking the TTL of the agent's TLS certificate and CA certificate. The metrics are refreshed on reload and stopped when TLS is removed or the agent shuts down.
Contributor
pkazmierczak
left a comment
There was a problem hiding this comment.
In general looks really good, but I left some comments that I'd like clarified.
| // We have successfully initialized the new TLS metrics emitter, so | ||
| // we can stop the old one (if it exists) and start the new one. | ||
| if a.tlsMetrics != nil { | ||
| a.tlsMetrics.stop() |
Contributor
There was a problem hiding this comment.
This shuts down the channel but doesn't really handle goroutines stop, vide your comment above. Doesn't a reload lead to goroutine leak in this case?
Member
Author
There was a problem hiding this comment.
The tlsMetrics.emitLoop which is run as a go-routine has a select on the channel which gets triggered when it is closed and uses a return to cause the routine to exit.
I've tested this locally using the following workflow:
- start a Nomad agent with debug and TLS enabled
- regenerate the TLS certificate and send SIGHUP to the agent
- navigate to the pprof endpoint
- ensure there is only one
tlsMetricsroutine running, probably inselectas shown below
goroutine 1135 [select]:
github.com/hashicorp/nomad/command/agent.(*tlsMetrics).emitLoop(0x1400135bb60, 0x3b9aca00)
github.com/hashicorp/nomad/command/agent/tls_metrics.go:98 +0x8c
created by github.com/hashicorp/nomad/command/agent.(*tlsMetrics).start in goroutine 1
github.com/hashicorp/nomad/command/agent/tls_metrics.go:79 +0x8c
Is there something else I am missing here?
Contributor
There was a problem hiding this comment.
All good I might've gotten confused in the emitLoop.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Periodically emit gauge metrics tracking the TTL of the agent's TLS certificate and CA certificate. The metrics are refreshed on reload and stopped when TLS is removed or the agent shuts down
The metrics are emitted from the agent as the provided TLS certificates apply to both the server and client agent mode. It therefore felt like the best implementation approach to take and provides an easy to reason and understand approach.
AI Disclosure: I attempted to use Claude to generate some of the TLS metric handle boiler plate. Almost all of the generated code was deleted or reworked. I did not attempt to use any GenAI for the agent code.
Docs: I'll add as a follow up in the unified repo.
Testing & Reproduction steps
Running an agent in dev mode with TLS enabled, the new metrics can be review using the following commands (assuming JQ is available):
Links
Jira: https://hashicorp.atlassian.net/browse/NMD-1055
Closes: #26997
Contributor Checklist
changelog entry using the
make clcommand.ensure regressions will be caught.
and job configuration, please update the Nomad product documentation, which is stored in the
web-unified-docsrepo. Refer to theweb-unified-docscontributor guide for docs guidelines.Please also consider whether the change requires notes within the upgrade
guide. If you would like help with the docs, tag the
nomad-docsteam in this PR.Reviewer Checklist
backporting document.
in the majority of situations. The main exceptions are long-lived feature branches or merges where
history should be preserved.
within the public repository.