Skip to content

PMM-12832 Timeouts for exporters.#5134

Open
JiriCtvrtka wants to merge 45 commits intov3from
PMM-12832-exporter-timeouts
Open

PMM-12832 Timeouts for exporters.#5134
JiriCtvrtka wants to merge 45 commits intov3from
PMM-12832-exporter-timeouts

Conversation

@JiriCtvrtka
Copy link
Copy Markdown
Contributor

@JiriCtvrtka JiriCtvrtka commented Mar 12, 2026

@codecov
Copy link
Copy Markdown

codecov bot commented Mar 20, 2026

Codecov Report

❌ Patch coverage is 83.42541% with 60 lines in your changes missing coverage. Please review.
✅ Project coverage is 48.66%. Comparing base (e882771) to head (0eb4514).
⚠️ Report is 1 commits behind head on v3.

Files with missing lines Patch % Lines
managed/services/inventory/agents.go 70.58% 20 Missing ⚠️
managed/models/agent_model.go 87.38% 10 Missing and 4 partials ⚠️
managed/services/converters.go 73.33% 4 Missing and 4 partials ⚠️
managed/models/database.go 0.00% 1 Missing and 1 partial ⚠️
.../commands/inventory/add_agent_external_exporter.go 0.00% 1 Missing ⚠️
...n/commands/inventory/add_agent_mongodb_exporter.go 0.00% 1 Missing ⚠️
...in/commands/inventory/add_agent_mysqld_exporter.go 0.00% 1 Missing ⚠️
...dmin/commands/inventory/add_agent_node_exporter.go 0.00% 1 Missing ⚠️
.../commands/inventory/add_agent_postgres_exporter.go 0.00% 1 Missing ⚠️
.../commands/inventory/add_agent_proxysql_exporter.go 0.00% 1 Missing ⚠️
... and 10 more
Additional details and impacted files
@@            Coverage Diff             @@
##               v3    #5134      +/-   ##
==========================================
+ Coverage   47.15%   48.66%   +1.51%     
==========================================
  Files         410      411       +1     
  Lines       41969    42093     +124     
==========================================
+ Hits        19790    20485     +695     
+ Misses      20234    19555     -679     
- Partials     1945     2053     +108     
Flag Coverage Δ
admin 35.81% <39.13%> (+0.01%) ⬆️
agent 53.04% <ø> (+2.28%) ⬆️
managed 49.19% <86.43%> (+1.49%) ⬆️
vmproxy 72.09% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.


// DSN returns a DSN string for accessing a given Service with this Agent (and an implicit driver).
func (s *Agent) DSN(service *Service, dsnParams DSNParams, tdp *DelimiterPair, pmmAgentVersion *version.Parsed) string { //nolint:cyclop,maintidx
func (a *Agent) DSN(service *Service, dsnParams DSNParams, tdp *DelimiterPair, pmmAgentVersion *version.Parsed) string { //nolint:cyclop,maintidx
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not related, but we discussed this on Go BE.

@JiriCtvrtka JiriCtvrtka marked this pull request as ready for review March 20, 2026 13:53
@JiriCtvrtka JiriCtvrtka requested a review from a team as a code owner March 20, 2026 13:53
@JiriCtvrtka JiriCtvrtka requested review from ademidoff and maxkondr and removed request for a team March 20, 2026 13:53
MongoDBExporterType,
PostgresExporterType,
ProxySQLExporterType,
RDSExporterType,
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

QQ: Doesn't RDS Exporter have a higher default timeout?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, it has usually around 90% of scrape interval. Will check it.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right now only scrape interval and scrape timeout from configs is used. Timeout in DSN is removed and not used like in v3.

}

expected := "redis://username:s3cur3%20p%40$$w0r4.@1.2.3.4:12345"
expected := "redis://username:s3cur3%20p%40$$w0r4.@1.2.3.4:12345?dial_timeout=1s"
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From what I see in the documentation, Redix exporter does not support such a parameter in their connection string. They seem to use the --connection-timeout parameter or an env variable.

Let's double check to be sure.

Copy link
Copy Markdown
Contributor Author

@JiriCtvrtka JiriCtvrtka Mar 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True, no DSN param, but flag. Should be correct right now. Also test file for this is added. Thank you.

return err
}

exporterOptions := models.ExporterOptions{
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it justified to have it as a separate variable vs inline as before?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to original format.

// (node / external / RDS scrape of exporter HTTP). By Prometheus rule, scrape_timeout
// cannot exceed scrape_interval, so PMM caps it at 0.9 of scrape interval:
// https://prometheus.io/docs/prometheus/latest/configuration/configuration/
func applyExporterScrapeTimeout(cfg *config.ScrapeConfig, agent *models.Agent) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Method names in Go don't need to be that long, I guess exporterScrapeTimeout is good enough, similar to the one right below :)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, renamed

ExtraDsnParams: agent.MySQLOptions.ExtraDSNParams,
}, nil
}
if agent.ExporterOptions.Timeout != 0 {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure, but I'm wondering if this if statement is not redundant.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agent.ExporterOptions.Timeout is not pointer so check could be skipped, but exporter.Timeout is so without this it will set 0 even for non set timeouts. With check it if more visible if it was really set or not.

@JiriCtvrtka JiriCtvrtka requested a review from ademidoff March 22, 2026 14:35
CustomLabels map[string]string `mapsep:"," help:"Custom user-assigned labels"`
PushMetrics bool `help:"Enables push metrics model flow, it will be sent to the server by an agent"`
TLSSkipVerify bool `help:"Skip TLS certificate verification"`
Timeout string `help:"Connection timeout to use for exporter (e.g. 1s, 500ms)"`
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Timeout string `help:"Connection timeout to use for exporter (e.g. 1s, 500ms)"`
Timeout *time.Duration `placeholder:"DURATION" help:"Connection timeout to use for exporter (e.g. 1s, 500ms)"`

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here and at the rest of commands

// Metrics resolution for this agent.
common.MetricsResolutions metrics_resolutions = 15;
// Connection timeout for exporter (if set).
google.protobuf.Duration timeout = 16;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there may be many timeouts: connection, fetch metrics, etc.
it is better to explicitly define that timeout is ment here like connection_timeout

// Real-Time Analytics options.
inventory.v1.RTAOptions rta_options = 42;
// Connection timeout for exporter (if set).
google.protobuf.Duration timeout = 43;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the same here and in the rest of the*.proto files

// Maximum number of exporter connections to PostgreSQL instance.
int32 max_postgresql_exporter_connections = 33;
// Connection timeout for exporter (if set).
google.protobuf.Duration timeout = 35;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
google.protobuf.Duration timeout = 35;
google.protobuf.Duration timeout = 34;

Comment on lines +826 to +837
switch a.AgentType {
case NodeExporterType,
MySQLdExporterType,
MongoDBExporterType,
PostgresExporterType,
ProxySQLExporterType,
AzureDatabaseExporterType,
ExternalExporterType,
ValkeyExporterType:
return 1 * time.Second
default:
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should it be there if effectiveDialTimeoutFallbackSec is used as fallback?

QANMongoDBMongologAgentType,
MongoDBExporterType,
RTAMongoDBAgentType)
MongoDBExporterType)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Why RTA and MongoLog agents types have been removed?
  2. Why MongoProfiler type is kept?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I resolved conflict during merging main wrong. I will fix it, thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants