Skip to content

[rush] duplicated cobuild telemetry leading to data skew #4737

@aramissennyeydd

Description

@aramissennyeydd

Summary

When using cobuilds, there is no easy way to determine if a given project was cobuilt or not during the flushTelemetry hook. You get a full list of operations and their execution time, both start + end time and nonCachedDurationMs. I opened #4680 as start + end time aren't super useful for cobuild cache hits. That work still causes data skew though. As a custom telemetry integrator, I'd like to make sure that I'm only counting operations once per cobuild for the agent that handled the building itself. No other agents should report on that operation.

Details

Current Skew

Main Agent

"@company/my-package (test)": {
        "startTimestampMs": 31468.20680500008,
        "endTimestampMs": 40377.58414799906,
        "nonCachedDurationMs": 8649.619353000075,
        "result": "SUCCESS",
        "dependencies": [
          "@company/my-package (build)"
        ]
},

Build Cache Restore Agent

      "@company/my-package (test)": {
        "startTimestampMs": 220830.3333630003,
        "endTimestampMs": 220875.03935700096,
        "nonCachedDurationMs": 8649.619353000075,
        "result": "SUCCESS",
        "dependencies": [
          "@company/my-package (build)"
        ]
      },

For the primary agent, endTimestampMs - startTimestampMs doesn't match nonCachedDurationMs. On the build cache restore agent, there's no reliable way to determine if the package was built on this machine or another machine. We currently report both and have been accidentally introducing data skew to our metric collection.

Recommendation

Possibly in conjunction with #4680, adding a new wasCobuiltOnThisAgent property to telemetry operation events would allow integrators to differentiate between primary and cache restore agents. I'd also recommend deprecating nonCachedDurationMs from telemetry, and moving to use just start and end time + wasCobuildOnThisAgent, as it's confusing to have multiple possible sources of truth for timing. It may be useful to capture build cache restore time elsewhere; for our use case though, it's not yet important to track.

Standard questions

Please answer these questions to help us investigate your issue more quickly:

Question Answer
Would you consider contributing a PR?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    Status

    Closed

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions