-
Notifications
You must be signed in to change notification settings - Fork 287
💥Add worker heartbeat support #2186
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
35 commits
Select commit
Hold shift + click to select a range
643a708
Add worker heartbeat support
yuandrew 6af4db8
vendor gopsutil
yuandrew 04ff20f
PR feedback
yuandrew 819080b
Sort plugin names
yuandrew c156e0d
Create new hostinfo package
yuandrew ebf1064
make methods/structs private, remove aw.workerHeartbeatManager
yuandrew b8893b9
tighten lock, consolidate describeNamespace calls to a single call in…
yuandrew a6135de
simplify heartbeat metrics, decouple poller/worker type from WithTags()
yuandrew 4f43e75
remove unused nexus worker, tighten heartbeat callback and make concu…
yuandrew 54ddf1f
Merge branch 'master' into worker-heartbeat
yuandrew e7fbc03
Fix tests
yuandrew edf6e11
Fix cursor discovered bugs, fix integ tests
yuandrew 73f4a10
Rename hostinfo to sysinfo, add interval enforcement, rename mutexes,…
yuandrew 972555a
fix bugs cursor found, sync.oncevalue, separate poll time tracking ou…
yuandrew f952732
Add back resource tuner tests that got dropped
yuandrew a25d85d
Fix tests
yuandrew 53da340
Fix tests, disable heartbeating for normal tests, bump dev server ver…
yuandrew 2155206
Finish renames of sysInfoProvider, handle Time.IsZero(), make pollTim…
yuandrew dd02159
Fix tests
yuandrew bb556cb
remove extra default logger addition, remove dead code
yuandrew 136d311
Merge branch 'master' into worker-heartbeat
yuandrew da40521
forgot a change..
yuandrew 04f5d4d
fix unit tests
yuandrew a5c85d0
Fix eventually expectation for slower CI machines, fix race with hear…
yuandrew b68132e
Merge branch 'master' into worker-heartbeat
yuandrew c5b49db
Gate all sticky cache tests behind maxWorkflowCacheSize checks so it …
yuandrew faeba63
loosen workerInfo.CurrentStickyCacheSize and workerInfo.TotalStickyCa…
yuandrew 004032a
Fix up TestWorkerHeartbeatStickyCacheMiss
yuandrew 257e264
Add comment, minor fix
yuandrew 8d7aa2e
Make SHUTTING_DOWN status atomic, plumb workerInstanceKeys to workflo…
yuandrew 24e102a
Merge branch 'master' into worker-heartbeat1, PR feedback
yuandrew 300c71d
Merge branch 'master' into worker-heartbeat1
yuandrew 41d6afd
bring back listworkers dynamic config, fix identity in heartbeat
yuandrew e297ae1
Add dynamic config for listWorkers for docker test
yuandrew 1cb7c90
server v1.29.1 still requires dynamic config for heartbeating
yuandrew File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
2 changes: 1 addition & 1 deletion
2
contrib/resourcetuner/cgroups_common.go → contrib/sysinfo/cgroups_common.go
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,4 +1,4 @@ | ||
| package resourcetuner | ||
| package sysinfo | ||
|
|
||
| import ( | ||
| "errors" | ||
|
|
||
2 changes: 1 addition & 1 deletion
2
contrib/resourcetuner/cgroups_notlinux.go → contrib/sysinfo/cgroups_notlinux.go
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,6 +1,6 @@ | ||
| //go:build !linux | ||
|
|
||
| package resourcetuner | ||
| package sysinfo | ||
|
|
||
| import "errors" | ||
|
|
||
|
|
||
2 changes: 1 addition & 1 deletion
2
contrib/resourcetuner/cgroups_test.go → contrib/sysinfo/cgroups_test.go
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,4 +1,4 @@ | ||
| package resourcetuner | ||
| package sysinfo | ||
|
|
||
| import ( | ||
| "errors" | ||
|
|
||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,111 @@ | ||
| package sysinfo | ||
|
|
||
| import ( | ||
| "context" | ||
| "runtime" | ||
| "sync" | ||
| "sync/atomic" | ||
| "time" | ||
|
|
||
| "github.com/shirou/gopsutil/v4/cpu" | ||
| "github.com/shirou/gopsutil/v4/mem" | ||
| "go.temporal.io/sdk/worker" | ||
| ) | ||
|
|
||
| var sysInfoProvider = sync.OnceValue(func() *psUtilSystemInfoSupplier { | ||
| return &psUtilSystemInfoSupplier{ | ||
| cGroupInfo: newCGroupInfo(), | ||
| } | ||
| }) | ||
|
|
||
| // SysInfoProvider returns a shared SysInfoProvider using gopsutil. | ||
| // Supports cgroup metrics in containerized Linux environments. | ||
| func SysInfoProvider() worker.SysInfoProvider { | ||
| return sysInfoProvider() | ||
| } | ||
|
|
||
| type psUtilSystemInfoSupplier struct { | ||
| mu sync.Mutex | ||
| lastRefresh atomic.Int64 // UnixNano, atomic for lock-free reads in maybeRefresh | ||
|
|
||
| lastMemStat *mem.VirtualMemoryStat | ||
| lastCpuUsage float64 | ||
|
|
||
| stopTryingToGetCGroupInfo bool | ||
| cGroupInfo cGroupInfo | ||
| } | ||
|
|
||
| type cGroupInfo interface { | ||
| // Update requests an update of the cgroup stats. This is a no-op if not in a cgroup. Returns | ||
| // true if cgroup stats should continue to be updated, false if not in a cgroup or the returned | ||
| // error is considered unrecoverable. | ||
| Update() (bool, error) | ||
| // GetLastMemUsage returns last known memory usage as a fraction of the cgroup limit. 0 if not | ||
| // in a cgroup or limit is not set. | ||
| GetLastMemUsage() float64 | ||
| // GetLastCPUUsage returns last known CPU usage as a fraction of the cgroup limit. 0 if not in a | ||
| // cgroup or limit is not set. | ||
| GetLastCPUUsage() float64 | ||
| } | ||
|
|
||
| func (p *psUtilSystemInfoSupplier) MemoryUsage(infoContext *worker.SysInfoContext) (float64, error) { | ||
| if err := p.maybeRefresh(infoContext); err != nil { | ||
| return 0, err | ||
| } | ||
| p.mu.Lock() | ||
| defer p.mu.Unlock() | ||
| lastCGroupMem := p.cGroupInfo.GetLastMemUsage() | ||
| if lastCGroupMem != 0 { | ||
| return lastCGroupMem, nil | ||
| } | ||
| return p.lastMemStat.UsedPercent / 100, nil | ||
| } | ||
|
|
||
| func (p *psUtilSystemInfoSupplier) CpuUsage(infoContext *worker.SysInfoContext) (float64, error) { | ||
| if err := p.maybeRefresh(infoContext); err != nil { | ||
| return 0, err | ||
| } | ||
| p.mu.Lock() | ||
| defer p.mu.Unlock() | ||
| lastCGroupCPU := p.cGroupInfo.GetLastCPUUsage() | ||
| if lastCGroupCPU != 0 { | ||
| return lastCGroupCPU, nil | ||
| } | ||
| return p.lastCpuUsage / 100, nil | ||
| } | ||
|
|
||
| func (p *psUtilSystemInfoSupplier) maybeRefresh(infoContext *worker.SysInfoContext) error { | ||
| if time.Since(time.Unix(0, p.lastRefresh.Load())) < 100*time.Millisecond { | ||
| return nil | ||
| } | ||
| p.mu.Lock() | ||
| defer p.mu.Unlock() | ||
| // Double check refresh is still needed | ||
| if time.Since(time.Unix(0, p.lastRefresh.Load())) < 100*time.Millisecond { | ||
| return nil | ||
| } | ||
| ctx, cancelFn := context.WithTimeout(context.Background(), 1*time.Second) | ||
| defer cancelFn() | ||
| memStat, err := mem.VirtualMemoryWithContext(ctx) | ||
| if err != nil { | ||
| return err | ||
| } | ||
| cpuUsage, err := cpu.PercentWithContext(ctx, 0, false) | ||
| if err != nil { | ||
| return err | ||
| } | ||
|
|
||
| p.lastMemStat = memStat | ||
| p.lastCpuUsage = cpuUsage[0] | ||
|
|
||
| if runtime.GOOS == "linux" && !p.stopTryingToGetCGroupInfo { | ||
| continueUpdates, err := p.cGroupInfo.Update() | ||
| if err != nil { | ||
| infoContext.Logger.Warn("Failed to get cgroup stats", "error", err) | ||
| } | ||
| p.stopTryingToGetCGroupInfo = !continueUpdates | ||
| } | ||
|
|
||
| p.lastRefresh.Store(time.Now().UnixNano()) | ||
| return nil | ||
| } |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,42 @@ | ||
| package sysinfo | ||
|
|
||
| import ( | ||
| "testing" | ||
|
|
||
| "github.com/stretchr/testify/assert" | ||
| "github.com/stretchr/testify/require" | ||
| "go.temporal.io/sdk/internal/log" | ||
| "go.temporal.io/sdk/worker" | ||
| ) | ||
|
|
||
| func TestGetMemoryCpuUsage(t *testing.T) { | ||
| supplier := SysInfoProvider() | ||
| ctx := &worker.SysInfoContext{Logger: log.NewNopLogger()} | ||
|
|
||
| usage, err := supplier.MemoryUsage(ctx) | ||
| require.NoError(t, err) | ||
| assert.GreaterOrEqual(t, usage, 0.0) | ||
| assert.LessOrEqual(t, usage, 1.0) | ||
|
|
||
| usage, err = supplier.CpuUsage(ctx) | ||
| require.NoError(t, err) | ||
| assert.GreaterOrEqual(t, usage, 0.0) | ||
| assert.LessOrEqual(t, usage, 1.0) | ||
| } | ||
|
|
||
| func TestMaybeRefreshRateLimiting(t *testing.T) { | ||
| supplier := SysInfoProvider().(*psUtilSystemInfoSupplier) | ||
| ctx := &worker.SysInfoContext{Logger: log.NewNopLogger()} | ||
|
|
||
| // First call should refresh | ||
| firstUsage, err := supplier.MemoryUsage(ctx) | ||
| require.NoError(t, err) | ||
| firstRefresh := supplier.lastRefresh.Load() | ||
|
|
||
| // Immediate second call should not refresh (rate limited) | ||
| secondUsage, err := supplier.MemoryUsage(ctx) | ||
| require.NoError(t, err) | ||
| assert.Equal(t, firstRefresh, supplier.lastRefresh.Load()) | ||
|
|
||
| assert.Equal(t, firstUsage, secondUsage) | ||
| } |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.