We use a map for storing the requests in cache https://github.com/envoyproxy/xds-relay/blob/master/internal/app/cache/cache.go#L52 which is key'ed on the Discovery Request. As a result, each entry in the map is going to be a unique entry and addition of deletion of unique entries is going to cause a memory overload on the map. It is a known issue in golang maps. (here, here)
In order to prove the hypothesis i replicated the benchmark tests to insert increasing number of DiscoveryRequests and remove them. This simulates the fanout scenario (here). We can see that even if the eventual state in the cache is 1 entry, addition and deletion of increasing amount of map entries causes high degree of processing time.
Benchmarking code: #196
➜ xds-relay git:(master) ✗ export MAX_DISCOVERY_REQUESTS=1 && go test -benchmem -run=^$ github.com/envoyproxy/xds-relay/internal/app/cache -bench "^(BenchmarkCacheRetrieval)$" | grep ns/op
BenchmarkCacheRetrieval-8 721880 1509 ns/op 944 B/op 12 allocs/op
➜ xds-relay git:(master) ✗ export MAX_DISCOVERY_REQUESTS=10 && go test -benchmem -run=^$ github.com/envoyproxy/xds-relay/internal/app/cache -bench "^(BenchmarkCacheRetrieval)$" | grep ns/op
BenchmarkCacheRetrieval-8 809854 1473 ns/op 944 B/op 12 allocs/op
➜ xds-relay git:(master) ✗ export MAX_DISCOVERY_REQUESTS=100 && go test -benchmem -run=^$ github.com/envoyproxy/xds-relay/internal/app/cache -bench "^(BenchmarkCacheRetrieval)$" | grep ns/op
BenchmarkCacheRetrieval-8 658707 1641 ns/op 944 B/op 12 allocs/op
➜ xds-relay git:(master) ✗ export MAX_DISCOVERY_REQUESTS=1000 && go test -benchmem -run=^$ github.com/envoyproxy/xds-relay/internal/app/cache -bench "^(BenchmarkCacheRetrieval)$" | grep ns/op
BenchmarkCacheRetrieval-8 264152 4144 ns/op 944 B/op 12 allocs/op
➜ xds-relay git:(master) ✗ export MAX_DISCOVERY_REQUESTS=10000 && go test -benchmem -run=^$ github.com/envoyproxy/xds-relay/internal/app/cache -bench "^(BenchmarkCacheRetrieval)$" | grep ns/op
BenchmarkCacheRetrieval-8 50784 24675 ns/op 944 B/op 12 allocs/op
➜ xds-relay git:(master) ✗ export MAX_DISCOVERY_REQUESTS=100000 && go test -benchmem -run=^$ github.com/envoyproxy/xds-relay/internal/app/cache -bench "^(BenchmarkCacheRetrieval)$" | grep ns/op
BenchmarkCacheRetrieval-8 5220 222593 ns/op 944 B/op 12 allocs/op
➜ xds-relay git:(master) ✗ export MAX_DISCOVERY_REQUESTS=1000000 && go test -benchmem -run=^$ github.com/envoyproxy/xds-relay/internal/app/cache -bench "^(BenchmarkCacheRetrieval)$" | grep ns/op
BenchmarkCacheRetrieval-8 255 4825196 ns/op 944 B/op 12 allocs/op
In a separate benchmark test #198 from orchestrator perspective, we got similar results.
➜ xds-relay git:(master) ✗ export MAX_DISCOVERY_REQUESTS=1 && go test -benchmem -run=^$ github.com/envoyproxy/xds-relay/internal/app/orchestrator -bench "^(BenchmarkGoldenPath)$" | grep ns
BenchmarkGoldenPath-8 69771 16503 ns/op 9408 B/op 93 allocs/op
➜ xds-relay git:(master) ✗ export MAX_DISCOVERY_REQUESTS=10 && go test -benchmem -run=^$ github.com/envoyproxy/xds-relay/internal/app/orchestrator -bench "^(BenchmarkGoldenPath)$" | grep ns
BenchmarkGoldenPath-8 64796 16518 ns/op 9408 B/op 93 allocs/op
➜ xds-relay git:(master) ✗ export MAX_DISCOVERY_REQUESTS=100 && go test -benchmem -run=^$ github.com/envoyproxy/xds-relay/internal/app/orchestrator -bench "^(BenchmarkGoldenPath)$" | grep ns
BenchmarkGoldenPath-8 68280 18062 ns/op 9408 B/op 93 allocs/op
➜ xds-relay git:(master) ✗ export MAX_DISCOVERY_REQUESTS=1000 && go test -benchmem -run=^$ github.com/envoyproxy/xds-relay/internal/app/orchestrator -bench "^(BenchmarkGoldenPath)$" | grep ns
BenchmarkGoldenPath-8 50516 23984 ns/op 9408 B/op 93 allocs/op
➜ xds-relay git:(master) ✗ export MAX_DISCOVERY_REQUESTS=10000 && go test -benchmem -run=^$ github.com/envoyproxy/xds-relay/internal/app/orchestrator -bench "^(BenchmarkGoldenPath)$" | grep ns
BenchmarkGoldenPath-8 28072 41137 ns/op 9408 B/op 93 allocs/op
➜ xds-relay git:(master) ✗ export MAX_DISCOVERY_REQUESTS=100000 && go test -benchmem -run=^$ github.com/envoyproxy/xds-relay/internal/app/orchestrator -bench "^(BenchmarkGoldenPath)$" | grep ns
BenchmarkGoldenPath-8 4819 236752 ns/op 9426 B/op 93 allocs/op
We use a map for storing the requests in cache https://github.com/envoyproxy/xds-relay/blob/master/internal/app/cache/cache.go#L52 which is key'ed on the Discovery Request. As a result, each entry in the map is going to be a unique entry and addition of deletion of unique entries is going to cause a memory overload on the map. It is a known issue in golang maps. (here, here)
In order to prove the hypothesis i replicated the benchmark tests to insert increasing number of DiscoveryRequests and remove them. This simulates the fanout scenario (here). We can see that even if the eventual state in the cache is 1 entry, addition and deletion of increasing amount of map entries causes high degree of processing time.
Benchmarking code: #196
In a separate benchmark test #198 from orchestrator perspective, we got similar results.