Bug: Horde failing some gRPC calls

### Expected Behaviour

The agent batch logs (i.e. the ones that can be viewed by clicking on the **View Log** button in an agent header in a job) do not contain a lot (or any ideally) gRPC failures.

### Current Behaviour

The agent batch logs are full of the following errors:
```
Exception on log tailing task (<log-id>): Status(StatusCode="Internal", Detail="Error starting gRPC call. HttpRequestException: The HTTP/2 server reset the stream. HTTP/2 error code 'PROTOCOL_ERROR' (0x1). (HttpProtocolError) HttpProtocolException: The HTTP/2 server reset the stream. HTTP/2 error code 'PROTOCOL_ERROR' (0x1). (HttpProtocolError)", DebugException="System.Net.Http.HttpRequestException: The HTTP/2 server reset the stream. HTTP/2 error code 'PROTOCOL_ERROR' (0x1). (HttpProtocolError)")
```

This also sometimes causes jobs to fail completely, if it causes the following unhandled exception, which is also gRPC related:
```
Unhandled exception. System.AggregateException: An error occurred while writing to logger(s). (Object reference not set to an instance of an object.)
 ---> System.NullReferenceException: Object reference not set to an instance of an object.
   at EpicGames.Core.LoggerScopeCollection.GetProperties()+MoveNext() in <workspace>\Engine\Source\Programs\Shared\EpicGames.Core\Log.cs:line 831
   at EpicGames.Core.LogEvent.MergedPropertyList.AddRange(IEnumerable`1 properties) in <workspace>\Engine\Source\Programs\Shared\EpicGames.Core\LogEvent.cs:line 106
   at EpicGames.Core.LogEvent.AddProperties(IEnumerable`1 properties) in <workspace>\Engine\Source\Programs\Shared\EpicGames.Core\LogEvent.cs:line 177
   at EpicGames.Core.DefaultLogger.Log[TState](LogLevel logLevel, EventId eventId, TState state, Exception exception, Func`3 formatter) in <workspace>\Engine\Source\Programs\Shared\EpicGames.Core\Log.cs:line 1283
   at Microsoft.Extensions.Logging.Logger.<Log>g__LoggerLog|14_0[TState](LogLevel logLevel, EventId eventId, ILogger logger, Exception exception, Func`3 formatter, List`1& exceptions, TState& state)
   --- End of inner exception stack trace ---
   at Microsoft.Extensions.Logging.Logger.ThrowLoggingError(List`1 exceptions)
   at Microsoft.Extensions.Logging.Logger.Log[TState](LogLevel logLevel, EventId eventId, TState state, Exception exception, Func`3 formatter)
   at Microsoft.Extensions.Logging.LoggerMessage.<>c__DisplayClass8_0.<Define>g__Log|0(ILogger logger, Exception exception)
   at Grpc.Net.Client.Internal.GrpcCallLog.ErrorExceedingDeadline(ILogger logger, Exception ex)
   at Grpc.Net.Client.Internal.GrpcCall`2.DeadlineExceededCallback(Object state)
   at System.Threading.TimerQueueTimer.Fire(Boolean isThreadPool)
   at System.Threading.TimerQueue.FireNextTimers()
   at System.Threading.ThreadPoolWorkQueue.Dispatch()
   at System.Threading.PortableThreadPool.WorkerThread.WorkerThreadStart()
Driver finished with exit code -532462766
```

The failures seem to be more frequent when doing long-running jobs such as a cook with a shader compilation.

I can provide logs with more info if necessary, but the errors are not much clearer.

### Possible Solution

I have tried changing the gRPC target groups' protocols to be `GRPC` instead of `HTTP2` as they currently are, because HTTP/2 ping frames are required for the keep-alive signal. This has not fixed the issue, but I think it may lead to the correct solution, as it seems to be a networking issue, which I haven't experienced using other Horde setups (such as docker compose)

### Steps to Reproduce

- Deploy a Horde server using the CGD Toolkit
- Connect agents to it
  - Note that mine are regular EC2 machines controlled using the `AwsRecycle` strategy, but this shouldn't matter, as they are running the same Horde agent that any machine would.
- Run a packaged build job (helps to have a longer job)
- Look at an agent's batch logs

### Cloud Game Development Toolkit version

latest

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bug: Horde failing some gRPC calls #855

Expected Behaviour

Current Behaviour

Possible Solution

Steps to Reproduce

Cloud Game Development Toolkit version

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Bug: Horde failing some gRPC calls #855

Description

Expected Behaviour

Current Behaviour

Possible Solution

Steps to Reproduce

Cloud Game Development Toolkit version

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions