Skip to content

Failure to get an initial Kafka connection should terminate or cause non-liveness #29

@solsson

Description

@solsson

If the kafka client fails to connect we currently get the following state

# curl localhost:8090/health/live
{
    "status": "UP",
    "checks": [
        {
            "name": "REST liveness",
            "status": "UP"
        }
    ]
# curl localhost:8090/health/ready
{
    "status": "DOWN",
    "checks": [
        {
            "name": "consume-loop",
            "status": "DOWN",
            "data": {
                "stage": "WaitingForKafkaConnection"
            }
        }
    ]
}

This service probably need to take a stance on the topic of https://srcco.de/posts/kubernetes-liveness-probes-are-dangerous.html from a sidecar perspective.

The cause of the above state is

2019-09-28 09:02:40,402 INFO  [se.yol.kaf.key.ConsumerAtLeastOnce] (kafkaclient) At stage Initializing before infinite polls with consumer org.apache.kafka.clients.consumer.KafkaConsumer@7f77c83013f0
2019-09-28 09:02:42,063 WARN  [org.apa.kaf.cli.NetworkClient] (kafkaclient) [Consumer clientId=consumer-1, groupId=integrations-b86db879f-r42zr] Connection to node -1 (bootstrap.kafka/10.43.84.242:9092) could not be established. Broker may not be available.
2019-09-28 09:02:45,197 WARN  [org.apa.kaf.cli.NetworkClient] (kafkaclient) [Consumer clientId=consumer-1, groupId=integrations-b86db879f-r42zr] Connection to node -1 (bootstrap.kafka/10.43.84.242:9092) could not be established. Broker may not be available.
2019-09-28 09:02:45,402 ERROR [se.yol.kaf.key.ConsumerAtLeastOnce] (kafkaclient) A Kafka timeout occured at stage WaitingForKafkaConnection: org.apache.kafka.common.errors.TimeoutException: Timeout expired while fetching topic metadata

Exception in thread "kafkaclient" org.apache.kafka.common.errors.TimeoutException: Timeout expired while fetching topic metadata
2019-09-28 09:02:45,402 INFO  [se.yol.kaf.key.ConsumerAtLeastOnce] (kafkaclient) Closing consumer ...
2019-09-28 09:02:45,407 INFO  [se.yol.kaf.key.ConsumerAtLeastOnce] (kafkaclient) Consumer closed at stage WaitingForKafkaConnection; Use liveness probes with /health for app termination
2019-09-30 11:26:24,917 ERROR [org.jbo.res.res.i18n] (executor-thread-1) RESTEASY002010: Failed to execute: javax.ws.rs.ServiceUnavailableException: Denied because cache isn't started yet, check /health for status
    at se.yolean.kafka.keyvalue.http.CacheResource.requireUpToDateCache(CacheResource.java:43)
    at se.yolean.kafka.keyvalue.http.CacheResource.keysJson(CacheResource.java:128)

And REST services respond 503

# curl --verbose localhost:8090/cache/v1/keys
*   Trying ::1...
* TCP_NODELAY set
* connect to ::1 port 8090 failed: Connection refused
*   Trying 127.0.0.1...
* TCP_NODELAY set
* Connected to localhost (127.0.0.1) port 8090 (#0)
> GET /cache/v1/keys HTTP/1.1
> Host: localhost:8090
> User-Agent: curl/7.52.1
> Accept: */*
>
< HTTP/1.1 503 Service Unavailable
< Connection: keep-alive
< Content-Length: 0
< Date: Mon, 30 Sep 2019 11:26:30 GMT
<
* Curl_http_done: called premature == 0
* Connection #0 to host localhost left intact

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions