Skip to content

kazoos connection timeouts don't really account for the behaviour of zookeeper during an election. #771

@ThosRTanner

Description

@ThosRTanner

If you attempt to connect to a zookeeper ensemble while it's undergoing an election, your connection gets closed almost immediately, and you'll find it takes very little time at all to attempt to connect to all the nodes.

At this point, if you don't have any retries configured, kazoo will decide you've lost your connection and when the election finishes you'll have lost all your ephemeral nodes.

This isn't a good thing in our environment, so we tried increasing the number of retries. However, this had other problems in that it would take a VERY long time to determine that'd you'd lost your connection (9 nodes in an ensemble, the connection timeout being taken at 10 seconds, rather than 10/9 seconds (see #685) makes it take an extra 1 1/2 minutes per retry to spot you've disconnected, which is not acceptable for our application

What I think needs to be done (which I'm currently achieving by a nasty hack), is an ability to tell KazooRetry (or something) that it should retry the operation till either it succeeds or the expiry time expires. The (undocumented) deadline parameter doesn't work for that because the connection code keeps resetting the retry object and that clears the deadline

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions