If you attempt to connect to a zookeeper ensemble while it's undergoing an election, your connection gets closed almost immediately, and you'll find it takes very little time at all to attempt to connect to all the nodes.
At this point, if you don't have any retries configured, kazoo will decide you've lost your connection and when the election finishes you'll have lost all your ephemeral nodes.
This isn't a good thing in our environment, so we tried increasing the number of retries. However, this had other problems in that it would take a VERY long time to determine that'd you'd lost your connection (9 nodes in an ensemble, the connection timeout being taken at 10 seconds, rather than 10/9 seconds (see #685) makes it take an extra 1 1/2 minutes per retry to spot you've disconnected, which is not acceptable for our application
What I think needs to be done (which I'm currently achieving by a nasty hack), is an ability to tell KazooRetry (or something) that it should retry the operation till either it succeeds or the expiry time expires. The (undocumented) deadline parameter doesn't work for that because the connection code keeps resetting the retry object and that clears the deadline
If you attempt to connect to a zookeeper ensemble while it's undergoing an election, your connection gets closed almost immediately, and you'll find it takes very little time at all to attempt to connect to all the nodes.
At this point, if you don't have any retries configured, kazoo will decide you've lost your connection and when the election finishes you'll have lost all your ephemeral nodes.
This isn't a good thing in our environment, so we tried increasing the number of retries. However, this had other problems in that it would take a VERY long time to determine that'd you'd lost your connection (9 nodes in an ensemble, the connection timeout being taken at 10 seconds, rather than 10/9 seconds (see #685) makes it take an extra 1 1/2 minutes per retry to spot you've disconnected, which is not acceptable for our application
What I think needs to be done (which I'm currently achieving by a nasty hack), is an ability to tell KazooRetry (or something) that it should retry the operation till either it succeeds or the expiry time expires. The (undocumented) deadline parameter doesn't work for that because the connection code keeps resetting the retry object and that clears the deadline