When the Redis master fails, it triggers an automatic failover. Requests sent to the old master node will time out. At this point, eredis_cluster should refresh the cluster mapping, but currently it treats this situation as a pool_busy error - only retrying the operation without actually refreshing the cluster mapping. Therefore, we can only wait for TCP timeout, which would take an excessively long time.
-spec transaction(PoolName::atom(), fun((Worker::pid()) -> redis_result())) ->
redis_result().
transaction(PoolName, Transaction) ->
try
poolboy:transaction(PoolName, Transaction)
catch
exit:{timeout, _GenServerCall} ->
%% Poolboy checkout timeout, but the pool is consistent.
{error, pool_busy};
exit:_ ->
%% Pool doesn't exist? Refresh mapping solves this.
{error, no_connection}
end.
poolboy:transaction involves two steps:
- Acquire a connection from the connection pool
- Send the message to the Redis node using the connection
Currently, timeouts in both steps are uniformly treated as pool_busy errors. This should be differentiated:
When message sending times out, should refresh the cluster mapping instead of just retrying?
When the Redis master fails, it triggers an automatic failover. Requests sent to the old master node will time out. At this point, eredis_cluster should refresh the cluster mapping, but currently it treats this situation as a pool_busy error - only retrying the operation without actually refreshing the cluster mapping. Therefore, we can only wait for TCP timeout, which would take an excessively long time.
poolboy:transaction involves two steps:
Currently, timeouts in both steps are uniformly treated as pool_busy errors. This should be differentiated:
When message sending times out, should refresh the cluster mapping instead of just retrying?