Skip to content

The latest version cannot handle situations where the Redis master node goes down. #78

@conan8737

Description

@conan8737

When the Redis master fails, it triggers an automatic ​failover​. Requests sent to the old master node will time out. At this point, eredis_cluster should refresh the cluster mapping, but currently it treats this situation as a pool_busy error - only retrying the operation without actually refreshing the cluster mapping. Therefore, we can only wait for TCP timeout, which would take an excessively long time.

-spec transaction(PoolName::atom(), fun((Worker::pid()) -> redis_result())) ->
    redis_result().
transaction(PoolName, Transaction) ->
    try
        poolboy:transaction(PoolName, Transaction)
    catch
        exit:{timeout, _GenServerCall} ->
            %% Poolboy checkout timeout, but the pool is consistent.
            {error, pool_busy};
        exit:_ ->
            %% Pool doesn't exist? Refresh mapping solves this.
            {error, no_connection}
    end.

poolboy:transaction involves two steps:

  1. Acquire a connection​ from the connection pool
  2. Send the message​ to the Redis node using the connection

Currently, timeouts in ​both steps​ are uniformly treated as pool_busy errors. This should be differentiated:

When ​message sending times out, should refresh the cluster mapping instead of just retrying?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions