-
Notifications
You must be signed in to change notification settings - Fork 532
Open
Milestone
Description
This is a meta-issue to track the remaining strong consistency work for Riak 2.0.
- There is currently a manual step that must be performed from the Erlang console to enable strong consistency. This step must be performed on one and only one node in the cluster. This should be changed to be automated, or exposed via a new
riak-admincommand. (Update ensemble bootstrap logic to enable consensus riak_core#571) - Nodes cannot safely be removed from a cluster that uses strong consistency. (Make node removal work with riak_ensemble riak_core#572)
- Using strong consistency on slow disks w/o reducing ring size is likely to lead to 100% CPU/disk usage and poor performance. (Prevent riak_ensemble from overwhelming slow disks riak_ensemble#15)
- There are no user-facing
riak-admincommands to inspect the state of the consensus system. (Add console commands riak_ensemble#9) - There are no stats for consistent K/V operations. (Adds stats for strongly consistent operations riak_kv#876)
- Pending ensemble peers are trusted by default, which makes Riak vulnerable to node failures while ownership changes are occurring (Do not trust pending peers by default riak_ensemble#17)
- The current AAE-based ensemble syncing approach used by
riak_kvis more sensitive to node failures / network partitions than it should be (riak_kv_ensemble_backend syncing should check sibling peers riak_kv#908) - Riak K/V ensemble data is hardcoded to never be trusted. We should make this a configurable setting. Then, users that trust their disks to not silently lose data can chose to switch to trust-by-default in order to need fewer online replicas (Make consistency trust configurable riak_kv#909)
- Several consensus subsystem related settings are hardcoded, but should instead be configurable for advanced users / support scenarios (Make hardcoded settings configurable riak_ensemble#18)
- Writes to consistent bucket types should fail fast if strong consistency is actually disabled. Currently,
riak_clientwill attempt to useriak_ensemble_clientwhich will error when consensus is disabled. (consistent ops should have a fast failure path when ensembles aren't enabled riak_kv#713) - K/V ensemble peers can start up before
riak_kvis ready, leading to various issues (Don't start K/V ensemble peers until riak_kv is up riak_kv#984) - K/V ensemble peers send messages to the vnode proxy without accounting for the fact that the proxy and/or vnode may crash and drop the message on the floor (riak_kv_ensemble_backend should monitor vnode proxy riak_kv#985)
- K/V ensemble peers send messages to the vnode proxy without accounting for the fact that the proxy may drop messages during an overload situation (Implement overload replies for ensemble messages riak_kv#986)
- Ensemble leaders do not step down when they fail a local put, they should (Step down if local_put fails in leader worker riak_ensemble#27)
- Ensemble leaders do not step down when they fail a local get, they should (When leader fails local get, we should step down riak_ensemble#30) (Not an issue, closed)
- Riak does not gracefully handle the case where consensus is not enabled the same across the entire cluster, this should be better handled since mixed configuration is a necessary evil during a rolling configuration change (Gracefully handle mixed consensus configuration #559)
- The ensemble manager does not guarantee state is saved when enabling consensus, which can lead to a potential race condition (Ensemble manager should save state on enable riak_ensemble#34)
- New integrated integrity checking (New integrated integrity checking for previously committed data riak_ensemble#37)
- A small change to check leader lease after reads is necessary to guarantee safety. Also, we should make the leader-only read configurable, in case we have other safety options -- eg. user can configure Riak to always do quorum reads instead. Consider it a "get out of jail" card. (Fix numerous issues with leader leases riak_ensemble#41)
Issues that have been punted to at least 2.0.x (if not 2.1):
- We incorrectly commit views against the joint quorum, when we only need to commit against the initial view. This is only a minor issue, but worth fixing. (2.0 view change bugfix riak_ensemble#3)
Issues that have been punted to 2.1:
- Dynamic ring resizing (esp. shrinking) is not safe in clusters that use strong consistency (Make ring resizing work with riak_ensemble [JIRA: RIAK-1652] riak_kv#900)
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels