- [PR #1114](https://github.com/rqlite/rqlite/pull/1114): Support automatically removing non-reachable nodes after a configurable period. Fixes [issue #728](https://github.com/rqlite/rqlite/issues/728).
- [PR #1114](https://github.com/rqlite/rqlite/pull/1114), [PR #1118](https://github.com/rqlite/rqlite/pull/1118): Support automatically removing non-reachable nodes after a configurable period. Fixes [issue #728](https://github.com/rqlite/rqlite/issues/728).
- [PR #1116](https://github.com/rqlite/rqlite/pull/1116), [PR #1117](https://github.com/rqlite/rqlite/pull/1117): Support associative form for query responses. Fixes [issue #1115](https://github.com/rqlite/rqlite/issues/1115).
@ -89,14 +89,16 @@ If you cannot bring sufficient nodes back online such that the cluster can elect
## Automatically removing failed nodes
> :warning: **This functionality was introduced in version 7.11.0. It does not exist in earlier releases.**
rqlite supports automatically removing both voting and non-voting (read-only) nodes that have been non-reachable for a configurable period of time. A non-reachable node is defined as a node that the Leader cannot heartbeat with. To enable reaping set `-raft-reap-nodes` when launching `rqlited`. The reaping timeout for each type of node can be set independently, but defaults to 72 hours. It is recommended that this is set to at least double the maximum expected recoverable outage time for a node or network partition for nodes. Note that the timeout clock is reset if a cluster elects a new Leader.
rqlite supports automatically removing both voting (the default type) and non-voting (read-only) nodes that have been non-reachable for a configurable period of time. A non-reachable node is defined as a node that the Leader cannot heartbeat with. To enable reaping of voting nodes set `-raft-reap-node-timeout` to a non-zero time interval. Likewise, to enable reaping of non-voting (read-only) nodes set `-raft-reap-read-only-node-timeout`.
It is recommended that these values be set conservatively, especially for voting nodes. Setting them too low may mean you don't account for the normal kinds of network outages and tempoary failures that can affect distributed systems such as rqlite. Note that the timeout clock is reset if a cluster elects a new Leader.
### Example configuration
Enable reaping, instructing rqlite to reap non-reachable voting nodes after 2 days, and non-reachable read-only nodes after 4 hours.
Instruct rqlite to reap non-reachable voting nodes after 2 days, and non-reachable read-only nodes after 30 minutes:
```bash
rqlited -node-id 1 -raft-reap-nodes -raft-reap-node-timeout=48h -raft-reap-read-only-node-timeout=4h data
rqlited -node-id 1 -raft-reap-node-timeout=48h -raft-reap-read-only-node-timeout=30m data
```
For reaping to work properly you **must** set these flags on **every** voting node in the cluster -- in otherwords, every node that could potentially become the Leader. To effectively disable reaping for one type of node, but not the other, simply set the relevant timeout to a very long time.
For reaping to work consistently you **must** set these flags on **every** voting node in the cluster -- in otherwords, every node that could potentially become the Leader. You can also set the flags on read-only nodes, but they will simply be silently ignored.
# Dealing with failure
It is the nature of clustered systems that nodes can fail at anytime. Depending on the size of your cluster, it will tolerate various amounts of failure. With a 3-node cluster, it can tolerate the failure of a single node, including the leader.
flag.BoolVar(&config.RaftShutdownOnRemove,"raft-remove-shutdown",false,"Shutdown Raft if node removed")
flag.BoolVar(&config.RaftNoFreelistSync,"raft-no-freelist-sync",false,"Do not sync Raft log database freelist to disk")
flag.StringVar(&config.RaftLogLevel,"raft-log-level","INFO","Minimum log level for Raft module")
flag.BoolVar(&config.RaftReapNodes,"raft-reap-nodes",false,"Enable reaping of non-reachable nodes")
flag.DurationVar(&config.RaftReapNodeTimeout,"raft-reap-node-timeout",72*time.Hour,"Time after which a nonreachable voting node will be reaped")
flag.DurationVar(&config.RaftReapReadOnlyNodeTimeout,"raft-reap-read-only-node-timeout",72*time.Hour,"Time after which a non-reachable non-voting node will be reaped")
flag.DurationVar(&config.RaftReapNodeTimeout,"raft-reap-node-timeout",0*time.Hour,"Time after which a non-reachable voting node will be reaped. If not set, no reaping takes place")
flag.DurationVar(&config.RaftReapReadOnlyNodeTimeout,"raft-reap-read-only-node-timeout",0*time.Hour,"Time after which a non-reachable non-voting node will be reaped. If not set, no reaping takes place")
flag.DurationVar(&config.ClusterConnectTimeout,"cluster-connect-timeout",30*time.Second,"Timeout for initial connection to other nodes")