This is a breaking change. However this feature hasn't been tested and
allows end-users to break the system too easily. Low-level control over
the SQLite database is better done with PRAGMA commands where possible.
With this change, rqlite nodes running in "on-disk" mode build the database in memory first, and move it to the disk just before the Raft system starts. This means that on-disk nodes now initialize almost as quickly as in-memory nodes.
Don't allow control over trailing logs directly, just implement a policy
that scales the number by 1.25 relative to Snapshot threshold. Without
doing this it may be that setting the snapshot threshold lower doesn't
actually affect SQLite start up times, due to log application. Perhaps
this really needs exposure at the command line, but the command line is
starting to get complicated.
Before Raft truncates the log, it needs to retrieve a snapshot of the SQLite database to represent the log entries that are removed during that truncation. Raft then writes that snapshot to disk. To reduce both disk IO and disk space this change compresses that SQLite snapshot before it is written to disk, and decompresses it during a restore operation.
This change also deals with older SQLite snapshots, which were not compressed.
rqlite used to work like this, but suffered a regression due to a change in how Hashicorp Raft worked. The manner it changed in was not public, so relying on it was always fragile.
This PR changes Raft Log Entry encoding from JSON to Protobuf. Furthermore, larger Raft commands (which can result from batching SQL statements, or individually long SQL statements) are compressed before encoding.
This primary reason for this change is to reduce IO load since that is one of the largest performance bottlenecks. It will also reduce internode traffic.
Legacy JSON-encoded commands are still handled by this code, so this change is backwards-compatible with previous releases in the v5 series.
This change allows the read request to specify the maximum time the node
receiving the request may have last heard from the cluster leader. It
only applies to a read consistency selection of None.
A non-voting node doesn't participate in Raft consensus, but does
subscribe to the committed log entries originating with the leader.
This means a non-voting node keeps up-to-date with the state machine,
without impacting write-latency. These non-voting nodes can provide
read scalability for the cluster.
With this change the cluster metadata (arbitrary key-value data associated with each node) is now broadcast across the cluster using the standard consensus mechanism. Specifically the use case for this metadata is to allow all nodes know the HTTP API address of all other nodes, for the purpose of redirecting requests to the leader.
This change removed the need for multiplexing two logical connections
over the single Raft TCP connection, which greatly simplifies the
networking code generally.
Original PR https://github.com/rqlite/rqlite/pull/434
The Query() and Execute() functions on the Store now take a complex type
that encapsulates the statements, and associated parameters. This will
make it easier to add more control parameters when making requests.