Before Raft truncates the log, it needs to retrieve a snapshot of the SQLite database to represent the log entries that are removed during that truncation. Raft then writes that snapshot to disk. To reduce both disk IO and disk space this change compresses that SQLite snapshot before it is written to disk, and decompresses it during a restore operation.
This change also deals with older SQLite snapshots, which were not compressed.
rqlite used to work like this, but suffered a regression due to a change in how Hashicorp Raft worked. The manner it changed in was not public, so relying on it was always fragile.
This PR changes Raft Log Entry encoding from JSON to Protobuf. Furthermore, larger Raft commands (which can result from batching SQL statements, or individually long SQL statements) are compressed before encoding.
This primary reason for this change is to reduce IO load since that is one of the largest performance bottlenecks. It will also reduce internode traffic.
Legacy JSON-encoded commands are still handled by this code, so this change is backwards-compatible with previous releases in the v5 series.
This change allows the read request to specify the maximum time the node
receiving the request may have last heard from the cluster leader. It
only applies to a read consistency selection of None.
A non-voting node doesn't participate in Raft consensus, but does
subscribe to the committed log entries originating with the leader.
This means a non-voting node keeps up-to-date with the state machine,
without impacting write-latency. These non-voting nodes can provide
read scalability for the cluster.
With this change the cluster metadata (arbitrary key-value data associated with each node) is now broadcast across the cluster using the standard consensus mechanism. Specifically the use case for this metadata is to allow all nodes know the HTTP API address of all other nodes, for the purpose of redirecting requests to the leader.
This change removed the need for multiplexing two logical connections
over the single Raft TCP connection, which greatly simplifies the
networking code generally.
Original PR https://github.com/rqlite/rqlite/pull/434
The Query() and Execute() functions on the Store now take a complex type
that encapsulates the statements, and associated parameters. This will
make it easier to add more control parameters when making requests.
Writing a sufficiently sophisicated parser is too much work, and unlikely to be successful soon. Instead this change simply loads the entire dump as one command, and allows the underlying SQLite support to parse it correctly. This will definitely work, but since the load goes over Raft, it may hit limits with regards to network transfer sizes. Right now any limitations in that area are unknown.
Therefore this functionality remains somewhat experimental.
This allows us to grab a raw snapshot of the databse.
v2: s/RawDBSnapshot/Database, remove reference to raft from the comment.
v3: add a leader arg to Database to do a leader check
v4: fix doc string.
Signed-off-by: Tycho Andersen <tycho.andersen@canonical.com>
This is useful in case the server needs to store other metadata (e.g. auth
data) along side the peer list.
The logging bit is handy in case something has its own logging framework
that it wants to use.
Signed-off-by: Tycho Andersen <tycho.andersen@canonical.com>
This was possible previously, but would need to be set everytime on
startup via the API. This change allows it to set at startup AND enables
foreign constraint checking by default.
Integrate TCP mux with cluster and store
This change allows any node, including followers, to use the Raft log to make changes to a cluster-wide state.
This is not a great solution, and somewhat of a hack to make
unit-testing easier. However, it will allow control over "time"
in the response in the future.
It might still need to be richer, so end-users could specify an
in-memory SQLite database. Specifying a DSN would require the user
to supply the full path to the SQLite database. This is OK. However,
the code then needs to be able to parse out the path to the database
so it can remove it before start up.
This shows that passing the database into the Raft module is probably
not going to work, since the database could be opened in two ways -
directly at startup, and be completely restored from the log, or with
a combination of restoring from a snapshot, followed by the remaining
log entries. In both cases the database must be opened using the
requested DSN settings.
A detailed config object for controlling SQLite behaviour, is probably
best, and it should be passed to the Raft store on start up.
Finally, file-level copying of the SQLite file can only take place if
no transaction is in effect. This might be handled by the use of a
RWLock. The write-lock is taken during Execute() and Snapshot, but
the Read lock is taken during Query(). Unfortunately this may reduce
the concurrency of inserts and updates. Perhaps the Backup call on
the Go SQLite library might be better, but it might be slow.