1
0
Fork 0
You cannot select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

63 lines
6.0 KiB
Markdown

9 years ago
# rqlite
You can find details on the design and implementation of rqlite from [these blog posts](http://www.philipotoole.com/tag/rqlite/).
5 years ago
The design and implementation of rqlite was also discussed at the [GoSF](http://www.meetup.com/golangsf/) [April 2016](http://www.meetup.com/golangsf/events/230127735/) Meetup. You can find the slides [here](http://www.slideshare.net/PhilipOToole/rqlite-replicating-sqlite-via-raft-consensu). A similar talk was given to the University of Pittsburgh in April 2018. Those slides are [here](https://docs.google.com/presentation/d/1lSNrZJUbAGD-ZsfD8B6_VPLVjq5zb7SlJMzDblq2yzU/edit?usp=sharing).
9 years ago
9 years ago
## Node design
5 years ago
The diagram below shows a high-level view of an rqlite node.
9 years ago
5 years ago
┌ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┐ ┌ ─ ─ ─ ─ ┐
Clients Other
└ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┘ │ Nodes │
│ ─ ─ ─ ─ ─
│ ▲
│ │
│ │
▼ ▼
┌─────────────────────────────┐ ┌───────────────┐
│ HTTP(S) │ │ TCP │
└─────────────────────────────┘ └───────────────┘
┌───────────────────────────────────────────────┐
│ Raft (hashicorp/raft) │
└───────────────────────────────────────────────┘
9 years ago
┌───────────────────────────────────────────────┐
│ matt-n/go-sqlite3 │
└───────────────────────────────────────────────┘
┌───────────────────────────────────────────────┐
│ sqlite3.c │
└───────────────────────────────────────────────┘
┌───────────────────────────────────────────────┐
│ RAM or disk │
└───────────────────────────────────────────────┘
5 years ago
## File system
### Raft
5 years ago
The Raft layer always creates a file -- it creates the _Raft log_. The log stores the set of commited SQLite commands, in the order which they were executed. This log is authoritative record of every change that has happened to the system. It may also contain some read-only queries entries, depending on read-consistency choices.
5 years ago
### SQLite
By default the SQLite layer doesn't create a file. Instead it creates the database in RAM. rqlite can create the SQLite database on disk, if so configured at start-time.
5 years ago
## Log Compaction and Truncation
7 years ago
rqlite automatically performs log compaction, so that disk usage due to the log remains bounded. After a configurable number of changes rqlite snapshots the SQLite database, and truncates the Raft log. This is a technical feature of the Raft consensus system, and most users of rqlite need not be concerned with this.
5 years ago
## Distributed Consensus
The following provides detailed information related to Raft, Distributed Consensus, and rqlite.
### rqlite and the CAP theorem
The [CAP theorem](https://en.wikipedia.org/wiki/CAP_theorem) states that it is impossible for a distributed database to provide consistency, availability, and partition tolerance simulataneously -- that, in the face of a network partition, the database can be available or consistent, but not both.
Raft is a Consistency-Partition (CP) protocol. This means that if a rqlite cluster is partitioned, only the side of the cluster that contains a majority of the nodes will be available. The other side of the cluster will not respond to writes. However the side that remains available will return consistent results, and when the partition is healed, consistent results will continue to be returned.
5 years ago
### Does the protocol require consensus be reached before a commit is accepted?
Yes, this is an intrinsic part of the Raft protocol. How long it takes to reach consensus depends, primarily on your network. It will two rounds trips from a leader to a quorum of nodes, though each of those nodes is contacted in parallel.
### Is the underlying serializable isolation level of SQLite maintained?
Yes, it is.
### Do concurrent writes block each other?
In this regard rqlite currently offers exactly the same semantics as SQLite. Each HTTP request uses the same SQLite connection. Explicit connection control will be available in a future request.
### How does this solution scale?
5 years ago
The simplest way to scale for reads and writes is to use higher-performance disks and a lower-latency network. This is known as _scaling vertically_.
rqlite doesn't scale horizontally for writes however, as all writes must go through the leader. It can be scaled horizontally for reads though, via [read-only nodes](https://github.com/rqlite/rqlite/blob/master/DOC/READ_ONLY_NODES.md).