1
0
Fork 0

Integrate SQL rewrite with rqlite for RANDOM (#1046)

master
Philip O'Toole 2 years ago committed by GitHub
parent 66b3c024dd
commit 95dfead226
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23

@ -1,4 +1,6 @@
## 7.6.2 (unreleased)
## 7.7.0 (unreleased)
### New features
- [PR #1046](https://github.com/rqlite/rqlite/pull/1046): Add rewriting of SQLite `RANDOM()` so statements with this function are safe to use.
### Implementation changes and bug fixes
- [PR #1064](https://github.com/rqlite/rqlite/pull/1064): Upgrade dependencies, and move to requiring Go 1.18 (or later) for building.

@ -0,0 +1,62 @@
# rqlite and Non-deterministic Functions
## Contents
* [Understanding the problem](#understanding-the-problem)
* [How rqlite solves this problem](#how-rqlite-solves-this-problem)
* [What does rqlite rewrite?](#what-does-rqlite-rewrite)
* [RANDOM()](#random)
* [Date and time functions](#date-and-time-functions)
* [Credits](#credits)
## Understanding the problem
rqlite peforms _statement-based replication_. This means that every SQL statement is stored in the Raft log exactly in the form it was received. Each rqlite node then reads the Raft log and applies the SQL statements it finds there to its own local copy of SQLite.
But if a SQL statement contains a [non-deterministic function](https://www.sqlite.org/deterministic.html), this type of replication can result in different SQLite data under each node -- which is not meant to happen. For example, the following statement could result in a different SQLite database under each node:
```
INSERT INTO foo (n) VALUES(random());
```
This is because `RANDOM()` is evaluated by each node independently, and `RANDOM()` will almost certainly return a different value on each node.
## How rqlite solves this problem
An rqlite node addresses this issue by _rewriting_ received SQL statements that contain certain non-deterministic functions, before sending the statement to any other node.
## What does rqlite rewrite?
### `RANDOM()`
> :warning: **This functionality was introduced in version 7.7.0. It does not exist in earlier releases.**
Any SQL statement containing `RANDOM()` is rewritten under any of the following circumstances:
- the statement is part of a write-request i.e. the request is sent to the `/db/execute` HTTP API.
- the statement is part of a read-request i.e. the request is sent to the `/db/execute` HTTP API **and** the read-request is made with _strong_ read consistency.
- `RANDOM()` is not used as an `ORDER BY` qualifier.
`RANDOM()` is replaced with a random integer between -9223372036854775808 and +9223372036854775807.
#### Examples
```bash
# Will be rewritten
curl -XPOST 'localhost:4001/db/execute' -H "Content-Type: application/json" -d '[
"INSERT INTO foo(id, age) VALUES(1234, RANDOM())"
]'
# RANDOM() rewriting explicitly disabled at request-time
curl -XPOST 'localhost:4001/db/execute?norwrandom' -H "Content-Type: application/json" -d '[
"INSERT INTO foo(id, age) VALUES(1234, RANDOM())"
]'
# Not rewritten
curl -G 'localhost:4001/db/query' --data-urlencode 'q=SELECT * FROM foo WHERE id = RANDOM()'
# Rewritten
curl -G 'localhost:4001/db/query?level=strong' --data-urlencode 'q=SELECT * FROM foo WHERE id = RANDOM()'
```
### Date and time functions
rqlite does not yet rewrite [SQLite date and time functions](https://www.sqlite.org/lang_datefunc.html) that are non-deterministic in nature. A example of such a function is
`INSERT INTO datetime_text (d1, d2) VALUES(datetime('now'),datetime('now', 'localtime'))`
Using such functions will result in undefined behavior. Date and time functions that use absolute values will work without issue.
## Credits
Many thanks to [Ben Johnson](https://github.com/benbjohnson) who wrote the SQLite parser used by rqlite.

@ -101,10 +101,8 @@ Since the Raft log is the authoritative store for all data, and it is stored on
## Limitations
* In-memory databases are currently limited to 2GiB (2147483648 bytes) in size. You can learn more about possible ways to get around this limit in the [documentation](https://github.com/rqlite/rqlite/blob/master/DOC/PERFORMANCE.md#in-memory-database-limits).
* Only SQL statements that are [__deterministic__](https://www.sqlite.org/deterministic.html) are safe to use with rqlite, because statements are committed to the Raft log before they are sent to each node. In other words, rqlite performs _statement-based replication_. For example, the following statement could result in a different SQLite database under each node:
```
INSERT INTO foo (n) VALUES(random());
```
* Because rqlite peforms _statement-based replication_ certain [_non-deterministic functions_](https://www.sqlite.org/deterministic.html), e.g. `RANDOM()`, are rewritten by rqlite before being passed to the Raft system and SQLite. To learn more about rqlite's support for non-deterministic functions, check out the [documentation](https://github.com/rqlite/rqlite/blob/master/DOC/NON_DETERMINISTIC_FUNCTIONS.md).
* This has not been extensively tested, but you can directly read the SQLite file under any node at anytime, assuming you run in "on-disk" mode. However there is no guarantee that the SQLite file reflects all the changes that have taken place on the cluster unless you are sure the host node itself has received and applied all changes.
* In case it isn't obvious, rqlite does not replicate any changes made directly to any underlying SQLite file, when run in "on disk" mode. **If you change the SQLite file directly, you may cause rqlite to fail**. Only modify the database via the HTTP API.
* SQLite dot-commands such as `.schema` or `.tables` are not directly supported by the API, but the [rqlite CLI](https://github.com/rqlite/rqlite/blob/master/DOC/CLI.md) supports some very similar functionality. This is because those commands are features of the `sqlite3` command, not SQLite itself.

@ -0,0 +1,37 @@
package command
import (
"strings"
"github.com/rqlite/sql"
)
// Rewrite rewrites the statements such that RANDOM is rewritten,
// if r is true.
func Rewrite(stmts []*Statement, r bool) error {
if !r {
return nil
}
rw := &sql.Rewriter{
RewriteRand: r,
}
for i := range stmts {
// Only replace the incoming statement with a rewritten version if
// there was no error, or if the rewriter did anything. If the statement
// is bad SQLite syntax, let SQLite deal with it -- and let its error
// be returned. Those errors will probably be clearer.
s, err := sql.NewParser(strings.NewReader(stmts[i].Sql)).ParseStatement()
if err != nil {
continue
}
s, f, err := rw.Do(s)
if err != nil || !f {
continue
}
stmts[i].Sql = s.String()
}
return nil
}

@ -0,0 +1,81 @@
package command
import (
"regexp"
"testing"
)
func Test_NoRewrites(t *testing.T) {
for _, str := range []string{
`INSERT INTO "names" VALUES (1, 'bob', '123-45-678')`,
`INSERT INTO "names" VALUES (RANDOM(), 'bob', '123-45-678')`,
`SELECT title FROM albums ORDER BY RANDOM()`,
`INSERT INTO foo(name, age) VALUES(?, ?)`,
} {
stmts := []*Statement{
{
Sql: str,
},
}
if err := Rewrite(stmts, false); err != nil {
t.Fatalf("failed to not rewrite: %s", err)
}
if stmts[0].Sql != str {
t.Fatalf("SQL is modified: %s", stmts[0].Sql)
}
}
}
func Test_NoRewritesMulti(t *testing.T) {
stmts := []*Statement{
{
Sql: `INSERT INTO "names" VALUES (1, 'bob', '123-45-678')`,
},
{
Sql: `INSERT INTO "names" VALUES (RANDOM(), 'bob', '123-45-678')`,
},
{
Sql: `SELECT title FROM albums ORDER BY RANDOM()`,
},
}
if err := Rewrite(stmts, false); err != nil {
t.Fatalf("failed to not rewrite: %s", err)
}
if len(stmts) != 3 {
t.Fatalf("returned stmts is wrong length: %d", len(stmts))
}
if stmts[0].Sql != `INSERT INTO "names" VALUES (1, 'bob', '123-45-678')` {
t.Fatalf("SQL is modified: %s", stmts[0].Sql)
}
if stmts[1].Sql != `INSERT INTO "names" VALUES (RANDOM(), 'bob', '123-45-678')` {
t.Fatalf("SQL is modified: %s", stmts[0].Sql)
}
if stmts[2].Sql != `SELECT title FROM albums ORDER BY RANDOM()` {
t.Fatalf("SQL is modified: %s", stmts[0].Sql)
}
}
func Test_Rewrites(t *testing.T) {
testSQLs := []string{
`INSERT INTO "names" VALUES (1, 'bob', '123-45-678')`, `INSERT INTO "names" VALUES \(1, 'bob', '123-45-678'\)`,
`INSERT INTO "names" VALUES (RANDOM(), 'bob', '123-45-678')`, `INSERT INTO "names" VALUES \(-?[0-9]+, 'bob', '123-45-678'\)`,
`SELECT title FROM albums ORDER BY RANDOM()`, `SELECT title FROM albums ORDER BY RANDOM\(\)`,
`SELECT RANDOM()`, `SELECT -?[0-9]+`,
}
for i := 0; i < len(testSQLs)-1; i += 2 {
stmts := []*Statement{
{
Sql: testSQLs[i],
},
}
if err := Rewrite(stmts, true); err != nil {
t.Fatalf("failed to not rewrite: %s", err)
}
match := regexp.MustCompile(testSQLs[i+1])
if !match.MatchString(stmts[0].Sql) {
t.Fatalf("test %d failed, %s (rewritten as %s) does not regex-match with %s", i, testSQLs[i], stmts[0].Sql, testSQLs[i+1])
}
}
}

@ -21,6 +21,7 @@ require (
github.com/rqlite/go-sqlite3 v1.25.0
github.com/rqlite/raft-boltdb v0.0.0-20211018013422-771de01086ce
github.com/rqlite/rqlite-disco-clients v0.0.0-20220328160918-ec33ecd01491
github.com/rqlite/sql v0.0.0-20220903214138-c36cd21831bb
go.etcd.io/bbolt v1.3.6
go.etcd.io/etcd/client/v3 v3.5.4 // indirect
go.uber.org/atomic v1.10.0 // indirect

@ -66,6 +66,8 @@ github.com/go-logfmt/logfmt v0.3.0/go.mod h1:Qt1PoO58o5twSAckw1HlFXLmHsOX5/0LbT9
github.com/go-logfmt/logfmt v0.4.0/go.mod h1:3RMwSq7FuexP4Kalkev3ejPJsZTpXXBr9+V4qmtdjCk=
github.com/go-logfmt/logfmt v0.5.0/go.mod h1:wCYkCAKZfumFQihp8CzCvQ3paCTfi41vtzG1KdI/P7A=
github.com/go-stack/stack v1.8.0/go.mod h1:v0f6uXyyMGvRgIKkXu+yp6POWl0qKG85gN/melR3HDY=
github.com/go-test/deep v1.0.8 h1:TDsG77qcSprGbC6vTN8OuXp5g+J+b5Pcguhf7Zt61VM=
github.com/go-test/deep v1.0.8/go.mod h1:5C2ZWiW0ErCdrYzpqxLbTX7MG14M9iiw8DgHncVwcsE=
github.com/godbus/dbus/v5 v5.0.4/go.mod h1:xhWf0FNVPg57R7Z0UbKHbJfkEywrmjJnf7w5xrFpKfA=
github.com/gogo/protobuf v1.1.1/go.mod h1:r8qH/GZQm5c6nD/R0oafs1akxWv10x8SbQlK7atdtwQ=
github.com/gogo/protobuf v1.3.2 h1:Ov1cvc58UF3b5XjBnZv7+opcTcQFZebYjWzi34vdm4Q=
@ -271,6 +273,10 @@ github.com/rqlite/raft-boltdb v0.0.0-20211018013422-771de01086ce h1:sVlzmCJiaM0L
github.com/rqlite/raft-boltdb v0.0.0-20211018013422-771de01086ce/go.mod h1:mc+WNDHyskdViYAoPnaMXEBnSKBmoUgiEZjrlAj6G34=
github.com/rqlite/rqlite-disco-clients v0.0.0-20220328160918-ec33ecd01491 h1:EtnrSIl+hdibyxb8ykdU9BFxqKUH4c4trKWLwNQD2vM=
github.com/rqlite/rqlite-disco-clients v0.0.0-20220328160918-ec33ecd01491/go.mod h1:pym85nj6JnCI7rM9RxTZ4cubkTQyyg7uLwVydso9B80=
github.com/rqlite/sql v0.0.0-20220903151541-65e2dbe5f0de h1:brKcAsFjSwFAAW+c7OEkK6+5+1vzjHvZwyAS9vo2uAg=
github.com/rqlite/sql v0.0.0-20220903151541-65e2dbe5f0de/go.mod h1:ib9zVtNgRKiGuoMyUqqL5aNpk+r+++YlyiVIkclVqPg=
github.com/rqlite/sql v0.0.0-20220903214138-c36cd21831bb h1:0zPg8UumyXnbHSKrv4EjdJENDcNVi3RtU9ptAuejBSU=
github.com/rqlite/sql v0.0.0-20220903214138-c36cd21831bb/go.mod h1:ib9zVtNgRKiGuoMyUqqL5aNpk+r+++YlyiVIkclVqPg=
github.com/ryanuber/columnize v0.0.0-20160712163229-9b3edd62028f/go.mod h1:sm1tb6uqfes/u+d4ooFouqFdy9/2g9QGwK3SQygK0Ts=
github.com/ryanuber/columnize v2.1.0+incompatible/go.mod h1:sm1tb6uqfes/u+d4ooFouqFdy9/2g9QGwK3SQygK0Ts=
github.com/sean-/seed v0.0.0-20170313163322-e2103e2c3529 h1:nn5Wsu0esKSJiIVhscUtVbo7ada43DJhG55ua/hjS5I=

@ -1000,6 +1000,15 @@ func (s *Service) queuedExecute(w http.ResponseWriter, r *http.Request) {
return
}
}
noRewriteRandom, err := noRewriteRandom(r)
if err != nil {
http.Error(w, err.Error(), http.StatusBadRequest)
return
}
if err := command.Rewrite(stmts, !noRewriteRandom); err != nil {
http.Error(w, fmt.Sprintf("SQL rewrite: %s", err.Error()), http.StatusInternalServerError)
return
}
timeout, err := timeoutParam(r, defaultTimeout)
if err != nil {
@ -1040,7 +1049,7 @@ func (s *Service) queuedExecute(w http.ResponseWriter, r *http.Request) {
func (s *Service) execute(w http.ResponseWriter, r *http.Request) {
resp := NewResponse()
timeout, isTx, timings, redirect, err := reqParams(r, defaultTimeout)
timeout, isTx, timings, redirect, noRewriteRandom, err := reqParams(r, defaultTimeout)
if err != nil {
http.Error(w, err.Error(), http.StatusBadRequest)
return
@ -1058,6 +1067,10 @@ func (s *Service) execute(w http.ResponseWriter, r *http.Request) {
http.Error(w, err.Error(), http.StatusBadRequest)
return
}
if err := command.Rewrite(stmts, !noRewriteRandom); err != nil {
http.Error(w, fmt.Sprintf("SQL rewrite: %s", err.Error()), http.StatusInternalServerError)
return
}
er := &command.ExecuteRequest{
Request: &command.Request{
@ -1132,7 +1145,7 @@ func (s *Service) handleQuery(w http.ResponseWriter, r *http.Request) {
resp := NewResponse()
timeout, isTx, timings, redirect, err := reqParams(r, defaultTimeout)
timeout, isTx, timings, redirect, noRewriteRandom, err := reqParams(r, defaultTimeout)
if err != nil {
http.Error(w, err.Error(), http.StatusBadRequest)
return
@ -1157,6 +1170,15 @@ func (s *Service) handleQuery(w http.ResponseWriter, r *http.Request) {
return
}
// No point rewriting queries if they don't go through the Raft log, since they
// will never be replayed from the log anyway.
if lvl == command.QueryRequest_QUERY_REQUEST_LEVEL_STRONG {
if err := command.Rewrite(queries, noRewriteRandom); err != nil {
http.Error(w, fmt.Sprintf("SQL rewrite: %s", err.Error()), http.StatusInternalServerError)
return
}
}
qr := &command.QueryRequest{
Request: &command.Request{
Transaction: isTx,
@ -1555,24 +1577,28 @@ func isQueue(req *http.Request) (bool, error) {
// reqParams is a convenience function to get a bunch of query params
// in one function call.
func reqParams(req *http.Request, def time.Duration) (timeout time.Duration, tx, timings, redirect bool, err error) {
func reqParams(req *http.Request, def time.Duration) (timeout time.Duration, tx, timings, redirect, noRwRandom bool, err error) {
timeout, err = timeoutParam(req, def)
if err != nil {
return 0, false, false, false, err
return 0, false, false, false, true, err
}
tx, err = isTx(req)
if err != nil {
return 0, false, false, false, err
return 0, false, false, false, true, err
}
timings, err = isTimings(req)
if err != nil {
return 0, false, false, false, err
return 0, false, false, false, true, err
}
redirect, err = isRedirect(req)
if err != nil {
return 0, false, false, false, err
return 0, false, false, false, true, err
}
return timeout, tx, timings, redirect, nil
noRwRandom, err = noRewriteRandom(req)
if err != nil {
return 0, false, false, false, true, err
}
return timeout, tx, timings, redirect, noRwRandom, nil
}
// noLeader returns whether processing should skip the leader check.
@ -1595,6 +1621,11 @@ func isWait(req *http.Request) (bool, error) {
return queryParam(req, "wait")
}
// noRewriteRandom returns whether a rewrite of RANDOM is disabled.
func noRewriteRandom(req *http.Request) (bool, error) {
return queryParam(req, "norwrandom")
}
// level returns the requested consistency level for a query
func level(req *http.Request) (command.QueryRequest_Level, error) {
q := req.URL.Query()

@ -162,6 +162,65 @@ func Test_MultiNodeCluster(t *testing.T) {
}
}
// Test_MultiNodeClusterRANDOM tests operation of RANDOM() SQL rewriting. It checks that a rewritten
// statement is sent to follower.
func Test_MultiNodeClusterRANDOM(t *testing.T) {
node1 := mustNewLeaderNode()
defer node1.Deprovision()
node2 := mustNewNode(false)
defer node2.Deprovision()
if err := node2.Join(node1); err != nil {
t.Fatalf("node failed to join leader: %s", err.Error())
}
_, err := node2.WaitForLeader()
if err != nil {
t.Fatalf("failed waiting for leader: %s", err.Error())
}
// Get the new leader, in case it changed.
c := Cluster{node1, node2}
leader, err := c.Leader()
if err != nil {
t.Fatalf("failed to find cluster leader: %s", err.Error())
}
_, err = leader.Execute("CREATE TABLE foo (id integer not null primary key, name text)")
if err != nil {
t.Fatalf("failed to create table: %s", err.Error())
}
_, err = leader.Execute(`INSERT INTO foo(id, name) VALUES(RANDOM(), "sinead")`)
if err != nil {
t.Fatalf("failed to INSERT record: %s", err.Error())
}
r, err := leader.Query("SELECT COUNT(*) FROM foo")
if err != nil {
t.Fatalf("failed to query for count: %s", err.Error())
}
if got, exp := r, `{"results":[{"columns":["COUNT(*)"],"types":[""],"values":[[1]]}]}`; got != exp {
t.Fatalf("wrong query results, exp %s, got %s", exp, got)
}
// Send a few Noops through to ensure SQLite database has been updated on each node.
for i := 0; i < 5; i++ {
node1.Noop("some_id")
}
// Check that row is *exactly* the same on each node. This could only happen if RANDOM was
// rewritten by the Leader before committing to the Raft log.
r1, err := node1.QueryNoneConsistency("SELECT * FROM foo")
if err != nil {
t.Fatalf("failed to query node 1: %s", err.Error())
}
r2, err := node2.QueryNoneConsistency("SELECT * FROM foo")
if err != nil {
t.Fatalf("failed to query node 2: %s", err.Error())
}
if r1 != r2 {
t.Fatalf("node 1 and node 2 do not have the same data (%s %s)", r1, r2)
}
}
// Test_MultiNodeClusterBootstrap tests formation of a 3-node cluster via bootstraping,
// and its operation.
func Test_MultiNodeClusterBootstrap(t *testing.T) {

@ -7,6 +7,7 @@ import (
"fmt"
"os"
"path/filepath"
"regexp"
"sync"
"testing"
"time"
@ -333,6 +334,26 @@ func Test_SingleNodeParameterizedNamed(t *testing.T) {
}
}
func Test_SingleNodeRewriteRandom(t *testing.T) {
node := mustNewLeaderNode()
defer node.Deprovision()
_, err := node.Execute(`CREATE TABLE foo (id integer not null primary key, name text)`)
if err != nil {
t.Fatalf(`CREATE TABLE failed: %s`, err.Error())
}
resp, err := node.Execute(`INSERT INTO foo(id, name) VALUES(RANDOM(), "fiona")`)
if err != nil {
t.Fatalf(`queued write failed: %s`, err.Error())
}
match := regexp.MustCompile(`{"results":[{"last_insert_id":\-?[0-9]+,"rows_affected":1}]}`)
if !match.MatchString(resp) {
t.Fatalf("test received wrong result got %s", resp)
}
}
func Test_SingleNodeQueued(t *testing.T) {
node := mustNewLeaderNode()
defer node.Deprovision()

Loading…
Cancel
Save