You cannot select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

774 lines
43 KiB
Plaintext

2 years ago
{
"metadata": {
"language_info": {
"codemirror_mode": {
"name": "text/plain"
},
"file_extension": ".txt",
"mimetype": "text/plain",
"name": "cozo",
"nbconvert_exporter": "text",
"pygments_lexer": "text",
"version": "es2017"
},
"kernelspec": {
"name": "cozo",
"display_name": "CozoScript (localhost)",
"language": "text"
}
},
"nbformat_minor": 4,
"nbformat": 4,
"cells": [
{
"cell_type": "markdown",
"source": "# Air-data acrobatics",
"metadata": {}
},
{
"cell_type": "markdown",
"source": "## Hello, world!",
"metadata": {}
},
{
"cell_type": "markdown",
"source": "Let's start exploring the Cozo database by following the \"hello world\" tradition:",
"metadata": {}
},
{
"cell_type": "code",
"source": "?[a, b, c] <- [['hello', 'world', 'Cozo!']]",
"metadata": {
"trusted": true
},
2 years ago
"execution_count": 1,
2 years ago
"outputs": [
{
2 years ago
"execution_count": 1,
2 years ago
"output_type": "execute_result",
"data": {
2 years ago
"text/html": "<div style=\"display: flex; align-items: end; flex-direction: row;\"><table><thead><tr><td style=\"font-weight: bold\">a</td><td style=\"font-weight: bold\">b</td><td style=\"font-weight: bold\">c</td></tr></thead><tbody><tr><td>hello</td><td>world</td><td>Cozo!</td></tr></tbody></table><span style=\"color: darkgrey; font-size: xx-small; margin: 13px;\">Took 1ms</span></div>"
2 years ago
},
"metadata": {}
}
]
},
{
"cell_type": "markdown",
"source": "Let's break that down. This query consists of two parts, the part before `<-` is called its _head_, and the part after is called its _body_. The symbol `<-` itself denotes that this is a _constant rule_, or a declaration of _facts_.\n\nThe head has the special name `?`, indicating the _entry_ of the query, which has three _arguments_ `a`, `b`, and `c`.\n\nThe body consists of a list of lists (in this case a list of a single inner list). Each inner list represents a _tuple_, which is similar to a row in a relational database. The length of the inner list must match the number of arguments of the head, and each argument is then _bound_ to the corresponding value in the inner list by position.\n\nOf course more than one inner list is allowed:",
"metadata": {}
},
{
"cell_type": "code",
"source": "?[a, b, c] <- [['hello', 'world', 'Cozo!'],\n ['hello', 'world', 'database!']]",
"metadata": {
"trusted": true
},
2 years ago
"execution_count": 2,
2 years ago
"outputs": [
{
2 years ago
"execution_count": 2,
2 years ago
"output_type": "execute_result",
"data": {
2 years ago
"text/html": "<div style=\"display: flex; align-items: end; flex-direction: row;\"><table><thead><tr><td style=\"font-weight: bold\">a</td><td style=\"font-weight: bold\">b</td><td style=\"font-weight: bold\">c</td></tr></thead><tbody><tr><td>hello</td><td>world</td><td>Cozo!</td></tr><tr><td>hello</td><td>world</td><td>database!</td></tr></tbody></table><span style=\"color: darkgrey; font-size: xx-small; margin: 13px;\">Took 0ms</span></div>"
2 years ago
},
"metadata": {}
}
]
},
{
"cell_type": "markdown",
"source": "Let's try the following:",
"metadata": {}
},
{
"cell_type": "code",
"source": "?[a] <- [['hello'], ['world'], ['Cozo!']]",
"metadata": {
"trusted": true
},
2 years ago
"execution_count": 3,
2 years ago
"outputs": [
{
2 years ago
"execution_count": 3,
2 years ago
"output_type": "execute_result",
"data": {
2 years ago
"text/html": "<div style=\"display: flex; align-items: end; flex-direction: row;\"><table><thead><tr><td style=\"font-weight: bold\">a</td></tr></thead><tbody><tr><td>Cozo!</td></tr><tr><td>hello</td></tr><tr><td>world</td></tr></tbody></table><span style=\"color: darkgrey; font-size: xx-small; margin: 13px;\">Took 0ms</span></div>"
2 years ago
},
"metadata": {}
}
]
},
{
"cell_type": "markdown",
"source": "Now we have three inner lists of length 1 each. The returned results is also _sorted_: all relations in Cozo are sorted lexicographically by position.\n\nCozo operates on _set semantics_ instead of _bag semantics_: observe",
"metadata": {}
},
{
"cell_type": "code",
"source": "?[a] <- [['hello'], ['world'], ['Cozo!'], ['hello'], ['world'], ['Cozo.']]",
"metadata": {
"trusted": true
},
2 years ago
"execution_count": 4,
2 years ago
"outputs": [
{
2 years ago
"execution_count": 4,
2 years ago
"output_type": "execute_result",
"data": {
2 years ago
"text/html": "<div style=\"display: flex; align-items: end; flex-direction: row;\"><table><thead><tr><td style=\"font-weight: bold\">a</td></tr></thead><tbody><tr><td>Cozo!</td></tr><tr><td>Cozo.</td></tr><tr><td>hello</td></tr><tr><td>world</td></tr></tbody></table><span style=\"color: darkgrey; font-size: xx-small; margin: 13px;\">Took 0ms</span></div>"
2 years ago
},
"metadata": {}
}
]
},
{
"cell_type": "markdown",
"source": "`'hello'` and `'world'` both appear only once in the result, even though they appear twice each in the input. Set semantics automatically de-duplicates based on the whole tuple.",
"metadata": {}
},
{
"cell_type": "markdown",
"source": "## Values and expressions",
"metadata": {}
},
{
"cell_type": "markdown",
"source": "The list of lists in the body of the rules certainly look familiar to anyone who have used languages such as JavaScript or Python. In fact, with the exception of the map `{}`, valid JSON values represent valid Cozo values.\n\nAs sorting is important in Cozo, study the following example, which demonstrates how different values are sorted:",
"metadata": {}
},
{
"cell_type": "code",
"source": "?[a] <- [[true],\n [false], \n [null],\n [\"A\"], \n ['apple'], // single or double quotes are both OK \n [\"Apple juice\"], \n [['apple', 1, [2, 3]]], // this row consists of a list consisting of heterogeneous items!\n [1.0], \n [1_234_567], // you can separate digits with underscores for clarity\n [3.14159], \n [-8e-99]]",
"metadata": {
"trusted": true
},
2 years ago
"execution_count": 5,
2 years ago
"outputs": [
{
2 years ago
"execution_count": 5,
2 years ago
"output_type": "execute_result",
"data": {
2 years ago
"text/html": "<div style=\"display: flex; align-items: end; flex-direction: row;\"><table><thead><tr><td style=\"font-weight: bold\">a</td></tr></thead><tbody><tr><td><span style=\"color: #bf5b3d;\">null</span></td></tr><tr><td><span style=\"color: #bf5b3d;\">false</span></td></tr><tr><td><span style=\"color: #bf5b3d;\">true</span></td></tr><tr><td><span style=\"color: #307fc1;\">-8e-99</span></td></tr><tr><td><span style=\"color: #307fc1;\">1</span></td></tr><tr><td><span style=\"color: #307fc1;\">3.14159</span></td></tr><tr><td><span style=\"color: #307fc1;\">1234567</span></td></tr><tr><td>A</td></tr><tr><td>Apple juice</td></tr><tr><td>apple</td></tr><tr><td><span style=\"color: #bf5b3d;\">[&quot;apple&quot;,1,[2,3]]</span></td></tr></tbody></table><span style=\"color: darkgrey; font-size: xx-small; margin: 13px;\">Took 0ms</span></div>"
2 years ago
},
"metadata": {}
}
]
},
{
"cell_type": "markdown",
"source": "Notice how comments are entered, just like in JavaScript. `/* ... */` also works.\n\nIn the playground, literal strings appear in black, numbers in blue, and reddish entries represent values that should be parsed as JSON.\n\nEven though the kind of rule we have been using is called the _constant rule_, you can in fact compute in them:",
"metadata": {}
},
{
"cell_type": "code",
"source": "?[i, a] <- [[1, 1 + 2], \n [2, 3 * 4], \n [3, 5 / 6], \n [4, exp(7)], \n [5, uppercase('number ') ++ to_string(10)], // string concatenation\n [6, to_float('PI')]]",
"metadata": {
"trusted": true
},
2 years ago
"execution_count": 6,
2 years ago
"outputs": [
{
2 years ago
"execution_count": 6,
2 years ago
"output_type": "execute_result",
"data": {
2 years ago
"text/html": "<div style=\"display: flex; align-items: end; flex-direction: row;\"><table><thead><tr><td style=\"font-weight: bold\">i</td><td style=\"font-weight: bold\">a</td></tr></thead><tbody><tr><td><span style=\"color: #307fc1;\">1</span></td><td><span style=\"color: #307fc1;\">3</span></td></tr><tr><td><span style=\"color: #307fc1;\">2</span></td><td><span style=\"color: #307fc1;\">12</span></td></tr><tr><td><span style=\"color: #307fc1;\">3</span></td><td><span style=\"color: #307fc1;\">0.8333333333333334</span></td></tr><tr><td><span style=\"color: #307fc1;\">4</span></td><td><span style=\"color: #307fc1;\">1096.6331584284585</span></td></tr><tr><td><span style=\"color: #307fc1;\">5</span></td><td>NUMBER 10</td></tr><tr><td><span style=\"color: #307fc1;\">6</span></td><td><span style=\"color: #307fc1;\">3.141592653589793</span></td></tr></tbody></table><span style=\"color: darkgrey; font-size: xx-small; margin: 13px;\">Took 0ms</span></div>"
2 years ago
},
"metadata": {}
}
]
},
{
"cell_type": "markdown",
"source": "for clarity we have used the index `i` to force the result to show in this order.\n\nFor the full list of functions you can use in expressions, consult the Manual.",
"metadata": {}
},
{
"cell_type": "markdown",
"source": "There is one thing we need to make clear at this point. In CozoScript, only `true` is true, and only `false` is false. This is not a tautology: every other value, including `null`, produces error when put in a position requiring a truthy value. In this sense, `null` in CosoScript is only a _marker_. It has no inherent logical semantics associated with it, unlike `NULL` in SQL, `null` and `undefeined` in Javascript, and `None` in Python. An example:",
"metadata": {}
},
{
"cell_type": "code",
"source": "?[a] <- [[!null]]",
"metadata": {
"trusted": true
},
2 years ago
"execution_count": 7,
2 years ago
"outputs": [
{
2 years ago
"execution_count": 7,
2 years ago
"output_type": "execute_result",
"data": {
"text/html": "<pre style=\"font-size: small\"><span style='color:#a00'>eval::throw</span>\n\n <span style='color:#a00'>×</span> Evaluation of expression failed\n ╭────\n <span style='opacity:0.67'>1</span> │ ?[a] &lt;- [[!null]]\n · <span style='color:#a0a'><b> ─────</b></span>\n ╰────\n<span style='color:#0aa'> help: </span>&#39;negate&#39; requires booleans\n</pre>"
},
"metadata": {}
}
]
},
{
"cell_type": "markdown",
"source": "In this case you really need to write",
"metadata": {}
},
{
"cell_type": "code",
"source": "?[a] <- [[!is_null(null)]]",
"metadata": {
"trusted": true
},
2 years ago
"execution_count": 8,
2 years ago
"outputs": [
{
2 years ago
"execution_count": 8,
2 years ago
"output_type": "execute_result",
"data": {
2 years ago
"text/html": "<div style=\"display: flex; align-items: end; flex-direction: row;\"><table><thead><tr><td style=\"font-weight: bold\">a</td></tr></thead><tbody><tr><td><span style=\"color: #bf5b3d;\">false</span></td></tr></tbody></table><span style=\"color: darkgrey; font-size: xx-small; margin: 13px;\">Took 0ms</span></div>"
2 years ago
},
"metadata": {}
}
]
},
{
"cell_type": "markdown",
"source": "This may seem a nuisance in trivial cases, but will save you a lot of hair in hairy situations. Believe me.",
"metadata": {}
},
{
"cell_type": "markdown",
"source": "## Horn-clause rules",
"metadata": {}
},
{
"cell_type": "markdown",
"source": "Usually constant rules are used to define ad-hoc facts useful for subsequent queries:",
"metadata": {}
},
{
"cell_type": "code",
"source": "?[loving, loved] := loves[loving, loved] // Yes, this is the 'subsequent query'. In a logical sense. \n // The order of rules has no significance whatsoever.\n\nloves[] <- [['alice', 'eve'],\n ['bob', 'alice'],\n ['eve', 'alice'],\n ['eve', 'bob'],\n ['eve', 'charlie'],\n ['charlie', 'eve'],\n ['david', 'george'],\n ['george', 'george']]",
"metadata": {
"trusted": true
},
2 years ago
"execution_count": 9,
2 years ago
"outputs": [
{
2 years ago
"execution_count": 9,
2 years ago
"output_type": "execute_result",
"data": {
2 years ago
"text/html": "<div style=\"display: flex; align-items: end; flex-direction: row;\"><table><thead><tr><td style=\"font-weight: bold\">loving</td><td style=\"font-weight: bold\">loved</td></tr></thead><tbody><tr><td>alice</td><td>eve</td></tr><tr><td>bob</td><td>alice</td></tr><tr><td>charlie</td><td>eve</td></tr><tr><td>david</td><td>george</td></tr><tr><td>eve</td><td>alice</td></tr><tr><td>eve</td><td>bob</td></tr><tr><td>eve</td><td>charlie</td></tr><tr><td>george</td><td>george</td></tr></tbody></table><span style=\"color: darkgrey; font-size: xx-small; margin: 13px;\">Took 0ms</span></div>"
2 years ago
},
"metadata": {}
}
]
},
{
"cell_type": "markdown",
"source": "The constant rule is now named `loves`, denoting a rather complicated relationship network (aren't 'relationship' and 'network' synonyms?). It reads like \"Alice loves Eve, Bob loves Alice\", \"nobody loves David, David loves George, but George only loves himself\", and so on. Note that for constant rules we can actually omit the arguments (but if explicitly given, the arity must match the actual data).\n\nThe entry `?` is now a _Horn-clause rule_, signified by the symbol `:=`. Its body has a single _application_ of the rule we have just defined, with _bindings_ `loving` and `loved` for the arguments. These bindings are then carried to the output via the arguments of the entry rule.\n\nHere both bindings to the rule application of `loves` are initially _unbound_, in which case all tuples of `loves` are returned. To _bind_ an argument simply pass a constant in:",
"metadata": {}
},
{
"cell_type": "code",
"source": "loves[] <- [['alice', 'eve'],\n ['bob', 'alice'],\n ['eve', 'alice'],\n ['eve', 'bob'],\n ['eve', 'charlie'],\n ['charlie', 'eve'],\n ['david', 'george'],\n ['george', 'george']]\n\n?[loved_by_eve] := loves['e' ++ 'v' ++ 'e', loved_by_eve] // Eve loves dramatic entrance",
"metadata": {
"trusted": true
},
2 years ago
"execution_count": 10,
2 years ago
"outputs": [
{
2 years ago
"execution_count": 10,
2 years ago
"output_type": "execute_result",
"data": {
2 years ago
"text/html": "<div style=\"display: flex; align-items: end; flex-direction: row;\"><table><thead><tr><td style=\"font-weight: bold\">loved_by_eve</td></tr></thead><tbody><tr><td>alice</td></tr><tr><td>bob</td></tr><tr><td>charlie</td></tr></tbody></table><span style=\"color: darkgrey; font-size: xx-small; margin: 13px;\">Took 0ms</span></div>"
2 years ago
},
"metadata": {}
}
]
},
{
"cell_type": "markdown",
"source": "Every argument position can be bound:",
"metadata": {}
},
{
"cell_type": "code",
"source": "loves[] <- [['alice', 'eve'],\n ['bob', 'alice'],\n ['eve', 'alice'],\n ['eve', 'bob'],\n ['eve', 'charlie'],\n ['charlie', 'eve'],\n ['david', 'george'],\n ['george', 'george']]\n\n?[loves_eve] := loves[loves_eve, 'eve']",
"metadata": {
"trusted": true
},
2 years ago
"execution_count": 11,
2 years ago
"outputs": [
{
2 years ago
"execution_count": 11,
2 years ago
"output_type": "execute_result",
"data": {
2 years ago
"text/html": "<div style=\"display: flex; align-items: end; flex-direction: row;\"><table><thead><tr><td style=\"font-weight: bold\">loves_eve</td></tr></thead><tbody><tr><td>alice</td></tr><tr><td>charlie</td></tr></tbody></table><span style=\"color: darkgrey; font-size: xx-small; margin: 13px;\">Took 0ms</span></div>"
2 years ago
},
"metadata": {}
}
]
},
{
"cell_type": "markdown",
"source": "Multiple clauses can appear in the body, in which case an implicit conjunction is implied, meaning that all clauses\nmust bind for a result to return:",
"metadata": {}
},
{
"cell_type": "code",
"source": "loves[] <- [['alice', 'eve'],\n ['bob', 'alice'],\n ['eve', 'alice'],\n ['eve', 'bob'],\n ['eve', 'charlie'],\n ['charlie', 'eve'],\n ['david', 'george'],\n ['george', 'george']]\n\n?[loved_by_b_e] := loves['eve', loved_by_b_e], loves['bob', loved_by_b_e]",
"metadata": {
"trusted": true
},
2 years ago
"execution_count": 12,
2 years ago
"outputs": [
{
2 years ago
"execution_count": 12,
2 years ago
"output_type": "execute_result",
"data": {
2 years ago
"text/html": "<div style=\"display: flex; align-items: end; flex-direction: row;\"><table><thead><tr><td style=\"font-weight: bold\">loved_by_b_e</td></tr></thead><tbody><tr><td>alice</td></tr></tbody></table><span style=\"color: darkgrey; font-size: xx-small; margin: 13px;\">Took 0ms</span></div>"
2 years ago
},
"metadata": {}
}
]
},
{
"cell_type": "markdown",
"source": "We see that Alice is loved by both Bob and Eve. The variable `loved_by_b_e` appears in both clauses, in which case they are _unified_, meaning that they must bind to the _same_ value for a tuple to return.",
"metadata": {}
},
{
"cell_type": "markdown",
"source": "Disjunction, meaning that _any_ clause with successful binding potentially contribute to results, must be specified explicitly:",
"metadata": {}
},
{
"cell_type": "code",
"source": "loves[] <- [['alice', 'eve'],\n ['bob', 'alice'],\n ['eve', 'alice'],\n ['eve', 'bob'],\n ['eve', 'charlie'],\n ['charlie', 'eve'],\n ['david', 'george'],\n ['george', 'george']]\n\n?[loved_by_b_e] := loves['eve', loved_by_b_e] or loves['bob', loved_by_b_e], \n loved_by_b_e != 'bob', \n loved_by_b_e != 'eve'",
"metadata": {
"trusted": true
},
2 years ago
"execution_count": 13,
2 years ago
"outputs": [
{
2 years ago
"execution_count": 13,
2 years ago
"output_type": "execute_result",
"data": {
2 years ago
"text/html": "<div style=\"display: flex; align-items: end; flex-direction: row;\"><table><thead><tr><td style=\"font-weight: bold\">loved_by_b_e</td></tr></thead><tbody><tr><td>alice</td></tr><tr><td>charlie</td></tr></tbody></table><span style=\"color: darkgrey; font-size: xx-small; margin: 13px;\">Took 0ms</span></div>"
2 years ago
},
"metadata": {}
}
]
},
{
"cell_type": "markdown",
"source": "As we can see, disjunctive clauses are connected by `or`. It binds more strongly than the implicit conjunction `,`.",
"metadata": {}
},
{
"cell_type": "markdown",
"source": "Horn clause rules (and Horn clause rules only) may have multiple definitions _having equivalent heads_. The above query is identical in every way to the following:",
"metadata": {}
},
{
"cell_type": "code",
"source": "loves[] <- [['alice', 'eve'],\n ['bob', 'alice'],\n ['eve', 'alice'],\n ['eve', 'bob'],\n ['eve', 'charlie'],\n ['charlie', 'eve'],\n ['david', 'george'],\n ['george', 'george']]\n\n?[loved_by_b_e] := loves['eve', loved_by_b_e], loved_by_b_e != 'bob', loved_by_b_e != 'eve'\n?[loved_by_b_e] := loves['bob', loved_by_b_e], loved_by_b_e != 'bob', loved_by_b_e != 'eve'",
"metadata": {
"trusted": true
},
2 years ago
"execution_count": 14,
2 years ago
"outputs": [
{
2 years ago
"execution_count": 14,
2 years ago
"output_type": "execute_result",
"data": {
2 years ago
"text/html": "<div style=\"display: flex; align-items: end; flex-direction: row;\"><table><thead><tr><td style=\"font-weight: bold\">loved_by_b_e</td></tr></thead><tbody><tr><td>alice</td></tr><tr><td>charlie</td></tr></tbody></table><span style=\"color: darkgrey; font-size: xx-small; margin: 13px;\">Took 0ms</span></div>"
2 years ago
},
"metadata": {}
}
]
},
{
"cell_type": "markdown",
"source": "If a Horn clause rule is not the entry, even the _names_ given to the arguments can differ. The bodies are not required to be of the same form, as long as they produce compatible outputs.",
"metadata": {}
},
{
"cell_type": "markdown",
"source": "Besides rule applications, _filters_ can also appear in the body:",
"metadata": {}
},
{
"cell_type": "code",
"source": "loves[] <- [['alice', 'eve'],\n ['bob', 'alice'],\n ['eve', 'alice'],\n ['eve', 'bob'],\n ['eve', 'charlie'],\n ['charlie', 'eve'],\n ['david', 'george'],\n ['george', 'george']]\n\n?[person, loved] := loves[person, loved], !ends_with(person, 'e')",
"metadata": {
"trusted": true
},
2 years ago
"execution_count": 15,
2 years ago
"outputs": [
{
2 years ago
"execution_count": 15,
2 years ago
"output_type": "execute_result",
"data": {
2 years ago
"text/html": "<div style=\"display: flex; align-items: end; flex-direction: row;\"><table><thead><tr><td style=\"font-weight: bold\">person</td><td style=\"font-weight: bold\">loved</td></tr></thead><tbody><tr><td>bob</td><td>alice</td></tr><tr><td>david</td><td>george</td></tr></tbody></table><span style=\"color: darkgrey; font-size: xx-small; margin: 13px;\">Took 0ms</span></div>"
2 years ago
},
"metadata": {}
}
]
},
{
"cell_type": "markdown",
"source": "In this case only people with name not ending in `'e'` are considered for the loving position.\n\nBy the way, if you are not interested in who the person in the loving position is, you can just omit it in the arguments to the entry:",
"metadata": {}
},
{
"cell_type": "code",
"source": "loves[] <- [['alice', 'eve'],\n ['bob', 'alice'],\n ['eve', 'alice'],\n ['eve', 'bob'],\n ['eve', 'charlie'],\n ['charlie', 'eve'],\n ['david', 'george'],\n ['george', 'george']]\n\n?[loved] := loves[person, loved], !ends_with(person, 'e')",
"metadata": {
"trusted": true
},
2 years ago
"execution_count": 16,
2 years ago
"outputs": [
{
2 years ago
"execution_count": 16,
2 years ago
"output_type": "execute_result",
"data": {
2 years ago
"text/html": "<div style=\"display: flex; align-items: end; flex-direction: row;\"><table><thead><tr><td style=\"font-weight: bold\">loved</td></tr></thead><tbody><tr><td>alice</td></tr><tr><td>george</td></tr></tbody></table><span style=\"color: darkgrey; font-size: xx-small; margin: 13px;\">Took 0ms</span></div>"
2 years ago
},
"metadata": {}
}
]
},
{
"cell_type": "markdown",
"source": "... but every argument in the head of any Horn-clause rule must appear in the body, of course:",
"metadata": {}
},
{
"cell_type": "code",
"source": "loves[] <- [['alice', 'eve'],\n ['bob', 'alice'],\n ['eve', 'alice'],\n ['eve', 'bob'],\n ['eve', 'charlie'],\n ['charlie', 'eve'],\n ['david', 'george'],\n ['george', 'george']]\n\n?[the_alien, loved] := loves[person, loved], !ends_with(person, 'e')",
"metadata": {
"trusted": true
},
2 years ago
"execution_count": 17,
2 years ago
"outputs": [
{
2 years ago
"execution_count": 17,
2 years ago
"output_type": "execute_result",
"data": {
"text/html": "<pre style=\"font-size: small\"><span style='color:#a00'>eval::unbound_symb_in_head</span>\n\n <span style='color:#a00'>×</span> Symbol &#39;the_alien&#39; in rule head is unbound\n ╭─[9:1]\n <span style='opacity:0.67'> 9</span> │ \n <span style='opacity:0.67'>10</span> │ ?[the_alien, loved] := loves[person, loved], !ends_with(person, &#39;e&#39;)\n · <span style='color:#a0a'><b> ─────────</b></span>\n ╰────\n<span style='color:#0aa'> help: </span>Note that symbols occurring only in negated positions are not considered bound\n</pre>"
},
"metadata": {}
}
]
},
{
"cell_type": "markdown",
"source": "## Negation",
"metadata": {}
},
{
"cell_type": "markdown",
"source": "The next query finds those who are loved by Eve, but not by Bob:",
"metadata": {}
},
{
"cell_type": "code",
"source": "loves[] <- [['alice', 'eve'],\n ['bob', 'alice'],\n ['eve', 'alice'],\n ['eve', 'bob'],\n ['eve', 'charlie'],\n ['charlie', 'eve'],\n ['david', 'george'],\n ['george', 'george']]\n\n?[loved_by_e_not_b] := loves['eve', loved_by_e_not_b], not loves['bob', loved_by_e_not_b]",
"metadata": {
"trusted": true
},
2 years ago
"execution_count": 18,
2 years ago
"outputs": [
{
2 years ago
"execution_count": 18,
2 years ago
"output_type": "execute_result",
"data": {
2 years ago
"text/html": "<div style=\"display: flex; align-items: end; flex-direction: row;\"><table><thead><tr><td style=\"font-weight: bold\">loved_by_e_not_b</td></tr></thead><tbody><tr><td>bob</td></tr><tr><td>charlie</td></tr></tbody></table><span style=\"color: darkgrey; font-size: xx-small; margin: 13px;\">Took 0ms</span></div>"
2 years ago
},
"metadata": {}
}
]
},
{
"cell_type": "markdown",
"source": "Here we are using the `not` keyword to _negate_ the rule application `loves`. This negation is at the level of Horn-clauses, which is not the same as the level of expressions. In fact, there are two sets of related but inequivalent operators:\n\n* For Horn clauses: `,` (conjunction), `or` (disjunction), `not` (negation)\n* For boolean expressions: `&&` (conjunction), `||` (disjunction), `!` (negation)\n\nHopefully you are already familiar with the boolean set of operators. If you use them in the wrong way, the query compiler will yell at you. And you will comply.",
"metadata": {}
},
{
"cell_type": "markdown",
"source": "Negation has to abide by the _safety rule_. Let's violate it:",
"metadata": {}
},
{
"cell_type": "code",
"source": "loves[] <- [['alice', 'eve'],\n ['bob', 'alice'],\n ['eve', 'alice'],\n ['eve', 'bob'],\n ['eve', 'charlie'],\n ['charlie', 'eve'],\n ['david', 'george'],\n ['george', 'george']]\n\n?[not_loved_by_b] := not loves['bob', not_loved_by_b]",
"metadata": {
"trusted": true
},
2 years ago
"execution_count": 19,
2 years ago
"outputs": [
{
2 years ago
"execution_count": 19,
2 years ago
"output_type": "execute_result",
"data": {
"text/html": "<pre style=\"font-size: small\"><span style='color:#a00'>eval::unbound_symb_in_head</span>\n\n <span style='color:#a00'>×</span> Symbol &#39;not_loved_by_b&#39; in rule head is unbound\n ╭─[9:1]\n <span style='opacity:0.67'> 9</span> │ \n <span style='opacity:0.67'>10</span> │ ?[not_loved_by_b] := not loves[&#39;bob&#39;, not_loved_by_b]\n · <span style='color:#a0a'><b> ──────────────</b></span>\n ╰────\n<span style='color:#0aa'> help: </span>Note that symbols occurring only in negated positions are not considered bound\n</pre>"
},
"metadata": {}
}
]
},
{
"cell_type": "markdown",
"source": "Oh no! The query compiler rejects our perfectly reasonable query trying to determine those poor souls not loved by Bob!\n\nBut is our query really reasonable? For example, should the query return a tuple containing 'gold', since according to facts at hand, Bob clearly has no interest in 'gold'? So should our query return every possible string except a select few? Do you want your computer to handle such a query?\n\nNow you understand what the help message above is trying to tell you.",
"metadata": {}
},
{
"cell_type": "markdown",
"source": "To make our query really reasonable, we have to explicitly give our query a _closed world_ in which to operate the negation:",
"metadata": {}
},
{
"cell_type": "code",
"source": "loves[] <- [['alice', 'eve'],\n ['bob', 'alice'],\n ['eve', 'alice'],\n ['eve', 'bob'],\n ['eve', 'charlie'],\n ['charlie', 'eve'],\n ['david', 'george'],\n ['george', 'george']]\n \nthe_population[p] := loves[p, _a]\nthe_population[p] := loves[_a, p]\n\n?[not_loved_by_b] := the_population[not_loved_by_b], not loves['bob', not_loved_by_b]",
"metadata": {
"trusted": true
},
2 years ago
"execution_count": 20,
2 years ago
"outputs": [
{
2 years ago
"execution_count": 20,
2 years ago
"output_type": "execute_result",
"data": {
2 years ago
"text/html": "<div style=\"display: flex; align-items: end; flex-direction: row;\"><table><thead><tr><td style=\"font-weight: bold\">not_loved_by_b</td></tr></thead><tbody><tr><td>bob</td></tr><tr><td>charlie</td></tr><tr><td>david</td></tr><tr><td>eve</td></tr><tr><td>george</td></tr></tbody></table><span style=\"color: darkgrey; font-size: xx-small; margin: 13px;\">Took 0ms</span></div>"
2 years ago
},
"metadata": {}
}
]
},
{
"cell_type": "markdown",
"source": "Now the query understands that we are asking our question _within_ the people in the love network. It then proceeds without complaints.\n\nLet's state the **safety rule for negation**: _at least one_ argument of the rule application must be bound elsewhere (otherwise the clause will produce an infinity of candidate tuples), and _all arguments_ to negated clauses are _not_ considered bound, _unless_ they also appear elsewhere in a positive context.\n\nIf you can't wrap your head around the rule yet, don't worry. Just write your query. Return here and reread this section when you encounter some error messages similar to the above.",
"metadata": {}
},
{
"cell_type": "markdown",
"source": "## Unification",
"metadata": {
"tags": []
}
},
{
"cell_type": "markdown",
"source": "We have seen that variables with repeated appearance in rule applications and predicates are implicitly unified. You can also _explicitly_ unify a variable with the unify operator `<-`:",
"metadata": {}
},
{
"cell_type": "code",
"source": "loves[] <- [['alice', 'eve'],\n ['bob', 'alice'],\n ['eve', 'alice'],\n ['eve', 'bob'],\n ['eve', 'charlie'],\n ['charlie', 'eve'],\n ['david', 'george'],\n ['george', 'george']]\n\n?[loves_eve] := eve <- 'eve', loves[loves_eve, eve]",
"metadata": {
"trusted": true
},
2 years ago
"execution_count": 21,
2 years ago
"outputs": [
{
2 years ago
"execution_count": 21,
2 years ago
"output_type": "execute_result",
"data": {
2 years ago
"text/html": "<div style=\"display: flex; align-items: end; flex-direction: row;\"><table><thead><tr><td style=\"font-weight: bold\">loves_eve</td></tr></thead><tbody><tr><td>alice</td></tr><tr><td>charlie</td></tr></tbody></table><span style=\"color: darkgrey; font-size: xx-small; margin: 13px;\">Took 0ms</span></div>"
2 years ago
},
"metadata": {}
}
]
},
{
"cell_type": "markdown",
"source": "By the way, the _order_ a clause appears in a Horn-clause rule can never affect the result in any way (provided your queries do not contain random functions):",
"metadata": {}
},
{
"cell_type": "code",
"source": "loves[] <- [['alice', 'eve'],\n ['bob', 'alice'],\n ['eve', 'alice'],\n ['eve', 'bob'],\n ['eve', 'charlie'],\n ['charlie', 'eve'],\n ['david', 'george'],\n ['george', 'george']]\n\n?[loves_eve] := loves[loves_eve, eve], eve <- 'eve'",
"metadata": {
"trusted": true
},
2 years ago
"execution_count": 22,
2 years ago
"outputs": [
{
2 years ago
"execution_count": 22,
2 years ago
"output_type": "execute_result",
"data": {
2 years ago
"text/html": "<div style=\"display: flex; align-items: end; flex-direction: row;\"><table><thead><tr><td style=\"font-weight: bold\">loves_eve</td></tr></thead><tbody><tr><td>alice</td></tr><tr><td>charlie</td></tr></tbody></table><span style=\"color: darkgrey; font-size: xx-small; margin: 13px;\">Took 0ms</span></div>"
2 years ago
},
"metadata": {}
}
]
},
{
"cell_type": "markdown",
"source": "... but the performance might vary, sometimes greatly. This is an advanced topic that we will come back to in a later session. For trivial examples like ours it doesn't matter. In your own explorations, just try to put more 'restrictive' rules first (meaning that they filter out a greater number of tuples), and you will be fine most of the time.",
"metadata": {}
},
{
"cell_type": "markdown",
"source": "There is also the spread-unify operator `<- ..`, which unifies the left hand side with values in a list one at a time:",
"metadata": {}
},
{
"cell_type": "code",
"source": "?[u] := u <- ..['a', 'b', 'c']",
"metadata": {
"trusted": true
},
2 years ago
"execution_count": 23,
2 years ago
"outputs": [
{
2 years ago
"execution_count": 23,
2 years ago
"output_type": "execute_result",
"data": {
2 years ago
"text/html": "<div style=\"display: flex; align-items: end; flex-direction: row;\"><table><thead><tr><td style=\"font-weight: bold\">u</td></tr></thead><tbody><tr><td>a</td></tr><tr><td>b</td></tr><tr><td>c</td></tr></tbody></table><span style=\"color: darkgrey; font-size: xx-small; margin: 13px;\">Took 0ms</span></div>"
2 years ago
},
"metadata": {}
}
]
},
{
"cell_type": "markdown",
"source": "Another example: this is the \"Cartesian product\"",
"metadata": {}
},
{
"cell_type": "code",
"source": "?[u, v] := u <- ..['a', 'b', 'c'], v <- ..['x', 'y']",
"metadata": {
"trusted": true
},
2 years ago
"execution_count": 24,
2 years ago
"outputs": [
{
2 years ago
"execution_count": 24,
2 years ago
"output_type": "execute_result",
"data": {
2 years ago
"text/html": "<div style=\"display: flex; align-items: end; flex-direction: row;\"><table><thead><tr><td style=\"font-weight: bold\">u</td><td style=\"font-weight: bold\">v</td></tr></thead><tbody><tr><td>a</td><td>x</td></tr><tr><td>a</td><td>y</td></tr><tr><td>b</td><td>x</td></tr><tr><td>b</td><td>y</td></tr><tr><td>c</td><td>x</td></tr><tr><td>c</td><td>y</td></tr></tbody></table><span style=\"color: darkgrey; font-size: xx-small; margin: 13px;\">Took 0ms</span></div>"
2 years ago
},
"metadata": {}
}
]
},
{
"cell_type": "markdown",
"source": "You may notice that paired with functions extracting elements from lists, we don't actually need constant rules anymore. But constant rules are more explicit when you really have _facts_ as inputs.",
"metadata": {}
},
{
"cell_type": "markdown",
"source": "## Recursion",
"metadata": {}
},
{
"cell_type": "markdown",
"source": "Now we come to the \"poster boy\" query of classical Datalog: let's find out all the people loved by Alice, or loved by someone loved by Alice, or loved by someone loved by someone loved by Alice, _ad infinitum_:",
"metadata": {}
},
{
"cell_type": "code",
"source": "loves[] <- [['alice', 'eve'],\n ['bob', 'alice'],\n ['eve', 'alice'],\n ['eve', 'bob'],\n ['eve', 'charlie'],\n ['charlie', 'eve'],\n ['david', 'george'],\n ['george', 'george']]\n\nalice_love_chain[person] := loves['alice', person]\nalice_love_chain[person] := alice_love_chain[in_person], loves[in_person, person]\n\n?[chained] := alice_love_chain[chained]",
"metadata": {
"trusted": true
},
2 years ago
"execution_count": 25,
2 years ago
"outputs": [
{
2 years ago
"execution_count": 25,
2 years ago
"output_type": "execute_result",
"data": {
2 years ago
"text/html": "<div style=\"display: flex; align-items: end; flex-direction: row;\"><table><thead><tr><td style=\"font-weight: bold\">chained</td></tr></thead><tbody><tr><td>alice</td></tr><tr><td>bob</td></tr><tr><td>charlie</td></tr><tr><td>eve</td></tr></tbody></table><span style=\"color: darkgrey; font-size: xx-small; margin: 13px;\">Took 0ms</span></div>"
2 years ago
},
"metadata": {}
}
]
},
{
"cell_type": "markdown",
"source": "Someone \"chained\" is either loved by Alice directly, or loved by someone already in the chain. The query as written reads very naturally. This is why this \"transitive closure\" type of query is the poster-boy query of classical Datalog. \n\nWriting the same thing in SQL requires recursive CTE, and those CTEs escalate pretty quickly. On the other hand, if well written, Datalog queries can weather very demanding situations and remain readable.\n\nRecursive queries are an essential part for graphs (networks). So they had better be easy to write _and_ read in a database claiming to be optimized for graphs.",
"metadata": {}
},
{
"cell_type": "markdown",
"source": "We've talked about the safety rule for negation above. You may suspect that something similar is at play here. Let's retry the above query, but omit the starting condition `alice_love_chain[person] := loves['alice', person]`:",
"metadata": {}
},
{
"cell_type": "code",
"source": "loves[] <- [['alice', 'eve'],\n ['bob', 'alice'],\n ['eve', 'alice'],\n ['eve', 'bob'],\n ['eve', 'charlie'],\n ['charlie', 'eve'],\n ['david', 'george'],\n ['george', 'george']]\n\nalice_love_chain[person] := alice_love_chain[in_person], loves[in_person, person]\n\n?[chained] := alice_love_chain[chained]",
"metadata": {
"trusted": true
},
2 years ago
"execution_count": 26,
2 years ago
"outputs": [
{
2 years ago
"execution_count": 26,
2 years ago
"output_type": "execute_result",
"data": {
2 years ago
"text/html": "<div style=\"display: flex; align-items: end; flex-direction: row;\"><table><thead><tr><td style=\"font-weight: bold\">chained</td></tr></thead><tbody></tbody></table><span style=\"color: darkgrey; font-size: xx-small; margin: 13px;\">Took 0ms</span></div>"
2 years ago
},
"metadata": {}
}
]
},
{
"cell_type": "markdown",
"source": "Are you surprised that the compiler did not complain? Are you surprised that it returned no results? This is the _closed-world assumption_ hinted above at play again. If there is no way to _deduce_ a fact from the given facts, _then_ the fact itself is false.\n\nThis so called \"least fixed point\" semantics is the semantics of Datalog queries. This semantics is actually subtly different from SQL, due to the existence of `UNKNOWN` in SQL, usually manifesting as `NULL`. In other worlds, SQL operates on [ternary logic](https://en.wikipedia.org/wiki/Three-valued_logic) whereas Datalog stays boolean all the way (under the protection of the closed world assumptions).",
"metadata": {}
},
{
"cell_type": "markdown",
"source": "Still, there are _rules_ with respect to recursion. [Bertrand Russell](https://en.wikipedia.org/wiki/Russell%27s_paradox) would rush to write:",
"metadata": {}
},
{
"cell_type": "code",
"source": "world[a] := a <- ..[1, 2]\n\np[a] := world[a], not q[a]\nq[a] := world[a], not p[a]\n\n?[a] := p[a]",
"metadata": {
"trusted": true
},
2 years ago
"execution_count": 27,
2 years ago
"outputs": [
{
2 years ago
"execution_count": 27,
2 years ago
"output_type": "execute_result",
"data": {
"text/html": "<pre style=\"font-size: small\"><span style='color:#a00'>eval::unstratifiable</span>\n\n <span style='color:#a00'>×</span> Query is unstratifiable\n<span style='color:#0aa'> help: </span>The rule &#39;q&#39; is in the strongly connected component [&quot;p&quot;, &quot;q&quot;],\n and is involved in at least one forbidden dependency\n (negation, non-meet aggregation, or algorithm-application).\n</pre>"
},
"metadata": {}
}
]
},
{
"cell_type": "markdown",
"source": "The above query does not violate the safety rule of negation (because he put a `world` in front of each negation), but the compiler still rejects it. Don't worry about the unworldly incantation the error makes. Instead, think for a moment what the result _could_ be.",
"metadata": {}
},
{
"cell_type": "markdown",
"source": "You can verify that the result could be the single tuple `[1]` with the assignment `p[a] <- [[1]]` and `q[a] <- [[2]]`, _or_ the single tuple `['q']` with the assignment `p[a] <- [[2]]` and `q[a] <- [[1]]`. The problem is, these answers contradict each other, and neither can be deduced _constructively_. So under the least fixed point semantics, this program has no _meaning_, and the compiler rejects it.\n\nAgain, don't worry if you can't exactly follow what is going on. Just trust that the compiler is trying to prevent your computer from imploding. Real applications don't tend to produce these kinds of contrived, paradoxical queries anyway.",
"metadata": {}
},
{
"cell_type": "markdown",
"source": "## Conclusion",
"metadata": {}
},
{
"cell_type": "markdown",
"source": "That's it! You have learned the basics of Datalog in the dialect CozoScript!\n\nIf you want to play more without going further for the moment, it is recommended that you skim through the list of functions in the Manual. Those functions allow you to do much more acrobatics with pure Datalog.\n\n\"We've seen data, but where is the BASE of dataBASE?\", you ask, not content of being merely an air-datarist.\n\nI'm glad you asked. Let's go to our base now!",
"metadata": {}
},
{
"cell_type": "code",
"source": "",
"metadata": {},
"execution_count": null,
"outputs": []
}
]
}