{ "metadata": { "language_info": { "codemirror_mode": { "name": "text/plain" }, "file_extension": ".txt", "mimetype": "text/plain", "name": "cozo", "nbconvert_exporter": "text", "pygments_lexer": "text", "version": "es2017" }, "kernelspec": { "name": "cozo", "display_name": "CozoScript (localhost)", "language": "text" } }, "nbformat_minor": 4, "nbformat": 4, "cells": [ { "cell_type": "markdown", "source": "# Air-data acrobatics", "metadata": {} }, { "cell_type": "markdown", "source": "## Hello, world!", "metadata": {} }, { "cell_type": "markdown", "source": "Let's start exploring the Cozo database by following the \"hello world\" tradition:", "metadata": {} }, { "cell_type": "code", "source": "?[a, b, c] <- [['hello', 'world', 'Cozo!']]", "metadata": { "trusted": true }, "execution_count": 1, "outputs": [ { "execution_count": 1, "output_type": "execute_result", "data": { "text/html": "
abc
helloworldCozo!
Took 1ms
" }, "metadata": {} } ] }, { "cell_type": "markdown", "source": "Let's break that down. This query consists of two parts, the part before `<-` is called its _head_, and the part after is called its _body_. The symbol `<-` itself denotes that this is a _constant rule_, or a declaration of _facts_.\n\nThe head has the special name `?`, indicating the _entry_ of the query, which has three _arguments_ `a`, `b`, and `c`.\n\nThe body consists of a list of lists (in this case a list of a single inner list). Each inner list represents a _tuple_, which is similar to a row in a relational database. The length of the inner list must match the number of arguments of the head, and each argument is then _bound_ to the corresponding value in the inner list by position.\n\nOf course more than one inner list is allowed:", "metadata": {} }, { "cell_type": "code", "source": "?[a, b, c] <- [['hello', 'world', 'Cozo!'],\n ['hello', 'world', 'database!']]", "metadata": { "trusted": true }, "execution_count": 2, "outputs": [ { "execution_count": 2, "output_type": "execute_result", "data": { "text/html": "
abc
helloworldCozo!
helloworlddatabase!
Took 0ms
" }, "metadata": {} } ] }, { "cell_type": "markdown", "source": "Let's try the following:", "metadata": {} }, { "cell_type": "code", "source": "?[a] <- [['hello'], ['world'], ['Cozo!']]", "metadata": { "trusted": true }, "execution_count": 3, "outputs": [ { "execution_count": 3, "output_type": "execute_result", "data": { "text/html": "
a
Cozo!
hello
world
Took 0ms
" }, "metadata": {} } ] }, { "cell_type": "markdown", "source": "Now we have three inner lists of length 1 each. The returned results is also _sorted_: all relations in Cozo are sorted lexicographically by position.\n\nCozo operates on _set semantics_ instead of _bag semantics_: observe", "metadata": {} }, { "cell_type": "code", "source": "?[a] <- [['hello'], ['world'], ['Cozo!'], ['hello'], ['world'], ['Cozo.']]", "metadata": { "trusted": true }, "execution_count": 4, "outputs": [ { "execution_count": 4, "output_type": "execute_result", "data": { "text/html": "
a
Cozo!
Cozo.
hello
world
Took 0ms
" }, "metadata": {} } ] }, { "cell_type": "markdown", "source": "`'hello'` and `'world'` both appear only once in the result, even though they appear twice each in the input. Set semantics automatically de-duplicates based on the whole tuple.", "metadata": {} }, { "cell_type": "markdown", "source": "## Values and expressions", "metadata": {} }, { "cell_type": "markdown", "source": "The list of lists in the body of the rules certainly look familiar to anyone who have used languages such as JavaScript or Python. In fact, with the exception of the map `{}`, valid JSON values represent valid Cozo values.\n\nAs sorting is important in Cozo, study the following example, which demonstrates how different values are sorted:", "metadata": {} }, { "cell_type": "code", "source": "?[a] <- [[true],\n [false], \n [null],\n [\"A\"], \n ['apple'], // single or double quotes are both OK \n [\"Apple juice\"], \n [['apple', 1, [2, 3]]], // this row consists of a list consisting of heterogeneous items!\n [1.0], \n [1_234_567], // you can separate digits with underscores for clarity\n [3.14159], \n [-8e-99]]", "metadata": { "trusted": true }, "execution_count": 5, "outputs": [ { "execution_count": 5, "output_type": "execute_result", "data": { "text/html": "
a
null
false
true
-8e-99
1
3.14159
1234567
A
Apple juice
apple
["apple",1,[2,3]]
Took 0ms
" }, "metadata": {} } ] }, { "cell_type": "markdown", "source": "Notice how comments are entered, just like in JavaScript. `/* ... */` also works.\n\nIn the playground, literal strings appear in black, numbers in blue, and reddish entries represent values that should be parsed as JSON.\n\nEven though the kind of rule we have been using is called the _constant rule_, you can in fact compute in them:", "metadata": {} }, { "cell_type": "code", "source": "?[i, a] <- [[1, 1 + 2], \n [2, 3 * 4], \n [3, 5 / 6], \n [4, exp(7)], \n [5, uppercase('number ') ++ to_string(10)], // string concatenation\n [6, to_float('PI')]]", "metadata": { "trusted": true }, "execution_count": 6, "outputs": [ { "execution_count": 6, "output_type": "execute_result", "data": { "text/html": "
ia
13
212
30.8333333333333334
41096.6331584284585
5NUMBER 10
63.141592653589793
Took 0ms
" }, "metadata": {} } ] }, { "cell_type": "markdown", "source": "for clarity we have used the index `i` to force the result to show in this order.\n\nFor the full list of functions you can use in expressions, consult the Manual.", "metadata": {} }, { "cell_type": "markdown", "source": "There is one thing we need to make clear at this point. In CozoScript, only `true` is true, and only `false` is false. This is not a tautology: every other value, including `null`, produces error when put in a position requiring a truthy value. In this sense, `null` in CosoScript is only a _marker_. It has no inherent logical semantics associated with it, unlike `NULL` in SQL, `null` and `undefeined` in Javascript, and `None` in Python. An example:", "metadata": {} }, { "cell_type": "code", "source": "?[a] <- [[!null]]", "metadata": { "trusted": true }, "execution_count": 7, "outputs": [ { "execution_count": 7, "output_type": "execute_result", "data": { "text/html": "
eval::throw\n\n  × Evaluation of expression failed\n   ╭────\n 1 │ ?[a] <- [[!null]]\n   ·           ─────\n   ╰────\n  help: 'negate' requires booleans\n
" }, "metadata": {} } ] }, { "cell_type": "markdown", "source": "In this case you really need to write", "metadata": {} }, { "cell_type": "code", "source": "?[a] <- [[!is_null(null)]]", "metadata": { "trusted": true }, "execution_count": 8, "outputs": [ { "execution_count": 8, "output_type": "execute_result", "data": { "text/html": "
a
false
Took 0ms
" }, "metadata": {} } ] }, { "cell_type": "markdown", "source": "This may seem a nuisance in trivial cases, but will save you a lot of hair in hairy situations. Believe me.", "metadata": {} }, { "cell_type": "markdown", "source": "## Horn-clause rules", "metadata": {} }, { "cell_type": "markdown", "source": "Usually constant rules are used to define ad-hoc facts useful for subsequent queries:", "metadata": {} }, { "cell_type": "code", "source": "?[loving, loved] := loves[loving, loved] // Yes, this is the 'subsequent query'. In a logical sense. \n // The order of rules has no significance whatsoever.\n\nloves[] <- [['alice', 'eve'],\n ['bob', 'alice'],\n ['eve', 'alice'],\n ['eve', 'bob'],\n ['eve', 'charlie'],\n ['charlie', 'eve'],\n ['david', 'george'],\n ['george', 'george']]", "metadata": { "trusted": true }, "execution_count": 9, "outputs": [ { "execution_count": 9, "output_type": "execute_result", "data": { "text/html": "
lovingloved
aliceeve
bobalice
charlieeve
davidgeorge
evealice
evebob
evecharlie
georgegeorge
Took 0ms
" }, "metadata": {} } ] }, { "cell_type": "markdown", "source": "The constant rule is now named `loves`, denoting a rather complicated relationship network (aren't 'relationship' and 'network' synonyms?). It reads like \"Alice loves Eve, Bob loves Alice\", \"nobody loves David, David loves George, but George only loves himself\", and so on. Note that for constant rules we can actually omit the arguments (but if explicitly given, the arity must match the actual data).\n\nThe entry `?` is now a _Horn-clause rule_, signified by the symbol `:=`. Its body has a single _application_ of the rule we have just defined, with _bindings_ `loving` and `loved` for the arguments. These bindings are then carried to the output via the arguments of the entry rule.\n\nHere both bindings to the rule application of `loves` are initially _unbound_, in which case all tuples of `loves` are returned. To _bind_ an argument simply pass a constant in:", "metadata": {} }, { "cell_type": "code", "source": "loves[] <- [['alice', 'eve'],\n ['bob', 'alice'],\n ['eve', 'alice'],\n ['eve', 'bob'],\n ['eve', 'charlie'],\n ['charlie', 'eve'],\n ['david', 'george'],\n ['george', 'george']]\n\n?[loved_by_eve] := loves['e' ++ 'v' ++ 'e', loved_by_eve] // Eve loves dramatic entrance", "metadata": { "trusted": true }, "execution_count": 10, "outputs": [ { "execution_count": 10, "output_type": "execute_result", "data": { "text/html": "
loved_by_eve
alice
bob
charlie
Took 0ms
" }, "metadata": {} } ] }, { "cell_type": "markdown", "source": "Every argument position can be bound:", "metadata": {} }, { "cell_type": "code", "source": "loves[] <- [['alice', 'eve'],\n ['bob', 'alice'],\n ['eve', 'alice'],\n ['eve', 'bob'],\n ['eve', 'charlie'],\n ['charlie', 'eve'],\n ['david', 'george'],\n ['george', 'george']]\n\n?[loves_eve] := loves[loves_eve, 'eve']", "metadata": { "trusted": true }, "execution_count": 11, "outputs": [ { "execution_count": 11, "output_type": "execute_result", "data": { "text/html": "
loves_eve
alice
charlie
Took 0ms
" }, "metadata": {} } ] }, { "cell_type": "markdown", "source": "Multiple clauses can appear in the body, in which case an implicit conjunction is implied, meaning that all clauses\nmust bind for a result to return:", "metadata": {} }, { "cell_type": "code", "source": "loves[] <- [['alice', 'eve'],\n ['bob', 'alice'],\n ['eve', 'alice'],\n ['eve', 'bob'],\n ['eve', 'charlie'],\n ['charlie', 'eve'],\n ['david', 'george'],\n ['george', 'george']]\n\n?[loved_by_b_e] := loves['eve', loved_by_b_e], loves['bob', loved_by_b_e]", "metadata": { "trusted": true }, "execution_count": 12, "outputs": [ { "execution_count": 12, "output_type": "execute_result", "data": { "text/html": "
loved_by_b_e
alice
Took 0ms
" }, "metadata": {} } ] }, { "cell_type": "markdown", "source": "We see that Alice is loved by both Bob and Eve. The variable `loved_by_b_e` appears in both clauses, in which case they are _unified_, meaning that they must bind to the _same_ value for a tuple to return.", "metadata": {} }, { "cell_type": "markdown", "source": "Disjunction, meaning that _any_ clause with successful binding potentially contribute to results, must be specified explicitly:", "metadata": {} }, { "cell_type": "code", "source": "loves[] <- [['alice', 'eve'],\n ['bob', 'alice'],\n ['eve', 'alice'],\n ['eve', 'bob'],\n ['eve', 'charlie'],\n ['charlie', 'eve'],\n ['david', 'george'],\n ['george', 'george']]\n\n?[loved_by_b_e] := loves['eve', loved_by_b_e] or loves['bob', loved_by_b_e], \n loved_by_b_e != 'bob', \n loved_by_b_e != 'eve'", "metadata": { "trusted": true }, "execution_count": 13, "outputs": [ { "execution_count": 13, "output_type": "execute_result", "data": { "text/html": "
loved_by_b_e
alice
charlie
Took 0ms
" }, "metadata": {} } ] }, { "cell_type": "markdown", "source": "As we can see, disjunctive clauses are connected by `or`. It binds more strongly than the implicit conjunction `,`.", "metadata": {} }, { "cell_type": "markdown", "source": "Horn clause rules (and Horn clause rules only) may have multiple definitions _having equivalent heads_. The above query is identical in every way to the following:", "metadata": {} }, { "cell_type": "code", "source": "loves[] <- [['alice', 'eve'],\n ['bob', 'alice'],\n ['eve', 'alice'],\n ['eve', 'bob'],\n ['eve', 'charlie'],\n ['charlie', 'eve'],\n ['david', 'george'],\n ['george', 'george']]\n\n?[loved_by_b_e] := loves['eve', loved_by_b_e], loved_by_b_e != 'bob', loved_by_b_e != 'eve'\n?[loved_by_b_e] := loves['bob', loved_by_b_e], loved_by_b_e != 'bob', loved_by_b_e != 'eve'", "metadata": { "trusted": true }, "execution_count": 14, "outputs": [ { "execution_count": 14, "output_type": "execute_result", "data": { "text/html": "
loved_by_b_e
alice
charlie
Took 0ms
" }, "metadata": {} } ] }, { "cell_type": "markdown", "source": "If a Horn clause rule is not the entry, even the _names_ given to the arguments can differ. The bodies are not required to be of the same form, as long as they produce compatible outputs.", "metadata": {} }, { "cell_type": "markdown", "source": "Besides rule applications, _filters_ can also appear in the body:", "metadata": {} }, { "cell_type": "code", "source": "loves[] <- [['alice', 'eve'],\n ['bob', 'alice'],\n ['eve', 'alice'],\n ['eve', 'bob'],\n ['eve', 'charlie'],\n ['charlie', 'eve'],\n ['david', 'george'],\n ['george', 'george']]\n\n?[person, loved] := loves[person, loved], !ends_with(person, 'e')", "metadata": { "trusted": true }, "execution_count": 15, "outputs": [ { "execution_count": 15, "output_type": "execute_result", "data": { "text/html": "
personloved
bobalice
davidgeorge
Took 0ms
" }, "metadata": {} } ] }, { "cell_type": "markdown", "source": "In this case only people with name not ending in `'e'` are considered for the loving position.\n\nBy the way, if you are not interested in who the person in the loving position is, you can just omit it in the arguments to the entry:", "metadata": {} }, { "cell_type": "code", "source": "loves[] <- [['alice', 'eve'],\n ['bob', 'alice'],\n ['eve', 'alice'],\n ['eve', 'bob'],\n ['eve', 'charlie'],\n ['charlie', 'eve'],\n ['david', 'george'],\n ['george', 'george']]\n\n?[loved] := loves[person, loved], !ends_with(person, 'e')", "metadata": { "trusted": true }, "execution_count": 16, "outputs": [ { "execution_count": 16, "output_type": "execute_result", "data": { "text/html": "
loved
alice
george
Took 0ms
" }, "metadata": {} } ] }, { "cell_type": "markdown", "source": "... but every argument in the head of any Horn-clause rule must appear in the body, of course:", "metadata": {} }, { "cell_type": "code", "source": "loves[] <- [['alice', 'eve'],\n ['bob', 'alice'],\n ['eve', 'alice'],\n ['eve', 'bob'],\n ['eve', 'charlie'],\n ['charlie', 'eve'],\n ['david', 'george'],\n ['george', 'george']]\n\n?[the_alien, loved] := loves[person, loved], !ends_with(person, 'e')", "metadata": { "trusted": true }, "execution_count": 17, "outputs": [ { "execution_count": 17, "output_type": "execute_result", "data": { "text/html": "
eval::unbound_symb_in_head\n\n  × Symbol 'the_alien' in rule head is unbound\n    ╭─[9:1]\n  9 │ \n 10 │ ?[the_alien, loved] := loves[person, loved], !ends_with(person, 'e')\n    ·   ─────────\n    ╰────\n  help: Note that symbols occurring only in negated positions are not considered bound\n
" }, "metadata": {} } ] }, { "cell_type": "markdown", "source": "## Negation", "metadata": {} }, { "cell_type": "markdown", "source": "The next query finds those who are loved by Eve, but not by Bob:", "metadata": {} }, { "cell_type": "code", "source": "loves[] <- [['alice', 'eve'],\n ['bob', 'alice'],\n ['eve', 'alice'],\n ['eve', 'bob'],\n ['eve', 'charlie'],\n ['charlie', 'eve'],\n ['david', 'george'],\n ['george', 'george']]\n\n?[loved_by_e_not_b] := loves['eve', loved_by_e_not_b], not loves['bob', loved_by_e_not_b]", "metadata": { "trusted": true }, "execution_count": 18, "outputs": [ { "execution_count": 18, "output_type": "execute_result", "data": { "text/html": "
loved_by_e_not_b
bob
charlie
Took 0ms
" }, "metadata": {} } ] }, { "cell_type": "markdown", "source": "Here we are using the `not` keyword to _negate_ the rule application `loves`. This negation is at the level of Horn-clauses, which is not the same as the level of expressions. In fact, there are two sets of related but inequivalent operators:\n\n* For Horn clauses: `,` (conjunction), `or` (disjunction), `not` (negation)\n* For boolean expressions: `&&` (conjunction), `||` (disjunction), `!` (negation)\n\nHopefully you are already familiar with the boolean set of operators. If you use them in the wrong way, the query compiler will yell at you. And you will comply.", "metadata": {} }, { "cell_type": "markdown", "source": "Negation has to abide by the _safety rule_. Let's violate it:", "metadata": {} }, { "cell_type": "code", "source": "loves[] <- [['alice', 'eve'],\n ['bob', 'alice'],\n ['eve', 'alice'],\n ['eve', 'bob'],\n ['eve', 'charlie'],\n ['charlie', 'eve'],\n ['david', 'george'],\n ['george', 'george']]\n\n?[not_loved_by_b] := not loves['bob', not_loved_by_b]", "metadata": { "trusted": true }, "execution_count": 19, "outputs": [ { "execution_count": 19, "output_type": "execute_result", "data": { "text/html": "
eval::unbound_symb_in_head\n\n  × Symbol 'not_loved_by_b' in rule head is unbound\n    ╭─[9:1]\n  9 │ \n 10 │ ?[not_loved_by_b] := not loves['bob', not_loved_by_b]\n    ·   ──────────────\n    ╰────\n  help: Note that symbols occurring only in negated positions are not considered bound\n
" }, "metadata": {} } ] }, { "cell_type": "markdown", "source": "Oh no! The query compiler rejects our perfectly reasonable query trying to determine those poor souls not loved by Bob!\n\nBut is our query really reasonable? For example, should the query return a tuple containing 'gold', since according to facts at hand, Bob clearly has no interest in 'gold'? So should our query return every possible string except a select few? Do you want your computer to handle such a query?\n\nNow you understand what the help message above is trying to tell you.", "metadata": {} }, { "cell_type": "markdown", "source": "To make our query really reasonable, we have to explicitly give our query a _closed world_ in which to operate the negation:", "metadata": {} }, { "cell_type": "code", "source": "loves[] <- [['alice', 'eve'],\n ['bob', 'alice'],\n ['eve', 'alice'],\n ['eve', 'bob'],\n ['eve', 'charlie'],\n ['charlie', 'eve'],\n ['david', 'george'],\n ['george', 'george']]\n \nthe_population[p] := loves[p, _a]\nthe_population[p] := loves[_a, p]\n\n?[not_loved_by_b] := the_population[not_loved_by_b], not loves['bob', not_loved_by_b]", "metadata": { "trusted": true }, "execution_count": 20, "outputs": [ { "execution_count": 20, "output_type": "execute_result", "data": { "text/html": "
not_loved_by_b
bob
charlie
david
eve
george
Took 0ms
" }, "metadata": {} } ] }, { "cell_type": "markdown", "source": "Now the query understands that we are asking our question _within_ the people in the love network. It then proceeds without complaints.\n\nLet's state the **safety rule for negation**: _at least one_ argument of the rule application must be bound elsewhere (otherwise the clause will produce an infinity of candidate tuples), and _all arguments_ to negated clauses are _not_ considered bound, _unless_ they also appear elsewhere in a positive context.\n\nIf you can't wrap your head around the rule yet, don't worry. Just write your query. Return here and reread this section when you encounter some error messages similar to the above.", "metadata": {} }, { "cell_type": "markdown", "source": "## Unification", "metadata": { "tags": [] } }, { "cell_type": "markdown", "source": "We have seen that variables with repeated appearance in rule applications and predicates are implicitly unified. You can also _explicitly_ unify a variable with the unify operator `<-`:", "metadata": {} }, { "cell_type": "code", "source": "loves[] <- [['alice', 'eve'],\n ['bob', 'alice'],\n ['eve', 'alice'],\n ['eve', 'bob'],\n ['eve', 'charlie'],\n ['charlie', 'eve'],\n ['david', 'george'],\n ['george', 'george']]\n\n?[loves_eve] := eve <- 'eve', loves[loves_eve, eve]", "metadata": { "trusted": true }, "execution_count": 21, "outputs": [ { "execution_count": 21, "output_type": "execute_result", "data": { "text/html": "
loves_eve
alice
charlie
Took 0ms
" }, "metadata": {} } ] }, { "cell_type": "markdown", "source": "By the way, the _order_ a clause appears in a Horn-clause rule can never affect the result in any way (provided your queries do not contain random functions):", "metadata": {} }, { "cell_type": "code", "source": "loves[] <- [['alice', 'eve'],\n ['bob', 'alice'],\n ['eve', 'alice'],\n ['eve', 'bob'],\n ['eve', 'charlie'],\n ['charlie', 'eve'],\n ['david', 'george'],\n ['george', 'george']]\n\n?[loves_eve] := loves[loves_eve, eve], eve <- 'eve'", "metadata": { "trusted": true }, "execution_count": 22, "outputs": [ { "execution_count": 22, "output_type": "execute_result", "data": { "text/html": "
loves_eve
alice
charlie
Took 0ms
" }, "metadata": {} } ] }, { "cell_type": "markdown", "source": "... but the performance might vary, sometimes greatly. This is an advanced topic that we will come back to in a later session. For trivial examples like ours it doesn't matter. In your own explorations, just try to put more 'restrictive' rules first (meaning that they filter out a greater number of tuples), and you will be fine most of the time.", "metadata": {} }, { "cell_type": "markdown", "source": "There is also the spread-unify operator `<- ..`, which unifies the left hand side with values in a list one at a time:", "metadata": {} }, { "cell_type": "code", "source": "?[u] := u <- ..['a', 'b', 'c']", "metadata": { "trusted": true }, "execution_count": 23, "outputs": [ { "execution_count": 23, "output_type": "execute_result", "data": { "text/html": "
u
a
b
c
Took 0ms
" }, "metadata": {} } ] }, { "cell_type": "markdown", "source": "Another example: this is the \"Cartesian product\"", "metadata": {} }, { "cell_type": "code", "source": "?[u, v] := u <- ..['a', 'b', 'c'], v <- ..['x', 'y']", "metadata": { "trusted": true }, "execution_count": 24, "outputs": [ { "execution_count": 24, "output_type": "execute_result", "data": { "text/html": "
uv
ax
ay
bx
by
cx
cy
Took 0ms
" }, "metadata": {} } ] }, { "cell_type": "markdown", "source": "You may notice that paired with functions extracting elements from lists, we don't actually need constant rules anymore. But constant rules are more explicit when you really have _facts_ as inputs.", "metadata": {} }, { "cell_type": "markdown", "source": "## Recursion", "metadata": {} }, { "cell_type": "markdown", "source": "Now we come to the \"poster boy\" query of classical Datalog: let's find out all the people loved by Alice, or loved by someone loved by Alice, or loved by someone loved by someone loved by Alice, _ad infinitum_:", "metadata": {} }, { "cell_type": "code", "source": "loves[] <- [['alice', 'eve'],\n ['bob', 'alice'],\n ['eve', 'alice'],\n ['eve', 'bob'],\n ['eve', 'charlie'],\n ['charlie', 'eve'],\n ['david', 'george'],\n ['george', 'george']]\n\nalice_love_chain[person] := loves['alice', person]\nalice_love_chain[person] := alice_love_chain[in_person], loves[in_person, person]\n\n?[chained] := alice_love_chain[chained]", "metadata": { "trusted": true }, "execution_count": 25, "outputs": [ { "execution_count": 25, "output_type": "execute_result", "data": { "text/html": "
chained
alice
bob
charlie
eve
Took 0ms
" }, "metadata": {} } ] }, { "cell_type": "markdown", "source": "Someone \"chained\" is either loved by Alice directly, or loved by someone already in the chain. The query as written reads very naturally. This is why this \"transitive closure\" type of query is the poster-boy query of classical Datalog. \n\nWriting the same thing in SQL requires recursive CTE, and those CTEs escalate pretty quickly. On the other hand, if well written, Datalog queries can weather very demanding situations and remain readable.\n\nRecursive queries are an essential part for graphs (networks). So they had better be easy to write _and_ read in a database claiming to be optimized for graphs.", "metadata": {} }, { "cell_type": "markdown", "source": "We've talked about the safety rule for negation above. You may suspect that something similar is at play here. Let's retry the above query, but omit the starting condition `alice_love_chain[person] := loves['alice', person]`:", "metadata": {} }, { "cell_type": "code", "source": "loves[] <- [['alice', 'eve'],\n ['bob', 'alice'],\n ['eve', 'alice'],\n ['eve', 'bob'],\n ['eve', 'charlie'],\n ['charlie', 'eve'],\n ['david', 'george'],\n ['george', 'george']]\n\nalice_love_chain[person] := alice_love_chain[in_person], loves[in_person, person]\n\n?[chained] := alice_love_chain[chained]", "metadata": { "trusted": true }, "execution_count": 26, "outputs": [ { "execution_count": 26, "output_type": "execute_result", "data": { "text/html": "
chained
Took 0ms
" }, "metadata": {} } ] }, { "cell_type": "markdown", "source": "Are you surprised that the compiler did not complain? Are you surprised that it returned no results? This is the _closed-world assumption_ hinted above at play again. If there is no way to _deduce_ a fact from the given facts, _then_ the fact itself is false.\n\nThis so called \"least fixed point\" semantics is the semantics of Datalog queries. This semantics is actually subtly different from SQL, due to the existence of `UNKNOWN` in SQL, usually manifesting as `NULL`. In other worlds, SQL operates on [ternary logic](https://en.wikipedia.org/wiki/Three-valued_logic) whereas Datalog stays boolean all the way (under the protection of the closed world assumptions).", "metadata": {} }, { "cell_type": "markdown", "source": "Still, there are _rules_ with respect to recursion. [Bertrand Russell](https://en.wikipedia.org/wiki/Russell%27s_paradox) would rush to write:", "metadata": {} }, { "cell_type": "code", "source": "world[a] := a <- ..[1, 2]\n\np[a] := world[a], not q[a]\nq[a] := world[a], not p[a]\n\n?[a] := p[a]", "metadata": { "trusted": true }, "execution_count": 27, "outputs": [ { "execution_count": 27, "output_type": "execute_result", "data": { "text/html": "
eval::unstratifiable\n\n  × Query is unstratifiable\n  help: The rule 'q' is in the strongly connected component ["p", "q"],\n        and is involved in at least one forbidden dependency\n        (negation, non-meet aggregation, or algorithm-application).\n
" }, "metadata": {} } ] }, { "cell_type": "markdown", "source": "The above query does not violate the safety rule of negation (because he put a `world` in front of each negation), but the compiler still rejects it. Don't worry about the unworldly incantation the error makes. Instead, think for a moment what the result _could_ be.", "metadata": {} }, { "cell_type": "markdown", "source": "You can verify that the result could be the single tuple `[1]` with the assignment `p[a] <- [[1]]` and `q[a] <- [[2]]`, _or_ the single tuple `['q']` with the assignment `p[a] <- [[2]]` and `q[a] <- [[1]]`. The problem is, these answers contradict each other, and neither can be deduced _constructively_. So under the least fixed point semantics, this program has no _meaning_, and the compiler rejects it.\n\nAgain, don't worry if you can't exactly follow what is going on. Just trust that the compiler is trying to prevent your computer from imploding. Real applications don't tend to produce these kinds of contrived, paradoxical queries anyway.", "metadata": {} }, { "cell_type": "markdown", "source": "## Conclusion", "metadata": {} }, { "cell_type": "markdown", "source": "That's it! You have learned the basics of Datalog in the dialect CozoScript!\n\nIf you want to play more without going further for the moment, it is recommended that you skim through the list of functions in the Manual. Those functions allow you to do much more acrobatics with pure Datalog.\n\n\"We've seen data, but where is the BASE of dataBASE?\", you ask, not content of being merely an air-datarist.\n\nI'm glad you asked. Let's go to our base now!", "metadata": {} }, { "cell_type": "code", "source": "", "metadata": {}, "execution_count": null, "outputs": [] } ] }