You cannot select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

214 lines
9.8 KiB
ReStructuredText

==============
Queries
==============
2 years ago
The Cozo database system is queried using the CozoScript language.
At its core, CozoScript is a `Datalog <https://en.wikipedia.org/wiki/Datalog>`_ dialect
supporting stratified negation and stratified recursive meet-aggregations.
2 years ago
The built-in native algorithms (mainly graph algorithms) further empower
2 years ago
CozoScript for much greater ease of use and much wider applicability.
2 years ago
2 years ago
A query consists of one or many named rules.
2 years ago
Each named rule conceptually represents a relation or a table with rows and columns.
2 years ago
The rule named ``?`` is called the entry to the query,
and its associated relation is returned as the result of the query.
Each named rule has associated with it a rule head, which names the columns of the relation,
and a rule body, which specifies the content of the relation, or how the content should be computed.
2 years ago
2 years ago
In CozoScript, relations (stored relations or relations defined by rules) abide by the *set semantics*,
2 years ago
meaning that even if a rule may compute a row multiple times, it will occur only once in the output.
This is in contradistinction to SQL.
2 years ago
There are three types of named rules in CozoScript: constant rules, Horn-clause rules and algorithm applications.
2 years ago
-----------------
Constant rules
-----------------
2 years ago
The following is an example of a constant rule::
const_rule[a, b, c] <- [[1, 2, 3], [4, 5, 6]]
Constant rules are distinguished by the symbol ``<-`` separating the rule head and rule body.
The rule body should be an expression evaluating to a list of lists:
2 years ago
every subslist of the rule body should be of the same length (the *arity* of the rule),
2 years ago
and must match the number of arguments in the rule head.
In general, if you are passing data into the query,
you should take advantage of named parameters::
const_rule[a, b, c] <- $data_passed_in
and pass a map containing a key of ``"data_passed_in"`` with a value of a list of lists.
The rule head may be omitted if the rule body is not the empty list::
const_rule[] <- [[1, 2, 3], [4, 5, 6]]
2 years ago
in which case the system will deduce the arity of the rule from the data.
2 years ago
-----------------
Horn-clause rules
-----------------
2 years ago
An example of a Horn-clause rule is::
2 years ago
hc_rule[a, e] := rule_a['constant_string', b], rule_b[b, d, a, e]
2 years ago
As can be seen, Horn-clause rules are distinguished by the symbol ``:=`` separating the rule head and rule body.
2 years ago
The rule body of a Horn-clause rule consists of multiple *atoms* joined by commas,
and is interpreted as representing the *conjunction* of these atoms.
^^^^^^^^^^^^^^
Atoms
^^^^^^^^^^^^^^
2 years ago
Atoms come in various flavours.
2 years ago
In the example above::
rule_a['constant_string', b]
2 years ago
is an atom representing a *rule application*: a rule named ``rule_a`` must exist in the same query
2 years ago
and have the correct arity (2 here).
Each row in the named rule is then *unified* with the bindings given as parameters in the square bracket:
here the first column is unified with a constant string, and unification succeeds only when the string
2 years ago
completely matches what is given;
the second column is unified with the *variable* ``b``,
and as the variable is fresh at this point (meaning that it first appears here),
2 years ago
the unification will always succeed and the variable will become *bound*:
2 years ago
from this point take on the value of whatever it was
2 years ago
unified with in the named relation.
When a bound variable is used again later, for example in ``rule_b[b, d, a, e]``, the variable ``b`` was bound
at this point, this unification will only succeed when the unified value is the same as the previously unified value.
In other words, repeated use of the same variable in named rules corresponds to inner joins in relational algebra.
2 years ago
Another flavour of atoms is the *stored relation*. It may be written similarly to a rule application::
2 years ago
:stored_relation[bind1, bind2]
2 years ago
with the colon in front of the stored relation name to distinguish it from rule application.
Written in this way, you must give as many bindings to the stored relation as its arity,
and the bindings proceed by argument positions, which may be cumbersome and error-prone.
So alternatively, you may use the fact that columns of a stored relation are always named and bind by name::
2 years ago
:stored_relation{col1: bind1, col2: bind2}
2 years ago
In this case, you only need to bind as many variables as you use.
2 years ago
If the name you want to give the binding is the same as the name of the column, you may use the shorthand notation:
``:stored_relation{col1}`` is the same as ``:stored_relation{col1: col1}``.
2 years ago
*Expressions* are also atoms, such as::
2 years ago
a > b + 1
Here ``a`` and ``b`` must be bound somewhere else in the rule, and the expression must evaluate to a boolean, and act as a *filter*: only rows where the expression evaluates to true are kept.
You can also use *unification atoms* to unify explicitly::
a = b + c + d
2 years ago
for such atoms, whatever appears on the left-hand side must be a single variable and is unified with the right-hand side.
This is different from the equality operator ``==``,
2 years ago
where both sides are merely required to be expressions.
2 years ago
When the left-hand side is a single *bound* variable,
2 years ago
it may be shown that the equality and the unification operators are semantically equivalent.
Another form of *unification atom* is the explicit multi-unification::
a in [x, y, z]
2 years ago
here the variable on the left-hand side of ``in`` is unified with each item on the right-hand side in turn,
which in turn implies that the right-hand side must evaluate to a list
2 years ago
(but may be represented by a single variable or a function call).
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Head and returned relation
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Atoms, as explained above, corresponds to either relations (or their projections) or filters in relational algebra.
2 years ago
Linked by commas, they, therefore, represent a joined relation, with named columns.
2 years ago
The *head* of the rule, which in the simplest case is just a list of variables,
then defines whichever columns to keep, and their order in the output relation.
Each variable in the head must be bound in the body, this is one of the *safety rules* of Datalog.
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Multiple definitions and disjunction
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2 years ago
For Horn-clause rules only, multiple rule definitions may share the same name,
2 years ago
with the requirement that the arity of the head in each definition must match.
The returned relation is then the *disjunction* of the multiple definitions,
which correspond to *union* in SQL.
2 years ago
*Intersect* in SQL can be written in CozoScript into a single rule since commas denote conjunction.
2 years ago
In complicated situations, you may instead write disjunctions in a single rule with the explicit ``or`` operator::
rule1[a, b] := rule2[a] or rule3[a], rule4[a, b]
For completeness, there is also an explicit ``and`` operator, but it is semantically identical to the comma, except that
it has higher operator precedence than ``or``, which in turn has higher operator precedence than the comma.
2 years ago
During evaluation, each rule is canonicalized into disjunction normal form
2 years ago
and each clause of the outmost disjunction is treated as a separate rule.
2 years ago
The consequence is that the safety rule may be violated
even though textually every variable in the head occurs in the body.
2 years ago
As an example::
rule[a, b] := rule1[a] or rule2[b]
is a violation of the safety rule since it is rewritten into two rules, each of which is missing a different binding.
^^^^^^^^^^^^^^^^
Negation
^^^^^^^^^^^^^^^^
2 years ago
Atoms in Horn clauses may be *negated* by putting ``not`` in front of them, as in::
2 years ago
not rule1[a, b]
2 years ago
When negating rule applications and stored relations,
at least one binding must be bound somewhere else in the rule in a non-negated context:
this is another safety rule of Datalog, and it ensures that the outputs of rules are always finite.
2 years ago
The unbound bindings in negated rules remain unbound: negation cannot introduce bound bindings to be used in the head.
2 years ago
Negated expressions act as negative filters,
2 years ago
which is semantically equivalent to putting ``!`` in front of the expression.
2 years ago
Since negation does not introduce new bindings,
unifications and multi-unifications are converted to equivalent expressions and then negated.
2 years ago
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Recursion and stratification
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2 years ago
The body of a Horn-clause rule may contain rule applications of itself,
2 years ago
and multiple Horn-clause rules may apply each other recursively.
The only exception is the entry rule ``?``, which cannot be referred to by other rules.
2 years ago
Self and mutual references allow recursion to be defined easily. To guard against semantically pathological cases,
2 years ago
recursion cannot occur in negated positions: the Russell-style rule ``r[a] := not r[a]`` is not allowed.
This requirement creates an ordering of the rules, since
2 years ago
negated rules must evaluate to completion before rules that apply them can start evaluation:
2 years ago
this is called *stratification* of the rules.
2 years ago
In cases where a total ordering cannot be defined since there exists a loop in the ordering
2 years ago
required by negation, the query is then deemed unstratifiable and Cozo will refuse to execute it.
2 years ago
Note that since CozoScript allows unifying fresh variables, you can still easily write programs that produce
2 years ago
infinite relations and hence cannot complete through recursion, but that are still accepted by the database.
One of the simplest examples is::
r[a] := a = 0
r[a] := r[b], a = b + 1
?[a] := r[a]
2 years ago
It is up to the user to ensure that such programs are not submitted to the database,
2 years ago
as it is not even in principle possible for the database to rule out such cases without wrongly rejecting valid queries.
2 years ago
If you accidentally submitted one, you can refer to the system ops section for how to terminate long-running queries.
2 years ago
Or you can give a timeout for the query when you submit.
----------------------------------
Algorithm application
----------------------------------
-----------------------
2 years ago
Query options
-----------------------