hc_rule[a, e] := rule_a['constant_string', b], rule_b[b, d, a, e]
As can be seen, Horn-clause rule is distinguished by the symbol ``:=`` separating the rule head and rule body.
The rule body of a Horn-clause rule consists of multiple *atoms* joined by commas,
and is interpreted as representing the *conjunction* of these atoms.
^^^^^^^^^^^^^^
Atoms
^^^^^^^^^^^^^^
Atoms comes in various flavours.
In the example above::
rule_a['constant_string', b]
is an atom representing a *rule application*: a rule named ``rule_a`` must exists in the same query
and have the correct arity (2 here).
Each row in the named rule is then *unified* with the bindings given as parameters in the square bracket:
here the first column is unified with a constant string, and unification succeeds only when the string
completely matches what is given;
the second column is unified with the *variable*``b``,
and as the variable is fresh at this point (meaning that it first appears here),
the unification will always succeed and the variable will become *bound*:
from this point take on value of whatever it was
unified with in the named relation.
When a bound variable is used again later, for example in ``rule_b[b, d, a, e]``, the variable ``b`` was bound
at this point, this unification will only succeed when the unified value is the same as the previously unified value.
In other words, repeated use of the same variable in named rules corresponds to inner joins in relational algebra.
Another flavour of atoms is the *stored relation*. It may be written similarly to a rule application:
:stored_relation[bind1, bind2]
with the colon in front of the stored relation name to distinguish it from rule application.
Written in this way, you must give as many bindings to the stored relation as its arity,
and the bindings proceed by arguments positions, which may be cumbersome and error-prone.
So alternatively, you may use the fact that columns of a stored relation are always named and bind by name:
:stored_relation{col1:bind1, col2: bind2}
In this case, you only need to bind as many variables as you use.
If the name you want to give the binding is the same as the name of the column, you may use the shorthand notation:
``:stored_relation{col1}`` is the same as ``:stored_relation{col1: col1}``.
*Expressions* are also atoms, such as
a > b + 1
Here ``a`` and ``b`` must be bound somewhere else in the rule, and the expression must evaluate to a boolean, and act as a *filter*: only rows where the expression evaluates to true are kept.
You can also use *unification atoms* to unify explicitly::
a = b + c + d
for such atoms, whatever appears on the left hand side must be a single variable, and is unified with the right hand side.
This is different from the equality operator ``==``,
where both sides are merely required to be expressions.
When the left hand side is a single *bound* variable,
it may be shown that the equality and the unification operators are semantically equivalent.
Another form of *unification atom* is the explicit multi-unification::
a in [x, y, z]
here the variable on the left hand side of ``in`` is unified with each item in the right hand side in turn,
which in turn implies that the right hand side must evaluate to a list
(but may be represented by a single variable or a function call).
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Head and returned relation
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Atoms, as explained above, corresponds to either relations (or their projections) or filters in relational algebra.
Linked by commas, they therefore represent a joined relation, with named columns.
The *head* of the rule, which in the simplest case is just a list of variables,
then defines whichever columns to keep, and their order in the output relation.
Each variable in the head must be bound in the body, this is one of the *safety rules* of Datalog.
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Multiple definitions and disjunction
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
For Horn-clause rules only, multiple rule definitions may share the same name,
with the requirement that the arity of the head in each definition must match.
The returned relation is then the *disjunction* of the multiple definitions,
which correspond to *union* in SQL.
*Intersect* in SQL can be written in CozoScript into a single rule, since commas denote conjunction.
In complicated situations, you may instead write disjunctions in a single rule with the explicit ``or`` operator::
rule1[a, b] := rule2[a] or rule3[a], rule4[a, b]
For completeness, there is also an explicit ``and`` operator, but it is semantically identical to the comma, except that
it has higher operator precedence than ``or``, which in turn has higher operator precedence than the comma.
During evaluation, each rule is canonicalized into disjunction normal form
and each clause of the outmost disjunction is treated as a separate rule.
The consequence is that the safety rule may be violated
even though textually every variable in the head occurs textually in the body.
As an example::
rule[a, b] := rule1[a] or rule2[b]
is a violation of the safety rule since it is rewritten into two rules, each of which is missing a different binding.