execution docs; completion of manual

main
Ziyang Hu 2 years ago
parent eff28aa470
commit 23b15509d3

@ -23,7 +23,7 @@ sometimes in rather convoluted ways.
In Cozo, no coercion is necessary since the query execution is completely
determined by how the query is written:
there is no stats-based query planning involved.
In our experience, this actually saves quite a lot of developer time,
In our experience, this saves quite a lot of developer time,
since developers eventually learn how to write efficient queries naturally,
and after they do, they no longer have to deal with endless "query de-optimizations".
@ -40,25 +40,25 @@ and negation only occurs for the leaf atoms.
The next step towards executing the query is *stratifying* the rules.
Stratification begins by making a graph of the named rules,
with the rules themselves as nodes,
and a link is added between two nodes when one of the rule applies the other.
and a link is added between two nodes when one of the rules applies the other.
This application is through atoms for inline rules, and input relations for fixed rules.
Now some of the links are labelled *strafifying*:
Now some of the links are labelled *stratifying*:
when an inline rule applies another rule through negation,
when an inline rule applies another inline rule that contains aggregations,
when an inline rule applies itself and it has non-semi-lattice,
when an inline rule applies another rule which is a fixed rule,
or when a fixed rule has another rule as input relation.
or when a fixed rule has another rule as an input relation.
The strongly connected components of this graph are then determined and tested,
and if it found that some strongly connected component contains a stratifying link,
the graph is deemed *unstratifiable*, and the execution aborts.
Otherwise Cozo will topologically sort the strongly connected components in order to
Otherwise, Cozo will topologically sort the strongly connected components to
determine a *stratification* of the rules:
rules within the same stratum are logically executed together,
and no two rules within the same stratum can have a stratifying link between them.
In this process,
Cozo will merge the strongly connected components into as few supernodes as possible
while still maintenance the restriction on stratifying links.
The resulting strata are then passed to be processed in the next step.
while still maintaining the restriction on stratifying links.
The resulting strata are then passed on to be processed in the next step.
You can see the stratum number assigned to rules by using the ``::explain`` system op.
@ -66,8 +66,8 @@ You can see the stratum number assigned to rules by using the ``::explain`` syst
Magic set rewrites
--------------------------------------
Within each stratum, the input rules are rewritten using a technique called magic sets.
In intuitive terms, this rewriting is to make sure that the query execution does not
waste its time on calculating results that are then simply discarded.
In intuitive terms, this rewriting is to ensure that the query execution does not
waste time calculating results that are then simply discarded.
As an example, consider::
reachable[a, b] := link[a, n]
@ -76,8 +76,8 @@ As an example, consider::
Without magic set rewrites, the whole ``reachable`` relation is generated first,
then most of them are thrown away, keeping only those starting from ``'A'``.
Magic set avoids this problem. How the rewrite actually proceeds is rather technical,
but you can see the end results in the output of ``::explain``.
Magic set avoids this problem. How the rewrite proceeds is rather technical,
but you can see the results in the output of ``::explain``.
The rewritten query is guaranteed to yield the same relation for ``?``,
and will in general yield fewer intermediate rows.
@ -89,17 +89,17 @@ So for the moment being, you may need to manually constrain some of your rules.
Semi-naïve evaluation
--------------------------------------
Now each stratum contains either a single fixed rule, or a set of inline rules.
Now each stratum contains either a single fixed rule or a set of inline rules.
The single fixed rule case is easy: just run the specific implementation of the rule.
For the inline rules case, each of the rule is assigned an output relation.
In the case of the inline rules, each of the rules is assigned an output relation.
Assuming we know how to evaluate each rule given all the relations it depends on,
the semi-naïve algorithm can now be applied to the rules to yield all output rows.
The semi-naïve algorithm is a bottom-up evaluation strategy, meaning that it tries to deduce
all facts from a set of given facts.
By contrast, top-down strategies start with stated goals and try to find proofs for the goals.
By contrast, top-down strategies start with stated goals and try to find proof for the goals.
Bottom-up strategies have many advantages over top-down ones when the whole output of each rule
is needed, but may waste time generating unused facts if only some of the output are kept.
is needed, but may waste time generating unused facts if only some of the output is kept.
Magic set rewrites are introduced to eliminate precisely this weakness.
@ -110,7 +110,7 @@ Ordering of atoms
Now we discuss how a single definition of an inline rule is evaluated.
We know from the query chapter that the body of the rule contains atoms,
and after conversion to disjunctive normal forms, all atoms are linked by conjunction,
and each atom can only be the one of the following:
and each atom can only be one of the following:
* an explicit unification,
* applying a rule or a stored relation,
@ -122,10 +122,10 @@ The atoms are then reordered: all atoms that introduce new bindings stay where t
whereas all atoms that do not introduce new bindings are moved to the earliest possible place
where all their bindings are bound. In fact,
all atoms that introduce bindings correspond to
joining with a pre-existing relations followed by projections
joining with a pre-existing relation followed by projections
in relational algebra, and all atoms that do not correspond to filters.
The idea is to apply filters as early as possible
so as to minimize the number of rows before joining with the next relation.
to minimize the number of rows before joining with the next relation.
This procedure is completely deterministic.
When writing the body of rules, we therefore should aim to minimize the total number of rows generated.
@ -136,13 +136,13 @@ as this can make the left relation in each join small.
Relations as indices
---------------------------------------
Next we need to understand how a single atom which generate new bindings is processed.
Next, we need to understand how a single atom which generates new bindings is processed.
For the case of unification, it is simple: the right-hand size of the unification,
For the case of unification, it is simple: the right-hand side of the unification,
which is an expression with all variables bound, is simply evaluated, and the result is joined
to the current relation (as in a ``map-cat`` operation in functional languages).
For the case of application of relations,
For the case of the application of relations,
the first thing to understand is that all relations in Cozo are conceptually trees.
All the bindings of relations generated by inline or fixed rules,
and the keys of stored relations, act as a composite key for the tree.
@ -151,7 +151,7 @@ For example, the following application::
a_rule['A', 'B', c]
with ``c`` unbound is very efficient, since this correspond to a prefix-scan in the tree with the key prefix ``['A', 'B']``,
with ``c`` unbound is very efficient, since this corresponds to a prefix scan in the tree with the key prefix ``['A', 'B']``,
whereas the following application::
a_rule[a, 'B', 'C']
@ -170,5 +170,5 @@ Within each stratum, rows are generated in a streaming fashion.
For the entry rule ``?``, if ``:limit`` is specified as a query option,
a counter is used to monitor how many valid rows are already generated.
If enough rows are generated, the query stops.
Note that this only works when the entry rule is an inline rule,
Note that this only works when the entry rule is inline,
and when you are *not* specifying ``:order``.
Loading…
Cancel
Save