You cannot select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

146 lines
4.5 KiB
ReStructuredText

==============
Aggregations
==============
Aggregations in Cozo can be thought of as a function that acts on a string of values and produces a single value (the aggregate). Due to Datalog semantics, the stream is never empty.
There are two kinds of aggregations in Cozo, *ordinary aggregations* and *meet aggregations*. They are implemented differently in Cozo, with meet aggregations generally faster and more powerful (e.g. only meet aggregations can be recursive).
The power of meet aggregations derive from the additional properties they satisfy by forming a `semilattice <https://en.wikipedia.org/wiki/Semilattice>`_:
idempotency
the aggregate of a single value ``a`` is ``a`` itself,
2 years ago
commutativity
the aggregate of ``a`` then ``b`` is equal to the aggregate of ``b`` then ``a``,
2 years ago
associativity
it is immaterial where we put the parentheses in an aggregate application.
Meet aggregations can be used as ordinary ones, but the reverse is impossible.
------------------
Meet aggregations
------------------
.. module:: Aggr.Meet
:noindex:
.. function:: min(x)
Aggregate the minimum value of all ``x``.
.. function:: max(x)
Aggregate the maximum value of all ``x``.
.. function:: and(var)
Aggregate the logical conjunction of the variable passed in.
.. function:: or(var)
Aggregate the logical disjunction of the variable passed in.
.. function:: union(var)
Aggregate the unions of ``var``, which must be a list.
.. function:: intersection(var)
Aggregate the intersections of ``var``, which must be a list.
.. function:: choice(var)
Non-deterministically chooses one of the values of ``var`` as the aggregate. It simply chooses the first value it meets (the order that it meets values should be considered non-deterministic).
.. function:: choice_last(var)
Non-deterministically chooses one of the values of ``var`` as the aggregate. It simply chooses the last value it meets.
.. function:: min_cost([data, cost])
The argument should be a list of two elements and this aggregation chooses the list of the minimum ``cost``.
.. function:: shortest(var)
``var`` must be a list. Returns the shortest list among all values. Ties will be broken non-deterministically.
.. function:: coalesce(var)
Returns the first non-null value it meets. The order is non-deterministic.
.. function:: bit_and(var)
``var`` must be bytes. Returns the bitwise 'and' of the values.
.. function:: bit_or(var)
``var`` must be bytes. Returns the bitwise 'or' of the values.
---------------------
Ordinary aggregations
---------------------
.. module:: Aggr.Ord
:noindex:
.. function:: count(var)
Count how many values are generated for ``var`` (using bag instead of set semantics).
.. function:: count_unique(var)
Count how many unique values there are for ``var``.
.. function:: collect(var)
Collect all values for ``var`` into a list.
.. function:: unique(var)
Collect ``var`` into a list, keeping each unique value only once.
.. function:: group_count(var)
Count the occurrence of unique values of ``var``, putting the result into a list of lists, e.g. when applied to ``'a'``, ``'b'``, ``'c'``, ``'c'``, ``'a'``, ``'c'``, the results is ``[['a', 2], ['b', 1], ['c', 3]]``.
.. function:: bit_xor(var)
``var`` must be bytes. Returns the bitwise 'xor' of the values.
.. function:: latest_by([data, time])
The argument should be a list of two elements and this aggregation returns the ``data`` of the maximum ``cost``. This is very similar to ``min_cost``, the differences being that maximum instead of minimum is used, only the data itself is returned, and the aggregation is deliberately note a meet aggregation. Intended to be used in timestamped audit trails.
2 years ago
.. function:: choice_rand(var)
Non-deterministically chooses one of the values of ``var`` as the aggregate.
Each value the aggregation encounters has the same probability of being chosen.
This version of ``choice`` is not a meet aggregation
since it is impossible to satisfy the uniform sampling requirement while maintaining no state,
which is an implementation restriction unlikely to be lifted.
^^^^^^^^^^^^^^^^^^^^^^^^^
Statistical aggregations
^^^^^^^^^^^^^^^^^^^^^^^^^
.. function:: mean(x)
The mean value of ``x``.
.. function:: sum(x)
The sum of ``x``.
.. function:: product(x)
The product of ``x``.
.. function:: variance(x)
The sample variance of ``x``.
.. function:: std_dev(x)
The sample standard deviation of ``x``.