Aggregations in Cozo can be thought of as a function that acts on a string of values and produces a single value (the aggregate). Due to Datalog semantics, the stream is never empty.
Aggregations in Cozo can be thought of as a function that acts on a stream of values
and produces a single value (the aggregate).
There are two kinds of aggregations in Cozo, *ordinary aggregations* and *meet aggregations*. They are implemented differently in Cozo, with meet aggregations generally faster and more powerful (e.g. only meet aggregations can be recursive).
There are two kinds of aggregations in Cozo, *ordinary aggregations* and *semi-lattice aggregations*.
They are implemented differently in Cozo, with semi-lattice aggregations generally faster and more powerful
(only the latter can be used recursively).
The power of meet aggregations derive from the additional properties they satisfy by forming a `semilattice <https://en.wikipedia.org/wiki/Semilattice>`_:
The power of semi-lattice aggregations derive from the additional properties they satisfy: a `semilattice <https://en.wikipedia.org/wiki/Semilattice>`_:
idempotency
the aggregate of a single value ``a`` is ``a`` itself,
@ -16,13 +19,11 @@ The power of meet aggregations derive from the additional properties they satisf
associativity
it is immaterial where we put the parentheses in an aggregate application.
Meet aggregations can be used as ordinary ones, but the reverse is impossible.
------------------------------------
Semi-lattice aggregations
------------------------------------
------------------
Meet aggregations
------------------
..module:: Aggr.Meet
..module:: Aggr.SemiLattice
:noindex:
..function:: min(x)
@ -51,11 +52,13 @@ Meet aggregations
..function:: choice(var)
Non-deterministically chooses one of the values of ``var`` as the aggregate. It simply chooses the first value it meets (the order that it meets values should be considered non-deterministic).
Non-deterministically chooses one of the values of ``var`` as the aggregate.
It simply chooses the first value it meets (the order that it meets values is non-deterministic).
..function:: choice_last(var)
Non-deterministically chooses one of the values of ``var`` as the aggregate. It simply chooses the last value it meets.
Non-deterministically chooses one of the values of ``var`` as the aggregate.
It simply chooses the last value it meets.
..function:: min_cost([data, cost])
@ -102,7 +105,8 @@ Ordinary aggregations
..function:: group_count(var)
Count the occurrence of unique values of ``var``, putting the result into a list of lists, e.g. when applied to ``'a'``, ``'b'``, ``'c'``, ``'c'``, ``'a'``, ``'c'``, the results is ``[['a', 2], ['b', 1], ['c', 3]]``.
Count the occurrence of unique values of ``var``, putting the result into a list of lists,
e.g. when applied to ``'a'``, ``'b'``, ``'c'``, ``'c'``, ``'a'``, ``'c'``, the results is ``[['a', 2], ['b', 1], ['c', 3]]``.
..function:: bit_xor(var)
@ -110,15 +114,19 @@ Ordinary aggregations
..function:: latest_by([data, time])
The argument should be a list of two elements and this aggregation returns the ``data`` of the maximum ``cost``. This is very similar to ``min_cost``, the differences being that maximum instead of minimum is used, only the data itself is returned, and the aggregation is deliberately note a meet aggregation. Intended to be used in timestamped audit trails.
The argument should be a list of two elements and this aggregation returns the ``data`` of the maximum ``cost``.
This is very similar to ``min_cost``, the differences being that maximum instead of minimum is used,
only the data itself is returned, and the aggregation is deliberately not a semi-lattice aggregation. Intended to be used in timestamped audit trails.
..function:: choice_rand(var)
Non-deterministically chooses one of the values of ``var`` as the aggregate.
Each value the aggregation encounters has the same probability of being chosen.
This version of ``choice`` is not a meet aggregation
since it is impossible to satisfy the uniform sampling requirement while maintaining no state,
which is an implementation restriction unlikely to be lifted.
..NOTE::
This version of ``choice`` is not a semi-lattice aggregation
since it is impossible to satisfy the uniform sampling requirement while maintaining no state,
which is an implementation restriction unlikely to be lifted.
* Lists are ordered lexicographically by their elements;
* Bytes are compared lexicographically;
* Strings are ordered lexicographically by their UTF-8 byte representations.
* UUIDs are sorted in a way that UUIDv1 with similar timestamps are near each other. This is to improve data locality and should be considered an implementation detail. Depending on the order of UUID in your application is not recommended.
* Strings are compared lexicographically by their UTF-8 byte representations;
* UUIDs are sorted in a way that UUIDv1 with similar timestamps are near each other.
This is to improve data locality and should be considered an implementation detail.
Depending on the order of UUID in your application is not recommended.
..WARNING::
Because there are two internal number types ``Int`` and ``Float`` under the umbrella type ``Number``, sorting numbers can be more complex than anticipated.
When sorting, the integer always comes before the equivalent float. For example, ``1.0 == 1``, ``1.0 >= 1`` and ``1.0 <= 1`` all evaluate to true, but when sorting ``1`` and ``1.0`` are two _different_ values and ``1`` is placed before ``1.0``.
This may create problems when applying aggregations since if a grouping key contains both ``1.0`` and ``1``, they are treated as separate group headings. In such cases, it may help to use explicit coercion ``to_float`` or ``round`` to coerce all sorted values to the same type.
``1 == 1.0`` evaluates to ``true``, but ``1`` and ``1.0`` are distinct values,
meaning that a relation can contain both as keys according to set semantics.
This is especially confusing when using JavaScript, which converts all numbers to float,
and python, which does not show a difference between the two when printing.
Using floating point numbers in keys is not recommended if the rows are accessed by these keys
(instead of accessed by iteration).
----------------
Value literals
Literals
----------------
The standard notations ``null`` for the type ``Null``, ``false`` and ``true`` for the type ``Bool`` are followed.
The standard notations ``null`` for the type ``Null``, ``false`` and ``true`` for the type ``Bool`` are used.
Besides the usual decimal notation for signed integers,
you can prefix a number with ``0x`` or ``-0x`` for hexadecimal notation,
with ``0o`` or ``-0o`` for octal notation,
or with ``0b`` or ``-0b`` for binary notation.
Floating point numbers include the decimal dot, which may be trailing,
you can prefix a number with ``0x`` or ``-0x`` for hexadecimal representation,
with ``0o`` or ``-0o`` for octal,
or with ``0b`` or ``-0b`` for binary.
Floating point numbers include the decimal dot (may be trailing),
and may be in scientific notation.
All numbers may include underscores ``_`` in their representation for clarity.
For example, ``299_792_458`` is the speed of light in meters per second.
Strings can be typed in the same way as they do in JSON between double quotes ``""``,
Strings can be typed in the same way as they do in JSON using double quotes ``""``,
with the same escape rules.
You can also use single quotes ``''`` in which case the roles of the double quote and single quote are switched.
In addition, there is a raw string notation::
You can also use single quotes ``''`` in which case the roles of double quotes and single quotes are switched.
There is also a "raw string" notation::
r___"I'm a raw string with "quotes"!"___
___"I'm a raw string"___
A raw string starts with the letter ``r`` followed by an arbitrary number of underscores, and then a double quote.
A raw string starts with an arbitrary number of underscores, and then a double quote.
It terminates when followed by a double quote and the same number of underscores.
Everything in between is interpreted exactly as typed, including any newlines.
By varying the number of underscores, you can represent any string without quoting.
There is no literal representation for ``Bytes`` or ``Uuid`` due to restrictions placed by JSON.
You must pass in its Base64 encoding for bytes, or hyphened strings for UUIDs,
and use the appropriate functions to decode it.
If you are just inserting data into a stored relation with a column specified to contain bytes or UUIDs,
auto-coercion will kick in.
There is no literal representation for ``Bytes`` or ``Uuid``.
Use the appropriate functions to create them.
If you are inserting data into a stored relation with a column specified to contain bytes or UUIDs,
auto-coercion will kick in and use ``decode_base64`` and ``to_uuid`` for conversion.
Lists are items enclosed between square brackets ``[]``, separated by commas.
Functions can be used in expressions in Cozo. All function arguments in Cozo are immutable. All functions except those having names starting with ``rand_`` are deterministic.
Functions can be used to build expressions.
Internally, all function arguments are partially evaluated before binding variables to input tuples. For example, the regular expression in ``regex_matches(var, '[a-zA-Z]+')`` will only be compiled once during the execution of the query, instead of being repeatedly compiled for every input tuple.
In the following, all functions except those having names starting with ``rand_`` are deterministic.
------------------------
Equality and Comparisons
@ -15,7 +15,7 @@ Equality and Comparisons
..function:: eq(x, y)
Equality comparison. The operator form is ``x == y`` or ``x = y``. The two arguments of the equality can be of different types, in which case the result is ``false``.
Equality comparison. The operator form is ``x == y``. The two arguments of the equality can be of different types, in which case the result is ``false``.
..function:: neq(x, y)
@ -37,10 +37,9 @@ Equality and Comparisons
Equivalent to ``x <= y``
..NOTE::
The four comparison operators can only compare values of the same value type. Integers and floats are of the same type ``Number``.
The four comparison operators can only compare values of the same runtime type. Integers and floats are of the same type ``Number``.
Computes with the `haversine formula <https://en.wikipedia.org/wiki/Haversine_formula>`_ the angle measured in radians between two points ``a`` and ``b`` on a sphere specified by their latitudes and longitudes. The inputs are in radians. You probably want the next function since most maps measure angles in radians.
Computes with the `haversine formula <https://en.wikipedia.org/wiki/Haversine_formula>`_
the angle measured in radians between two points ``a`` and ``b`` on a sphere
specified by their latitudes and longitudes. The inputs are in radians.
You probably want the next function when you are dealing with maps,
since most maps measure angles in degrees instead of radians.
Same as the previous function, but the inputs are in degrees instead of radians. The return value is still in radians. If you want the approximate distance measured on the surface of the earth instead of the angle between two points, multiply the result by the radius of the earth, which is about ``6371`` kilometres, ``3959`` miles, or ``3440`` nautical miles.
Same as the previous function, but the inputs are in degrees instead of radians.
The return value is still in radians.
If you want the approximate distance measured on the surface of the earth instead of the angle between two points,
multiply the result by the radius of the earth,
which is about ``6371`` kilometres, ``3959`` miles, or ``3440`` nautical miles.
..WARNING::
.. NOTE::
The haversine formula, when applied to the surface of the earth, which is not a perfect sphere, can result in an error of less than one percent.
The haversine formula, when applied to the surface of the earth, which is not a perfect sphere, can result in an error of less than one percent.
------------------------
String functions
@ -234,9 +242,11 @@ String functions
Can also be applied to a list or a byte array.
.. WARNING::
.. WARNING::
``length(str)`` does not return the number of bytes of the string representation. Also, what is returned depends on the normalization of the string. So if such details are important, apply ``unicode_normalize`` before ``length``.
``length(str)`` does not return the number of bytes of the string representation.
Also, what is returned depends on the normalization of the string.
So if such details are important, apply ``unicode_normalize`` before ``length``.
..function:: concat(x, ...)
@ -273,9 +283,10 @@ String functions
Tests if ``x`` starts with ``y``.
.. TIP::
.. TIP::
``starts_with(var, str)`` is prefered over equivalent (e.g. regex) conditions, since the compiler may more easily compile the clause into a range scan.
``starts_with(var, str)`` is preferred over equivalent (e.g. regex) conditions,
since the compiler may more easily compile the clause into a range scan.
..function:: ends_with(x, y)
@ -283,7 +294,8 @@ String functions
..function:: unicode_normalize(str, norm)
Converts ``str`` to the `normalization <https://en.wikipedia.org/wiki/Unicode_equivalence>`_ specified by ``norm``. The valid values of ``norm`` are ``'nfc'``, ``'nfd'``, ``'nfkc'`` and ``'nfkd'``.
Converts ``str`` to the `normalization <https://en.wikipedia.org/wiki/Unicode_equivalence>`_ specified by ``norm``.
The valid values of ``norm`` are ``'nfc'``, ``'nfd'``, ``'nfkc'`` and ``'nfkd'``.
..function:: chars(str)
@ -293,10 +305,10 @@ String functions
Combines the strings in ``list`` into a big string. In a sense, it is the inverse function of ``chars``.
..WARNING::
If you want substring slices, indexing strings, etc., first convert the string to a list with ``chars``, do the manipulation on the list, and then recombine with ``from_substring``. Hopefully, the omission of functions doing such things directly can make people more aware of the complexities involved in manipulating strings (and getting the *correct* result).
..WARNING::
If you want substring slices, indexing strings, etc., first convert the string to a list with ``chars``,
do the manipulation on the list, and then recombine with ``from_substring``.
--------------------------
List functions
@ -323,11 +335,11 @@ List functions
..function:: get(l, n)
Returns the element at index ``n`` in the list ``l``. This function will raise an error if the access is out of bounds. Indices start with 0.
Returns the element at index ``n`` in the list ``l``. Raises an error if the access is out of bounds. Indices start with 0.
..function:: maybe_get(l, n)
Returns the element at index ``n`` in the list ``l``. This function will return``null`` if the access is out of bounds. Indices start with 0.
Returns the element at index ``n`` in the list ``l``. Returns``null`` if the access is out of bounds. Indices start with 0.
..function:: length(list)
@ -337,7 +349,9 @@ List functions
..function:: slice(l, start, end)
Returns the slice of list between the index ``start`` (inclusive) and ``end`` (exclusive). Negative numbers may be used, which is interpreted as counting from the end of the list. E.g. ``slice([1, 2, 3, 4], 1, 3) == [2, 3]``, ``slice([1, 2, 3, 4], 1, -1) == [2, 3]``.
Returns the slice of list between the index ``start`` (inclusive) and ``end`` (exclusive).
Negative numbers may be used, which is interpreted as counting from the end of the list.