complete algo docs

main
Ziyang Hu 2 years ago
parent 5c24d4baa2
commit 4772b2a4ce

@ -7,7 +7,7 @@ Aggregations in Cozo can be thought of as a function that acts on a string of va
There are two kinds of aggregations in Cozo, *ordinary aggregations* and *meet aggregations*. They are implemented differently in Cozo, with meet aggregations generally faster and more powerful (e.g. only meet aggregations can be recursive).
The power of meet aggregations derive from the additional properties they satisfy (see also https://en.wikipedia.org/wiki/Semilattice):
The power of meet aggregations derive from the additional properties they satisfy by forming a `semilattice <https://en.wikipedia.org/wiki/Semilattice>`_:
idempotency
the aggregate of a single value ``a`` is ``a`` itself,

@ -4,7 +4,193 @@ Algorithms
.. module:: Algo
.. function:: reorder_sort(rel[...], out: [...], sort_by: [...], descending: false, break_ties: false, skip: 0, take: 0)
------------------
Connectedness
------------------
.. function:: ConnectedComponents(edges[from, to])
Computes the `connected components <https://en.wikipedia.org/wiki/Connected_component_(graph_theory)>`_ of a graph with the provided edges.
:return: Pairs containing the node index, and its component index.
.. function:: StronglyConnectedComponent(edges[from, to])
Computes the `strongly connected components <https://en.wikipedia.org/wiki/Strongly_connected_component>`_ of a graph with the provided edges.
:return: Pairs containing the node index, and its component index.
.. function:: SCC(...)
See :func:`Algo.StronglyConnectedComponent`.
.. function:: MinimumSpanningForestKruskal(edges[from, to, weight?])
Runs `Kruskal's algorithm <https://en.wikipedia.org/wiki/Kruskal%27s_algorithm>`_ on the provided edges to compute a `minimum spanning forest <https://en.wikipedia.org/wiki/Minimum_spanning_tree>`_. Negative weights are fine.
:return: Triples containing the from-node, the to-node, and the cost from the tree root to the to-node. Which nodes are chosen to be the roots are non-deterministic. Multiple roots imply the graph is disconnected.
.. function:: MinimumSpanningTreePrim(edges[from, to, weight?], starting?[idx])
Runs `Prim's algorithm <https://en.wikipedia.org/wiki/Prim%27s_algorithm>`_ on the provided edges to compute a `minimum spanning tree <https://en.wikipedia.org/wiki/Minimum_spanning_tree>`_. ``starting`` should be a relation producing exactly one node index as the starting node. Only the connected component of the starting node is returned. If ``starting`` is omitted, which component is returned is arbitrary.
:return: Triples containing the from-node, the to-node, and the cost from the tree root to the to-node.
.. function:: TopSort(edges[from, to])
Performs `topological sorting <https://en.wikipedia.org/wiki/Topological_sorting>`_ on the graph with the provided edges. The graph is required to be connected in the first place.
:return: Pairs containing the sort order and the node index.
------------------
Pathfinding
------------------
.. function:: ShortestPathDijkstra(edges[from, to, weight?], starting[idx], goals[idx], undirected: false, keep_ties: false)
Runs `Dijkstra's algorithm <https://en.wikipedia.org/wiki/Dijkstra%27s_algorithm>`_ to determine the shortest paths between the ``starting`` nodes and the ``goals``. Weights, if given, must be non-negative.
:param undirected: Whether the graph should be interpreted as undirected. Defaults to ``false``.
:param keep_ties: Whether to return all paths with the same lowest cost. Defaults to ``false``, in which any one path of the lowest cost could be returned.
:return: 4-tuples containing the starting node, the goal, the lowest cost, and a path with the lowest cost.
.. function:: KShortestPathYen(edges[from, to, weight?], starting[idx], goals[idx], k: expr, undirected: false)
Runs `Yen's algorithm <https://en.wikipedia.org/wiki/Yen%27s_algorithm>`_ (backed by Dijkstra's algorithm) to find the k-shortest paths between nodes in ``starting`` and nodes in ``goals``.
:param required k: How many routes to return for each start-goal pair.
:param undirected: Whether the graph should be interpreted as undirected. Defaults to ``false``.
:return: 4-tuples containing the starting node, the goal, the cost, and a path with the cost.
.. function:: BreadthFirstSearch(edges[from, to], nodes[idx, ...], starting?[idx], condition: expr, limit: 1)
Runs breadth first search on the directed graph with the given edges and nodes, starting at the nodes in ``starting``. If ``starting`` is not given, it will default to all of ``nodes``, which may be quite a lot to calculate.
:param required condition: The stopping condition, will be evaluated with the bindings given to ``nodes``. Should evaluate to a boolean, with ``true`` indicating an acceptable answer was found.
:param limit: How many answers to produce for each starting nodes. Defaults to 1.
:return: Triples containing the starting node, the answer node, and the found path connecting them.
.. function:: BFS(...)
See :func:`Algo.BreadthFirstSearch`.
.. function:: DepthFirstSearch(edges[from, to], nodes[idx, ...], starting?[idx], condition: expr, limit: 1)
Runs depth first search on the directed graph with the given edges and nodes, starting at the nodes in ``starting``. If ``starting`` is not given, it will default to all of ``nodes``, which may be quite a lot to calculate.
:param required condition: The stopping condition, will be evaluated with the bindings given to ``nodes``. Should evaluate to a boolean, with ``true`` indicating an acceptable answer was found.
:param limit: How many answers to produce for each starting nodes. Defaults to 1.
:return: Triples containing the starting node, the answer node, and the found path connecting them.
.. TIP::
You probably don't want to use depth first search for path finding unless you have a really niche use case.
.. function:: DFS(...)
See :func:`Algo.DepthFirstSearch`.
.. function:: ShortestPathAStar(edges[from, to, weight], nodes[idx, ...], starting[idx], goals[idx], heuristic: expr)
Computes the shortest path from every node in ``starting`` to every node in ``goals`` by the `A\* algorithm <https://en.wikipedia.org/wiki/A*_search_algorithm>`_.
``edges`` are interpreted as directed, weighted edges with non-negative weights.
:param required heuristic: The search heuristic expression. It will be evaluated with the bindings from ``goals`` and ``nodes``. It should return a number which is a lower bound of the true shortest distance from a node to the goal node. If the estimate is not a valid lower-bound, i.e. it over-estimates, the results returned may not be correct.
:return: 4-tuples containing the starting node index, the goal node index, the lowest cost, and a path with the lowest cost.
.. TIP::
The performance of A\* star algorithm heavily depends on how good your heuristic function is. Passing in ``0`` as the estimate is always valid, but then you really should be using Dijkstra's algorithm.
Good heuristics usually come about from a metric in the ambient space in which your data lives, e.g. spherical distance on the surface of a sphere, or Manhattan distance on a grid. :func:`Func.Math.haversine_deg_input` could be helpful for the spherical case. Note that you must use the correct units for the distance.
Providing a heuristic that is not guaranteed to be a lower-bound *might* be acceptable if you are fine with inaccuracies. The errors in the answers are bound by the sum of the margins of your over-estimates.
-------------------
Community detection
-------------------
.. function:: ClusteringCoefficients(edges[from, to, weight?])
Computes the `clustering coefficients <https://en.wikipedia.org/wiki/Clustering_coefficient>`_ of the graph with the provided edges.
:return: 4-tuples containing the node index, the clustering coefficient, the number of triangles attached to the node, and the total degree of the node.
.. function:: CommunityDetectionLouvain(edges[from, to, weight?], undirected: false, max_iter: 10, delta: 0.0001, keep_depth?: depth)
Runs the `Louvain algorithm <https://en.wikipedia.org/wiki/Louvain_method>`_ on the graph with the provided edges, optionally non-negatively weighted.
:param undirected: Whether the graph should be interpreted as undirected. Defaults to ``false``.
:param max_iter: The maximum number of iterations to run within each epoch of the algorithm. Defaults to 10.
:param delta: How much the `modularity <https://en.wikipedia.org/wiki/Modularity_(networks)>` has to change before a step in the algorithm is considered to be an improvement.
:param keep_depth: How many levels in the hierarchy of communities to keep in the final result. If omitted, all levels are kept.
:return: Pairs containing the label for a community, and a node index belonging to the community. Each label is a list of integers with maximum length constrained by the parameter ``keep_depth``. This list represents the hierarchy of sub-communities containing the list.
.. function:: LabelPropagation(edges[from, to, weight?], undirected: false, max_iter: 10)
Runs the `label propagation algorithm <https://en.wikipedia.org/wiki/Label_propagation_algorithm>`_ on the graph with the provided edges, optionally weighted.
:param undirected: Whether the graph should be interpreted as undirected. Defaults to ``false``.
:param max_iter: The maximum number of iterations to run. Defaults to 10.
:return: Pairs containing the integer label for a community, and a node index belonging to the community.
-------------------
Centrality measures
-------------------
.. function:: DegreeCentrality(edges[from, to])
Computes the degree centrality of the nodes in the graph with the given edges. The computation is trivial, so this should be your first thing to try when exploring new data.
:return: 4-tuples containing the node index, the total degree (how many edges involve this node), the out-degree (how many edges point away from this node), and the in-degree (how many edges point to this node).
.. function:: PageRank(edges[from, to, weight?], undirected: false, theta: 0.8, epsilon: 0.05, iterations: 20)
Computes the `PageRank <https://en.wikipedia.org/wiki/PageRank>`_ from the given graph with the provided edges, optionally weighted.
:param undirected: Whether the graph should be interpreted as undirected. Defaults to ``false``.
:param theta: A number between 0 and 1 indicating how much weight in the PageRank matrix is due to the explicit edges. A number of 1 indicates no random restarts. Defaults to 0.8.
:param epsilon: Minimum PageRank change in any node for an iteration to be considered an improvement. Defaults to 0.05.
:param iterations: How many iterations to run. Fewer iterations are run if convergence is reached. Defaults to 20.
:return: Pairs containing the node label and its PageRank. For a graph with uniform edges, the PageRank of every node is 1. The `L2-norm <https://en.wikipedia.org/wiki/Norm_(mathematics)>`_ of the results is forced to be invariant, i.e. in the results those nodes with a PageRank greater than 1 is "more central" than the average node in a certain sense.
.. function:: ClosenessCentrality(edges[from, to, weight?], undirected: false)
Computes the `closeness centrality <https://en.wikipedia.org/wiki/Closeness_centrality>`_ of the graph. The input relation represent edges connecting node indices which are optionally weighted.
:param undirected: Whether the edges should be interpreted as undirected. Defaults to ``false``.
:return: Node index together with its centrality.
.. function:: BetweennessCentrality(edges[from, to, weight?], undirected: false)
Computes the `betweenness centrality <https://en.wikipedia.org/wiki/Betweenness_centrality>`_ of the graph. The input relation represent edges connecting node indices which are optionally weighted.
:param undirected: Whether the edges should be interpreted as undirected. Defaults to ``false``.
:return: Node index together with its centrality.
.. WARNING::
``BetweennessCentrality`` is very expansive to compute. Plan resources accordingly.
------------------
Miscellaneous
------------------
.. function:: RandomWalk(edges[from, to, ...], nodes[idx, ...], starting[idx], steps: 10, weight?: expr, iterations: 1)
Performs random walk on the graph with the provided edges and nodes, starting at the nodes in ``starting``.
:param required steps: How many steps to walk for each node in ``starting``. Produced paths may be shorter if dead ends are reached.
:param weight: An expression evaluated against bindings of ``nodes`` and bindings of ``edges``, at a time when the walk is at a node and choosing between multiple edges to follow. It should evaluate to a non-negative number indicating the weight of the given choice of edge to follow. If omitted, which edge to follow is chosen uniformly.
:param iterations: How many times walking is repeated for each starting node.
:return: Triples containing a numerical index for the walk, the starting node, and the path followed.
.. function:: ReorderSort(rel[...], out: [...], sort_by: [...], descending: false, break_ties: false, skip: 0, take: 0)
Sort and then extract new columns of the passed in relation ``rel``.

@ -8,7 +8,7 @@ Functions can be used in expressions in Cozo. All function arguments in Cozo are
Equality and Comparisons
------------------------
.. module:: Function.EqCmp
.. module:: Func.EqCmp
.. function:: eq(x, y)
@ -55,7 +55,7 @@ Equality and Comparisons
Boolean functions
------------------------
.. module:: Function.Bool
.. module:: Func.Bool
.. function:: and(...)
@ -77,7 +77,7 @@ Boolean functions
Mathematics
------------------------
.. module:: Function.Math
.. module:: Func.Math
.. function:: add(...)
@ -149,15 +149,15 @@ Mathematics
.. function:: sin(x)
The sine trigonometric function.
The sine trigonometric Func.
.. function:: cos(x)
The cosine trigonometric function.
The cosine trigonometric Func.
.. function:: tan(x)
The tangent trigonometric function.
The tangent trigonometric Func.
.. function:: asin(x)
@ -173,7 +173,7 @@ Mathematics
.. function:: atan2(x, y)
The inverse tangent but passing `x` and `y` separately, see https://en.wikipedia.org/wiki/Atan2.
The inverse tangent `atan2 <https://en.wikipedia.org/wiki/Atan2>`_ by passing `x` and `y` separately.
.. function:: sinh(x)
@ -209,7 +209,7 @@ Mathematics
.. function:: haversine(a_lat, a_lon, b_lat, b_lon)
Returns the angle measured in radians between two points ``a`` and ``b`` on a sphere specified by their latitudes and longitudes. The inputs are in radians. You probably want the next function since most maps measure angles in radians. See https://en.wikipedia.org/wiki/Haversine_formula.
Computes with the `Haversine formula <https://en.wikipedia.org/wiki/Haversine_formula>`_ the angle measured in radians between two points ``a`` and ``b`` on a sphere specified by their latitudes and longitudes. The inputs are in radians. You probably want the next function since most maps measure angles in radians.
.. function:: haversine_deg_input(a_lat, a_lon, b_lat, b_lon)
@ -220,7 +220,7 @@ Mathematics
String functions
------------------------
.. module:: Function.String
.. module:: Func.String
.. function:: length(str)
@ -254,15 +254,15 @@ String functions
.. function:: trim(x)
Removes whitespace from both ends of the string. "Whitespace" is defined as in https://en.wikipedia.org/wiki/Whitespace_character.
Removes `whitespace <https://en.wikipedia.org/wiki/Whitespace_character>`_ from both ends of the string.
.. function:: trim_start(x)
Removes whitespace from the start of the string.
Removes `whitespace <https://en.wikipedia.org/wiki/Whitespace_character>`_ from the start of the string.
.. function:: trim_end(x)
Removes whitespace from the end of the string.
Removes `whitespace <https://en.wikipedia.org/wiki/Whitespace_character>`_ from the end of the string.
.. function:: starts_with(x, y)
@ -278,7 +278,7 @@ String functions
.. function:: unicode_normalize(str, norm)
Converts ``str`` to the normalization specified by ``norm``. The valid values of ``norm`` are ``'nfc'``, ``'nfd'``, ``'nfkc'`` and ``'nfkd'``. See https://en.wikipedia.org/wiki/Unicode_equivalence.
Converts ``str`` to the `normalization <https://en.wikipedia.org/wiki/Unicode_equivalence>`_ specified by ``norm``. The valid values of ``norm`` are ``'nfc'``, ``'nfd'``, ``'nfkc'`` and ``'nfkd'``.
.. function:: chars(str)
@ -297,7 +297,7 @@ String functions
List functions
--------------------------
.. module:: Function.List
.. module:: Func.List
.. function:: list(x, ...)
@ -389,7 +389,7 @@ List functions
Binary functions
----------------
.. module:: Function.Bin
.. module:: Func.Bin
.. function:: length(bytes)
@ -423,21 +423,21 @@ Binary functions
.. function:: encode_base64(b)
Encodes the byte array ``b`` into the Base64 encoded string. See https://en.wikipedia.org/wiki/Base64.
Encodes the byte array ``b`` into the `Base64 <https://en.wikipedia.org/wiki/Base64>`_-encoded string.
.. NOTE::
``encode_base64`` is automatically applied when output to JSON since JSON cannot represent bytes natively.
.. function:: decode_base64(str)
Tries to decode the ``str`` as a Base64-encoded byte array.
Tries to decode the ``str`` as a `Base64 <https://en.wikipedia.org/wiki/Base64>`_-encoded byte array.
--------------------------------
Type checking and conversions
--------------------------------
.. module:: Function.Typing
.. module:: Func.Typing
.. function:: to_string(x)
@ -498,7 +498,7 @@ Type checking and conversions
Random functions
-----------------
.. module:: Function.Rand
.. module:: Func.Rand
.. function:: rand_float()
@ -521,7 +521,7 @@ Random functions
Regex functions
------------------
.. module:: Function.Regex
.. module:: Func.Regex
.. function:: regex_matches(x, reg)

@ -84,14 +84,14 @@ impl AlgoImpl for Bfs {
for (starting, ending) in found {
let mut route = vec![];
let mut current = ending;
let mut current = ending.clone();
while current != starting {
route.push(current.clone());
current = backtrace.get(&current).unwrap().clone();
}
route.push(starting);
route.push(starting.clone());
route.reverse();
let tuple = Tuple(route);
let tuple = Tuple(vec![starting, ending, DataValue::List(route)]);
out.put(tuple, 0);
}
Ok(())

@ -86,14 +86,14 @@ impl AlgoImpl for Dfs {
for (starting, ending) in found {
let mut route = vec![];
let mut current = ending;
let mut current = ending.clone();
while current != starting {
route.push(current.clone());
current = backtrace.get(&current).unwrap().clone();
}
route.push(starting);
route.push(starting.clone());
route.reverse();
let tuple = Tuple(route);
let tuple = Tuple(vec![starting, ending, DataValue::List(route)]);
out.put(tuple, 0);
poison.check()?;
}

@ -14,9 +14,9 @@ use crate::runtime::db::Poison;
use crate::runtime::derived::DerivedRelStore;
use crate::runtime::transact::SessionTx;
pub(crate) struct MinimumSpanningTreeKruskal;
pub(crate) struct MinimumSpanningForestKruskal;
impl AlgoImpl for MinimumSpanningTreeKruskal {
impl AlgoImpl for MinimumSpanningForestKruskal {
fn run(
&mut self,
tx: &SessionTx,

@ -11,7 +11,7 @@ use crate::algo::astar::ShortestPathAStar;
use crate::algo::bfs::Bfs;
use crate::algo::degree_centrality::DegreeCentrality;
use crate::algo::dfs::Dfs;
use crate::algo::kruskal::MinimumSpanningTreeKruskal;
use crate::algo::kruskal::MinimumSpanningForestKruskal;
use crate::algo::label_propagation::LabelPropagation;
use crate::algo::louvain::CommunityDetectionLouvain;
use crate::algo::pagerank::PageRank;
@ -126,7 +126,7 @@ impl AlgoHandle {
"ShortestPathAStar" => Box::new(ShortestPathAStar),
"KShortestPathYen" => Box::new(KShortestPathYen),
"MinimumSpanningTreePrim" => Box::new(MinimumSpanningTreePrim),
"MinimumSpanningTreeKruskal" => Box::new(MinimumSpanningTreeKruskal),
"MinimumSpanningForestKruskal" => Box::new(MinimumSpanningForestKruskal),
"TopSort" => Box::new(TopSort),
"ConnectedComponents" => Box::new(StronglyConnectedComponent::new(false)),
"StronglyConnectedComponents" | "SCC" => {

@ -27,7 +27,7 @@ impl AlgoImpl for PageRank {
let edges = algo.relation(0)?;
let undirected = algo.bool_option("undirected", Some(false))?;
let theta = algo.unit_interval_option("theta", Some(0.8))? as f32;
let epsilon = algo.unit_interval_option("epsilon", Some(0.8))? as f32;
let epsilon = algo.unit_interval_option("epsilon", Some(0.05))? as f32;
let iterations = algo.pos_integer_option("iterations", Some(20))?;
let (graph, indices, _) = edges.convert_edge_to_graph(undirected, tx, stores)?;
let res = pagerank(&graph, theta, epsilon, iterations, poison)?;

@ -1,14 +1,17 @@
use std::cmp::Reverse;
use std::collections::BTreeMap;
use miette::Diagnostic;
use miette::Result;
use ordered_float::OrderedFloat;
use priority_queue::PriorityQueue;
use thiserror::Error;
use crate::algo::AlgoImpl;
use crate::data::program::{MagicAlgoApply, MagicSymbol};
use crate::data::tuple::Tuple;
use crate::data::value::DataValue;
use crate::parse::SourceSpan;
use crate::runtime::db::Poison;
use crate::runtime::derived::DerivedRelStore;
use crate::runtime::transact::SessionTx;
@ -25,12 +28,34 @@ impl AlgoImpl for MinimumSpanningTreePrim {
poison: Poison,
) -> Result<()> {
let edges = algo.relation(0)?;
let (graph, indices, _, _) =
let (graph, indices, inv_indices, _) =
edges.convert_edge_to_weighted_graph(true, true, tx, stores)?;
if graph.is_empty() {
return Ok(());
}
let msp = prim(&graph, poison)?;
let starting = match algo.relation(1) {
Err(_) => 0,
Ok(rel) => {
let tuple = rel.iter(tx, stores)?.next().ok_or_else(|| {
#[derive(Debug, Error, Diagnostic)]
#[error("The provided starting nodes relation is empty")]
#[diagnostic(code(algo::empty_starting))]
struct EmptyStarting(#[label] SourceSpan);
EmptyStarting(rel.span())
})??;
let dv = &tuple.0[0];
*inv_indices.get(dv).ok_or_else(|| {
#[derive(Debug, Error, Diagnostic)]
#[error("The requested starting node {0:?} is not found")]
#[diagnostic(code(algo::starting_node_not_found))]
struct StartingNodeNotFound(DataValue, #[label] SourceSpan);
StartingNodeNotFound(dv.clone(), rel.span())
})?
}
};
let msp = prim(&graph, starting, poison)?;
for (src, dst, cost) in msp {
out.put(
Tuple(vec![
@ -45,7 +70,11 @@ impl AlgoImpl for MinimumSpanningTreePrim {
}
}
fn prim(graph: &[Vec<(usize, f64)>], poison: Poison) -> Result<Vec<(usize, usize, f64)>> {
fn prim(
graph: &[Vec<(usize, f64)>],
starting: usize,
poison: Poison,
) -> Result<Vec<(usize, usize, f64)>> {
let mut visited = vec![false; graph.len()];
let mut mst_edges = Vec::with_capacity(graph.len() - 1);
let mut pq = PriorityQueue::new();
@ -61,7 +90,7 @@ fn prim(graph: &[Vec<(usize, f64)>], poison: Poison) -> Result<Vec<(usize, usize
}
};
relax_edges_at_node(0, &mut pq);
relax_edges_at_node(starting, &mut pq);
while let Some((to_node, (Reverse(OrderedFloat(cost)), from_node))) = pq.pop() {
if mst_edges.len() == graph.len() - 1 {

@ -28,7 +28,7 @@ impl AlgoImpl for RandomWalk {
let nodes = algo.relation(1)?;
let starting = algo.relation(2)?;
let iterations = algo.pos_integer_option("iterations", Some(1))?;
let steps = algo.pos_integer_option("steps", Some(1))?;
let steps = algo.pos_integer_option("steps", None)?;
let mut maybe_weight = algo.expr_option("weight", None).ok();
if let Some(weight) = &mut maybe_weight {

Loading…
Cancel
Save