diff --git a/IN_PROG.md b/IN_PROG.md deleted file mode 100644 index 72f6a7f7..00000000 --- a/IN_PROG.md +++ /dev/null @@ -1,200 +0,0 @@ -[![GitHub](https://img.shields.io/github/license/cozodb/cozo)](https://github.com/cozodb/cozo/blob/main/LICENSE.txt) -[![GitHub Workflow Status](https://img.shields.io/github/workflow/status/cozodb/cozo/Build)](https://github.com/cozodb/cozo/actions/workflows/build.yml) -[![Discord](https://img.shields.io/discord/1034464550927478886)](https://discord.gg/GFpjQ9m866) - -# `cozo` - -A general-purpose, transactional, relational database -that uses Datalog for query and focuses on graph data. - -## Features - -* Relational database with [Datalog](https://en.wikipedia.org/wiki/Datalog) as the query language -* Recursive queries, especially recursion through (safe) aggregation, capable of expressing complex graph operations and algorithms -* Fixed rules providing efficient whole-graph algorithms which integrate seamlessly with Datalog -* Rich set of built-in functions and aggregations -* Only a single executable, trivial to deploy and run -* Easy to use from any programming language -* Special support for [Jupyter](https://jupyter.org/) notebooks for integration with the Python DataScience ecosystem -* Modern, clean, flexible syntax, informative error messages - -## Teasers - -Here `*route` is a relation with two columns `src` and `dst`, -representing a route between those airports. - -Find airports reachable by one stop from Frankfurt Airport (code `FRA`): - -```js -?[dst] := *route{src: 'FRA', dst: stop}, - *route{src: stop, dst} -``` - -Find airports reachable from Frankfurt with any number of stops -with code starting with the letter `A`: - -```js -reachable[dst] := *route{src: 'FRA', dst} -reachable[dst] := reachable[src], *route{src, dst} -?[airport] := reachable[airport], starts_with(airport, 'A') -``` - -Compute the shortest path between Frankfurt and all airports in the world: - -```js -shortest_paths[dst, shortest(path)] := *route{src: 'FRA', dst}, - path = ['FRA', dst] -shortest_paths[dst, shortest(path)] := shortest_paths[stop, prev_path], - *route{src: stop, dst}, - path = append(prev_path, dst) -?[dst, path] := shortest_paths[dst, path] -``` - -Compute the shortest path again, but with built-in algorithm: - -```js -starting[airport] := airport = 'FRA' -?[src, dst, cost, path] <~ ShortestPathDijkstra(*route[], starting[]) -``` - -## Learning Cozo - -* Start with the [Tutorial](https://cozodb.github.io/current/tutorial.html) to learn the basics; -* Continue with the [Manual](https://cozodb.github.io/current/manual/) to understand the fine points. - -## Bug reports, discussions - -If you encounter a bug, first search for [past issues](https://github.com/cozodb/cozo/issues) to see -if it has already been reported. If not, open a new issue. -Please provide sufficient information so that we can diagnose the problem faster. - -Other discussions about Cozo should be in [GitHub discussions](https://github.com/cozodb/cozo/discussions). -We also have a [Discord channel](https://discord.gg/GFpjQ9m866). - -## Use cases - -As Cozo is a general-purpose database, -it can be used in situations -where traditional databases such as PostgreSQL and SQLite -are used. -However, Cozo is designed to overcome several shortcomings -of traditional databases, and hence fares especially well -in specific situations: - -* You have a lot of interconnected relations - and the usual queries need to relate many relations together. - In other words, you need to query a complex graph. - * An example is a system granting permissions to users for specific tasks. - In this case, users may have roles, - belong to an organization hierarchy, and tasks similarly have organizations - and special provisions associated with them. - The granting process itself may also be a complicated rule encoded as data - within the database. - * With a traditional database, - the corresponding SQL tend to become - an entangled web of nested queries, with many tables joined together, - and maybe even with some recursive CTE thrown in. This is hard to maintain, - and worse, the performance is unpredictable since query optimizers in general - fail when you have over twenty tables joined together. - * With Cozo, on the other hand, [Horn clauses](https://en.wikipedia.org/wiki/Horn_clause) - make it easy to break - the logic into smaller pieces and write clear, easily testable queries. - Furthermore, the deterministic evaluation order makes identifying and solving - performance problems easier. -* Your data may be simple, even a single table, but it is inherently a graph. - * We have seen an example in the [Tutorial](https://cozodb.github.io/current/tutorial.html): - the air route dataset, where the key relation contains the routes connecting airports. - * In traditional databases, when you are given a new relation, - you try to understand it by running aggregations on it to collect statistics: - what is the distribution of values, how are the columns correlated, etc. - * In Cozo you can do the same exploratory analysis, - except now you also have graph algorithms that you can - easily apply to understand things such as: what is the most _connected_ entity, - how are the nodes connected, and what are the _communities_ structure within the nodes. -* Your data contains hidden structures that only become apparent when you - identify the _scales_ of the relevant structures. - * Examples are most real networks, such as social networks, - which have a very rich hierarchy of structures - * In a traditional database, you are limited to doing nested aggregations and filtering, - i.e. a form of multifaceted data analysis. For example, you can analyze by gender, geography, - job or combinations of them. For structures hidden in other ways, - or if such categorizing tags are not already present in your data, - you are out of luck. - * With Cozo, you can now deal with emergent and fuzzy structures by using e.g. - community detection algorithms, and collapse the original graph into a coarse-grained - graph consisting of super-nodes and super-edges. - The process can be iterated to gain insights into even higher-order emergent structures. - This is possible in a social network with only edges and _no_ categorizing tags - associated with nodes at all, - and the discovered structures almost always have meanings correlated to real-world events and - organizations, for example, forms of collusion and crime rings. - Also, from a performance perspective, - coarse-graining is a required step in analyzing the so-called big data, - since many graph algorithms have high complexity and are only applicable to - the coarse-grained small or medium networks. -* You want to understand your live business data better by augmenting it into a _knowledge graph_. - * For example, your sales database contains product, buyer, inventory, and invoice tables. - The augmentation is external data about the entities in your data in the form of _taxonomies_ - and _ontologies_ in layers. - * This is inherently a graph-theoretic undertaking and traditional databases are not suitable. - Usually, a dedicated graph processing engine is used, separate from the main database. - * With Cozo, it is possible to keep your live data and knowledge graph analysis together, - and importing new external data and doing analysis is just a few lines of code away. - This ease of use means that you will do the analysis much more often, with a perhaps much wider scope. - -## Status of the project - -Cozo is very young and **not** production-ready yet, -but we encourage you to try it out for your use case. -Any feedback is welcome. - -Versions before 1.0 do not promise syntax/API stability or storage compatibility. -We promise that when you try to open database files created with an incompatible version, -Cozo will at least refuse to start instead of silently corrupting your data. - -## Plans for development - -In the near term, before we reach version 1.0: - -* Backup/restore functionality -* Many, many more tests to ensure correctness -* Benchmarks - -Further down the road: - -* More tuning options -* Streaming/reactive data -* Extension system - * The core of Cozo should be kept small at all times. Additional functionalities should be in extensions for the user to choose from. - * What can be extended: datatypes, functions, aggregations, and fixed algorithms. - * Extensions should be written in a compiled language such as Rust or C++ and compiled into a dynamic library, to be loaded by Cozo at runtime. - * There will probably be a few "official" extension bundles, such as - * arbitrary precision arithmetic - * full-text "indexing" and searching - * relations that can emulate spatial and other types of non-lexicographic indices - * reading from external databases directly - * more exotic graph algorithms - -Ideas and discussions are welcome. - -## Storage engine - -Cozo is written in Rust, with [RocksDB](http://rocksdb.org/) as the storage engine -(this may change in the future). -We manually wrote the C++/Rust bindings for RocksDB with [cxx](https://cxx.rs/). - -## Contributing - -Contributions to code or other materials -should be done via [pull requests](https://github.com/cozodb/cozo/pulls). - -For code contributions other than simple bug fixes, please -[discuss](https://github.com/cozodb/cozo/discussions) it -with the maintainer first before opening a pull request, -otherwise it is unlikely to be accepted. - -## Licensing - -The contents of this project are licensed under AGPL-3.0 or later, except: -* Files under `cozorocks/` are licensed under MIT, or Apache-2.0, or BSD-3-Clause; -* Files under `docs/` are licensed under CC BY-SA 4.0. \ No newline at end of file diff --git a/README.md b/README.md index 1a1aebed..72f6a7f7 100644 --- a/README.md +++ b/README.md @@ -1 +1,200 @@ -# WIP \ No newline at end of file +[![GitHub](https://img.shields.io/github/license/cozodb/cozo)](https://github.com/cozodb/cozo/blob/main/LICENSE.txt) +[![GitHub Workflow Status](https://img.shields.io/github/workflow/status/cozodb/cozo/Build)](https://github.com/cozodb/cozo/actions/workflows/build.yml) +[![Discord](https://img.shields.io/discord/1034464550927478886)](https://discord.gg/GFpjQ9m866) + +# `cozo` + +A general-purpose, transactional, relational database +that uses Datalog for query and focuses on graph data. + +## Features + +* Relational database with [Datalog](https://en.wikipedia.org/wiki/Datalog) as the query language +* Recursive queries, especially recursion through (safe) aggregation, capable of expressing complex graph operations and algorithms +* Fixed rules providing efficient whole-graph algorithms which integrate seamlessly with Datalog +* Rich set of built-in functions and aggregations +* Only a single executable, trivial to deploy and run +* Easy to use from any programming language +* Special support for [Jupyter](https://jupyter.org/) notebooks for integration with the Python DataScience ecosystem +* Modern, clean, flexible syntax, informative error messages + +## Teasers + +Here `*route` is a relation with two columns `src` and `dst`, +representing a route between those airports. + +Find airports reachable by one stop from Frankfurt Airport (code `FRA`): + +```js +?[dst] := *route{src: 'FRA', dst: stop}, + *route{src: stop, dst} +``` + +Find airports reachable from Frankfurt with any number of stops +with code starting with the letter `A`: + +```js +reachable[dst] := *route{src: 'FRA', dst} +reachable[dst] := reachable[src], *route{src, dst} +?[airport] := reachable[airport], starts_with(airport, 'A') +``` + +Compute the shortest path between Frankfurt and all airports in the world: + +```js +shortest_paths[dst, shortest(path)] := *route{src: 'FRA', dst}, + path = ['FRA', dst] +shortest_paths[dst, shortest(path)] := shortest_paths[stop, prev_path], + *route{src: stop, dst}, + path = append(prev_path, dst) +?[dst, path] := shortest_paths[dst, path] +``` + +Compute the shortest path again, but with built-in algorithm: + +```js +starting[airport] := airport = 'FRA' +?[src, dst, cost, path] <~ ShortestPathDijkstra(*route[], starting[]) +``` + +## Learning Cozo + +* Start with the [Tutorial](https://cozodb.github.io/current/tutorial.html) to learn the basics; +* Continue with the [Manual](https://cozodb.github.io/current/manual/) to understand the fine points. + +## Bug reports, discussions + +If you encounter a bug, first search for [past issues](https://github.com/cozodb/cozo/issues) to see +if it has already been reported. If not, open a new issue. +Please provide sufficient information so that we can diagnose the problem faster. + +Other discussions about Cozo should be in [GitHub discussions](https://github.com/cozodb/cozo/discussions). +We also have a [Discord channel](https://discord.gg/GFpjQ9m866). + +## Use cases + +As Cozo is a general-purpose database, +it can be used in situations +where traditional databases such as PostgreSQL and SQLite +are used. +However, Cozo is designed to overcome several shortcomings +of traditional databases, and hence fares especially well +in specific situations: + +* You have a lot of interconnected relations + and the usual queries need to relate many relations together. + In other words, you need to query a complex graph. + * An example is a system granting permissions to users for specific tasks. + In this case, users may have roles, + belong to an organization hierarchy, and tasks similarly have organizations + and special provisions associated with them. + The granting process itself may also be a complicated rule encoded as data + within the database. + * With a traditional database, + the corresponding SQL tend to become + an entangled web of nested queries, with many tables joined together, + and maybe even with some recursive CTE thrown in. This is hard to maintain, + and worse, the performance is unpredictable since query optimizers in general + fail when you have over twenty tables joined together. + * With Cozo, on the other hand, [Horn clauses](https://en.wikipedia.org/wiki/Horn_clause) + make it easy to break + the logic into smaller pieces and write clear, easily testable queries. + Furthermore, the deterministic evaluation order makes identifying and solving + performance problems easier. +* Your data may be simple, even a single table, but it is inherently a graph. + * We have seen an example in the [Tutorial](https://cozodb.github.io/current/tutorial.html): + the air route dataset, where the key relation contains the routes connecting airports. + * In traditional databases, when you are given a new relation, + you try to understand it by running aggregations on it to collect statistics: + what is the distribution of values, how are the columns correlated, etc. + * In Cozo you can do the same exploratory analysis, + except now you also have graph algorithms that you can + easily apply to understand things such as: what is the most _connected_ entity, + how are the nodes connected, and what are the _communities_ structure within the nodes. +* Your data contains hidden structures that only become apparent when you + identify the _scales_ of the relevant structures. + * Examples are most real networks, such as social networks, + which have a very rich hierarchy of structures + * In a traditional database, you are limited to doing nested aggregations and filtering, + i.e. a form of multifaceted data analysis. For example, you can analyze by gender, geography, + job or combinations of them. For structures hidden in other ways, + or if such categorizing tags are not already present in your data, + you are out of luck. + * With Cozo, you can now deal with emergent and fuzzy structures by using e.g. + community detection algorithms, and collapse the original graph into a coarse-grained + graph consisting of super-nodes and super-edges. + The process can be iterated to gain insights into even higher-order emergent structures. + This is possible in a social network with only edges and _no_ categorizing tags + associated with nodes at all, + and the discovered structures almost always have meanings correlated to real-world events and + organizations, for example, forms of collusion and crime rings. + Also, from a performance perspective, + coarse-graining is a required step in analyzing the so-called big data, + since many graph algorithms have high complexity and are only applicable to + the coarse-grained small or medium networks. +* You want to understand your live business data better by augmenting it into a _knowledge graph_. + * For example, your sales database contains product, buyer, inventory, and invoice tables. + The augmentation is external data about the entities in your data in the form of _taxonomies_ + and _ontologies_ in layers. + * This is inherently a graph-theoretic undertaking and traditional databases are not suitable. + Usually, a dedicated graph processing engine is used, separate from the main database. + * With Cozo, it is possible to keep your live data and knowledge graph analysis together, + and importing new external data and doing analysis is just a few lines of code away. + This ease of use means that you will do the analysis much more often, with a perhaps much wider scope. + +## Status of the project + +Cozo is very young and **not** production-ready yet, +but we encourage you to try it out for your use case. +Any feedback is welcome. + +Versions before 1.0 do not promise syntax/API stability or storage compatibility. +We promise that when you try to open database files created with an incompatible version, +Cozo will at least refuse to start instead of silently corrupting your data. + +## Plans for development + +In the near term, before we reach version 1.0: + +* Backup/restore functionality +* Many, many more tests to ensure correctness +* Benchmarks + +Further down the road: + +* More tuning options +* Streaming/reactive data +* Extension system + * The core of Cozo should be kept small at all times. Additional functionalities should be in extensions for the user to choose from. + * What can be extended: datatypes, functions, aggregations, and fixed algorithms. + * Extensions should be written in a compiled language such as Rust or C++ and compiled into a dynamic library, to be loaded by Cozo at runtime. + * There will probably be a few "official" extension bundles, such as + * arbitrary precision arithmetic + * full-text "indexing" and searching + * relations that can emulate spatial and other types of non-lexicographic indices + * reading from external databases directly + * more exotic graph algorithms + +Ideas and discussions are welcome. + +## Storage engine + +Cozo is written in Rust, with [RocksDB](http://rocksdb.org/) as the storage engine +(this may change in the future). +We manually wrote the C++/Rust bindings for RocksDB with [cxx](https://cxx.rs/). + +## Contributing + +Contributions to code or other materials +should be done via [pull requests](https://github.com/cozodb/cozo/pulls). + +For code contributions other than simple bug fixes, please +[discuss](https://github.com/cozodb/cozo/discussions) it +with the maintainer first before opening a pull request, +otherwise it is unlikely to be accepted. + +## Licensing + +The contents of this project are licensed under AGPL-3.0 or later, except: +* Files under `cozorocks/` are licensed under MIT, or Apache-2.0, or BSD-3-Clause; +* Files under `docs/` are licensed under CC BY-SA 4.0. \ No newline at end of file