{ "cells": [ { "cell_type": "markdown", "id": "3df9ce6e-3722-4334-abf9-703d62c27eab", "metadata": {}, "source": [ "# The Cozo database tutorial" ] }, { "cell_type": "markdown", "id": "63c695dc-6e85-42bb-b39b-1f0fd14ed5eb", "metadata": {}, "source": [ "This tutorial will teach you the basics of using the Cozo database with the query language CozoScript.\n", "There are no database-specific prerequisites, \n", "though it would be helpful if you already know some other databases, \n", "especially SQL databases." ] }, { "cell_type": "markdown", "id": "c2bfcfe4-c06c-4b02-a040-6fd840b77820", "metadata": {}, "source": [ "## Setup" ] }, { "cell_type": "markdown", "id": "eb1fefcf-81b5-4ccf-8179-631fdb011bae", "metadata": {}, "source": [ "The best way to learn from this tutorial is to run the queries as they are introduced.\n", "For this, you need to install the Cozo database on your local machine." ] }, { "cell_type": "markdown", "id": "7f26f0f5-a3ff-4145-a9d5-84c65e368afa", "metadata": {}, "source": [ "Cozo is distributed as a single gzipped/zipped binary. \n", "Go to https://github.com/cozodb/cozo/releases and download the latest release binary for your operating system to a local directory.\n", "\n", "If your operating system is Linux/Mac, \n", "open a terminal command prompt, \n", "`cd` into the directory of your download,\n", "and run" ] }, { "cell_type": "markdown", "id": "341d8539-d6e1-4610-bd1e-6c29568573a8", "metadata": {}, "source": [ "```bash\n", "gunzip cozoserver-*.gz\n", "mv cozoserver-* cozoserver\n", "chmod +x cozoserver\n", "```" ] }, { "cell_type": "markdown", "id": "536fb265-e25f-469d-843e-935b272c7298", "metadata": {}, "source": [ "If you are on Windows instead, open a PowerShell and run" ] }, { "cell_type": "markdown", "id": "9809d37e-b1b3-4419-8c19-5fe7bc8994a6", "metadata": {}, "source": [ "```powershell\n", "Expand-Archive -Path .\\cozoserver-*.zip -DestinationPath .\n", "```" ] }, { "cell_type": "markdown", "id": "12da9726-3c18-4011-98ea-e2aae105ee08", "metadata": {}, "source": [ "To run the server, you need to specify a directory to store persistent data on your file system. \n", "In the following, we will use a directory called `tutorial-data` in the same directory as the binary executable.\n", "In the terminal, run" ] }, { "cell_type": "markdown", "id": "59c6fdfc-1a3c-4da3-8138-6c653592995f", "metadata": {}, "source": [ "```bash\n", "./cozoserver ./tutorial-data\n", "```" ] }, { "cell_type": "markdown", "id": "41820cf5-a800-4c81-8a17-227e9692aae2", "metadata": {}, "source": [ "The same command should work in PowerShell as well.\n", "\n", "If you see something like `Database web API running at ...` displayed in your terminal, \n", "then the server is successfully started. \n", "Keep the server running when you are following the tutorial.\n", "When you are done, `CTRL-C` in the terminal will stop the server.\n", "You can restart the server again by running the command again.\n", "\n", "More options when starting the server are available. Run" ] }, { "cell_type": "markdown", "id": "e975d0a6-0851-4eed-9a24-d6b3d55fa09f", "metadata": {}, "source": [ "```bash\n", "./cozoserver -h\n", "```" ] }, { "cell_type": "markdown", "id": "f192cf0b-a30d-46ad-85a1-d81f7590b78b", "metadata": {}, "source": [ "for more details." ] }, { "cell_type": "markdown", "id": "2ec06d2e-1187-48c8-8777-f84b4c71c741", "metadata": {}, "source": [ "## A place to run queries\n", "\n", "Cozo exposes an HTTP API, so theoretically you can follow along using tools like `curl`. \n", "If you are interested, consult the [manual](https://cozodb.github.io/current/manual/setup.html#the-query-api) for the request format the API expects.\n", "For a better user experience, we suggest following one of the following two subsections instead." ] }, { "cell_type": "markdown", "id": "c4e8b790-7c0f-4538-b20e-2c3fb4065dc8", "metadata": {}, "source": [ "### Option 1: the JupyterLab notebook\n", "\n", "This option provides the best user experience but also requires you to install quite a lot of things, \n", "though you may already have them installed on your computer if you use the python data science stack.\n", "First, you will need python installed. \n", "Then install JupyterLab by following the instruction at https://jupyter.org/install.\n", "Next, run the following to install a Jupyter extension to help query Cozo:" ] }, { "cell_type": "markdown", "id": "37b47a25-3708-490d-ba08-975771dc469d", "metadata": {}, "source": [ "```bash\n", "pip install pycozo pandas\n", "```" ] }, { "cell_type": "markdown", "id": "9c07f306-9abc-4dd2-b78c-5b46a823982e", "metadata": {}, "source": [ "While you are at it, right-click on https://raw.githubusercontent.com/cozodb/cozo/main/docs/tutorial/tutorial.ipynb \n", "and save the notebook of this tutorial to your disk.\n", "\n", "Then run Jupyter Lab, open the saved tutorial document, and follow along." ] }, { "cell_type": "markdown", "id": "5154e084-d997-4f22-97fb-ce5808fb1da3", "metadata": {}, "source": [ "We need to enable the extension in the notebook. Run" ] }, { "cell_type": "code", "execution_count": 1, "id": "2e0d11d7-64d7-4577-a28d-7541ee8875fa", "metadata": {}, "outputs": [], "source": [ "%load_ext pycozo.ipyext_direct" ] }, { "cell_type": "markdown", "id": "c3f0d80d-9854-4a21-9121-3220e7bcc73b", "metadata": {}, "source": [ "Then the \"hello world\" query:" ] }, { "cell_type": "code", "execution_count": 2, "id": "f3dfb8a1-35f0-4dc2-b8d7-e81fa3d45b75", "metadata": {}, "outputs": [ { "data": { "text/html": [ "

Completed in 2ms

" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
 012
0helloworldCozo!
\n" ], "text/plain": [ "" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "?[] <- [['hello', 'world', 'Cozo!']]" ] }, { "cell_type": "markdown", "id": "7e6dda2a-393a-4d87-8ba1-41c5d0b229d5", "metadata": {}, "source": [ "If you get the same words back formatted in a table, congratulations! \n", "You can skip to the next section where we start learning CozoScript proper.\n", "If you want to know more about what the `pycozo` extension did and more tricks that you can do with the extension, read the [manual](https://cozodb.github.io/current/manual/setup.html#jupyterlab)." ] }, { "cell_type": "markdown", "id": "ac86f14f-72c7-4206-9e6b-6daf9e77b67a", "metadata": {}, "source": [ "### Option 2: the JavaScript console in your browser" ] }, { "cell_type": "markdown", "id": "8ddc55d2-604b-4218-af45-8c62d79db201", "metadata": {}, "source": [ "If you have never used Python before, the first option may be overwhelming. \n", "Or you just want to try Cozo out first to decide quickly if it has anything interesting for you.\n", "Whatever your reason for not wanting to install the whole python toolchain, we have you covered." ] }, { "cell_type": "markdown", "id": "1147466d-d925-4c32-8b6c-28bdabc656fb", "metadata": {}, "source": [ "Your local machine at least has a modern web browser, like a recent version of Firefox, Chrome, or Edge, right?\n", "Good. \n", "\n", "Use your browser to navigate to http://127.0.0.1:9070 (the address shown in your terminal when you run `cozoserver`).\n", "You should be greeted by a page saying that the server is running.\n", "Now open the developer tools of your browser by right-clicking the page and selecting \"Inspect\" from the menu \n", "(if you cannot find it, you may need to fiddle with your browser settings to enable the developer tools).\n", "Switch to the \"Console\" tab of the developer tools if it is not already open. \n", "\n", "If you see some messages where \n", "the \"Cozo Makeshift Javascript Console\" welcomes you, you are ready. Run the \"hello world\" query by typing the following into the console and pressing enter:" ] }, { "cell_type": "markdown", "id": "97547c5d-e2b4-4205-ac10-b2aeb5855782", "metadata": {}, "source": [ "```javascript\n", "await run(`?[] <- [['hello', 'world', 'Cozo!']]`)\n", "```" ] }, { "cell_type": "markdown", "id": "81fe58e2-3c52-4859-828d-533b25a62468", "metadata": {}, "source": [ "If you see the three words echoed back in a table, you are successful. When following the tutorial, you have to wrap all queries within the backticks `` in the above command to run them in the JavaScript console." ] }, { "cell_type": "markdown", "id": "545ed77c-eeb9-4008-972e-0bdc1d6b7478", "metadata": {}, "source": [ "## Your first relations" ] }, { "cell_type": "markdown", "id": "61d8ce38-d2e0-4b5f-a9f7-64e9a65e0fe0", "metadata": {}, "source": [ "Cozo is a relational database. The \"hello world\" query" ] }, { "cell_type": "code", "execution_count": 3, "id": "30b8148b-4daa-4de0-a844-d8254a07d990", "metadata": {}, "outputs": [ { "data": { "text/html": [ "

Completed in 0ms

" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
 012
0helloworldCozo!
\n" ], "text/plain": [ "" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "?[] <- [['hello', 'world', 'Cozo!']]" ] }, { "cell_type": "markdown", "id": "a7de1353-7970-41ee-843e-cbe6287ded8d", "metadata": {}, "source": [ "as you might have guessed, simply passes an ad hoc relation, here represented by a list of lists, and ask the database to return the relation to you.\n", "\n", "You can pass more rows, or a different number of columns, to corroborate further your guess:" ] }, { "cell_type": "code", "execution_count": 4, "id": "27995b4b-35b3-427f-af32-c3945d177463", "metadata": {}, "outputs": [ { "data": { "text/html": [ "

Completed in 0ms

" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
 012
0123
1abc
\n" ], "text/plain": [ "" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "?[] <- [[1, 2, 3], ['a', 'b', 'c']]" ] }, { "cell_type": "markdown", "id": "f56c15a0-8359-4d90-b35d-aa21991acb96", "metadata": {}, "source": [ "This example shows how to enter literals for numbers, strings, booleans and `null`:" ] }, { "cell_type": "code", "execution_count": 5, "id": "54901c40-ff7e-4c26-bfed-0d38aa7a80dc", "metadata": {}, "outputs": [ { "data": { "text/html": [ "

Completed in 0ms

" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
 01234
0TrueFalseNone-0.014000A string with double quotes
11.5000002.500000345.500000
2aAbBcCdDeE
\n" ], "text/plain": [ "" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "?[] <- [[1.5, 2.5, 3, 4, 5.5], \n", " ['aA', 'bB', 'cC', 'dD', 'eE'], \n", " [true, false, null, -1.4e-2, \"A string with double quotes\"]]" ] }, { "cell_type": "markdown", "id": "615e35d7-f5c0-4d71-ac62-6934c658a715", "metadata": {}, "source": [ "The literal representations are similar to those in JavaScript. \n", "In particular, strings in double quotes are guaranteed to be interpreted in the same way as in JSON." ] }, { "cell_type": "markdown", "id": "1d76ab84-36a3-4d1d-9da2-a891de52b011", "metadata": {}, "source": [ "You may be surprised by the order of the returned rows in the last example: the returned order is not the same as the input order.\n", "This is because in Cozo relations are stored (either in memory or on disk) as trees, and trees are always sorted.\n", "\n", "Another consequence of trees is that you can have no duplicate rows:" ] }, { "cell_type": "code", "execution_count": 6, "id": "28ab3f6e-0e32-486d-83fb-4fd5581e20dc", "metadata": {}, "outputs": [ { "data": { "text/html": [ "

Completed in 0ms

" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
 0
01
12
\n" ], "text/plain": [ "" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "?[] <- [[1], [2], [1], [2], [1]]" ] }, { "cell_type": "markdown", "id": "5a5d9022-c9b2-48c9-b447-ae69a2fc3a21", "metadata": {}, "source": [ "We say that relations in Cozo follow _set semantics_ where de-duplication is automatic. \n", "By contrast, SQL usually follows _bag semantics_ (some databases do this by secretly having a unique internal key for every row, in Cozo you must do this explicitly if you need to simulate duplicate rows).\n", "\n", "Why does Cozo break tradition and go with set semantics?\n", "Set semantics is much more convenient when you have recursions between relations involved,\n", "and Cozo is designed to deal with very complicated recursions." ] }, { "cell_type": "markdown", "id": "bcdfc9e4-1bc4-4473-8dd5-44097a164fbe", "metadata": {}, "source": [ "## Expressions" ] }, { "cell_type": "markdown", "id": "09dd8c4c-fb0e-4f53-9dec-b3fbcb6dce5b", "metadata": {}, "source": [ "The next example shows the use of various expressions and comments:" ] }, { "cell_type": "code", "execution_count": 7, "id": "dff14dab-2cc1-4917-aaa5-8feb1f075cf9", "metadata": {}, "outputs": [ { "data": { "text/html": [ "

Completed in 0ms

" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
 012345678
030.750000FalseFalseTrueFalsehello0.947226[1, 2, 3, 4, 5, 6, 7]
\n" ], "text/plain": [ "" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "?[] <- [[\n", " 1 + 2, # addition\n", " 3 / 4, # division\n", " 5 == 6, # equality\n", " 7 > 8, # greater\n", " true || false, # or\n", " false && true, # and\n", " lowercase('HELLO'), # function\n", " rand_float(), # function taking no argument\n", " union([1, 2, 3], [3, 4, 5], [5, 6, 7]), # variadic function\n", " ]]" ] }, { "cell_type": "markdown", "id": "4c5ee182-2ca8-49b3-8937-19da63124143", "metadata": {}, "source": [ "Notice in the last column the use of list literals within expressions. \n", "See [here](https://cozodb.github.io/current/manual/functions.html) the full list of functions.\n", "The syntax is deliberately made almost identical to C-like languages." ] }, { "cell_type": "markdown", "id": "1bcc26ff-9901-4d6e-a895-cb77dbfafe3a", "metadata": {}, "source": [ "## Rules and relations" ] }, { "cell_type": "markdown", "id": "0eefa79d-a14e-4179-81b0-04da2b47c498", "metadata": {}, "source": [ "Previous examples all start with `?[] <-`, which denotes a _rule_ named `?`, which is a _constant rule_, which when evaluated just echoes the list of lists back as a relation.\n", "\n", "Rules can have other names, but the rule named `?` is special in that its evaluation determines the return relation of the query.\n", "\n", "Before we go beyond constant rules, note that we can give _bindings_ in the _head_ of rules:" ] }, { "cell_type": "code", "execution_count": 8, "id": "2a35a414-7037-4a2a-a07b-086463a4d4ef", "metadata": {}, "outputs": [ { "data": { "text/html": [ "

Completed in 0ms

" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
 firstsecondthird
0123
1abc
\n" ], "text/plain": [ "" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "?[first, second, third] <- [[1, 2, 3], ['a', 'b', 'c']]" ] }, { "cell_type": "markdown", "id": "ba32d27c-0787-48d3-94fb-79e77a8cce0c", "metadata": {}, "source": [ "If you give bindings, the number of bindings must match the actual data, otherwise, you will get an error:" ] }, { "cell_type": "code", "execution_count": 9, "id": "b25901e4-fc6a-4cb0-9ca2-b9aeac80cc83", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "\u001b[31mparser::fixed_rule_head_arity_mismatch\u001b[0m\n", "\n", " \u001b[31m×\u001b[0m Fixed rule head arity mismatch\n", " ╭────\n", " \u001b[2m1\u001b[0m │ ?[first, second] <- [[1, 2, 3], ['a', 'b', 'c']]\n", " · \u001b[35;1m─────────────────────────────────────────────────\u001b[0m\n", " ╰────\n", "\u001b[36m help: \u001b[0mExpected arity: 3, number of arguments given: 2\n" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "?[first, second] <- [[1, 2, 3], ['a', 'b', 'c']]" ] }, { "cell_type": "markdown", "id": "c71fb449-6a9c-4a7e-bed9-44230a41d5dd", "metadata": {}, "source": [ "Now let's define rules that use other rules:" ] }, { "cell_type": "code", "execution_count": 10, "id": "bc63b0b2-fa28-4d02-be90-23690f6021aa", "metadata": {}, "outputs": [ { "data": { "text/html": [ "

Completed in 0ms

" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
 abc
0123
1abc
\n" ], "text/plain": [ "" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "rule[first, second, third] <- [[1, 2, 3], ['a', 'b', 'c']]\n", "?[a, b, c] := rule[a, b, c]" ] }, { "cell_type": "markdown", "id": "d86d57af-fadb-441a-926f-142c613ca1f7", "metadata": {}, "source": [ "This first defines a constant rule named `rule`. The `?` rule is now an _inline rule_, denoted by the connecting symbol `:=`. In its body it _applies_ the fixed rule, by giving the name of the rule followed by three _fresh bindings_, which are the _variables_ `a`, `b` and `c`.\n", "\n", "With inline rules, you can manipulate the order of the columns, or what columns are returned:" ] }, { "cell_type": "code", "execution_count": 11, "id": "d5d52704-e2ed-430c-b9da-939964f7b8e0", "metadata": {}, "outputs": [ { "data": { "text/html": [ "

Completed in 0ms

" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
 cb
032
1cb
\n" ], "text/plain": [ "" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "rule[first, second, third] <- [[1, 2, 3], ['a', 'b', 'c']]\n", "?[c, b] := rule[a, b, c]" ] }, { "cell_type": "markdown", "id": "fb8a68dc-ab27-494b-86cd-b38502ea1a7c", "metadata": {}, "source": [ "The body of an inline rule, which are the things to the right of the connecting symbol `:=`, consists of _atoms_. \n", "The previous example has a single rule application atom as the body. Multiple atoms are connected by commas:" ] }, { "cell_type": "code", "execution_count": 12, "id": "657050ce-5c99-494b-b2e7-283ae73de8fe", "metadata": {}, "outputs": [ { "data": { "text/html": [ "

Completed in 0ms

" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
 cb
032
\n" ], "text/plain": [ "" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "?[c, b] := rule[a, b, c], is_num(a)\n", "rule[first, second, third] <- [[1, 2, 3], ['a', 'b', 'c']]" ] }, { "cell_type": "markdown", "id": "15c68fce-5318-453d-9e59-aee862ed5036", "metadata": {}, "source": [ "Here the second atom is an _expression_ `is_num(a)`. \n", "Only rows for which the expression evaluates to `true` are returned, so expression atoms act as filters. \n", "By the way, we see that the order in which the rules are given is immaterial." ] }, { "cell_type": "markdown", "id": "7793c44a-302c-4ed1-985e-4951c0a78b20", "metadata": {}, "source": [ "You can also bind constants to rule applications directly:" ] }, { "cell_type": "code", "execution_count": 13, "id": "44745ced-563b-41ce-aace-ce43f7ada7f7", "metadata": {}, "outputs": [ { "data": { "text/html": [ "

Completed in 0ms

" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
 cb
0cb
\n" ], "text/plain": [ "" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "rule[first, second, third] <- [[1, 2, 3], ['a', 'b', 'c']]\n", "?[c, b] := rule['a', b, c]" ] }, { "cell_type": "markdown", "id": "40b1fc69-c118-4876-a2d5-5fd2d8931811", "metadata": {}, "source": [ "You introduce additional bindings with the _unification operator_ `=`:" ] }, { "cell_type": "code", "execution_count": 14, "id": "cf32547a-7def-4dec-8d05-77c7c910be2c", "metadata": {}, "outputs": [ { "data": { "text/html": [ "

Completed in 0ms

" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
 cbd
0329
\n" ], "text/plain": [ "" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "rule[first, second, third] <- [[1, 2, 3], ['a', 'b', 'c']]\n", "?[c, b, d] := rule[a, b, c], is_num(a), d = a + b + 2*c" ] }, { "cell_type": "markdown", "id": "c5384db7-8cab-4262-94f5-a2b7298417e8", "metadata": {}, "source": [ "Having multiple rule applications in the body generates every combination of the bindings:" ] }, { "cell_type": "code", "execution_count": 15, "id": "59d2a912-50d6-474a-ae9b-74f024a58bdc", "metadata": {}, "outputs": [ { "data": { "text/html": [ "

Completed in 0ms

" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
 l1l2
0aB
1aC
2bB
3bC
\n" ], "text/plain": [ "" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "r1[] <- [[1, 'a'], [2, 'b']]\n", "r2[] <- [[2, 'B'], [3, 'C']]\n", "\n", "?[l1, l2] := r1[a, l1], \n", " r2[b, l2]" ] }, { "cell_type": "markdown", "id": "b26be73d-dc5c-400b-9af4-d3aa9af0c430", "metadata": {}, "source": [ "This corresponds to a Cartesian join in relational algebra. \n", "Notice the bindings in the rule applications are all distinct.\n", "If bindings are reused, then we get the effect of _unification_,\n", "which corresponds to joins in relational algebra:" ] }, { "cell_type": "code", "execution_count": 16, "id": "20256fd6-b5c0-4947-a860-021846bef5fa", "metadata": {}, "outputs": [ { "data": { "text/html": [ "

Completed in 0ms

" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
 l1l2
0bB
\n" ], "text/plain": [ "" ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ "r1[] <- [[1, 'a'], [2, 'b']]\n", "r2[] <- [[2, 'B'], [3, 'C']]\n", "\n", "?[l1, l2] := r1[a, l1], \n", " r2[a, l2] # reused `a`" ] }, { "cell_type": "markdown", "id": "bf4be53c-118f-46b8-9c42-cb259aed6e87", "metadata": {}, "source": [ "The explicit unification `=` unifies with a single value. There is another kind of unification that unifies values within a list. Observe:" ] }, { "cell_type": "code", "execution_count": 17, "id": "5c7d2b90-8494-4521-aac6-2445473e50d7", "metadata": {}, "outputs": [ { "data": { "text/html": [ "

Completed in 0ms

" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
 xy
01x
11y
22x
32y
43x
53y
\n" ], "text/plain": [ "" ] }, "execution_count": 17, "metadata": {}, "output_type": "execute_result" } ], "source": [ "?[x, y] := x in [1, 2, 3], y in ['x', 'y']" ] }, { "cell_type": "markdown", "id": "af092ab5-2692-4b12-a6a7-eae8def60cac", "metadata": {}, "source": [ "For the head of inline rules, you do not need to use all variables that appear in the body. But whatever you use must appear in the body, this is called the _safety rule_:" ] }, { "cell_type": "code", "execution_count": 18, "id": "5a13303d-4f61-4bc5-85d1-391773a9bc7b", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "\u001b[31meval::unbound_symb_in_head\u001b[0m\n", "\n", " \u001b[31m×\u001b[0m Symbol 'x' in rule head is unbound\n", " ╭─[3:1]\n", " \u001b[2m3\u001b[0m │ \n", " \u001b[2m4\u001b[0m │ ?[l1, l2, x] := r1[a, l1], \n", " · \u001b[35;1m ─\u001b[0m\n", " \u001b[2m5\u001b[0m │ r2[a, l2]\n", " ╰────\n", "\u001b[36m help: \u001b[0mNote that symbols occurring only in negated positions are not considered bound\n" ] }, "execution_count": 18, "metadata": {}, "output_type": "execute_result" } ], "source": [ "r1[] <- [[1, 'a'], [2, 'b']]\n", "r2[] <- [[2, 'B'], [3, 'C']]\n", "\n", "?[l1, l2, x] := r1[a, l1], \n", " r2[a, l2]" ] }, { "cell_type": "markdown", "id": "e0c1b973-4ecf-459d-99d9-bab9665a7d23", "metadata": {}, "source": [ "## Stored relations" ] }, { "cell_type": "markdown", "id": "cb7e8795-7b95-4653-958c-c9682f25e137", "metadata": {}, "source": [ "The constructs that we have introduced already cover most of what relational algebra can do. That's some economy of syntax! \n", "However, as a database, we need to know how to store data persistently. In Cozo, persistent relations are called _stored relations_." ] }, { "cell_type": "markdown", "id": "79357544-8eeb-4b33-b49c-bd6cb4de57cb", "metadata": {}, "source": [ "There is no ceremony required at all when you want to store data in Cozo:" ] }, { "cell_type": "code", "execution_count": 19, "id": "2944f226-c865-4fd6-b412-cb34e2e50705", "metadata": {}, "outputs": [ { "data": { "text/html": [ "

Completed in 0ms

" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
 status
0OK
\n" ], "text/plain": [ "" ] }, "execution_count": 19, "metadata": {}, "output_type": "execute_result" } ], "source": [ "r1[] <- [[1, 'a'], [2, 'b']]\n", "r2[] <- [[2, 'B'], [3, 'C']]\n", "\n", "?[l1, l2] := r1[a, l1], \n", " r2[a, l2]\n", " \n", ":create stored {l1, l2}" ] }, { "cell_type": "markdown", "id": "6e911e87-a80e-4b1b-a878-613b61e339bc", "metadata": {}, "source": [ "The query itself is identical to the one which we have run before, except we have added the `:create` query option, instructing the system to store the result in a stored relation named `stored`, containing the columns `l1` and `l2`.\n", "\n", "By the way, if you just want to create the relation without adding any data, you can omit the queries. No need to have an empty `?` query." ] }, { "cell_type": "markdown", "id": "60c8509a-0aea-4d0b-9ca3-e8d0b00e73ad", "metadata": {}, "source": [ "You can verify that you now have the required stored relation in your system by running a _system op_:" ] }, { "cell_type": "code", "execution_count": 20, "id": "e151e2e6-f185-4659-89c6-a274c643a03a", "metadata": {}, "outputs": [ { "data": { "text/html": [ "

Completed in 0ms

" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
 namearityaccess_leveln_keysn_non_keysn_put_triggersn_rm_triggersn_replace_triggers
0stored2normal20000
\n" ], "text/plain": [ "" ] }, "execution_count": 20, "metadata": {}, "output_type": "execute_result" } ], "source": [ "::relations" ] }, { "cell_type": "markdown", "id": "30d7999f-6939-497d-98d6-e5edc6af44a1", "metadata": {}, "source": [ "You can also investigate the columns of the stored relation:" ] }, { "cell_type": "code", "execution_count": 21, "id": "fd8c4f13-96c8-4f3c-b295-ea493d3875f6", "metadata": {}, "outputs": [ { "data": { "text/html": [ "

Completed in 0ms

" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
 columnis_keyindextypehas_default
0l1True0Any?False
1l2True1Any?False
\n" ], "text/plain": [ "" ] }, "execution_count": 21, "metadata": {}, "output_type": "execute_result" } ], "source": [ "::columns stored " ] }, { "cell_type": "markdown", "id": "79d320aa-bf41-4676-a778-7ff00b704ced", "metadata": {}, "source": [ "Stored relations can be used in a similar way to relations defined via inline rules or fixed rules. The only difference is that you prefix the relation name with a colon:" ] }, { "cell_type": "code", "execution_count": 22, "id": "3d4a2c70-7ed3-403b-9528-a696fe7b08a6", "metadata": {}, "outputs": [ { "data": { "text/html": [ "

Completed in 0ms

" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
 ab
0bB
\n" ], "text/plain": [ "" ] }, "execution_count": 22, "metadata": {}, "output_type": "execute_result" } ], "source": [ "?[a, b] := *stored[a, b]" ] }, { "cell_type": "markdown", "id": "821c1a1f-ef11-4429-ba99-c2599bc18857", "metadata": {}, "source": [ "Unlike relations defined inline, the columns of stored relations have fixed names. You can use this to your advantage by selectively referring to columns by name.\n", "This is especially useful if you have a lot of columns:" ] }, { "cell_type": "code", "execution_count": 23, "id": "dba56d3f-8162-4519-8934-12d5c6935239", "metadata": {}, "outputs": [ { "data": { "text/html": [ "

Completed in 0ms

" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
 ab
0bB
\n" ], "text/plain": [ "" ] }, "execution_count": 23, "metadata": {}, "output_type": "execute_result" } ], "source": [ "?[a, b] := *stored{l2: b, l1: a}" ] }, { "cell_type": "markdown", "id": "135b9fb5-96fd-444b-aa44-ad8dd754318d", "metadata": {}, "source": [ "If you are fine with using the name of the column as the binding, a shorthand is available:" ] }, { "cell_type": "code", "execution_count": 24, "id": "a5a29a9c-5320-4c9f-a153-a96c662cf5d6", "metadata": {}, "outputs": [ { "data": { "text/html": [ "

Completed in 0ms

" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
 l2
0B
\n" ], "text/plain": [ "" ] }, "execution_count": 24, "metadata": {}, "output_type": "execute_result" } ], "source": [ "?[l2] := *stored{l2}" ] }, { "cell_type": "markdown", "id": "2424a684-89fb-449e-a5e6-931353bb482f", "metadata": {}, "source": [ "Inserting more data into stored relation is easy by using the `:put` query option:" ] }, { "cell_type": "code", "execution_count": 25, "id": "eb8b4f2c-a2e7-45e3-8c18-6b6b8329344c", "metadata": {}, "outputs": [ { "data": { "text/html": [ "

Completed in 0ms

" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
 status
0OK
\n" ], "text/plain": [ "" ] }, "execution_count": 25, "metadata": {}, "output_type": "execute_result" } ], "source": [ "?[l1, l2] <- [['e', 'E']]\n", " \n", ":put stored {l1, l2}" ] }, { "cell_type": "code", "execution_count": 26, "id": "ef21c6f3-ce5c-434c-a580-3d1442ebe7c6", "metadata": {}, "outputs": [ { "data": { "text/html": [ "

Completed in 0ms

" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
 l1l2
0bB
1eE
\n" ], "text/plain": [ "" ] }, "execution_count": 26, "metadata": {}, "output_type": "execute_result" } ], "source": [ "?[l1, l2] := *stored[l1, l2]" ] }, { "cell_type": "markdown", "id": "a58b487a-6a44-43c7-9351-b78a49770a46", "metadata": {}, "source": [ "To remove rows, use the `:rm` query option:" ] }, { "cell_type": "code", "execution_count": 27, "id": "2c1d522b-286f-41be-9796-da165999e1df", "metadata": {}, "outputs": [ { "data": { "text/html": [ "

Completed in 0ms

" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
 status
0OK
\n" ], "text/plain": [ "" ] }, "execution_count": 27, "metadata": {}, "output_type": "execute_result" } ], "source": [ "?[l1, l2] <- [['e', 'E']]\n", " \n", ":rm stored {l1, l2}" ] }, { "cell_type": "code", "execution_count": 28, "id": "d9ce93f8-dc37-4c37-9f47-904066c1341a", "metadata": {}, "outputs": [ { "data": { "text/html": [ "

Completed in 0ms

" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
 l1l2
0bB
\n" ], "text/plain": [ "" ] }, "execution_count": 28, "metadata": {}, "output_type": "execute_result" } ], "source": [ "?[l1, l2] := *stored[l1, l2]" ] }, { "cell_type": "markdown", "id": "1f70e3b5-0b96-42ce-8a34-9d45beaf1772", "metadata": {}, "source": [ "You can get rid of a stored relation with the following:" ] }, { "cell_type": "code", "execution_count": 29, "id": "f27e0eb3-711f-4d68-bd02-1e4cfddfd9ac", "metadata": {}, "outputs": [ { "data": { "text/html": [ "

Completed in 0ms

" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
 status
0OK
\n" ], "text/plain": [ "" ] }, "execution_count": 29, "metadata": {}, "output_type": "execute_result" } ], "source": [ "::remove stored" ] }, { "cell_type": "code", "execution_count": 30, "id": "c938ecd5-8166-4f78-8fa1-234c75691d5e", "metadata": {}, "outputs": [ { "data": { "text/html": [ "

Completed in 0ms

" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
 namearityaccess_leveln_keysn_non_keysn_put_triggersn_rm_triggersn_replace_triggers
\n" ], "text/plain": [ "" ] }, "execution_count": 30, "metadata": {}, "output_type": "execute_result" } ], "source": [ "::relations" ] }, { "cell_type": "markdown", "id": "df2bff7d-3164-409a-a7fd-4e074be7ecbc", "metadata": {}, "source": [ "As we have mentioned, every relation in Cozo is a tree. Stored relation is no exception.\n", "So far, our trees store all the data in their keys.\n", "You can instruct Cozo to only treat some of the data as keys, thereby indicating a _functional dependency_:" ] }, { "cell_type": "code", "execution_count": 31, "id": "3a133dc0-5334-4acb-972a-305b5927941a", "metadata": {}, "outputs": [ { "data": { "text/html": [ "

Completed in 0ms

" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
 status
0OK
\n" ], "text/plain": [ "" ] }, "execution_count": 31, "metadata": {}, "output_type": "execute_result" } ], "source": [ "?[a, b, c] <- [[1, 'a', 'A'],\n", " [2, 'b', 'B'],\n", " [3, 'c', 'C'],\n", " [4, 'd', 'D']]\n", "\n", ":create fd {a, b => c}" ] }, { "cell_type": "code", "execution_count": 32, "id": "3ff9ac04-531d-4726-aeb2-d8feab7bc149", "metadata": {}, "outputs": [ { "data": { "text/html": [ "

Completed in 0ms

" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
 abc
01aA
12bB
23cC
34dD
\n" ], "text/plain": [ "" ] }, "execution_count": 32, "metadata": {}, "output_type": "execute_result" } ], "source": [ "?[a, b, c] := *fd[a, b, c]" ] }, { "cell_type": "markdown", "id": "53128ced-3edb-4f8a-bbff-506e7139711f", "metadata": {}, "source": [ "Now if you insert another row with an existing key, that row will be updated:" ] }, { "cell_type": "code", "execution_count": 33, "id": "fd442640-4664-4f1c-a0a7-0a265d19ca74", "metadata": {}, "outputs": [ { "data": { "text/html": [ "

Completed in 0ms

" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
 status
0OK
\n" ], "text/plain": [ "" ] }, "execution_count": 33, "metadata": {}, "output_type": "execute_result" } ], "source": [ "?[a, b, c] <- [[3, 'c', 'CCCCCCC']]\n", "\n", ":put fd {a, b => c}" ] }, { "cell_type": "code", "execution_count": 34, "id": "980e4530-baec-40c5-9d68-e60d3dd8fb73", "metadata": {}, "outputs": [ { "data": { "text/html": [ "

Completed in 0ms

" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
 abc
01aA
12bB
23cCCCCCCC
34dD
\n" ], "text/plain": [ "" ] }, "execution_count": 34, "metadata": {}, "output_type": "execute_result" } ], "source": [ "?[a, b, c] := *fd[a, b, c]" ] }, { "cell_type": "markdown", "id": "4f84ee9b-46b2-448a-b399-914cf5d4d810", "metadata": {}, "source": [ "You can easily check whether a column is in a key position by looking at the `is_key` column in the following:" ] }, { "cell_type": "code", "execution_count": 35, "id": "07bdf041-6b4f-4686-9aa5-3d95f323a397", "metadata": {}, "outputs": [ { "data": { "text/html": [ "

Completed in 0ms

" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
 columnis_keyindextypehas_default
0aTrue0Any?False
1bTrue1Any?False
2cFalse2Any?False
\n" ], "text/plain": [ "" ] }, "execution_count": 35, "metadata": {}, "output_type": "execute_result" } ], "source": [ "::columns fd" ] }, { "cell_type": "markdown", "id": "183002b2-1041-4843-a3a8-d1496ec6a4b6", "metadata": {}, "source": [ "You may have noticed that columns also have types and default values associated with them, and stored relations can have triggers. These are discussed in the [manual](https://cozodb.github.io/current/manual/stored.html).\n", "We won't overload you with all the complexities in this tutorial." ] }, { "cell_type": "markdown", "id": "726ae4cf-d643-4c27-abe2-56c04050e02e", "metadata": {}, "source": [ "The stored relations API are designed to be easy to use. However, sometimes being too easy can be bad, for example, it is possible to wipe an important stored relation full of data by a `:replace` erroneously introduced in a rarely used piece of code somewhere. For valuable stored relations, therefore, we recommend that you set the appropriate _access level_, as explained [here](https://cozodb.github.io/current/manual/stored.html#stored-relations)." ] }, { "cell_type": "markdown", "id": "30504cd6-7fb6-4ae3-ba2c-79eefa288095", "metadata": {}, "source": [ "Before continuing, let's remove the stored relation we introduced:" ] }, { "cell_type": "code", "execution_count": 36, "id": "8e70a093-600c-48c5-90f5-ff3f340287d2", "metadata": {}, "outputs": [ { "data": { "text/html": [ "

Completed in 0ms

" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
 status
0OK
\n" ], "text/plain": [ "" ] }, "execution_count": 36, "metadata": {}, "output_type": "execute_result" } ], "source": [ "::remove fd" ] }, { "cell_type": "markdown", "id": "4b702c08-b59a-451c-826c-919a533b9094", "metadata": {}, "source": [ "## Graphs" ] }, { "cell_type": "markdown", "id": "f22421f2-6492-4e0a-a66b-672ff7a3d3b5", "metadata": {}, "source": [ "Now let's consider a graph, stored as a relation. Let's first make it a stored relation:" ] }, { "cell_type": "code", "execution_count": 37, "id": "fba82f6e-124b-478d-893d-b69bc79680cc", "metadata": {}, "outputs": [ { "data": { "text/html": [ "

Completed in 0ms

" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
 status
0OK
\n" ], "text/plain": [ "" ] }, "execution_count": 37, "metadata": {}, "output_type": "execute_result" } ], "source": [ "?[loving, loved] <- [['alice', 'eve'],\n", " ['bob', 'alice'],\n", " ['eve', 'alice'],\n", " ['eve', 'bob'],\n", " ['eve', 'charlie'],\n", " ['charlie', 'eve'],\n", " ['david', 'george'],\n", " ['george', 'george']]\n", "\n", ":replace love {loving, loved}" ] }, { "cell_type": "markdown", "id": "8b1cf760-0989-48b3-aaed-3270c2d188bf", "metadata": {}, "source": [ "The graph we have created reads like \"Alice loves Eve, Bob loves Alice\", \"nobody loves David, David loves George, but George only loves himself\", and so on. \n", "Here we used `:replace` instead of `:create`. The difference is that if `love` already exists, it will be wiped and replaced with the new data given." ] }, { "cell_type": "markdown", "id": "59d559f0-3325-4bee-b209-3dbe26c3b90c", "metadata": {}, "source": [ "With the graph available, we can investigate competing interests:" ] }, { "cell_type": "code", "execution_count": 38, "id": "64fb79e4-381e-4190-ac19-05b53e76cf6b", "metadata": {}, "outputs": [ { "data": { "text/html": [ "

Completed in 0ms

" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
 loved_by_b_e
0alice
\n" ], "text/plain": [ "" ] }, "execution_count": 38, "metadata": {}, "output_type": "execute_result" } ], "source": [ "?[loved_by_b_e] := *love['eve', loved_by_b_e], *love['bob', loved_by_b_e]" ] }, { "cell_type": "markdown", "id": "48fa9636-5727-4984-af6e-f318c3e98b85", "metadata": {}, "source": [ "So far we have only seen bodies consisting of _conjunction_ of atoms. Disjunction is also available, by using the `or` keyword:" ] }, { "cell_type": "code", "execution_count": 39, "id": "132c8855-cca2-4cb3-ae53-53d2ff92f4a9", "metadata": {}, "outputs": [ { "data": { "text/html": [ "

Completed in 0ms

" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
 loved_by_b_e
0alice
1charlie
\n" ], "text/plain": [ "" ] }, "execution_count": 39, "metadata": {}, "output_type": "execute_result" } ], "source": [ "?[loved_by_b_e] := *love['eve', loved_by_b_e] or *love['bob', loved_by_b_e], \n", " loved_by_b_e != 'bob', \n", " loved_by_b_e != 'eve'" ] }, { "cell_type": "markdown", "id": "d8ebc9a1-356e-4ba1-952c-c66920551381", "metadata": {}, "source": [ "Another way to write the same query is to have multiple definitions of the same rule, with different bodies:" ] }, { "cell_type": "code", "execution_count": 40, "id": "4e7ba20f-2a10-41f0-9062-cc269b36554d", "metadata": {}, "outputs": [ { "data": { "text/html": [ "

Completed in 0ms

" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
 loved_by_b_e
0alice
1charlie
\n" ], "text/plain": [ "" ] }, "execution_count": 40, "metadata": {}, "output_type": "execute_result" } ], "source": [ "?[loved_by_b_e] := *love['eve', loved_by_b_e], \n", " loved_by_b_e != 'bob', \n", " loved_by_b_e != 'eve'\n", "?[loved_by_b_e] := *love['bob', loved_by_b_e], \n", " loved_by_b_e != 'bob', \n", " loved_by_b_e != 'eve'" ] }, { "cell_type": "markdown", "id": "0b859b50-a748-4e20-9900-7ee053078a7f", "metadata": {}, "source": [ "The first way of writing the query (using `or`) is just syntax sugar for the second way. When you have multiple definitions of the same inline rule, the rule heads must be compatible. Fixed rules cannot have multiple definitions." ] }, { "cell_type": "markdown", "id": "5f357511-fb25-441f-bf9f-d1e583368928", "metadata": {}, "source": [ "## Negation" ] }, { "cell_type": "markdown", "id": "78242c93-bf6e-4e72-84e1-e7706f84e633", "metadata": {}, "source": [ "The next example demonstrates filters using negated expressions, which should already be familiar now:" ] }, { "cell_type": "code", "execution_count": 41, "id": "ff02b41d-d3aa-4267-b7e9-b56874f37bf9", "metadata": {}, "outputs": [ { "data": { "text/html": [ "

Completed in 0ms

" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
 loved
0alice
1george
\n" ], "text/plain": [ "" ] }, "execution_count": 41, "metadata": {}, "output_type": "execute_result" } ], "source": [ "?[loved] := *love[person, loved], !ends_with(person, 'e')" ] }, { "cell_type": "markdown", "id": "433386de-59b5-4169-b0fb-20ad28df401b", "metadata": {}, "source": [ "Rule applications can also be negated. Not with the `!` operator, but with the `not` keyword instead:" ] }, { "cell_type": "code", "execution_count": 42, "id": "5eda38db-ddba-45e1-8bd5-4b39116fd81f", "metadata": {}, "outputs": [ { "data": { "text/html": [ "

Completed in 0ms

" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
 loved_by_e_not_b
0bob
1charlie
\n" ], "text/plain": [ "" ] }, "execution_count": 42, "metadata": {}, "output_type": "execute_result" } ], "source": [ "?[loved_by_e_not_b] := *love['eve', loved_by_e_not_b], not *love['bob', loved_by_e_not_b]" ] }, { "cell_type": "markdown", "id": "5b2c4fa2-1f07-440a-a7fa-1f9a3cbfdc3c", "metadata": {}, "source": [ "You can say that there are two sets of logical operations in Cozo, one set that acts on the level of expressions, and another set that acts on the level of atoms:\n", "\n", "* For atoms: `,` or `and` (conjunction), `or` (disjunction), `not` (negation)\n", "* For expressions: `&&` (conjunction), `||` (disjunction), `!` (negation)\n", "\n", "The difference between `,` and `and` is operator precedence: `and` has higher precedence than `or`, whereas `,` has lower precedence than `or`." ] }, { "cell_type": "markdown", "id": "0d382525-25a5-49d1-9d8e-4a057de933a0", "metadata": {}, "source": [ "Negation of atoms has to abide by the _safety rule_. Let's violate it:" ] }, { "cell_type": "code", "execution_count": 43, "id": "7db09fb7-25a5-424a-a66a-0bc24d6d4fc1", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "\u001b[31meval::unbound_symb_in_head\u001b[0m\n", "\n", " \u001b[31m×\u001b[0m Symbol 'not_loved_by_b' in rule head is unbound\n", " ╭────\n", " \u001b[2m1\u001b[0m │ ?[not_loved_by_b] := not *love['bob', not_loved_by_b]\n", " · \u001b[35;1m ──────────────\u001b[0m\n", " ╰────\n", "\u001b[36m help: \u001b[0mNote that symbols occurring only in negated positions are not considered bound\n" ] }, "execution_count": 43, "metadata": {}, "output_type": "execute_result" } ], "source": [ "?[not_loved_by_b] := not *love['bob', not_loved_by_b]" ] }, { "cell_type": "markdown", "id": "44d83981-21eb-406c-9287-0e50aff5c423", "metadata": {}, "source": [ "Why is this query not allowed? Well, what can it possibly return?\n", "For example, should the query return 'gold', since according to the facts at hand, \n", "Bob has no interest in 'gold'? \n", "So should our query return every possible string except a select few? \n", "That's not reasonable." ] }, { "cell_type": "markdown", "id": "204fbb34-293e-4c63-a1c4-14e5a48491d7", "metadata": {}, "source": [ "To make our query reasonable, we have to explicitly give our query a _closed world_ in which to operate the negation:" ] }, { "cell_type": "code", "execution_count": 44, "id": "225a4625-b99a-41a8-a2ef-fa553065e8aa", "metadata": {}, "outputs": [ { "data": { "text/html": [ "

Completed in 0ms

" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
 not_loved_by_b
0bob
1charlie
2david
3eve
4george
\n" ], "text/plain": [ "" ] }, "execution_count": 44, "metadata": {}, "output_type": "execute_result" } ], "source": [ "the_population[p] := *love[p, _a]\n", "the_population[p] := *love[_a, p]\n", "\n", "?[not_loved_by_b] := the_population[not_loved_by_b], not *love['bob', not_loved_by_b]" ] }, { "cell_type": "markdown", "id": "5ac5f2cd-5a05-4b84-898d-f5a08c15907a", "metadata": {}, "source": [ "## Recursion" ] }, { "cell_type": "markdown", "id": "6a61abee-f5ae-4447-ae7b-d857bfe5b870", "metadata": {}, "source": [ "Inline rules can refer to other rules by applying them. Inline rules can have multiple definitions. If you combine these two, you get recursions:" ] }, { "cell_type": "code", "execution_count": 45, "id": "7a63d36e-a03d-4b4a-a825-cdc3ed036f1b", "metadata": {}, "outputs": [ { "data": { "text/html": [ "

Completed in 0ms

" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
 chained
0alice
1bob
2charlie
3eve
\n" ], "text/plain": [ "" ] }, "execution_count": 45, "metadata": {}, "output_type": "execute_result" } ], "source": [ "alice_love_chain[person] := *love['alice', person]\n", "alice_love_chain[person] := alice_love_chain[in_person], *love[in_person, person]\n", "\n", "?[chained] := alice_love_chain[chained]" ] }, { "cell_type": "markdown", "id": "00db7e33-13e8-4b5f-9438-b14ec8b37bf5", "metadata": {}, "source": [ "Someone \"chained\" is either loved by Alice directly or loved by someone already in the chain. The query as written reads very naturally." ] }, { "cell_type": "markdown", "id": "1666bf8f-4e41-4428-bdc8-c1d05f6783df", "metadata": {}, "source": [ "You may object that you only need to be able to refer to other rules by applying them to have recursion, and multiple definitions are not required. Technically, true, but the resulting queries are not useful. Observe:" ] }, { "cell_type": "code", "execution_count": 46, "id": "cbf96780-d71b-471e-8add-dbd2d0f13f24", "metadata": {}, "outputs": [ { "data": { "text/html": [ "

Completed in 0ms

" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
 chained
\n" ], "text/plain": [ "" ] }, "execution_count": 46, "metadata": {}, "output_type": "execute_result" } ], "source": [ "alice_love_chain[person] := alice_love_chain[in_person], *love[in_person, person]\n", "\n", "?[chained] := alice_love_chain[chained]" ] }, { "cell_type": "markdown", "id": "ac4a9d13-3360-487d-81c8-8da33f781bae", "metadata": {}, "source": [ "This is the _closed-world assumption_. If there is no way to _deduce_ a fact from the given facts, _then_ the fact itself is false. You need multiple definitions to \"bootstrap\" the query." ] }, { "cell_type": "markdown", "id": "18f93e8f-d0d7-4238-b057-682a2e4daa70", "metadata": {}, "source": [ "You can do crazy things with recursion and negation. Fortunately, Cozo will try to stop you when you want to run something unreasonable:" ] }, { "cell_type": "code", "execution_count": 47, "id": "21d2a0da-9c12-454f-b3d7-43a13496cdef", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "\u001b[31meval::unstratifiable\u001b[0m\n", "\n", " \u001b[31m×\u001b[0m Query is unstratifiable\n", "\u001b[36m help: \u001b[0mThe rule 'q' is in the strongly connected component [\"p\", \"q\"],\n", " and is involved in at least one forbidden dependency\n", " (negation, non-meet aggregation, or algorithm-application).\n" ] }, "execution_count": 47, "metadata": {}, "output_type": "execute_result" } ], "source": [ "world[a] := a in [1, 2]\n", "\n", "p[a] := world[a], not q[a]\n", "q[a] := world[a], not p[a]\n", "\n", "?[a] := p[a]" ] }, { "cell_type": "markdown", "id": "51c7dd9e-51f2-4bb0-9e9b-c912b3f87036", "metadata": {}, "source": [ "Never mind the error message. If you consider the query as an equation to be solved, then `p[a] <- [[1]]` and `q[a] <- [[2]]` is a solution. But there is no way to _deduce_ this solution constructively. Furthermore, `q[a] <- [[1]]` and `p[a] <- [[2]]` is also a solution which is incompatible with the first." ] }, { "cell_type": "markdown", "id": "72cfad4e-20de-4700-a191-03647fd6cdf8", "metadata": {}, "source": [ "## Aggregation" ] }, { "cell_type": "markdown", "id": "62149092-fb2f-47b2-bb0d-404fc17c28c1", "metadata": {}, "source": [ "For computing statistics, _aggregations_ are useful. In Cozo, aggregations are applied in the head of inline rules:" ] }, { "cell_type": "code", "execution_count": 48, "id": "70705529-80be-4990-9ea5-9f7842ba1543", "metadata": {}, "outputs": [ { "data": { "text/html": [ "

Completed in 0ms

" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
 personcount(loved_by)
0alice2
1bob1
2charlie1
3eve2
4george2
\n" ], "text/plain": [ "" ] }, "execution_count": 48, "metadata": {}, "output_type": "execute_result" } ], "source": [ "?[person, count(loved_by)] := *love[loved_by, person]" ] }, { "cell_type": "markdown", "id": "19f47c1e-ffb6-4531-b181-0995bfeb63df", "metadata": {}, "source": [ "The usual `sum`, `mean`, etc. are all available. Having aggregations apply in the head of the rule instead of in the body is powerful, as we will see later in the extended examples.\n", "\n", "Here is the [full list](https://cozodb.github.io/current/manual/aggregations.html) of aggregations." ] }, { "cell_type": "markdown", "id": "9e39b8d9-0d52-433e-bec4-d856afd83606", "metadata": {}, "source": [ "## Query options" ] }, { "cell_type": "markdown", "id": "921d1c1a-3396-41d2-9010-1f3b9318cce8", "metadata": {}, "source": [ "We already know how to use query options to manipulate stored relations. There are also query options for controlling what is returned. For example:" ] }, { "cell_type": "code", "execution_count": 49, "id": "542e2947-95b3-41d1-8f91-007aa353d09d", "metadata": {}, "outputs": [ { "data": { "text/html": [ "

Completed in 0ms

" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
 lovingloved
0aliceeve
1bobalice
2charlieeve
3davidgeorge
4evealice
5evebob
6evecharlie
7georgegeorge
\n" ], "text/plain": [ "" ] }, "execution_count": 49, "metadata": {}, "output_type": "execute_result" } ], "source": [ "?[loving, loved] := *love{ loving, loved }" ] }, { "cell_type": "markdown", "id": "5ef5dc9c-1c58-49de-8cd6-4f725b001551", "metadata": {}, "source": [ "returns all rows. If we only want one row:" ] }, { "cell_type": "code", "execution_count": 50, "id": "58f1d43f-3f50-42df-8e27-8f6e549cbc6e", "metadata": {}, "outputs": [ { "data": { "text/html": [ "

Completed in 0ms

" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
 lovingloved
0aliceeve
\n" ], "text/plain": [ "" ] }, "execution_count": 50, "metadata": {}, "output_type": "execute_result" } ], "source": [ "?[loving, loved] := *love{ loving, loved }\n", "\n", ":limit 1" ] }, { "cell_type": "markdown", "id": "fe622726-e0b0-4ea3-878a-75e7de93a669", "metadata": {}, "source": [ "sorted by `loved` in descending order, then `loving` in ascending order, and skip the first row:" ] }, { "cell_type": "code", "execution_count": 51, "id": "60f3673b-3273-49a3-b893-1236faf64490", "metadata": {}, "outputs": [ { "data": { "text/html": [ "

Completed in 0ms

" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
 lovingloved
0georgegeorge
1aliceeve
2charlieeve
3evecharlie
4evebob
5bobalice
6evealice
\n" ], "text/plain": [ "" ] }, "execution_count": 51, "metadata": {}, "output_type": "execute_result" } ], "source": [ "?[loving, loved] := *love{ loving, loved }\n", "\n", ":order -loved, loving\n", ":offset 1" ] }, { "cell_type": "markdown", "id": "2c481e69-9640-4246-8ab8-e66549653c0d", "metadata": {}, "source": [ "Putting `-` in front of variables in `:order` clause denotes reverse order. Nothing or `+` denotes the normal order." ] }, { "cell_type": "markdown", "id": "40e456b1-2030-4013-894d-a45e81e6b74e", "metadata": {}, "source": [ "There are many more query options, as explained [here](https://cozodb.github.io/current/manual/queries.html#query-options)." ] }, { "cell_type": "markdown", "id": "f6aebc44-2523-4bc6-8c65-3554e985aef1", "metadata": {}, "source": [ "## Fixed rules" ] }, { "cell_type": "markdown", "id": "cf08e8f6-ce6a-4212-a456-98fa57d457b4", "metadata": {}, "source": [ "You may be wondering why we are calling rules defined `:=` _inline_ rules. \n", "Well, the logic that defines how the output relation is computed is given _inline_, as a series of atoms.\n", "\n", "By contrast, rules defined using `<-` are called _constant_ rules, which are special cases of _fixed rules_:\n", "rules whose logic is defined in fixed implementations hidden from the user.\n", "\n", "The `<-` syntax is syntax sugar. The full syntax is:" ] }, { "cell_type": "code", "execution_count": 52, "id": "91414374-23cb-4310-ac36-a372e151ef6c", "metadata": {}, "outputs": [ { "data": { "text/html": [ "

Completed in 0ms

" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
 012
0helloworldCozo!
\n" ], "text/plain": [ "" ] }, "execution_count": 52, "metadata": {}, "output_type": "execute_result" } ], "source": [ "?[] <~ Constant(data: [['hello', 'world', 'Cozo!']])" ] }, { "cell_type": "markdown", "id": "5b836e0b-1b38-485a-ade2-69742b29d09f", "metadata": {}, "source": [ "Here we are using the fixed rule `Constant`, which takes one _option_ named `data`. Note the curly tail of the arrow.\n", "\n", "Fixed rules take in some input relations, and by applying custom logic, produce their output relation. The `Constant` fixed rule take in zero input relations.\n", "\n", "As an example of a less trivial fixed rule, let's say we want to find out who is most popular in the `love` graph. How do we define popularity? \n", "One way is to say that the higher [PageRank](https://en.wikipedia.org/wiki/PageRank) a person has, the more popular. Calculating PageRank using inline rules\n", "is very awkward (but doable). Fortunately, one of the fixed rules is an optimized PageRank implementation, so let's just use it:" ] }, { "cell_type": "code", "execution_count": 53, "id": "f596d73d-1879-49e7-834a-5c489752a152", "metadata": {}, "outputs": [ { "data": { "text/html": [ "

Completed in 0ms

" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
 personpage_rank
0alice1.191497
1eve1.191497
2george1.064742
3bob0.921087
4charlie0.921087
5david0.574623
\n" ], "text/plain": [ "" ] }, "execution_count": 53, "metadata": {}, "output_type": "execute_result" } ], "source": [ "?[person, page_rank] <~ PageRank(*love[])\n", "\n", ":order -page_rank" ] }, { "cell_type": "markdown", "id": "394a62e3-ad1c-49a0-944f-f0de0150d2e0", "metadata": {}, "source": [ "Here the input relation is a stored relation. Input relations are distinguished from options by not having any names preceding them.\n", "\n", "Each fixed rule is different, and you must read their [documentation](https://cozodb.github.io/current/manual/algorithms.html) to learn how to correctly use them." ] }, { "cell_type": "code", "execution_count": 54, "id": "db73e2f7-61a1-4824-9c47-c6da461b632b", "metadata": {}, "outputs": [ { "data": { "text/html": [ "

Completed in 0ms

" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
 status
0OK
\n" ], "text/plain": [ "" ] }, "execution_count": 54, "metadata": {}, "output_type": "execute_result" } ], "source": [ "::remove love" ] }, { "cell_type": "markdown", "id": "a16bee55-9802-4089-a0c2-c6f85213cf13", "metadata": {}, "source": [ "## Extended example: the air routes dataset" ] }, { "cell_type": "markdown", "id": "72e46708-93cd-45de-8d9e-79adb8cc1da0", "metadata": {}, "source": [ "Now you have a basic understanding of using the various constructs of Cozo, let's deal with a less trivial dataset.\n", "\n", "The data we are going to use, and many examples that we will present, are adapted from the book [Practical Gremlin](https://kelvinlawrence.net/book/Gremlin-Graph-Guide.html), which teaches the Gremlin graph query language, a very different, imperative take on graphs (Datalog, by contrast, is declarative)." ] }, { "cell_type": "markdown", "id": "54e3ec1d-c139-48ab-8205-49b71aba4c67", "metadata": {}, "source": [ "First, let's import the data into our database. We will use fixed rules to do that. First, we define the `airport` relation:" ] }, { "cell_type": "code", "execution_count": 55, "id": "dd70c091-063d-4f0e-995d-06b7f428a4b9", "metadata": {}, "outputs": [ { "data": { "text/html": [ "

Completed in 33ms

" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
 status
0OK
\n" ], "text/plain": [ "" ] }, "execution_count": 55, "metadata": {}, "output_type": "execute_result" } ], "source": [ "res[idx, label, typ, code, icao, desc, region, runways, longest, elev, country, city, lat, lon] <~\n", " CsvReader(types: ['Int', 'Any', 'Any', 'Any', 'Any', 'Any', 'Any', 'Int?', 'Float?', 'Float?', 'Any', 'Any', 'Float?', 'Float?'],\n", " url: 'https://github.com/cozodb/cozo/raw/main/tests/air-routes-latest-nodes.csv', \n", " # url: 'file://./tests/air-routes-latest-nodes.csv', \n", " has_headers: true)\n", "\n", "?[code, icao, desc, region, runways, longest, elev, country, city, lat, lon] :=\n", " res[idx, label, typ, code, icao, desc, region, runways, longest, elev, country, city, lat, lon],\n", " label == 'airport'\n", "\n", ":replace airport {\n", " code: String \n", " => \n", " icao: String, \n", " desc: String, \n", " region: String, \n", " runways: Int, \n", " longest: Float, \n", " elev: Float, \n", " country: String, \n", " city: String, \n", " lat: Float, \n", " lon: Float\n", "}" ] }, { "cell_type": "markdown", "id": "708cd3df-4da7-42e2-a1a3-f36e37fed7f4", "metadata": {}, "source": [ "The `CsvReader` utility downloads a CSV file from the internet and attempts to parse its content into a relation.\n", "When we store the relation, we specified types for the columns. The `code` column acts as a primary key for the `airport` stored relation.\n", "\n", "If your Internet connection is slow, it might help if you download the CSV file manually to your disk and load the local file. \n", "The line commented out shows how to do it. The relative path is relative to the directory in which you run the `cozoserver` executable.\n", "As the same file will be downloaded multiple times below, you may also want to download it just once to the local disk if your connection is metered." ] }, { "cell_type": "markdown", "id": "79d41137-1936-4023-a870-91158693c6bb", "metadata": {}, "source": [ "Next is `country`:" ] }, { "cell_type": "code", "execution_count": 56, "id": "51e71275-3911-42c0-b993-8916ba953fdc", "metadata": {}, "outputs": [ { "data": { "text/html": [ "

Completed in 11ms

" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
 status
0OK
\n" ], "text/plain": [ "" ] }, "execution_count": 56, "metadata": {}, "output_type": "execute_result" } ], "source": [ "res[idx, label, typ, code, icao, desc] <~\n", " CsvReader(types: ['Int', 'Any', 'Any', 'Any', 'Any', 'Any'],\n", " url: 'https://github.com/cozodb/cozo/raw/main/tests/air-routes-latest-nodes.csv', \n", " # url: 'file://./tests/air-routes-latest-nodes.csv', \n", " has_headers: true)\n", "?[code, desc] :=\n", " res[idx, label, typ, code, icao, desc],\n", " label == 'country'\n", "\n", ":replace country {\n", " code: String\n", " =>\n", " desc: String\n", "}" ] }, { "cell_type": "markdown", "id": "72901087-cebd-472e-ba31-adee8dc72abc", "metadata": {}, "source": [ "`continent`:" ] }, { "cell_type": "code", "execution_count": 57, "id": "d5660e58-0fec-4929-941d-0debb9234e50", "metadata": {}, "outputs": [ { "data": { "text/html": [ "

Completed in 11ms

" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
 status
0OK
\n" ], "text/plain": [ "" ] }, "execution_count": 57, "metadata": {}, "output_type": "execute_result" } ], "source": [ "res[idx, label, typ, code, icao, desc] <~\n", " CsvReader(types: ['Int', 'Any', 'Any', 'Any', 'Any', 'Any'],\n", " url: 'https://github.com/cozodb/cozo/raw/main/tests/air-routes-latest-nodes.csv', \n", " # url: 'file://./tests/air-routes-latest-nodes.csv', \n", " has_headers: true)\n", "?[idx, code, desc] :=\n", " res[idx, label, typ, code, icao, desc],\n", " label == 'continent'\n", "\n", ":replace continent {\n", " code: String\n", " =>\n", " desc: String\n", "}" ] }, { "cell_type": "markdown", "id": "6e2d30c1-90d3-4403-b816-a21d0baf0670", "metadata": {}, "source": [ "We need to make a translation table for the indices the original data use:" ] }, { "cell_type": "code", "execution_count": 58, "id": "49433f68-afcd-4788-9e82-a8f57c24ee78", "metadata": {}, "outputs": [ { "data": { "text/html": [ "

Completed in 24ms

" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
 status
0OK
\n" ], "text/plain": [ "" ] }, "execution_count": 58, "metadata": {}, "output_type": "execute_result" } ], "source": [ "res[idx, label, typ, code] <~\n", " CsvReader(types: ['Int', 'Any', 'Any', 'Any'],\n", " url: 'https://github.com/cozodb/cozo/raw/main/tests/air-routes-latest-nodes.csv', \n", " # url: 'file://./tests/air-routes-latest-nodes.csv', \n", " has_headers: true)\n", "?[idx, code] :=\n", " res[idx, label, typ, code],\n", "\n", ":replace idx2code { idx => code }" ] }, { "cell_type": "markdown", "id": "af261df6-ce89-47a0-b7d9-2efe794767fb", "metadata": {}, "source": [ "The `contain` relation contains information on the geographical inclusion of entities:" ] }, { "cell_type": "code", "execution_count": 59, "id": "2ec122ba-a72b-4ee8-bd57-e6c9db7f6e15", "metadata": {}, "outputs": [ { "data": { "text/html": [ "

Completed in 99ms

" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
 status
0OK
\n" ], "text/plain": [ "" ] }, "execution_count": 59, "metadata": {}, "output_type": "execute_result" } ], "source": [ "res[] <~\n", " CsvReader(types: ['Int', 'Int', 'Int', 'String'],\n", " url: 'https://github.com/cozodb/cozo/raw/main/tests/air-routes-latest-edges.csv', \n", " # url: 'file://./tests/air-routes-latest-edges.csv', \n", " has_headers: true)\n", "?[entity, contained] :=\n", " res[idx, fr_i, to_i, typ],\n", " typ == 'contains',\n", " *idx2code[fr_i, entity],\n", " *idx2code[to_i, contained]\n", "\n", "\n", ":replace contain { entity: String, contained: String }" ] }, { "cell_type": "markdown", "id": "ddc41b91-021a-4ab4-b92f-63013dd655bf", "metadata": {}, "source": [ "Finally, the `route`s between the airports. This relation is much larger than the rest and contains about 60k rows, which may take a few seconds to download and process:" ] }, { "cell_type": "code", "execution_count": 60, "id": "a916cb80-7463-44eb-8f15-c12cbe8091d4", "metadata": {}, "outputs": [ { "data": { "text/html": [ "

Completed in 368ms

" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
 status
0OK
\n" ], "text/plain": [ "" ] }, "execution_count": 60, "metadata": {}, "output_type": "execute_result" } ], "source": [ "res[] <~\n", " CsvReader(types: ['Int', 'Int', 'Int', 'String', 'Float?'],\n", " url: 'https://github.com/cozodb/cozo/raw/main/tests/air-routes-latest-edges.csv', \n", " # url: 'file://./tests/air-routes-latest-edges.csv', \n", " has_headers: true)\n", "?[fr, to, dist] :=\n", " res[idx, fr_i, to_i, typ, dist],\n", " typ == 'route',\n", " *idx2code[fr_i, fr],\n", " *idx2code[to_i, to]\n", "\n", ":replace route { fr: String, to: String => dist: Float }" ] }, { "cell_type": "markdown", "id": "c1ce4c36-d196-414b-b7f4-295708849e19", "metadata": {}, "source": [ "We no longer need the `idx2code` relation:" ] }, { "cell_type": "code", "execution_count": 61, "id": "5938a20c-56d7-41ec-b3b2-4ca2f5bd835b", "metadata": {}, "outputs": [ { "data": { "text/html": [ "

Completed in 0ms

" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
 status
0OK
\n" ], "text/plain": [ "" ] }, "execution_count": 61, "metadata": {}, "output_type": "execute_result" } ], "source": [ "::remove idx2code" ] }, { "cell_type": "markdown", "id": "65cb9b9d-79ce-4814-8f75-e6c22fe1d98d", "metadata": {}, "source": [ "Now let's verify all the relations we want are there:" ] }, { "cell_type": "code", "execution_count": 62, "id": "5c067dd0-7052-46c6-823b-ddf6cbb1b516", "metadata": {}, "outputs": [ { "data": { "text/html": [ "

Completed in 0ms

" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
 namearityaccess_leveln_keysn_non_keysn_put_triggersn_rm_triggersn_replace_triggers
0airport11normal110000
1contain2normal20000
2continent2normal11000
3country2normal11000
4route3normal21000
\n" ], "text/plain": [ "" ] }, "execution_count": 62, "metadata": {}, "output_type": "execute_result" } ], "source": [ "::relations" ] }, { "cell_type": "markdown", "id": "0d33e9f4-0876-46cd-bc6e-7ae42ce587dd", "metadata": {}, "source": [ "Now let's just look at some data. Start with airports:" ] }, { "cell_type": "code", "execution_count": 63, "id": "fefbae3c-3b06-43ca-82b6-a10ee9174edf", "metadata": {}, "outputs": [ { "data": { "text/html": [ "

Completed in 0ms

" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
 codecitydescregionrunwayslatlon
0AAAAnaaAnaa AirportPF-U-A1-17.352600-145.509995
1AAEAnnabahAnnaba AirportDZ-36236.8222017.809170
2AALAalborgAalborg AirportDK-81257.0927599.849243
3AANAl AinAl Ain International AirportAE-AZ124.26170055.609200
4AAQAnapaAnapa AirportRU-KDA145.00210237.347301
\n" ], "text/plain": [ "" ] }, "execution_count": 63, "metadata": {}, "output_type": "execute_result" } ], "source": [ "?[code, city, desc, region, runways, lat, lon] := *airport{code, city, desc, region, runways, lat, lon}\n", " \n", ":limit 5" ] }, { "cell_type": "markdown", "id": "b645ed53-c80a-4b50-8e4d-f5080c48de2d", "metadata": {}, "source": [ "Airports with the most runways:" ] }, { "cell_type": "code", "execution_count": 64, "id": "b733fffd-e292-4af9-9810-7010b87c2d09", "metadata": {}, "outputs": [ { "data": { "text/html": [ "

Completed in 15ms

" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
 codecitydescregionrunwayslatlon
0DFWDallasDallas/Fort Worth International AirportUS-TX732.896801-97.038002
1ORDChicagoChicago O'Hare International AirportUS-IL741.978600-87.904800
2AMSAmsterdamAmsterdam Airport SchipholNL-NH652.3086014.763890
3BOSBostonBoston LoganUS-MA642.364300-71.005203
4DENDenverDenver International AirportUS-CO639.861698-104.672997
5DTWDetroitDetroit Metropolitan, Wayne CountyUS-MI642.212399-83.353401
6ATLAtlantaHartsfield - Jackson Atlanta International AirportUS-GA533.636700-84.428101
7GISGisborneGisborne AirportNZ-GIS5-38.663300177.977997
8HLZHamiltonHamilton International AirportNZ-WKO5-37.866699175.332001
9IAHHoustonGeorge Bush IntercontinentalUS-TX529.984400-95.341400
\n" ], "text/plain": [ "" ] }, "execution_count": 64, "metadata": {}, "output_type": "execute_result" } ], "source": [ "?[code, city, desc, region, runways, lat, lon] := *airport{code, city, desc, region, runways, lat, lon}\n", "\n", ":order -runways\n", ":limit 10" ] }, { "cell_type": "markdown", "id": "db45f1d4-e89a-449e-beb8-a9301acfa86c", "metadata": {}, "source": [ "How many airports are there in total?" ] }, { "cell_type": "code", "execution_count": 65, "id": "9cd0a6ba-4808-4812-b760-ac389c435610", "metadata": {}, "outputs": [ { "data": { "text/html": [ "

Completed in 15ms

" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
 count(code)
03504
\n" ], "text/plain": [ "" ] }, "execution_count": 65, "metadata": {}, "output_type": "execute_result" } ], "source": [ "?[count(code)] := *airport{code}" ] }, { "cell_type": "markdown", "id": "e7bf2676-4e79-49f0-9401-4bf60f2dc313", "metadata": {}, "source": [ "Let's get a distribution of the initials of the airport codes:" ] }, { "cell_type": "code", "execution_count": 66, "id": "d9878f9c-1b43-426b-a8f5-5060f2476d86", "metadata": {}, "outputs": [ { "data": { "text/html": [ "

Completed in 19ms

" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
 count(initial)initial
0212A
1235B
2214C
3116D
495E
576F
6135G
7129H
8112I
980J
10197K
11184L
12228M
13111N
1489O
15203P
167Q
17121R
18245S
19205T
2077U
2186V
2259W
2328X
24211Y
2549Z
\n" ], "text/plain": [ "" ] }, "execution_count": 66, "metadata": {}, "output_type": "execute_result" } ], "source": [ "?[count(initial), initial] := *airport{code}, initial = first(chars(code))\n", "\n", ":order initial" ] }, { "cell_type": "markdown", "id": "34283880-46bb-4759-965e-5e3de5128c15", "metadata": {}, "source": [ "More useful are the statistics of runways:" ] }, { "cell_type": "code", "execution_count": 67, "id": "4d509508-02fa-4ec3-94a9-905dde2dda41", "metadata": {}, "outputs": [ { "data": { "text/html": [ "

Completed in 17ms

" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
 count(r)count_unique(r)sum(r)min(r)max(r)mean(r)std_dev(r)
0350474980.000000171.4212330.743083
\n" ], "text/plain": [ "" ] }, "execution_count": 67, "metadata": {}, "output_type": "execute_result" } ], "source": [ "?[count(r), count_unique(r), sum(r), min(r), max(r), mean(r), std_dev(r)] := \n", " *airport{runways: r}" ] }, { "cell_type": "markdown", "id": "c62269c6-9c23-4766-918d-aedb5cfe4bc3", "metadata": {}, "source": [ "Using `country`, we can find countries with no airports:" ] }, { "cell_type": "code", "execution_count": 68, "id": "739902f6-902f-4261-878f-826291596399", "metadata": {}, "outputs": [ { "data": { "text/html": [ "

Completed in 10ms

" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
 desc
0Andorra
1Liechtenstein
2Monaco
3Pitcairn
4San Marino
\n" ], "text/plain": [ "" ] }, "execution_count": 68, "metadata": {}, "output_type": "execute_result" } ], "source": [ "?[desc] := *country{code, desc}, not *airport{country: code}" ] }, { "cell_type": "markdown", "id": "6f65e01a-b43f-48d2-9afa-0021262e6ada", "metadata": {}, "source": [ "The `route` relation by itself is rather boring:" ] }, { "cell_type": "code", "execution_count": 69, "id": "9d297cc5-ba50-44d7-b834-46ffaa6c748d", "metadata": {}, "outputs": [ { "data": { "text/html": [ "

Completed in 0ms

" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
 frtodist
0AAAFAC48.000000
1AAAMKP133.000000
2AAAPPT270.000000
3AAARAR968.000000
4AAEALG254.000000
5AAECDG882.000000
6AAEIST1161.000000
7AAELYS631.000000
8AAEMRS477.000000
9AAEORN477.000000
\n" ], "text/plain": [ "" ] }, "execution_count": 69, "metadata": {}, "output_type": "execute_result" } ], "source": [ "?[fr, to, dist] := *route{fr, to, dist}\n", "\n", ":limit 10" ] }, { "cell_type": "markdown", "id": "9f037a30-6f67-4fd9-ba3a-968ebfe811ff", "metadata": {}, "source": [ "It just records the starting and ending airports of each route, together with the distance. This relation only becomes useful when used as a graph." ] }, { "cell_type": "markdown", "id": "44135ecf-045e-41c2-9c76-fb8eaac8b15b", "metadata": {}, "source": [ "Airports with no routes:" ] }, { "cell_type": "code", "execution_count": 70, "id": "e58de060-2f75-46fb-9ce0-98068e338c09", "metadata": {}, "outputs": [ { "data": { "text/html": [ "

Completed in 54ms

" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
 codedesc
0AFWFort Worth Alliance Airport
1APACentennial Airport
2APKApataki Airport
3BIDBlock Island State Airport
4BVSBreves Airport
5BWUSydney Bankstown Airport
6CRCSanta Ana Airport
7CVTCoventry Airport
8EKAMurray Field
9GYZGruyere Airport
10HFNHornafjordur Airport
11HZKHusavik Airport
12ILGNew Castle Airport
13INTSmith Reynolds Airport
14ISLAtaturk International Airport
15KGGKédougou Airport
16NBWLeeward Point Field
17NFOMata'aho Airport
18PSYStanley Airport
19RIGRio Grande Airport
20SFDSan Fernando De Apure Airport
21SFHSan Felipe International Airport
22SXFBerlin-Schönefeld International Airport *Closed*
23TUATeniente Coronel Luis a Mantilla Airport
24TWBToowoomba Airport
25TXLBerlin, Tegel International Airport *Closed*
26VCVSouthern California Logistics Airport
27YEIBursa Yenişehir Airport
\n" ], "text/plain": [ "" ] }, "execution_count": 70, "metadata": {}, "output_type": "execute_result" } ], "source": [ "?[code, desc] := *airport{code, desc}, not *route{fr: code}, not *route{to: code}" ] }, { "cell_type": "markdown", "id": "e26b06c3-3919-4a72-861d-e63318708496", "metadata": {}, "source": [ "Airports with the most out routes:" ] }, { "cell_type": "code", "execution_count": 71, "id": "1b0a075d-5968-41e6-90df-27ee2034dc84", "metadata": {}, "outputs": [ { "data": { "text/html": [ "

Completed in 98ms

" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
 coden
0FRA310
1IST309
2CDG293
3AMS283
4MUC270
\n" ], "text/plain": [ "" ] }, "execution_count": 71, "metadata": {}, "output_type": "execute_result" } ], "source": [ "route_count[fr, count(fr)] := *route{fr}\n", "?[code, n] := route_count[code, n]\n", "\n", ":sort -n\n", ":limit 5" ] }, { "cell_type": "markdown", "id": "b4154c1e-78f8-4621-9a32-05363d1b13cf", "metadata": {}, "source": [ "How many routes are there from the European Union to the US?" ] }, { "cell_type": "code", "execution_count": 72, "id": "7de94407-206b-42d9-ac3b-cd9a3277bfe3", "metadata": {}, "outputs": [ { "data": { "text/html": [ "

Completed in 46ms

" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
 n
0435
\n" ], "text/plain": [ "" ] }, "execution_count": 72, "metadata": {}, "output_type": "execute_result" } ], "source": [ "routes[unique(r)] := *contain['EU', fr],\n", " *route{fr, to},\n", " *airport{code: to, country: 'US'},\n", " r = [fr, to]\n", "?[n] := routes[rs], n = length(rs)" ] }, { "cell_type": "markdown", "id": "528bc6eb-064d-418f-8aa1-98aeedd64ace", "metadata": {}, "source": [ "How many airports are there in the US with routes from the EU?" ] }, { "cell_type": "code", "execution_count": 73, "id": "7ecfd10d-f05f-443e-a0e9-73480d67dcf1", "metadata": {}, "outputs": [ { "data": { "text/html": [ "

Completed in 40ms

" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
 count_unique(to)
045
\n" ], "text/plain": [ "" ] }, "execution_count": 73, "metadata": {}, "output_type": "execute_result" } ], "source": [ "?[count_unique(to)] := *contain['EU', fr],\n", " *route{fr, to},\n", " *airport{code: to, country: 'US'}\n" ] }, { "cell_type": "markdown", "id": "3a13fa07-aaa4-47c1-aefb-0a93c80e3c18", "metadata": {}, "source": [ "How many routes are there for each airport in London, UK?" ] }, { "cell_type": "code", "execution_count": 74, "id": "52f6c021-1ac7-42a6-910a-7686eb2c653d", "metadata": {}, "outputs": [ { "data": { "text/html": [ "

Completed in 15ms

" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
 codecount(code)
0LCY51
1LGW232
2LHR221
3LTN130
4STN211
\n" ], "text/plain": [ "" ] }, "execution_count": 74, "metadata": {}, "output_type": "execute_result" } ], "source": [ "?[code, count(code)] := *airport{code, city: 'London', region: 'GB-ENG'}, *route{fr: code}" ] }, { "cell_type": "markdown", "id": "19ecd6fa-04e0-4a03-9834-f0dfea861e9a", "metadata": {}, "source": [ "We need to specify the region, because there is another city called London, not in the UK." ] }, { "cell_type": "markdown", "id": "d393dfbb-5ceb-412d-bc71-24db4bcf3261", "metadata": {}, "source": [ "How many airports are reachable from London, UK in two hops?" ] }, { "cell_type": "code", "execution_count": 75, "id": "0689fddc-e02d-4b4c-a621-b17f0431ccf4", "metadata": {}, "outputs": [ { "data": { "text/html": [ "

Completed in 73ms

" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
 count_unique(a3)
02353
\n" ], "text/plain": [ "" ] }, "execution_count": 75, "metadata": {}, "output_type": "execute_result" } ], "source": [ "lon_uk_airports[code] := *airport{code, city: 'London', region: 'GB-ENG'}\n", "one_hop[to] := lon_uk_airports[fr], *route{fr, to}, not lon_uk_airports[to];\n", "?[count_unique(a3)] := one_hop[a2], *route{fr: a2, to: a3}, not lon_uk_airports[a3];" ] }, { "cell_type": "markdown", "id": "4be5b2d5-8aa9-4109-94f4-ccb75e109935", "metadata": {}, "source": [ "What are the cities directly reachable from LGW, but furthermost away?" ] }, { "cell_type": "code", "execution_count": 76, "id": "9ac99669-84dc-4a7a-b985-d1e333a41065", "metadata": {}, "outputs": [ { "data": { "text/html": [ "

Completed in 3ms

" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
 citydist
0Buenos Aires6908.000000
1Singapore6751.000000
2Langkawi6299.000000
3Duong Dong6264.000000
4Taipei6080.000000
5Port Louis6053.000000
6Rayong6008.000000
7Cape Town5987.000000
8Hong Kong5982.000000
9Shanghai5745.000000
\n" ], "text/plain": [ "" ] }, "execution_count": 76, "metadata": {}, "output_type": "execute_result" } ], "source": [ "?[city, dist] := *route{fr: 'LGW', to, dist},\n", " *airport{code: to, city}\n", ":order -dist\n", ":limit 10" ] }, { "cell_type": "markdown", "id": "121df2dd-1926-47eb-80d3-c90fe56b253d", "metadata": {}, "source": [ "What airports are within 0.1 degrees of the Greenwich meridian?" ] }, { "cell_type": "code", "execution_count": 77, "id": "5ac6dee6-c475-4db9-824d-2cbcc774aa23", "metadata": {}, "outputs": [ { "data": { "text/html": [ "

Completed in 9ms

" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
 codedesclonlat
0CDTCastellon De La Plana Airport0.02611139.999199
1LCYLondon City Airport0.05527851.505278
2LDETarbes-Lourdes-Pyrénées Airport-0.00643943.178699
3LEHLe Havre Octeville Airport0.08805649.533901
\n" ], "text/plain": [ "" ] }, "execution_count": 77, "metadata": {}, "output_type": "execute_result" } ], "source": [ "?[code, desc, lon, lat] := *airport{lon, lat, code, desc}, lon > -0.1, lon < 0.1" ] }, { "cell_type": "markdown", "id": "015a1743-edd4-490d-b02f-462fa2506bed", "metadata": {}, "source": [ "Airports in a box drawn around London Heathrow, UK:" ] }, { "cell_type": "code", "execution_count": 78, "id": "9ced92e0-ced4-4e55-a547-c8b06fa50756", "metadata": {}, "outputs": [ { "data": { "text/html": [ "

Completed in 16ms

" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
 codedesc
0LCYLondon City Airport
1LGWLondon Gatwick
2LHRLondon Heathrow
3LTNLondon Luton Airport
4SOUSouthampton Airport
5STNLondon Stansted Airport
\n" ], "text/plain": [ "" ] }, "execution_count": 78, "metadata": {}, "output_type": "execute_result" } ], "source": [ "h_box[lon, lat] := *airport{code: 'LHR', lon, lat}\n", "?[code, desc] := h_box[lhr_lon, lhr_lat], *airport{code, lon, lat, desc},\n", " abs(lhr_lon - lon) < 1, abs(lhr_lat - lat) < 1" ] }, { "cell_type": "markdown", "id": "20046b0e-cd8f-4de6-8f87-a49dcc8b8f7f", "metadata": {}, "source": [ "For some spherical geometry: what is the angle subtended by SFO and NRT on the surface of the earth?" ] }, { "cell_type": "code", "execution_count": 79, "id": "6d133cc5-23cb-47c6-9cc0-2c6bebf049eb", "metadata": {}, "outputs": [ { "data": { "text/html": [ "

Completed in 0ms

" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
 deg_diff
073.992112
\n" ], "text/plain": [ "" ] }, "execution_count": 79, "metadata": {}, "output_type": "execute_result" } ], "source": [ "?[deg_diff] := *airport{code: 'SFO', lat: a_lat, lon: a_lon},\n", " *airport{code: 'NRT', lat: b_lat, lon: b_lon},\n", " deg_diff = rad_to_deg(haversine_deg_input(a_lat, a_lon, b_lat, b_lon))" ] }, { "cell_type": "markdown", "id": "184bc863-b2a7-4dd3-9d27-648315098ec8", "metadata": {}, "source": [ "We mentioned before that aggregations in Cozo are powerful. More powerful than in traditional SQL databases. The power comes from the fact that aggregations can be used in recursions (some restrictions apply).\n", "\n", "Let's say we want to find the distance of the _shortest route_ between two airports. One way to calculate is to enumerate all the routes between the two airports, and then apply `min` aggregation to the results. This cannot be implemented as stated, since the routes may contain cycles and hence there can be an infinite number of routes between two airports.\n", "\n", "Instead, let's think recursively. If we already have all the shortest routes between all nodes, can we derive an _equation_ satisfied by the shortest route? Yes, the shortest route between `a` and `b` is either the distance of a direct route or the sum of the shortest distance from `a` to `c` and the distance of a direct route from `c` to `d`. We apply our `min` aggregation to this recursive set instead. \n", "\n", "Let's write it out and try to find the shortest route between the airports `LHR` and `YPO`:" ] }, { "cell_type": "code", "execution_count": 80, "id": "207a6e99-1e4a-466d-9492-859a680d9207", "metadata": {}, "outputs": [ { "data": { "text/html": [ "

Completed in 100ms

" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
 dist
04147.000000
\n" ], "text/plain": [ "" ] }, "execution_count": 80, "metadata": {}, "output_type": "execute_result" } ], "source": [ "shortest[b, min(dist)] := *route{fr: 'LHR', to: b, dist} \n", " # Start with the airport 'LHR', retrieve a direct route from 'LHR' to b\n", "\n", "shortest[b, min(dist)] := shortest[c, d1], # Start with an existing shortest route from 'LHR' to c\n", " *route{fr: c, to: b, dist: d2}, # Retrieve a direct route from c to b\n", " dist = d1 + d2 # Add the distances\n", "\n", "?[dist] := shortest['YPO', dist] # Extract the answer for 'YPO'. \n", " # We chose it since it is the hardest airport to get to from 'LHR'." ] }, { "cell_type": "markdown", "id": "73d28120-ee91-4893-a796-6cddec3426ad", "metadata": {}, "source": [ "It works. Since path-finding is such a common operation on graphs, Cozo has several fixed rules for that:" ] }, { "cell_type": "code", "execution_count": 81, "id": "e1d8328f-87e5-4884-9f77-f6379c46465b", "metadata": {}, "outputs": [ { "data": { "text/html": [ "

Completed in 57ms

" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
 startinggoaldistancepath
0LHRYPO4147.000000['LHR', 'YUL', 'YVO', 'YKQ', 'YMO', 'YFA', 'ZKE', 'YAT', 'YPO']
\n" ], "text/plain": [ "" ] }, "execution_count": 81, "metadata": {}, "output_type": "execute_result" } ], "source": [ "starting[] <- [['LHR']]\n", "goal[] <- [['YPO']]\n", "?[starting, goal, distance, path] <~ ShortestPathDijkstra(*route[], starting[], goal[])" ] }, { "cell_type": "markdown", "id": "af89a4d7-2260-466c-a119-bb0ae0a84ebf", "metadata": {}, "source": [ "Not only is it more efficient, but we also get a path for the shortest route.\n", "\n", "Not content with the shortest path, the following calculates the shortest ten paths:" ] }, { "cell_type": "code", "execution_count": 82, "id": "050a7fe1-9b6f-4cc8-96b7-9cebac55bdbb", "metadata": {}, "outputs": [ { "data": { "text/html": [ "

Completed in 87ms

" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
 startinggoaldistancepath
0LHRYPO4147.000000['LHR', 'YUL', 'YVO', 'YKQ', 'YMO', 'YFA', 'ZKE', 'YAT', 'YPO']
1LHRYPO4150.000000['LHR', 'DUB', 'YUL', 'YVO', 'YKQ', 'YMO', 'YFA', 'ZKE', 'YAT', 'YPO']
2LHRYPO4164.000000['LHR', 'YUL', 'YMT', 'YKQ', 'YMO', 'YFA', 'ZKE', 'YAT', 'YPO']
3LHRYPO4167.000000['LHR', 'DUB', 'YUL', 'YMT', 'YKQ', 'YMO', 'YFA', 'ZKE', 'YAT', 'YPO']
4LHRYPO4187.000000['LHR', 'MAN', 'DUB', 'YUL', 'YVO', 'YKQ', 'YMO', 'YFA', 'ZKE', 'YAT', 'YPO']
5LHRYPO4202.000000['LHR', 'IOM', 'DUB', 'YUL', 'YVO', 'YKQ', 'YMO', 'YFA', 'ZKE', 'YAT', 'YPO']
6LHRYPO4204.000000['LHR', 'MAN', 'DUB', 'YUL', 'YMT', 'YKQ', 'YMO', 'YFA', 'ZKE', 'YAT', 'YPO']
7LHRYPO4209.000000['LHR', 'YUL', 'YMT', 'YNS', 'YKQ', 'YMO', 'YFA', 'ZKE', 'YAT', 'YPO']
8LHRYPO4211.000000['LHR', 'MAN', 'IOM', 'DUB', 'YUL', 'YVO', 'YKQ', 'YMO', 'YFA', 'ZKE', 'YAT', 'YPO']
9LHRYPO4212.000000['LHR', 'DUB', 'YUL', 'YMT', 'YNS', 'YKQ', 'YMO', 'YFA', 'ZKE', 'YAT', 'YPO']
\n" ], "text/plain": [ "" ] }, "execution_count": 82, "metadata": {}, "output_type": "execute_result" } ], "source": [ "starting[] <- [['LHR']]\n", "goal[] <- [['YPO']]\n", "?[starting, goal, distance, path] <~ KShortestPathYen(*route[], starting[], goal[], k: 10)" ] }, { "cell_type": "markdown", "id": "5b3a398d-95ee-4e4f-91a7-5c95f82c8bd3", "metadata": {}, "source": [ "On the other hand, if efficiency is really important to you, you can use the A* algorithm with a really good heuristic function:" ] }, { "cell_type": "code", "execution_count": 83, "id": "311a46cf-ba33-483a-ae60-c9a4ff2a071b", "metadata": {}, "outputs": [ { "data": { "text/html": [ "

Completed in 28ms

" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
 0123
0LHRYPO4147.000000['LHR', 'YUL', 'YVO', 'YKQ', 'YMO', 'YFA', 'ZKE', 'YAT', 'YPO']
\n" ], "text/plain": [ "" ] }, "execution_count": 83, "metadata": {}, "output_type": "execute_result" } ], "source": [ "code_lat_lon[code, lat, lon] := *airport{code, lat, lon}\n", "starting[code, lat, lon] := code = 'LHR', *airport{code, lat, lon};\n", "goal[code, lat, lon] := code = 'YPO', *airport{code, lat, lon};\n", "?[] <~ ShortestPathAStar(*route[], \n", " code_lat_lon[node, lat1, lon1], \n", " starting[], \n", " goal[goal, lat2, lon2], \n", " heuristic: haversine_deg_input(lat1, lon1, lat2, lon2) * 3963);" ] }, { "cell_type": "markdown", "id": "18a35f6d-7d31-412d-9ab0-abb79655b0d9", "metadata": {}, "source": [ "There's a lot more setup required in this case: we need to retrieve the latitudes and longitudes of airports and do processing on them first.\n", "The number `3963` above is the radius of the earth in miles. \n", "See [here](https://cozodb.github.io/current/manual/algorithms.html#Algo.ShortestPathAStar) for what is going on." ] }, { "cell_type": "markdown", "id": "c522cee1-d4ad-4c75-aa91-9c5e2852eecd", "metadata": {}, "source": [ "The most important airports, by PageRank:" ] }, { "cell_type": "code", "execution_count": 84, "id": "a42d018b-ba09-4e27-9b1a-e780ca0bcc1e", "metadata": {}, "outputs": [ { "data": { "text/html": [ "

Completed in 81ms

" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
 codedescscore
0FRAFrankfurt am Main1.265292
1ISTIstanbul International Airport1.260846
2CDGParis Charles de Gaulle1.251049
3AMSAmsterdam Airport Schiphol1.243261
4MUCMunich International Airport1.230537
5ORDChicago O'Hare International Airport1.220283
6DFWDallas/Fort Worth International Airport1.208827
7DXBDubai International Airport1.208430
8PEKBeijing Capital International Airport1.208074
9ATLHartsfield - Jackson Atlanta International Airport1.199858
\n" ], "text/plain": [ "" ] }, "execution_count": 84, "metadata": {}, "output_type": "execute_result" } ], "source": [ "rank[code, score] <~ PageRank(*route[a, b])\n", "?[code, desc, score] := rank[code, score], *airport{code, desc}\n", "\n", ":limit 10;\n", ":order -score" ] }, { "cell_type": "markdown", "id": "c2be4f97-5173-456e-a0d9-b7cf99bf9786", "metadata": {}, "source": [ "The following example takes a long time to run since it calculates the betweenness centrality.\n", "Algorithms for calculating the betweenness centrality have high complexity." ] }, { "cell_type": "code", "execution_count": 85, "id": "aaad3658-f2c2-4dde-bfe4-9b52e2b3f589", "metadata": {}, "outputs": [ { "data": { "text/html": [ "

Completed in 2888ms

" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
 codedescscore
0ANCAnchorage Ted Stevens1074869.260952
1KEFReykjavik, Keflavik International Airport928449.975037
2HELHelsinki Ventaa581588.490562
3PEKBeijing Capital International Airport532020.425300
4DELIndira Gandhi International Airport472979.963291
5ISTIstanbul International Airport457882.076744
6PKCYelizovo Airport408571.027619
7MSPMinneapolis-St.Paul International Airport396433.049206
8LAXLos Angeles International Airport393310.114286
9DENDenver International Airport374339.835975
\n" ], "text/plain": [ "" ] }, "execution_count": 85, "metadata": {}, "output_type": "execute_result" } ], "source": [ "centrality[code, score] <~ BetweennessCentrality(*route[a, b])\n", "?[code, desc, score] := centrality[code, score], *airport{code, desc}\n", "\n", ":limit 10;\n", ":order -score" ] }, { "cell_type": "markdown", "id": "02ef7f3e-50c5-41bd-8879-17127a0369cc", "metadata": {}, "source": [ "These are the airports that, if disconnected from the network, cause the most disruption." ] }, { "cell_type": "markdown", "id": "27763722-f643-42cb-af4e-1f56277e194b", "metadata": {}, "source": [ "That's it for the tutorial. Continue with the [Manual](https://cozodb.github.io/current/manual/index.html) if you want more details." ] }, { "cell_type": "code", "execution_count": null, "id": "4f2c6f0c-e6a4-4eeb-b33b-288bdfc8863d", "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.6" } }, "nbformat": 4, "nbformat_minor": 5 }