{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# The distillation" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "%reload_ext pycozo.ipyext_direct\n", "%cozo_auth tutorial *******" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Welcome back! You already know how to use simple Datalog queries and stored relations in Cozo, and you have learned the intricacies of schema-based triple stores. Today we are going to learn about aggregations and algorithms.\n", "\n", "Before we start, we need to get some data into the database so that we can play with them. Instead of sesame-seed-sized inline data we used the last few times, today we are moving towards peanut-sized data. The data we are going to use, and many examples that we will present, are adapted from the book [Practical Gremlin](https://kelvinlawrence.net/book/Gremlin-Graph-Guide.html), which teaches the Gremlin graph query language, a very different, imperative take on graphs (Datalog, by constrast, is declarative). It is always a good idea to explore different options for your problem and to decide for yourself which tool is best for you." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We start by defining the schema we need:" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "
\n", " | attr_id | \n", "op | \n", "
---|---|---|
0 | \n", "10000011 | \n", "assert | \n", "
1 | \n", "10000012 | \n", "assert | \n", "
2 | \n", "10000013 | \n", "assert | \n", "
3 | \n", "10000014 | \n", "assert | \n", "
4 | \n", "10000015 | \n", "assert | \n", "
5 | \n", "10000016 | \n", "assert | \n", "
6 | \n", "10000017 | \n", "assert | \n", "
7 | \n", "10000018 | \n", "assert | \n", "
8 | \n", "10000019 | \n", "assert | \n", "
9 | \n", "10000020 | \n", "assert | \n", "
10 | \n", "10000021 | \n", "assert | \n", "
11 | \n", "10000022 | \n", "assert | \n", "
12 | \n", "10000023 | \n", "assert | \n", "
13 | \n", "10000024 | \n", "assert | \n", "
14 | \n", "10000025 | \n", "assert | \n", "
15 | \n", "10000026 | \n", "assert | \n", "
16 | \n", "10000027 | \n", "assert | \n", "
17 | \n", "10000028 | \n", "assert | \n", "
18 | \n", "10000029 | \n", "assert | \n", "
\n", " | asserts | \n", "retracts | \n", "
---|---|---|
0 | \n", "197646 | \n", "0 | \n", "
\n", " | iata | \n", "city | \n", "desc | \n", "region | \n", "runways | \n", "lat | \n", "lon | \n", "
---|---|---|---|---|---|---|---|
0 | \n", "ANC | \n", "Anchorage | \n", "Anchorage Ted Stevens | \n", "US-AK | \n", "3 | \n", "61.174400 | \n", "-149.996002 | \n", "
1 | \n", "ATL | \n", "Atlanta | \n", "Hartsfield - Jackson Atlanta International Airport | \n", "US-GA | \n", "5 | \n", "33.636700 | \n", "-84.428101 | \n", "
2 | \n", "AUS | \n", "Austin | \n", "Austin Bergstrom International Airport | \n", "US-TX | \n", "2 | \n", "30.194500 | \n", "-97.669899 | \n", "
3 | \n", "BNA | \n", "Nashville | \n", "Nashville International Airport | \n", "US-TN | \n", "4 | \n", "36.124500 | \n", "-86.678200 | \n", "
4 | \n", "BOS | \n", "Boston | \n", "Boston Logan | \n", "US-MA | \n", "6 | \n", "42.364300 | \n", "-71.005203 | \n", "
\n", " | iata | \n", "city | \n", "desc | \n", "region | \n", "runways | \n", "lat | \n", "lon | \n", "
---|---|---|---|---|---|---|---|
0 | \n", "BNA | \n", "Nashville | \n", "Nashville International Airport | \n", "US-TN | \n", "4 | \n", "36.124500 | \n", "-86.678200 | \n", "
1 | \n", "BOS | \n", "Boston | \n", "Boston Logan | \n", "US-MA | \n", "6 | \n", "42.364300 | \n", "-71.005203 | \n", "
\n", " | iata | \n", "city | \n", "desc | \n", "region | \n", "runways | \n", "lat | \n", "lon | \n", "
---|---|---|---|---|---|---|---|
0 | \n", "AAA | \n", "Anaa | \n", "Anaa Airport | \n", "PF-U-A | \n", "1 | \n", "-17.352600 | \n", "-145.509995 | \n", "
1 | \n", "AAE | \n", "Annabah | \n", "Annaba Airport | \n", "DZ-36 | \n", "2 | \n", "36.822201 | \n", "7.809170 | \n", "
2 | \n", "AAL | \n", "Aalborg | \n", "Aalborg Airport | \n", "DK-81 | \n", "2 | \n", "57.092759 | \n", "9.849243 | \n", "
3 | \n", "AAN | \n", "Al Ain | \n", "Al Ain International Airport | \n", "AE-AZ | \n", "1 | \n", "24.261700 | \n", "55.609200 | \n", "
4 | \n", "AAQ | \n", "Anapa | \n", "Anapa Airport | \n", "RU-KDA | \n", "1 | \n", "45.002102 | \n", "37.347301 | \n", "
\n", " | iata | \n", "city | \n", "desc | \n", "region | \n", "runways | \n", "lat | \n", "lon | \n", "
---|---|---|---|---|---|---|---|
0 | \n", "DFW | \n", "Dallas | \n", "Dallas/Fort Worth International Airport | \n", "US-TX | \n", "7 | \n", "32.896801 | \n", "-97.038002 | \n", "
1 | \n", "ORD | \n", "Chicago | \n", "Chicago O'Hare International Airport | \n", "US-IL | \n", "7 | \n", "41.978600 | \n", "-87.904800 | \n", "
2 | \n", "DTW | \n", "Detroit | \n", "Detroit Metropolitan, Wayne County | \n", "US-MI | \n", "6 | \n", "42.212399 | \n", "-83.353401 | \n", "
3 | \n", "DEN | \n", "Denver | \n", "Denver International Airport | \n", "US-CO | \n", "6 | \n", "39.861698 | \n", "-104.672997 | \n", "
4 | \n", "BOS | \n", "Boston | \n", "Boston Logan | \n", "US-MA | \n", "6 | \n", "42.364300 | \n", "-71.005203 | \n", "
5 | \n", "AMS | \n", "Amsterdam | \n", "Amsterdam Airport Schiphol | \n", "NL-NH | \n", "6 | \n", "52.308601 | \n", "4.763890 | \n", "
6 | \n", "UFA | \n", "Ufa | \n", "Ufa International Airport | \n", "RU-BA | \n", "5 | \n", "54.557499 | \n", "55.874401 | \n", "
7 | \n", "YYZ | \n", "Toronto | \n", "Toronto Pearson International Airport | \n", "CA-ON | \n", "5 | \n", "43.677200 | \n", "-79.630600 | \n", "
8 | \n", "TRG | \n", "Tauranga | \n", "Tauranga Airport | \n", "NZ-BOP | \n", "5 | \n", "-37.671902 | \n", "176.195999 | \n", "
9 | \n", "SNN | \n", "Shannon | \n", "Shannon Airport | \n", "IE-CE | \n", "5 | \n", "52.702000 | \n", "-8.924820 | \n", "
\n", " | count(a) | \n", "
---|---|
0 | \n", "3504 | \n", "
\n", " | count(initial) | \n", "initial | \n", "
---|---|---|
0 | \n", "212 | \n", "A | \n", "
1 | \n", "235 | \n", "B | \n", "
2 | \n", "214 | \n", "C | \n", "
3 | \n", "116 | \n", "D | \n", "
4 | \n", "95 | \n", "E | \n", "
5 | \n", "76 | \n", "F | \n", "
6 | \n", "135 | \n", "G | \n", "
7 | \n", "129 | \n", "H | \n", "
8 | \n", "112 | \n", "I | \n", "
9 | \n", "80 | \n", "J | \n", "
10 | \n", "197 | \n", "K | \n", "
11 | \n", "184 | \n", "L | \n", "
12 | \n", "228 | \n", "M | \n", "
13 | \n", "111 | \n", "N | \n", "
14 | \n", "89 | \n", "O | \n", "
15 | \n", "203 | \n", "P | \n", "
16 | \n", "7 | \n", "Q | \n", "
17 | \n", "121 | \n", "R | \n", "
18 | \n", "245 | \n", "S | \n", "
19 | \n", "205 | \n", "T | \n", "
20 | \n", "77 | \n", "U | \n", "
21 | \n", "86 | \n", "V | \n", "
22 | \n", "59 | \n", "W | \n", "
23 | \n", "28 | \n", "X | \n", "
24 | \n", "211 | \n", "Y | \n", "
25 | \n", "49 | \n", "Z | \n", "
\n", " | count(initial) | \n", "initial | \n", "
---|---|---|
0 | \n", "1 | \n", "A | \n", "
1 | \n", "1 | \n", "B | \n", "
2 | \n", "1 | \n", "C | \n", "
3 | \n", "1 | \n", "D | \n", "
4 | \n", "1 | \n", "E | \n", "
5 | \n", "1 | \n", "F | \n", "
6 | \n", "1 | \n", "G | \n", "
7 | \n", "1 | \n", "H | \n", "
8 | \n", "1 | \n", "I | \n", "
9 | \n", "1 | \n", "J | \n", "
10 | \n", "1 | \n", "K | \n", "
11 | \n", "1 | \n", "L | \n", "
12 | \n", "1 | \n", "M | \n", "
13 | \n", "1 | \n", "N | \n", "
14 | \n", "1 | \n", "O | \n", "
15 | \n", "1 | \n", "P | \n", "
16 | \n", "1 | \n", "Q | \n", "
17 | \n", "1 | \n", "R | \n", "
18 | \n", "1 | \n", "S | \n", "
19 | \n", "1 | \n", "T | \n", "
20 | \n", "1 | \n", "U | \n", "
21 | \n", "1 | \n", "V | \n", "
22 | \n", "1 | \n", "W | \n", "
23 | \n", "1 | \n", "X | \n", "
24 | \n", "1 | \n", "Y | \n", "
25 | \n", "1 | \n", "Z | \n", "
\n", " | count(initial) | \n", "initial | \n", "
---|---|---|
0 | \n", "212 | \n", "A | \n", "
1 | \n", "235 | \n", "B | \n", "
2 | \n", "214 | \n", "C | \n", "
3 | \n", "116 | \n", "D | \n", "
4 | \n", "95 | \n", "E | \n", "
5 | \n", "76 | \n", "F | \n", "
6 | \n", "135 | \n", "G | \n", "
7 | \n", "129 | \n", "H | \n", "
8 | \n", "112 | \n", "I | \n", "
9 | \n", "80 | \n", "J | \n", "
10 | \n", "197 | \n", "K | \n", "
11 | \n", "184 | \n", "L | \n", "
12 | \n", "228 | \n", "M | \n", "
13 | \n", "111 | \n", "N | \n", "
14 | \n", "89 | \n", "O | \n", "
15 | \n", "203 | \n", "P | \n", "
16 | \n", "7 | \n", "Q | \n", "
17 | \n", "121 | \n", "R | \n", "
18 | \n", "245 | \n", "S | \n", "
19 | \n", "205 | \n", "T | \n", "
20 | \n", "77 | \n", "U | \n", "
21 | \n", "86 | \n", "V | \n", "
22 | \n", "59 | \n", "W | \n", "
23 | \n", "28 | \n", "X | \n", "
24 | \n", "211 | \n", "Y | \n", "
25 | \n", "49 | \n", "Z | \n", "
\n", " | count(r) | \n", "count_unique(r) | \n", "sum(r) | \n", "min(r) | \n", "max(r) | \n", "mean(r) | \n", "std_dev(r) | \n", "
---|---|---|---|---|---|---|---|
0 | \n", "3504 | \n", "7 | \n", "4980.000000 | \n", "1 | \n", "7 | \n", "1.421233 | \n", "0.743083 | \n", "
\n", " | dist | \n", "
---|---|
0 | \n", "4147 | \n", "
\n", " | starting | \n", "goal | \n", "distance | \n", "path | \n", "
---|---|---|---|---|
0 | \n", "LHR | \n", "YPO | \n", "4147.000000 | \n", "['LHR', 'YUL', 'YVO', 'YKQ', 'YMO', 'YFA', 'ZKE', 'YAT', 'YPO'] | \n", "
\n", " | starting | \n", "goal | \n", "distance | \n", "path | \n", "
---|---|---|---|---|
0 | \n", "LHR | \n", "YPO | \n", "4147.000000 | \n", "['LHR', 'YUL', 'YVO', 'YKQ', 'YMO', 'YFA', 'ZKE', 'YAT', 'YPO'] | \n", "
1 | \n", "LHR | \n", "YPO | \n", "4150.000000 | \n", "['LHR', 'DUB', 'YUL', 'YVO', 'YKQ', 'YMO', 'YFA', 'ZKE', 'YAT', 'YPO'] | \n", "
2 | \n", "LHR | \n", "YPO | \n", "4164.000000 | \n", "['LHR', 'YUL', 'YMT', 'YKQ', 'YMO', 'YFA', 'ZKE', 'YAT', 'YPO'] | \n", "
3 | \n", "LHR | \n", "YPO | \n", "4167.000000 | \n", "['LHR', 'DUB', 'YUL', 'YMT', 'YKQ', 'YMO', 'YFA', 'ZKE', 'YAT', 'YPO'] | \n", "
4 | \n", "LHR | \n", "YPO | \n", "4187.000000 | \n", "['LHR', 'MAN', 'DUB', 'YUL', 'YVO', 'YKQ', 'YMO', 'YFA', 'ZKE', 'YAT', 'YPO'] | \n", "
5 | \n", "LHR | \n", "YPO | \n", "4202.000000 | \n", "['LHR', 'IOM', 'DUB', 'YUL', 'YVO', 'YKQ', 'YMO', 'YFA', 'ZKE', 'YAT', 'YPO'] | \n", "
6 | \n", "LHR | \n", "YPO | \n", "4204.000000 | \n", "['LHR', 'MAN', 'DUB', 'YUL', 'YMT', 'YKQ', 'YMO', 'YFA', 'ZKE', 'YAT', 'YPO'] | \n", "
7 | \n", "LHR | \n", "YPO | \n", "4209.000000 | \n", "['LHR', 'YUL', 'YMT', 'YNS', 'YKQ', 'YMO', 'YFA', 'ZKE', 'YAT', 'YPO'] | \n", "
8 | \n", "LHR | \n", "YPO | \n", "4211.000000 | \n", "['LHR', 'MAN', 'IOM', 'DUB', 'YUL', 'YVO', 'YKQ', 'YMO', 'YFA', 'ZKE', 'YAT', 'YPO'] | \n", "
9 | \n", "LHR | \n", "YPO | \n", "4212.000000 | \n", "['LHR', 'DUB', 'YUL', 'YMT', 'YNS', 'YKQ', 'YMO', 'YFA', 'ZKE', 'YAT', 'YPO'] | \n", "