You cannot select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

1798 lines
92 KiB
Plaintext

This file contains ambiguous Unicode characters!

This file contains ambiguous Unicode characters that may be confused with others in your current locale. If your use case is intentional and legitimate, you can safely ignore this warning. Use the Escape button to highlight these characters.

{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# The distillation"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"%reload_ext pycozo.ipyext_direct\n",
"%cozo_auth tutorial *******"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Welcome back! You already know how to use simple Datalog queries and stored relations in Cozo, and you have learned the intricacies of schema-based triple stores. Today we are going to learn about aggregations and algorithms.\n",
"\n",
"Before we start, we need to get some data into the database so that we can play with them. Instead of sesame-seed-sized inline data we used the last few times, today we are moving towards peanut-sized data. The data we are going to use, and many examples that we will present, are adapted from the book [Practical Gremlin](https://kelvinlawrence.net/book/Gremlin-Graph-Guide.html), which teaches the Gremlin graph query language, a very different, imperative take on graphs (Datalog, by constrast, is declarative). It is always a good idea to explore different options for your problem and to decide for yourself which tool is best for you."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We start by defining the schema we need:"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<style type=\"text/css\">\n",
"#T_16dfa_row0_col0, #T_16dfa_row1_col0, #T_16dfa_row2_col0, #T_16dfa_row3_col0, #T_16dfa_row4_col0, #T_16dfa_row5_col0, #T_16dfa_row6_col0, #T_16dfa_row7_col0, #T_16dfa_row8_col0, #T_16dfa_row9_col0, #T_16dfa_row10_col0, #T_16dfa_row11_col0, #T_16dfa_row12_col0, #T_16dfa_row13_col0, #T_16dfa_row14_col0, #T_16dfa_row15_col0, #T_16dfa_row16_col0, #T_16dfa_row17_col0, #T_16dfa_row18_col0 {\n",
" color: #307fc1;\n",
"}\n",
"#T_16dfa_row0_col1, #T_16dfa_row1_col1, #T_16dfa_row2_col1, #T_16dfa_row3_col1, #T_16dfa_row4_col1, #T_16dfa_row5_col1, #T_16dfa_row6_col1, #T_16dfa_row7_col1, #T_16dfa_row8_col1, #T_16dfa_row9_col1, #T_16dfa_row10_col1, #T_16dfa_row11_col1, #T_16dfa_row12_col1, #T_16dfa_row13_col1, #T_16dfa_row14_col1, #T_16dfa_row15_col1, #T_16dfa_row16_col1, #T_16dfa_row17_col1, #T_16dfa_row18_col1 {\n",
" color: black;\n",
"}\n",
"</style>\n",
"<table id=\"T_16dfa\">\n",
" <thead>\n",
" <tr>\n",
" <th class=\"blank level0\" >&nbsp;</th>\n",
" <th id=\"T_16dfa_level0_col0\" class=\"col_heading level0 col0\" >attr_id</th>\n",
" <th id=\"T_16dfa_level0_col1\" class=\"col_heading level0 col1\" >op</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th id=\"T_16dfa_level0_row0\" class=\"row_heading level0 row0\" >0</th>\n",
" <td id=\"T_16dfa_row0_col0\" class=\"data row0 col0\" >10000011</td>\n",
" <td id=\"T_16dfa_row0_col1\" class=\"data row0 col1\" >assert</td>\n",
" </tr>\n",
" <tr>\n",
" <th id=\"T_16dfa_level0_row1\" class=\"row_heading level0 row1\" >1</th>\n",
" <td id=\"T_16dfa_row1_col0\" class=\"data row1 col0\" >10000012</td>\n",
" <td id=\"T_16dfa_row1_col1\" class=\"data row1 col1\" >assert</td>\n",
" </tr>\n",
" <tr>\n",
" <th id=\"T_16dfa_level0_row2\" class=\"row_heading level0 row2\" >2</th>\n",
" <td id=\"T_16dfa_row2_col0\" class=\"data row2 col0\" >10000013</td>\n",
" <td id=\"T_16dfa_row2_col1\" class=\"data row2 col1\" >assert</td>\n",
" </tr>\n",
" <tr>\n",
" <th id=\"T_16dfa_level0_row3\" class=\"row_heading level0 row3\" >3</th>\n",
" <td id=\"T_16dfa_row3_col0\" class=\"data row3 col0\" >10000014</td>\n",
" <td id=\"T_16dfa_row3_col1\" class=\"data row3 col1\" >assert</td>\n",
" </tr>\n",
" <tr>\n",
" <th id=\"T_16dfa_level0_row4\" class=\"row_heading level0 row4\" >4</th>\n",
" <td id=\"T_16dfa_row4_col0\" class=\"data row4 col0\" >10000015</td>\n",
" <td id=\"T_16dfa_row4_col1\" class=\"data row4 col1\" >assert</td>\n",
" </tr>\n",
" <tr>\n",
" <th id=\"T_16dfa_level0_row5\" class=\"row_heading level0 row5\" >5</th>\n",
" <td id=\"T_16dfa_row5_col0\" class=\"data row5 col0\" >10000016</td>\n",
" <td id=\"T_16dfa_row5_col1\" class=\"data row5 col1\" >assert</td>\n",
" </tr>\n",
" <tr>\n",
" <th id=\"T_16dfa_level0_row6\" class=\"row_heading level0 row6\" >6</th>\n",
" <td id=\"T_16dfa_row6_col0\" class=\"data row6 col0\" >10000017</td>\n",
" <td id=\"T_16dfa_row6_col1\" class=\"data row6 col1\" >assert</td>\n",
" </tr>\n",
" <tr>\n",
" <th id=\"T_16dfa_level0_row7\" class=\"row_heading level0 row7\" >7</th>\n",
" <td id=\"T_16dfa_row7_col0\" class=\"data row7 col0\" >10000018</td>\n",
" <td id=\"T_16dfa_row7_col1\" class=\"data row7 col1\" >assert</td>\n",
" </tr>\n",
" <tr>\n",
" <th id=\"T_16dfa_level0_row8\" class=\"row_heading level0 row8\" >8</th>\n",
" <td id=\"T_16dfa_row8_col0\" class=\"data row8 col0\" >10000019</td>\n",
" <td id=\"T_16dfa_row8_col1\" class=\"data row8 col1\" >assert</td>\n",
" </tr>\n",
" <tr>\n",
" <th id=\"T_16dfa_level0_row9\" class=\"row_heading level0 row9\" >9</th>\n",
" <td id=\"T_16dfa_row9_col0\" class=\"data row9 col0\" >10000020</td>\n",
" <td id=\"T_16dfa_row9_col1\" class=\"data row9 col1\" >assert</td>\n",
" </tr>\n",
" <tr>\n",
" <th id=\"T_16dfa_level0_row10\" class=\"row_heading level0 row10\" >10</th>\n",
" <td id=\"T_16dfa_row10_col0\" class=\"data row10 col0\" >10000021</td>\n",
" <td id=\"T_16dfa_row10_col1\" class=\"data row10 col1\" >assert</td>\n",
" </tr>\n",
" <tr>\n",
" <th id=\"T_16dfa_level0_row11\" class=\"row_heading level0 row11\" >11</th>\n",
" <td id=\"T_16dfa_row11_col0\" class=\"data row11 col0\" >10000022</td>\n",
" <td id=\"T_16dfa_row11_col1\" class=\"data row11 col1\" >assert</td>\n",
" </tr>\n",
" <tr>\n",
" <th id=\"T_16dfa_level0_row12\" class=\"row_heading level0 row12\" >12</th>\n",
" <td id=\"T_16dfa_row12_col0\" class=\"data row12 col0\" >10000023</td>\n",
" <td id=\"T_16dfa_row12_col1\" class=\"data row12 col1\" >assert</td>\n",
" </tr>\n",
" <tr>\n",
" <th id=\"T_16dfa_level0_row13\" class=\"row_heading level0 row13\" >13</th>\n",
" <td id=\"T_16dfa_row13_col0\" class=\"data row13 col0\" >10000024</td>\n",
" <td id=\"T_16dfa_row13_col1\" class=\"data row13 col1\" >assert</td>\n",
" </tr>\n",
" <tr>\n",
" <th id=\"T_16dfa_level0_row14\" class=\"row_heading level0 row14\" >14</th>\n",
" <td id=\"T_16dfa_row14_col0\" class=\"data row14 col0\" >10000025</td>\n",
" <td id=\"T_16dfa_row14_col1\" class=\"data row14 col1\" >assert</td>\n",
" </tr>\n",
" <tr>\n",
" <th id=\"T_16dfa_level0_row15\" class=\"row_heading level0 row15\" >15</th>\n",
" <td id=\"T_16dfa_row15_col0\" class=\"data row15 col0\" >10000026</td>\n",
" <td id=\"T_16dfa_row15_col1\" class=\"data row15 col1\" >assert</td>\n",
" </tr>\n",
" <tr>\n",
" <th id=\"T_16dfa_level0_row16\" class=\"row_heading level0 row16\" >16</th>\n",
" <td id=\"T_16dfa_row16_col0\" class=\"data row16 col0\" >10000027</td>\n",
" <td id=\"T_16dfa_row16_col1\" class=\"data row16 col1\" >assert</td>\n",
" </tr>\n",
" <tr>\n",
" <th id=\"T_16dfa_level0_row17\" class=\"row_heading level0 row17\" >17</th>\n",
" <td id=\"T_16dfa_row17_col0\" class=\"data row17 col0\" >10000028</td>\n",
" <td id=\"T_16dfa_row17_col1\" class=\"data row17 col1\" >assert</td>\n",
" </tr>\n",
" <tr>\n",
" <th id=\"T_16dfa_level0_row18\" class=\"row_heading level0 row18\" >18</th>\n",
" <td id=\"T_16dfa_row18_col0\" class=\"data row18 col0\" >10000029</td>\n",
" <td id=\"T_16dfa_row18_col1\" class=\"data row18 col1\" >assert</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n"
],
"text/plain": [
"<pandas.io.formats.style.Styler at 0x107e10190>"
]
},
"execution_count": 2,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
":schema\n",
"\n",
":put country {\n",
" code: string unique,\n",
" desc: string\n",
"}\n",
"\n",
":put continent {\n",
" code: string unique,\n",
" desc: string\n",
"}\n",
"\n",
":put airport {\n",
" iata: string unique,\n",
" icao: string index,\n",
" city: string index,\n",
" desc: string,\n",
" region: string index,\n",
" country: ref,\n",
" runways: int,\n",
" longest: int,\n",
" altitude: int,\n",
" lat: float,\n",
" lon: float\n",
"}\n",
"\n",
":put route {\n",
" src: ref,\n",
" dst: ref,\n",
" distance: int\n",
"}\n",
"\n",
":put geo {\n",
" contains: ref many,\n",
"}"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We intend the entities to be countries, continents, airports and routes. The attribute `geo.contains` denotes geographical inclusion. In our case, the `src` and `dst` of a `route` are always airport entities. Airports are uniquely identified by their `iata` code, and contain a slew of other attributes including latitudes and longitudes."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now download the data, look over it to see what it contains, put it somewhere on your hard drive (we recommend next to the `cozoserver` executable so that the following script works verbatim) and run:"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<style type=\"text/css\">\n",
"#T_c0ae1_row0_col0, #T_c0ae1_row0_col1 {\n",
" color: #307fc1;\n",
"}\n",
"</style>\n",
"<table id=\"T_c0ae1\">\n",
" <thead>\n",
" <tr>\n",
" <th class=\"blank level0\" >&nbsp;</th>\n",
" <th id=\"T_c0ae1_level0_col0\" class=\"col_heading level0 col0\" >asserts</th>\n",
" <th id=\"T_c0ae1_level0_col1\" class=\"col_heading level0 col1\" >retracts</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th id=\"T_c0ae1_level0_row0\" class=\"row_heading level0 row0\" >0</th>\n",
" <td id=\"T_c0ae1_row0_col0\" class=\"data row0 col0\" >197646</td>\n",
" <td id=\"T_c0ae1_row0_col1\" class=\"data row0 col1\" >0</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n"
],
"text/plain": [
"<pandas.io.formats.style.Styler at 0x107e106d0>"
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
":db execute '../tests/air-routes-data.json'"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The execution should not take to long. When it's done, we are set."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"> Though peanut-sized by today's standard, the data still contains over 61k lines of JSON objects, some of which are quite long lines (yes, each line in the tx script is a valid JSON object), and it seems that the Python libraries we used to write the extension can't quite handle it. If you use the IPython magic `%%cozo_run_file` to run it, your python process will likely hang."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Exploratory data analysis"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The data is new to us. First we need to see what it looks like. Let's start with airports."
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<style type=\"text/css\">\n",
"#T_ad2bb_row0_col0, #T_ad2bb_row0_col1, #T_ad2bb_row0_col2, #T_ad2bb_row0_col3, #T_ad2bb_row1_col0, #T_ad2bb_row1_col1, #T_ad2bb_row1_col2, #T_ad2bb_row1_col3, #T_ad2bb_row2_col0, #T_ad2bb_row2_col1, #T_ad2bb_row2_col2, #T_ad2bb_row2_col3, #T_ad2bb_row3_col0, #T_ad2bb_row3_col1, #T_ad2bb_row3_col2, #T_ad2bb_row3_col3, #T_ad2bb_row4_col0, #T_ad2bb_row4_col1, #T_ad2bb_row4_col2, #T_ad2bb_row4_col3 {\n",
" color: black;\n",
"}\n",
"#T_ad2bb_row0_col4, #T_ad2bb_row0_col5, #T_ad2bb_row0_col6, #T_ad2bb_row1_col4, #T_ad2bb_row1_col5, #T_ad2bb_row1_col6, #T_ad2bb_row2_col4, #T_ad2bb_row2_col5, #T_ad2bb_row2_col6, #T_ad2bb_row3_col4, #T_ad2bb_row3_col5, #T_ad2bb_row3_col6, #T_ad2bb_row4_col4, #T_ad2bb_row4_col5, #T_ad2bb_row4_col6 {\n",
" color: #307fc1;\n",
"}\n",
"</style>\n",
"<table id=\"T_ad2bb\">\n",
" <thead>\n",
" <tr>\n",
" <th class=\"blank level0\" >&nbsp;</th>\n",
" <th id=\"T_ad2bb_level0_col0\" class=\"col_heading level0 col0\" >iata</th>\n",
" <th id=\"T_ad2bb_level0_col1\" class=\"col_heading level0 col1\" >city</th>\n",
" <th id=\"T_ad2bb_level0_col2\" class=\"col_heading level0 col2\" >desc</th>\n",
" <th id=\"T_ad2bb_level0_col3\" class=\"col_heading level0 col3\" >region</th>\n",
" <th id=\"T_ad2bb_level0_col4\" class=\"col_heading level0 col4\" >runways</th>\n",
" <th id=\"T_ad2bb_level0_col5\" class=\"col_heading level0 col5\" >lat</th>\n",
" <th id=\"T_ad2bb_level0_col6\" class=\"col_heading level0 col6\" >lon</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th id=\"T_ad2bb_level0_row0\" class=\"row_heading level0 row0\" >0</th>\n",
" <td id=\"T_ad2bb_row0_col0\" class=\"data row0 col0\" >ANC</td>\n",
" <td id=\"T_ad2bb_row0_col1\" class=\"data row0 col1\" >Anchorage</td>\n",
" <td id=\"T_ad2bb_row0_col2\" class=\"data row0 col2\" >Anchorage Ted Stevens</td>\n",
" <td id=\"T_ad2bb_row0_col3\" class=\"data row0 col3\" >US-AK</td>\n",
" <td id=\"T_ad2bb_row0_col4\" class=\"data row0 col4\" >3</td>\n",
" <td id=\"T_ad2bb_row0_col5\" class=\"data row0 col5\" >61.174400</td>\n",
" <td id=\"T_ad2bb_row0_col6\" class=\"data row0 col6\" >-149.996002</td>\n",
" </tr>\n",
" <tr>\n",
" <th id=\"T_ad2bb_level0_row1\" class=\"row_heading level0 row1\" >1</th>\n",
" <td id=\"T_ad2bb_row1_col0\" class=\"data row1 col0\" >ATL</td>\n",
" <td id=\"T_ad2bb_row1_col1\" class=\"data row1 col1\" >Atlanta</td>\n",
" <td id=\"T_ad2bb_row1_col2\" class=\"data row1 col2\" >Hartsfield - Jackson Atlanta International Airport</td>\n",
" <td id=\"T_ad2bb_row1_col3\" class=\"data row1 col3\" >US-GA</td>\n",
" <td id=\"T_ad2bb_row1_col4\" class=\"data row1 col4\" >5</td>\n",
" <td id=\"T_ad2bb_row1_col5\" class=\"data row1 col5\" >33.636700</td>\n",
" <td id=\"T_ad2bb_row1_col6\" class=\"data row1 col6\" >-84.428101</td>\n",
" </tr>\n",
" <tr>\n",
" <th id=\"T_ad2bb_level0_row2\" class=\"row_heading level0 row2\" >2</th>\n",
" <td id=\"T_ad2bb_row2_col0\" class=\"data row2 col0\" >AUS</td>\n",
" <td id=\"T_ad2bb_row2_col1\" class=\"data row2 col1\" >Austin</td>\n",
" <td id=\"T_ad2bb_row2_col2\" class=\"data row2 col2\" >Austin Bergstrom International Airport</td>\n",
" <td id=\"T_ad2bb_row2_col3\" class=\"data row2 col3\" >US-TX</td>\n",
" <td id=\"T_ad2bb_row2_col4\" class=\"data row2 col4\" >2</td>\n",
" <td id=\"T_ad2bb_row2_col5\" class=\"data row2 col5\" >30.194500</td>\n",
" <td id=\"T_ad2bb_row2_col6\" class=\"data row2 col6\" >-97.669899</td>\n",
" </tr>\n",
" <tr>\n",
" <th id=\"T_ad2bb_level0_row3\" class=\"row_heading level0 row3\" >3</th>\n",
" <td id=\"T_ad2bb_row3_col0\" class=\"data row3 col0\" >BNA</td>\n",
" <td id=\"T_ad2bb_row3_col1\" class=\"data row3 col1\" >Nashville</td>\n",
" <td id=\"T_ad2bb_row3_col2\" class=\"data row3 col2\" >Nashville International Airport</td>\n",
" <td id=\"T_ad2bb_row3_col3\" class=\"data row3 col3\" >US-TN</td>\n",
" <td id=\"T_ad2bb_row3_col4\" class=\"data row3 col4\" >4</td>\n",
" <td id=\"T_ad2bb_row3_col5\" class=\"data row3 col5\" >36.124500</td>\n",
" <td id=\"T_ad2bb_row3_col6\" class=\"data row3 col6\" >-86.678200</td>\n",
" </tr>\n",
" <tr>\n",
" <th id=\"T_ad2bb_level0_row4\" class=\"row_heading level0 row4\" >4</th>\n",
" <td id=\"T_ad2bb_row4_col0\" class=\"data row4 col0\" >BOS</td>\n",
" <td id=\"T_ad2bb_row4_col1\" class=\"data row4 col1\" >Boston</td>\n",
" <td id=\"T_ad2bb_row4_col2\" class=\"data row4 col2\" >Boston Logan</td>\n",
" <td id=\"T_ad2bb_row4_col3\" class=\"data row4 col3\" >US-MA</td>\n",
" <td id=\"T_ad2bb_row4_col4\" class=\"data row4 col4\" >6</td>\n",
" <td id=\"T_ad2bb_row4_col5\" class=\"data row4 col5\" >42.364300</td>\n",
" <td id=\"T_ad2bb_row4_col6\" class=\"data row4 col6\" >-71.005203</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n"
],
"text/plain": [
"<pandas.io.formats.style.Styler at 0x104573100>"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"?[iata, city, desc, region, runways, lat, lon] := \n",
" [a airport.iata iata],\n",
" [a airport.city city],\n",
" [a airport.desc desc],\n",
" [a airport.region region],\n",
" [a airport.runways runways],\n",
" [a airport.lat lat],\n",
" [a airport.lon lon]\n",
" \n",
":limit 5"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The only notable thing about this query is that we used the `:limit` option to limit the number of output rows. If we did not put it in, thousands of rows will be returned and your browser may not like it. The `:offset` option is also available:"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<style type=\"text/css\">\n",
"#T_e6af6_row0_col0, #T_e6af6_row0_col1, #T_e6af6_row0_col2, #T_e6af6_row0_col3, #T_e6af6_row1_col0, #T_e6af6_row1_col1, #T_e6af6_row1_col2, #T_e6af6_row1_col3 {\n",
" color: black;\n",
"}\n",
"#T_e6af6_row0_col4, #T_e6af6_row0_col5, #T_e6af6_row0_col6, #T_e6af6_row1_col4, #T_e6af6_row1_col5, #T_e6af6_row1_col6 {\n",
" color: #307fc1;\n",
"}\n",
"</style>\n",
"<table id=\"T_e6af6\">\n",
" <thead>\n",
" <tr>\n",
" <th class=\"blank level0\" >&nbsp;</th>\n",
" <th id=\"T_e6af6_level0_col0\" class=\"col_heading level0 col0\" >iata</th>\n",
" <th id=\"T_e6af6_level0_col1\" class=\"col_heading level0 col1\" >city</th>\n",
" <th id=\"T_e6af6_level0_col2\" class=\"col_heading level0 col2\" >desc</th>\n",
" <th id=\"T_e6af6_level0_col3\" class=\"col_heading level0 col3\" >region</th>\n",
" <th id=\"T_e6af6_level0_col4\" class=\"col_heading level0 col4\" >runways</th>\n",
" <th id=\"T_e6af6_level0_col5\" class=\"col_heading level0 col5\" >lat</th>\n",
" <th id=\"T_e6af6_level0_col6\" class=\"col_heading level0 col6\" >lon</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th id=\"T_e6af6_level0_row0\" class=\"row_heading level0 row0\" >0</th>\n",
" <td id=\"T_e6af6_row0_col0\" class=\"data row0 col0\" >BNA</td>\n",
" <td id=\"T_e6af6_row0_col1\" class=\"data row0 col1\" >Nashville</td>\n",
" <td id=\"T_e6af6_row0_col2\" class=\"data row0 col2\" >Nashville International Airport</td>\n",
" <td id=\"T_e6af6_row0_col3\" class=\"data row0 col3\" >US-TN</td>\n",
" <td id=\"T_e6af6_row0_col4\" class=\"data row0 col4\" >4</td>\n",
" <td id=\"T_e6af6_row0_col5\" class=\"data row0 col5\" >36.124500</td>\n",
" <td id=\"T_e6af6_row0_col6\" class=\"data row0 col6\" >-86.678200</td>\n",
" </tr>\n",
" <tr>\n",
" <th id=\"T_e6af6_level0_row1\" class=\"row_heading level0 row1\" >1</th>\n",
" <td id=\"T_e6af6_row1_col0\" class=\"data row1 col0\" >BOS</td>\n",
" <td id=\"T_e6af6_row1_col1\" class=\"data row1 col1\" >Boston</td>\n",
" <td id=\"T_e6af6_row1_col2\" class=\"data row1 col2\" >Boston Logan</td>\n",
" <td id=\"T_e6af6_row1_col3\" class=\"data row1 col3\" >US-MA</td>\n",
" <td id=\"T_e6af6_row1_col4\" class=\"data row1 col4\" >6</td>\n",
" <td id=\"T_e6af6_row1_col5\" class=\"data row1 col5\" >42.364300</td>\n",
" <td id=\"T_e6af6_row1_col6\" class=\"data row1 col6\" >-71.005203</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n"
],
"text/plain": [
"<pandas.io.formats.style.Styler at 0x10459e950>"
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"?[iata, city, desc, region, runways, lat, lon] := \n",
" [a airport.iata iata],\n",
" [a airport.city city],\n",
" [a airport.desc desc],\n",
" [a airport.region region],\n",
" [a airport.runways runways],\n",
" [a airport.lat lat],\n",
" [a airport.lon lon]\n",
"\n",
":offset 3\n",
":limit 2"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"There is a subtle point here: when you specify `:limit`, the database is constrained to return only that many rows to you. But _which_ rows it gives you is not specified (for performance reasons). In our case, even though the first returned IATA is ANC, that doesn't mean the smallest IATA is ANC (the output is sorted, yes, but only among the rows themselves). In fact, the query didn't even look at all the rows, since it can already satisfy what you ask it for by looking only at five rows!"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"If you want \"global\" sorting for your results before applying `:limit`, you have to ask for it and the database will be forced to look at all the data:"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<style type=\"text/css\">\n",
"#T_22bd9_row0_col0, #T_22bd9_row0_col1, #T_22bd9_row0_col2, #T_22bd9_row0_col3, #T_22bd9_row1_col0, #T_22bd9_row1_col1, #T_22bd9_row1_col2, #T_22bd9_row1_col3, #T_22bd9_row2_col0, #T_22bd9_row2_col1, #T_22bd9_row2_col2, #T_22bd9_row2_col3, #T_22bd9_row3_col0, #T_22bd9_row3_col1, #T_22bd9_row3_col2, #T_22bd9_row3_col3, #T_22bd9_row4_col0, #T_22bd9_row4_col1, #T_22bd9_row4_col2, #T_22bd9_row4_col3 {\n",
" color: black;\n",
"}\n",
"#T_22bd9_row0_col4, #T_22bd9_row0_col5, #T_22bd9_row0_col6, #T_22bd9_row1_col4, #T_22bd9_row1_col5, #T_22bd9_row1_col6, #T_22bd9_row2_col4, #T_22bd9_row2_col5, #T_22bd9_row2_col6, #T_22bd9_row3_col4, #T_22bd9_row3_col5, #T_22bd9_row3_col6, #T_22bd9_row4_col4, #T_22bd9_row4_col5, #T_22bd9_row4_col6 {\n",
" color: #307fc1;\n",
"}\n",
"</style>\n",
"<table id=\"T_22bd9\">\n",
" <thead>\n",
" <tr>\n",
" <th class=\"blank level0\" >&nbsp;</th>\n",
" <th id=\"T_22bd9_level0_col0\" class=\"col_heading level0 col0\" >iata</th>\n",
" <th id=\"T_22bd9_level0_col1\" class=\"col_heading level0 col1\" >city</th>\n",
" <th id=\"T_22bd9_level0_col2\" class=\"col_heading level0 col2\" >desc</th>\n",
" <th id=\"T_22bd9_level0_col3\" class=\"col_heading level0 col3\" >region</th>\n",
" <th id=\"T_22bd9_level0_col4\" class=\"col_heading level0 col4\" >runways</th>\n",
" <th id=\"T_22bd9_level0_col5\" class=\"col_heading level0 col5\" >lat</th>\n",
" <th id=\"T_22bd9_level0_col6\" class=\"col_heading level0 col6\" >lon</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th id=\"T_22bd9_level0_row0\" class=\"row_heading level0 row0\" >0</th>\n",
" <td id=\"T_22bd9_row0_col0\" class=\"data row0 col0\" >AAA</td>\n",
" <td id=\"T_22bd9_row0_col1\" class=\"data row0 col1\" >Anaa</td>\n",
" <td id=\"T_22bd9_row0_col2\" class=\"data row0 col2\" >Anaa Airport</td>\n",
" <td id=\"T_22bd9_row0_col3\" class=\"data row0 col3\" >PF-U-A</td>\n",
" <td id=\"T_22bd9_row0_col4\" class=\"data row0 col4\" >1</td>\n",
" <td id=\"T_22bd9_row0_col5\" class=\"data row0 col5\" >-17.352600</td>\n",
" <td id=\"T_22bd9_row0_col6\" class=\"data row0 col6\" >-145.509995</td>\n",
" </tr>\n",
" <tr>\n",
" <th id=\"T_22bd9_level0_row1\" class=\"row_heading level0 row1\" >1</th>\n",
" <td id=\"T_22bd9_row1_col0\" class=\"data row1 col0\" >AAE</td>\n",
" <td id=\"T_22bd9_row1_col1\" class=\"data row1 col1\" >Annabah</td>\n",
" <td id=\"T_22bd9_row1_col2\" class=\"data row1 col2\" >Annaba Airport</td>\n",
" <td id=\"T_22bd9_row1_col3\" class=\"data row1 col3\" >DZ-36</td>\n",
" <td id=\"T_22bd9_row1_col4\" class=\"data row1 col4\" >2</td>\n",
" <td id=\"T_22bd9_row1_col5\" class=\"data row1 col5\" >36.822201</td>\n",
" <td id=\"T_22bd9_row1_col6\" class=\"data row1 col6\" >7.809170</td>\n",
" </tr>\n",
" <tr>\n",
" <th id=\"T_22bd9_level0_row2\" class=\"row_heading level0 row2\" >2</th>\n",
" <td id=\"T_22bd9_row2_col0\" class=\"data row2 col0\" >AAL</td>\n",
" <td id=\"T_22bd9_row2_col1\" class=\"data row2 col1\" >Aalborg</td>\n",
" <td id=\"T_22bd9_row2_col2\" class=\"data row2 col2\" >Aalborg Airport</td>\n",
" <td id=\"T_22bd9_row2_col3\" class=\"data row2 col3\" >DK-81</td>\n",
" <td id=\"T_22bd9_row2_col4\" class=\"data row2 col4\" >2</td>\n",
" <td id=\"T_22bd9_row2_col5\" class=\"data row2 col5\" >57.092759</td>\n",
" <td id=\"T_22bd9_row2_col6\" class=\"data row2 col6\" >9.849243</td>\n",
" </tr>\n",
" <tr>\n",
" <th id=\"T_22bd9_level0_row3\" class=\"row_heading level0 row3\" >3</th>\n",
" <td id=\"T_22bd9_row3_col0\" class=\"data row3 col0\" >AAN</td>\n",
" <td id=\"T_22bd9_row3_col1\" class=\"data row3 col1\" >Al Ain</td>\n",
" <td id=\"T_22bd9_row3_col2\" class=\"data row3 col2\" >Al Ain International Airport</td>\n",
" <td id=\"T_22bd9_row3_col3\" class=\"data row3 col3\" >AE-AZ</td>\n",
" <td id=\"T_22bd9_row3_col4\" class=\"data row3 col4\" >1</td>\n",
" <td id=\"T_22bd9_row3_col5\" class=\"data row3 col5\" >24.261700</td>\n",
" <td id=\"T_22bd9_row3_col6\" class=\"data row3 col6\" >55.609200</td>\n",
" </tr>\n",
" <tr>\n",
" <th id=\"T_22bd9_level0_row4\" class=\"row_heading level0 row4\" >4</th>\n",
" <td id=\"T_22bd9_row4_col0\" class=\"data row4 col0\" >AAQ</td>\n",
" <td id=\"T_22bd9_row4_col1\" class=\"data row4 col1\" >Anapa</td>\n",
" <td id=\"T_22bd9_row4_col2\" class=\"data row4 col2\" >Anapa Airport</td>\n",
" <td id=\"T_22bd9_row4_col3\" class=\"data row4 col3\" >RU-KDA</td>\n",
" <td id=\"T_22bd9_row4_col4\" class=\"data row4 col4\" >1</td>\n",
" <td id=\"T_22bd9_row4_col5\" class=\"data row4 col5\" >45.002102</td>\n",
" <td id=\"T_22bd9_row4_col6\" class=\"data row4 col6\" >37.347301</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n"
],
"text/plain": [
"<pandas.io.formats.style.Styler at 0x107e10340>"
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"?[iata, city, desc, region, runways, lat, lon] := \n",
" [a airport.iata iata],\n",
" [a airport.city city],\n",
" [a airport.desc desc],\n",
" [a airport.region region],\n",
" [a airport.runways runways],\n",
" [a airport.lat lat],\n",
" [a airport.lon lon]\n",
" \n",
":limit 5\n",
":order iata"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"You can also sort in descending order (by prefixing the sorted column name by the minus sign), or sort by multiple columns:"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<style type=\"text/css\">\n",
"#T_0ba5a_row0_col0, #T_0ba5a_row0_col1, #T_0ba5a_row0_col2, #T_0ba5a_row0_col3, #T_0ba5a_row1_col0, #T_0ba5a_row1_col1, #T_0ba5a_row1_col2, #T_0ba5a_row1_col3, #T_0ba5a_row2_col0, #T_0ba5a_row2_col1, #T_0ba5a_row2_col2, #T_0ba5a_row2_col3, #T_0ba5a_row3_col0, #T_0ba5a_row3_col1, #T_0ba5a_row3_col2, #T_0ba5a_row3_col3, #T_0ba5a_row4_col0, #T_0ba5a_row4_col1, #T_0ba5a_row4_col2, #T_0ba5a_row4_col3, #T_0ba5a_row5_col0, #T_0ba5a_row5_col1, #T_0ba5a_row5_col2, #T_0ba5a_row5_col3, #T_0ba5a_row6_col0, #T_0ba5a_row6_col1, #T_0ba5a_row6_col2, #T_0ba5a_row6_col3, #T_0ba5a_row7_col0, #T_0ba5a_row7_col1, #T_0ba5a_row7_col2, #T_0ba5a_row7_col3, #T_0ba5a_row8_col0, #T_0ba5a_row8_col1, #T_0ba5a_row8_col2, #T_0ba5a_row8_col3, #T_0ba5a_row9_col0, #T_0ba5a_row9_col1, #T_0ba5a_row9_col2, #T_0ba5a_row9_col3 {\n",
" color: black;\n",
"}\n",
"#T_0ba5a_row0_col4, #T_0ba5a_row0_col5, #T_0ba5a_row0_col6, #T_0ba5a_row1_col4, #T_0ba5a_row1_col5, #T_0ba5a_row1_col6, #T_0ba5a_row2_col4, #T_0ba5a_row2_col5, #T_0ba5a_row2_col6, #T_0ba5a_row3_col4, #T_0ba5a_row3_col5, #T_0ba5a_row3_col6, #T_0ba5a_row4_col4, #T_0ba5a_row4_col5, #T_0ba5a_row4_col6, #T_0ba5a_row5_col4, #T_0ba5a_row5_col5, #T_0ba5a_row5_col6, #T_0ba5a_row6_col4, #T_0ba5a_row6_col5, #T_0ba5a_row6_col6, #T_0ba5a_row7_col4, #T_0ba5a_row7_col5, #T_0ba5a_row7_col6, #T_0ba5a_row8_col4, #T_0ba5a_row8_col5, #T_0ba5a_row8_col6, #T_0ba5a_row9_col4, #T_0ba5a_row9_col5, #T_0ba5a_row9_col6 {\n",
" color: #307fc1;\n",
"}\n",
"</style>\n",
"<table id=\"T_0ba5a\">\n",
" <thead>\n",
" <tr>\n",
" <th class=\"blank level0\" >&nbsp;</th>\n",
" <th id=\"T_0ba5a_level0_col0\" class=\"col_heading level0 col0\" >iata</th>\n",
" <th id=\"T_0ba5a_level0_col1\" class=\"col_heading level0 col1\" >city</th>\n",
" <th id=\"T_0ba5a_level0_col2\" class=\"col_heading level0 col2\" >desc</th>\n",
" <th id=\"T_0ba5a_level0_col3\" class=\"col_heading level0 col3\" >region</th>\n",
" <th id=\"T_0ba5a_level0_col4\" class=\"col_heading level0 col4\" >runways</th>\n",
" <th id=\"T_0ba5a_level0_col5\" class=\"col_heading level0 col5\" >lat</th>\n",
" <th id=\"T_0ba5a_level0_col6\" class=\"col_heading level0 col6\" >lon</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th id=\"T_0ba5a_level0_row0\" class=\"row_heading level0 row0\" >0</th>\n",
" <td id=\"T_0ba5a_row0_col0\" class=\"data row0 col0\" >DFW</td>\n",
" <td id=\"T_0ba5a_row0_col1\" class=\"data row0 col1\" >Dallas</td>\n",
" <td id=\"T_0ba5a_row0_col2\" class=\"data row0 col2\" >Dallas/Fort Worth International Airport</td>\n",
" <td id=\"T_0ba5a_row0_col3\" class=\"data row0 col3\" >US-TX</td>\n",
" <td id=\"T_0ba5a_row0_col4\" class=\"data row0 col4\" >7</td>\n",
" <td id=\"T_0ba5a_row0_col5\" class=\"data row0 col5\" >32.896801</td>\n",
" <td id=\"T_0ba5a_row0_col6\" class=\"data row0 col6\" >-97.038002</td>\n",
" </tr>\n",
" <tr>\n",
" <th id=\"T_0ba5a_level0_row1\" class=\"row_heading level0 row1\" >1</th>\n",
" <td id=\"T_0ba5a_row1_col0\" class=\"data row1 col0\" >ORD</td>\n",
" <td id=\"T_0ba5a_row1_col1\" class=\"data row1 col1\" >Chicago</td>\n",
" <td id=\"T_0ba5a_row1_col2\" class=\"data row1 col2\" >Chicago O'Hare International Airport</td>\n",
" <td id=\"T_0ba5a_row1_col3\" class=\"data row1 col3\" >US-IL</td>\n",
" <td id=\"T_0ba5a_row1_col4\" class=\"data row1 col4\" >7</td>\n",
" <td id=\"T_0ba5a_row1_col5\" class=\"data row1 col5\" >41.978600</td>\n",
" <td id=\"T_0ba5a_row1_col6\" class=\"data row1 col6\" >-87.904800</td>\n",
" </tr>\n",
" <tr>\n",
" <th id=\"T_0ba5a_level0_row2\" class=\"row_heading level0 row2\" >2</th>\n",
" <td id=\"T_0ba5a_row2_col0\" class=\"data row2 col0\" >DTW</td>\n",
" <td id=\"T_0ba5a_row2_col1\" class=\"data row2 col1\" >Detroit</td>\n",
" <td id=\"T_0ba5a_row2_col2\" class=\"data row2 col2\" >Detroit Metropolitan, Wayne County</td>\n",
" <td id=\"T_0ba5a_row2_col3\" class=\"data row2 col3\" >US-MI</td>\n",
" <td id=\"T_0ba5a_row2_col4\" class=\"data row2 col4\" >6</td>\n",
" <td id=\"T_0ba5a_row2_col5\" class=\"data row2 col5\" >42.212399</td>\n",
" <td id=\"T_0ba5a_row2_col6\" class=\"data row2 col6\" >-83.353401</td>\n",
" </tr>\n",
" <tr>\n",
" <th id=\"T_0ba5a_level0_row3\" class=\"row_heading level0 row3\" >3</th>\n",
" <td id=\"T_0ba5a_row3_col0\" class=\"data row3 col0\" >DEN</td>\n",
" <td id=\"T_0ba5a_row3_col1\" class=\"data row3 col1\" >Denver</td>\n",
" <td id=\"T_0ba5a_row3_col2\" class=\"data row3 col2\" >Denver International Airport</td>\n",
" <td id=\"T_0ba5a_row3_col3\" class=\"data row3 col3\" >US-CO</td>\n",
" <td id=\"T_0ba5a_row3_col4\" class=\"data row3 col4\" >6</td>\n",
" <td id=\"T_0ba5a_row3_col5\" class=\"data row3 col5\" >39.861698</td>\n",
" <td id=\"T_0ba5a_row3_col6\" class=\"data row3 col6\" >-104.672997</td>\n",
" </tr>\n",
" <tr>\n",
" <th id=\"T_0ba5a_level0_row4\" class=\"row_heading level0 row4\" >4</th>\n",
" <td id=\"T_0ba5a_row4_col0\" class=\"data row4 col0\" >BOS</td>\n",
" <td id=\"T_0ba5a_row4_col1\" class=\"data row4 col1\" >Boston</td>\n",
" <td id=\"T_0ba5a_row4_col2\" class=\"data row4 col2\" >Boston Logan</td>\n",
" <td id=\"T_0ba5a_row4_col3\" class=\"data row4 col3\" >US-MA</td>\n",
" <td id=\"T_0ba5a_row4_col4\" class=\"data row4 col4\" >6</td>\n",
" <td id=\"T_0ba5a_row4_col5\" class=\"data row4 col5\" >42.364300</td>\n",
" <td id=\"T_0ba5a_row4_col6\" class=\"data row4 col6\" >-71.005203</td>\n",
" </tr>\n",
" <tr>\n",
" <th id=\"T_0ba5a_level0_row5\" class=\"row_heading level0 row5\" >5</th>\n",
" <td id=\"T_0ba5a_row5_col0\" class=\"data row5 col0\" >AMS</td>\n",
" <td id=\"T_0ba5a_row5_col1\" class=\"data row5 col1\" >Amsterdam</td>\n",
" <td id=\"T_0ba5a_row5_col2\" class=\"data row5 col2\" >Amsterdam Airport Schiphol</td>\n",
" <td id=\"T_0ba5a_row5_col3\" class=\"data row5 col3\" >NL-NH</td>\n",
" <td id=\"T_0ba5a_row5_col4\" class=\"data row5 col4\" >6</td>\n",
" <td id=\"T_0ba5a_row5_col5\" class=\"data row5 col5\" >52.308601</td>\n",
" <td id=\"T_0ba5a_row5_col6\" class=\"data row5 col6\" >4.763890</td>\n",
" </tr>\n",
" <tr>\n",
" <th id=\"T_0ba5a_level0_row6\" class=\"row_heading level0 row6\" >6</th>\n",
" <td id=\"T_0ba5a_row6_col0\" class=\"data row6 col0\" >UFA</td>\n",
" <td id=\"T_0ba5a_row6_col1\" class=\"data row6 col1\" >Ufa</td>\n",
" <td id=\"T_0ba5a_row6_col2\" class=\"data row6 col2\" >Ufa International Airport</td>\n",
" <td id=\"T_0ba5a_row6_col3\" class=\"data row6 col3\" >RU-BA</td>\n",
" <td id=\"T_0ba5a_row6_col4\" class=\"data row6 col4\" >5</td>\n",
" <td id=\"T_0ba5a_row6_col5\" class=\"data row6 col5\" >54.557499</td>\n",
" <td id=\"T_0ba5a_row6_col6\" class=\"data row6 col6\" >55.874401</td>\n",
" </tr>\n",
" <tr>\n",
" <th id=\"T_0ba5a_level0_row7\" class=\"row_heading level0 row7\" >7</th>\n",
" <td id=\"T_0ba5a_row7_col0\" class=\"data row7 col0\" >YYZ</td>\n",
" <td id=\"T_0ba5a_row7_col1\" class=\"data row7 col1\" >Toronto</td>\n",
" <td id=\"T_0ba5a_row7_col2\" class=\"data row7 col2\" >Toronto Pearson International Airport</td>\n",
" <td id=\"T_0ba5a_row7_col3\" class=\"data row7 col3\" >CA-ON</td>\n",
" <td id=\"T_0ba5a_row7_col4\" class=\"data row7 col4\" >5</td>\n",
" <td id=\"T_0ba5a_row7_col5\" class=\"data row7 col5\" >43.677200</td>\n",
" <td id=\"T_0ba5a_row7_col6\" class=\"data row7 col6\" >-79.630600</td>\n",
" </tr>\n",
" <tr>\n",
" <th id=\"T_0ba5a_level0_row8\" class=\"row_heading level0 row8\" >8</th>\n",
" <td id=\"T_0ba5a_row8_col0\" class=\"data row8 col0\" >TRG</td>\n",
" <td id=\"T_0ba5a_row8_col1\" class=\"data row8 col1\" >Tauranga</td>\n",
" <td id=\"T_0ba5a_row8_col2\" class=\"data row8 col2\" >Tauranga Airport</td>\n",
" <td id=\"T_0ba5a_row8_col3\" class=\"data row8 col3\" >NZ-BOP</td>\n",
" <td id=\"T_0ba5a_row8_col4\" class=\"data row8 col4\" >5</td>\n",
" <td id=\"T_0ba5a_row8_col5\" class=\"data row8 col5\" >-37.671902</td>\n",
" <td id=\"T_0ba5a_row8_col6\" class=\"data row8 col6\" >176.195999</td>\n",
" </tr>\n",
" <tr>\n",
" <th id=\"T_0ba5a_level0_row9\" class=\"row_heading level0 row9\" >9</th>\n",
" <td id=\"T_0ba5a_row9_col0\" class=\"data row9 col0\" >SNN</td>\n",
" <td id=\"T_0ba5a_row9_col1\" class=\"data row9 col1\" >Shannon</td>\n",
" <td id=\"T_0ba5a_row9_col2\" class=\"data row9 col2\" >Shannon Airport</td>\n",
" <td id=\"T_0ba5a_row9_col3\" class=\"data row9 col3\" >IE-CE</td>\n",
" <td id=\"T_0ba5a_row9_col4\" class=\"data row9 col4\" >5</td>\n",
" <td id=\"T_0ba5a_row9_col5\" class=\"data row9 col5\" >52.702000</td>\n",
" <td id=\"T_0ba5a_row9_col6\" class=\"data row9 col6\" >-8.924820</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n"
],
"text/plain": [
"<pandas.io.formats.style.Styler at 0x104571fc0>"
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"?[iata, city, desc, region, runways, lat, lon] := \n",
" [a airport.iata iata],\n",
" [a airport.city city],\n",
" [a airport.desc desc],\n",
" [a airport.region region],\n",
" [a airport.runways runways],\n",
" [a airport.lat lat],\n",
" [a airport.lon lon]\n",
" \n",
":limit 10\n",
":order -runways, -city"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The above query finds the airports with the most runways, sorted by their city in reverse alphabetical order."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Of course, the first question when we have new data is \"how many rows\". We delayed answering this question since it requires aggregation (technically you can do it with aggregation since the query language we learned in the first tutorial is already Turing complete. But you need to get back lots of irrelevant stuff together with the count if you do it that way. Turing machines are not efficient). Here it is, how to count:"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<style type=\"text/css\">\n",
"#T_08319_row0_col0 {\n",
" color: #307fc1;\n",
"}\n",
"</style>\n",
"<table id=\"T_08319\">\n",
" <thead>\n",
" <tr>\n",
" <th class=\"blank level0\" >&nbsp;</th>\n",
" <th id=\"T_08319_level0_col0\" class=\"col_heading level0 col0\" >count(a)</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th id=\"T_08319_level0_row0\" class=\"row_heading level0 row0\" >0</th>\n",
" <td id=\"T_08319_row0_col0\" class=\"data row0 col0\" >3504</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n"
],
"text/plain": [
"<pandas.io.formats.style.Styler at 0x112008040>"
]
},
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"?[count(a)] := [a airport.iata iata]\n",
":order count(a)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The body of the rule is simple: we asked for all triples with the unique attribute `airport.iata`. But the aggregation `count` is applied to the _head_ of the rule instead of within the rule body."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We can mix aggregated head symbols with non-aggregates:"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<style type=\"text/css\">\n",
"#T_a7188_row0_col0, #T_a7188_row1_col0, #T_a7188_row2_col0, #T_a7188_row3_col0, #T_a7188_row4_col0, #T_a7188_row5_col0, #T_a7188_row6_col0, #T_a7188_row7_col0, #T_a7188_row8_col0, #T_a7188_row9_col0, #T_a7188_row10_col0, #T_a7188_row11_col0, #T_a7188_row12_col0, #T_a7188_row13_col0, #T_a7188_row14_col0, #T_a7188_row15_col0, #T_a7188_row16_col0, #T_a7188_row17_col0, #T_a7188_row18_col0, #T_a7188_row19_col0, #T_a7188_row20_col0, #T_a7188_row21_col0, #T_a7188_row22_col0, #T_a7188_row23_col0, #T_a7188_row24_col0, #T_a7188_row25_col0 {\n",
" color: #307fc1;\n",
"}\n",
"#T_a7188_row0_col1, #T_a7188_row1_col1, #T_a7188_row2_col1, #T_a7188_row3_col1, #T_a7188_row4_col1, #T_a7188_row5_col1, #T_a7188_row6_col1, #T_a7188_row7_col1, #T_a7188_row8_col1, #T_a7188_row9_col1, #T_a7188_row10_col1, #T_a7188_row11_col1, #T_a7188_row12_col1, #T_a7188_row13_col1, #T_a7188_row14_col1, #T_a7188_row15_col1, #T_a7188_row16_col1, #T_a7188_row17_col1, #T_a7188_row18_col1, #T_a7188_row19_col1, #T_a7188_row20_col1, #T_a7188_row21_col1, #T_a7188_row22_col1, #T_a7188_row23_col1, #T_a7188_row24_col1, #T_a7188_row25_col1 {\n",
" color: black;\n",
"}\n",
"</style>\n",
"<table id=\"T_a7188\">\n",
" <thead>\n",
" <tr>\n",
" <th class=\"blank level0\" >&nbsp;</th>\n",
" <th id=\"T_a7188_level0_col0\" class=\"col_heading level0 col0\" >count(initial)</th>\n",
" <th id=\"T_a7188_level0_col1\" class=\"col_heading level0 col1\" >initial</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th id=\"T_a7188_level0_row0\" class=\"row_heading level0 row0\" >0</th>\n",
" <td id=\"T_a7188_row0_col0\" class=\"data row0 col0\" >212</td>\n",
" <td id=\"T_a7188_row0_col1\" class=\"data row0 col1\" >A</td>\n",
" </tr>\n",
" <tr>\n",
" <th id=\"T_a7188_level0_row1\" class=\"row_heading level0 row1\" >1</th>\n",
" <td id=\"T_a7188_row1_col0\" class=\"data row1 col0\" >235</td>\n",
" <td id=\"T_a7188_row1_col1\" class=\"data row1 col1\" >B</td>\n",
" </tr>\n",
" <tr>\n",
" <th id=\"T_a7188_level0_row2\" class=\"row_heading level0 row2\" >2</th>\n",
" <td id=\"T_a7188_row2_col0\" class=\"data row2 col0\" >214</td>\n",
" <td id=\"T_a7188_row2_col1\" class=\"data row2 col1\" >C</td>\n",
" </tr>\n",
" <tr>\n",
" <th id=\"T_a7188_level0_row3\" class=\"row_heading level0 row3\" >3</th>\n",
" <td id=\"T_a7188_row3_col0\" class=\"data row3 col0\" >116</td>\n",
" <td id=\"T_a7188_row3_col1\" class=\"data row3 col1\" >D</td>\n",
" </tr>\n",
" <tr>\n",
" <th id=\"T_a7188_level0_row4\" class=\"row_heading level0 row4\" >4</th>\n",
" <td id=\"T_a7188_row4_col0\" class=\"data row4 col0\" >95</td>\n",
" <td id=\"T_a7188_row4_col1\" class=\"data row4 col1\" >E</td>\n",
" </tr>\n",
" <tr>\n",
" <th id=\"T_a7188_level0_row5\" class=\"row_heading level0 row5\" >5</th>\n",
" <td id=\"T_a7188_row5_col0\" class=\"data row5 col0\" >76</td>\n",
" <td id=\"T_a7188_row5_col1\" class=\"data row5 col1\" >F</td>\n",
" </tr>\n",
" <tr>\n",
" <th id=\"T_a7188_level0_row6\" class=\"row_heading level0 row6\" >6</th>\n",
" <td id=\"T_a7188_row6_col0\" class=\"data row6 col0\" >135</td>\n",
" <td id=\"T_a7188_row6_col1\" class=\"data row6 col1\" >G</td>\n",
" </tr>\n",
" <tr>\n",
" <th id=\"T_a7188_level0_row7\" class=\"row_heading level0 row7\" >7</th>\n",
" <td id=\"T_a7188_row7_col0\" class=\"data row7 col0\" >129</td>\n",
" <td id=\"T_a7188_row7_col1\" class=\"data row7 col1\" >H</td>\n",
" </tr>\n",
" <tr>\n",
" <th id=\"T_a7188_level0_row8\" class=\"row_heading level0 row8\" >8</th>\n",
" <td id=\"T_a7188_row8_col0\" class=\"data row8 col0\" >112</td>\n",
" <td id=\"T_a7188_row8_col1\" class=\"data row8 col1\" >I</td>\n",
" </tr>\n",
" <tr>\n",
" <th id=\"T_a7188_level0_row9\" class=\"row_heading level0 row9\" >9</th>\n",
" <td id=\"T_a7188_row9_col0\" class=\"data row9 col0\" >80</td>\n",
" <td id=\"T_a7188_row9_col1\" class=\"data row9 col1\" >J</td>\n",
" </tr>\n",
" <tr>\n",
" <th id=\"T_a7188_level0_row10\" class=\"row_heading level0 row10\" >10</th>\n",
" <td id=\"T_a7188_row10_col0\" class=\"data row10 col0\" >197</td>\n",
" <td id=\"T_a7188_row10_col1\" class=\"data row10 col1\" >K</td>\n",
" </tr>\n",
" <tr>\n",
" <th id=\"T_a7188_level0_row11\" class=\"row_heading level0 row11\" >11</th>\n",
" <td id=\"T_a7188_row11_col0\" class=\"data row11 col0\" >184</td>\n",
" <td id=\"T_a7188_row11_col1\" class=\"data row11 col1\" >L</td>\n",
" </tr>\n",
" <tr>\n",
" <th id=\"T_a7188_level0_row12\" class=\"row_heading level0 row12\" >12</th>\n",
" <td id=\"T_a7188_row12_col0\" class=\"data row12 col0\" >228</td>\n",
" <td id=\"T_a7188_row12_col1\" class=\"data row12 col1\" >M</td>\n",
" </tr>\n",
" <tr>\n",
" <th id=\"T_a7188_level0_row13\" class=\"row_heading level0 row13\" >13</th>\n",
" <td id=\"T_a7188_row13_col0\" class=\"data row13 col0\" >111</td>\n",
" <td id=\"T_a7188_row13_col1\" class=\"data row13 col1\" >N</td>\n",
" </tr>\n",
" <tr>\n",
" <th id=\"T_a7188_level0_row14\" class=\"row_heading level0 row14\" >14</th>\n",
" <td id=\"T_a7188_row14_col0\" class=\"data row14 col0\" >89</td>\n",
" <td id=\"T_a7188_row14_col1\" class=\"data row14 col1\" >O</td>\n",
" </tr>\n",
" <tr>\n",
" <th id=\"T_a7188_level0_row15\" class=\"row_heading level0 row15\" >15</th>\n",
" <td id=\"T_a7188_row15_col0\" class=\"data row15 col0\" >203</td>\n",
" <td id=\"T_a7188_row15_col1\" class=\"data row15 col1\" >P</td>\n",
" </tr>\n",
" <tr>\n",
" <th id=\"T_a7188_level0_row16\" class=\"row_heading level0 row16\" >16</th>\n",
" <td id=\"T_a7188_row16_col0\" class=\"data row16 col0\" >7</td>\n",
" <td id=\"T_a7188_row16_col1\" class=\"data row16 col1\" >Q</td>\n",
" </tr>\n",
" <tr>\n",
" <th id=\"T_a7188_level0_row17\" class=\"row_heading level0 row17\" >17</th>\n",
" <td id=\"T_a7188_row17_col0\" class=\"data row17 col0\" >121</td>\n",
" <td id=\"T_a7188_row17_col1\" class=\"data row17 col1\" >R</td>\n",
" </tr>\n",
" <tr>\n",
" <th id=\"T_a7188_level0_row18\" class=\"row_heading level0 row18\" >18</th>\n",
" <td id=\"T_a7188_row18_col0\" class=\"data row18 col0\" >245</td>\n",
" <td id=\"T_a7188_row18_col1\" class=\"data row18 col1\" >S</td>\n",
" </tr>\n",
" <tr>\n",
" <th id=\"T_a7188_level0_row19\" class=\"row_heading level0 row19\" >19</th>\n",
" <td id=\"T_a7188_row19_col0\" class=\"data row19 col0\" >205</td>\n",
" <td id=\"T_a7188_row19_col1\" class=\"data row19 col1\" >T</td>\n",
" </tr>\n",
" <tr>\n",
" <th id=\"T_a7188_level0_row20\" class=\"row_heading level0 row20\" >20</th>\n",
" <td id=\"T_a7188_row20_col0\" class=\"data row20 col0\" >77</td>\n",
" <td id=\"T_a7188_row20_col1\" class=\"data row20 col1\" >U</td>\n",
" </tr>\n",
" <tr>\n",
" <th id=\"T_a7188_level0_row21\" class=\"row_heading level0 row21\" >21</th>\n",
" <td id=\"T_a7188_row21_col0\" class=\"data row21 col0\" >86</td>\n",
" <td id=\"T_a7188_row21_col1\" class=\"data row21 col1\" >V</td>\n",
" </tr>\n",
" <tr>\n",
" <th id=\"T_a7188_level0_row22\" class=\"row_heading level0 row22\" >22</th>\n",
" <td id=\"T_a7188_row22_col0\" class=\"data row22 col0\" >59</td>\n",
" <td id=\"T_a7188_row22_col1\" class=\"data row22 col1\" >W</td>\n",
" </tr>\n",
" <tr>\n",
" <th id=\"T_a7188_level0_row23\" class=\"row_heading level0 row23\" >23</th>\n",
" <td id=\"T_a7188_row23_col0\" class=\"data row23 col0\" >28</td>\n",
" <td id=\"T_a7188_row23_col1\" class=\"data row23 col1\" >X</td>\n",
" </tr>\n",
" <tr>\n",
" <th id=\"T_a7188_level0_row24\" class=\"row_heading level0 row24\" >24</th>\n",
" <td id=\"T_a7188_row24_col0\" class=\"data row24 col0\" >211</td>\n",
" <td id=\"T_a7188_row24_col1\" class=\"data row24 col1\" >Y</td>\n",
" </tr>\n",
" <tr>\n",
" <th id=\"T_a7188_level0_row25\" class=\"row_heading level0 row25\" >25</th>\n",
" <td id=\"T_a7188_row25_col0\" class=\"data row25 col0\" >49</td>\n",
" <td id=\"T_a7188_row25_col1\" class=\"data row25 col1\" >Z</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n"
],
"text/plain": [
"<pandas.io.formats.style.Styler at 0x107e10100>"
]
},
"execution_count": 10,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"?[count(initial), initial] := [ct airport.iata iata], initial = first(chars(iata))\n",
"\n",
":order initial"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"This gives you the number of airports with different initials. Any non-aggregated symbols in the head acts as grouping variables (similar to `group by` in SQL)."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Another caveat lies here. Usually you can break a rule body into smaller parts by introducing other rules. But if we naively try to \"refactor\" the above query, we get nonsensical results:"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<style type=\"text/css\">\n",
"#T_720d8_row0_col0, #T_720d8_row1_col0, #T_720d8_row2_col0, #T_720d8_row3_col0, #T_720d8_row4_col0, #T_720d8_row5_col0, #T_720d8_row6_col0, #T_720d8_row7_col0, #T_720d8_row8_col0, #T_720d8_row9_col0, #T_720d8_row10_col0, #T_720d8_row11_col0, #T_720d8_row12_col0, #T_720d8_row13_col0, #T_720d8_row14_col0, #T_720d8_row15_col0, #T_720d8_row16_col0, #T_720d8_row17_col0, #T_720d8_row18_col0, #T_720d8_row19_col0, #T_720d8_row20_col0, #T_720d8_row21_col0, #T_720d8_row22_col0, #T_720d8_row23_col0, #T_720d8_row24_col0, #T_720d8_row25_col0 {\n",
" color: #307fc1;\n",
"}\n",
"#T_720d8_row0_col1, #T_720d8_row1_col1, #T_720d8_row2_col1, #T_720d8_row3_col1, #T_720d8_row4_col1, #T_720d8_row5_col1, #T_720d8_row6_col1, #T_720d8_row7_col1, #T_720d8_row8_col1, #T_720d8_row9_col1, #T_720d8_row10_col1, #T_720d8_row11_col1, #T_720d8_row12_col1, #T_720d8_row13_col1, #T_720d8_row14_col1, #T_720d8_row15_col1, #T_720d8_row16_col1, #T_720d8_row17_col1, #T_720d8_row18_col1, #T_720d8_row19_col1, #T_720d8_row20_col1, #T_720d8_row21_col1, #T_720d8_row22_col1, #T_720d8_row23_col1, #T_720d8_row24_col1, #T_720d8_row25_col1 {\n",
" color: black;\n",
"}\n",
"</style>\n",
"<table id=\"T_720d8\">\n",
" <thead>\n",
" <tr>\n",
" <th class=\"blank level0\" >&nbsp;</th>\n",
" <th id=\"T_720d8_level0_col0\" class=\"col_heading level0 col0\" >count(initial)</th>\n",
" <th id=\"T_720d8_level0_col1\" class=\"col_heading level0 col1\" >initial</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th id=\"T_720d8_level0_row0\" class=\"row_heading level0 row0\" >0</th>\n",
" <td id=\"T_720d8_row0_col0\" class=\"data row0 col0\" >1</td>\n",
" <td id=\"T_720d8_row0_col1\" class=\"data row0 col1\" >A</td>\n",
" </tr>\n",
" <tr>\n",
" <th id=\"T_720d8_level0_row1\" class=\"row_heading level0 row1\" >1</th>\n",
" <td id=\"T_720d8_row1_col0\" class=\"data row1 col0\" >1</td>\n",
" <td id=\"T_720d8_row1_col1\" class=\"data row1 col1\" >B</td>\n",
" </tr>\n",
" <tr>\n",
" <th id=\"T_720d8_level0_row2\" class=\"row_heading level0 row2\" >2</th>\n",
" <td id=\"T_720d8_row2_col0\" class=\"data row2 col0\" >1</td>\n",
" <td id=\"T_720d8_row2_col1\" class=\"data row2 col1\" >C</td>\n",
" </tr>\n",
" <tr>\n",
" <th id=\"T_720d8_level0_row3\" class=\"row_heading level0 row3\" >3</th>\n",
" <td id=\"T_720d8_row3_col0\" class=\"data row3 col0\" >1</td>\n",
" <td id=\"T_720d8_row3_col1\" class=\"data row3 col1\" >D</td>\n",
" </tr>\n",
" <tr>\n",
" <th id=\"T_720d8_level0_row4\" class=\"row_heading level0 row4\" >4</th>\n",
" <td id=\"T_720d8_row4_col0\" class=\"data row4 col0\" >1</td>\n",
" <td id=\"T_720d8_row4_col1\" class=\"data row4 col1\" >E</td>\n",
" </tr>\n",
" <tr>\n",
" <th id=\"T_720d8_level0_row5\" class=\"row_heading level0 row5\" >5</th>\n",
" <td id=\"T_720d8_row5_col0\" class=\"data row5 col0\" >1</td>\n",
" <td id=\"T_720d8_row5_col1\" class=\"data row5 col1\" >F</td>\n",
" </tr>\n",
" <tr>\n",
" <th id=\"T_720d8_level0_row6\" class=\"row_heading level0 row6\" >6</th>\n",
" <td id=\"T_720d8_row6_col0\" class=\"data row6 col0\" >1</td>\n",
" <td id=\"T_720d8_row6_col1\" class=\"data row6 col1\" >G</td>\n",
" </tr>\n",
" <tr>\n",
" <th id=\"T_720d8_level0_row7\" class=\"row_heading level0 row7\" >7</th>\n",
" <td id=\"T_720d8_row7_col0\" class=\"data row7 col0\" >1</td>\n",
" <td id=\"T_720d8_row7_col1\" class=\"data row7 col1\" >H</td>\n",
" </tr>\n",
" <tr>\n",
" <th id=\"T_720d8_level0_row8\" class=\"row_heading level0 row8\" >8</th>\n",
" <td id=\"T_720d8_row8_col0\" class=\"data row8 col0\" >1</td>\n",
" <td id=\"T_720d8_row8_col1\" class=\"data row8 col1\" >I</td>\n",
" </tr>\n",
" <tr>\n",
" <th id=\"T_720d8_level0_row9\" class=\"row_heading level0 row9\" >9</th>\n",
" <td id=\"T_720d8_row9_col0\" class=\"data row9 col0\" >1</td>\n",
" <td id=\"T_720d8_row9_col1\" class=\"data row9 col1\" >J</td>\n",
" </tr>\n",
" <tr>\n",
" <th id=\"T_720d8_level0_row10\" class=\"row_heading level0 row10\" >10</th>\n",
" <td id=\"T_720d8_row10_col0\" class=\"data row10 col0\" >1</td>\n",
" <td id=\"T_720d8_row10_col1\" class=\"data row10 col1\" >K</td>\n",
" </tr>\n",
" <tr>\n",
" <th id=\"T_720d8_level0_row11\" class=\"row_heading level0 row11\" >11</th>\n",
" <td id=\"T_720d8_row11_col0\" class=\"data row11 col0\" >1</td>\n",
" <td id=\"T_720d8_row11_col1\" class=\"data row11 col1\" >L</td>\n",
" </tr>\n",
" <tr>\n",
" <th id=\"T_720d8_level0_row12\" class=\"row_heading level0 row12\" >12</th>\n",
" <td id=\"T_720d8_row12_col0\" class=\"data row12 col0\" >1</td>\n",
" <td id=\"T_720d8_row12_col1\" class=\"data row12 col1\" >M</td>\n",
" </tr>\n",
" <tr>\n",
" <th id=\"T_720d8_level0_row13\" class=\"row_heading level0 row13\" >13</th>\n",
" <td id=\"T_720d8_row13_col0\" class=\"data row13 col0\" >1</td>\n",
" <td id=\"T_720d8_row13_col1\" class=\"data row13 col1\" >N</td>\n",
" </tr>\n",
" <tr>\n",
" <th id=\"T_720d8_level0_row14\" class=\"row_heading level0 row14\" >14</th>\n",
" <td id=\"T_720d8_row14_col0\" class=\"data row14 col0\" >1</td>\n",
" <td id=\"T_720d8_row14_col1\" class=\"data row14 col1\" >O</td>\n",
" </tr>\n",
" <tr>\n",
" <th id=\"T_720d8_level0_row15\" class=\"row_heading level0 row15\" >15</th>\n",
" <td id=\"T_720d8_row15_col0\" class=\"data row15 col0\" >1</td>\n",
" <td id=\"T_720d8_row15_col1\" class=\"data row15 col1\" >P</td>\n",
" </tr>\n",
" <tr>\n",
" <th id=\"T_720d8_level0_row16\" class=\"row_heading level0 row16\" >16</th>\n",
" <td id=\"T_720d8_row16_col0\" class=\"data row16 col0\" >1</td>\n",
" <td id=\"T_720d8_row16_col1\" class=\"data row16 col1\" >Q</td>\n",
" </tr>\n",
" <tr>\n",
" <th id=\"T_720d8_level0_row17\" class=\"row_heading level0 row17\" >17</th>\n",
" <td id=\"T_720d8_row17_col0\" class=\"data row17 col0\" >1</td>\n",
" <td id=\"T_720d8_row17_col1\" class=\"data row17 col1\" >R</td>\n",
" </tr>\n",
" <tr>\n",
" <th id=\"T_720d8_level0_row18\" class=\"row_heading level0 row18\" >18</th>\n",
" <td id=\"T_720d8_row18_col0\" class=\"data row18 col0\" >1</td>\n",
" <td id=\"T_720d8_row18_col1\" class=\"data row18 col1\" >S</td>\n",
" </tr>\n",
" <tr>\n",
" <th id=\"T_720d8_level0_row19\" class=\"row_heading level0 row19\" >19</th>\n",
" <td id=\"T_720d8_row19_col0\" class=\"data row19 col0\" >1</td>\n",
" <td id=\"T_720d8_row19_col1\" class=\"data row19 col1\" >T</td>\n",
" </tr>\n",
" <tr>\n",
" <th id=\"T_720d8_level0_row20\" class=\"row_heading level0 row20\" >20</th>\n",
" <td id=\"T_720d8_row20_col0\" class=\"data row20 col0\" >1</td>\n",
" <td id=\"T_720d8_row20_col1\" class=\"data row20 col1\" >U</td>\n",
" </tr>\n",
" <tr>\n",
" <th id=\"T_720d8_level0_row21\" class=\"row_heading level0 row21\" >21</th>\n",
" <td id=\"T_720d8_row21_col0\" class=\"data row21 col0\" >1</td>\n",
" <td id=\"T_720d8_row21_col1\" class=\"data row21 col1\" >V</td>\n",
" </tr>\n",
" <tr>\n",
" <th id=\"T_720d8_level0_row22\" class=\"row_heading level0 row22\" >22</th>\n",
" <td id=\"T_720d8_row22_col0\" class=\"data row22 col0\" >1</td>\n",
" <td id=\"T_720d8_row22_col1\" class=\"data row22 col1\" >W</td>\n",
" </tr>\n",
" <tr>\n",
" <th id=\"T_720d8_level0_row23\" class=\"row_heading level0 row23\" >23</th>\n",
" <td id=\"T_720d8_row23_col0\" class=\"data row23 col0\" >1</td>\n",
" <td id=\"T_720d8_row23_col1\" class=\"data row23 col1\" >X</td>\n",
" </tr>\n",
" <tr>\n",
" <th id=\"T_720d8_level0_row24\" class=\"row_heading level0 row24\" >24</th>\n",
" <td id=\"T_720d8_row24_col0\" class=\"data row24 col0\" >1</td>\n",
" <td id=\"T_720d8_row24_col1\" class=\"data row24 col1\" >Y</td>\n",
" </tr>\n",
" <tr>\n",
" <th id=\"T_720d8_level0_row25\" class=\"row_heading level0 row25\" >25</th>\n",
" <td id=\"T_720d8_row25_col0\" class=\"data row25 col0\" >1</td>\n",
" <td id=\"T_720d8_row25_col1\" class=\"data row25 col1\" >Z</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n"
],
"text/plain": [
"<pandas.io.formats.style.Styler at 0x11200a3b0>"
]
},
"execution_count": 11,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"initials[i] := [_ airport.iata iata], i = first(chars(iata))\n",
"?[count(initial), initial] := initials[initial]\n",
"\n",
":order initial"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"What's happening? Remember that Cozo Datalog operates with set semantics instead of bag semantics. So in the first rule, the results are already de-duplicated. But for aggregations like `count`, counting must be done with bag semantics. In fact, if the first rule can _disambiguate_ the duplicates, you get the old results:"
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<style type=\"text/css\">\n",
"#T_33ffa_row0_col0, #T_33ffa_row1_col0, #T_33ffa_row2_col0, #T_33ffa_row3_col0, #T_33ffa_row4_col0, #T_33ffa_row5_col0, #T_33ffa_row6_col0, #T_33ffa_row7_col0, #T_33ffa_row8_col0, #T_33ffa_row9_col0, #T_33ffa_row10_col0, #T_33ffa_row11_col0, #T_33ffa_row12_col0, #T_33ffa_row13_col0, #T_33ffa_row14_col0, #T_33ffa_row15_col0, #T_33ffa_row16_col0, #T_33ffa_row17_col0, #T_33ffa_row18_col0, #T_33ffa_row19_col0, #T_33ffa_row20_col0, #T_33ffa_row21_col0, #T_33ffa_row22_col0, #T_33ffa_row23_col0, #T_33ffa_row24_col0, #T_33ffa_row25_col0 {\n",
" color: #307fc1;\n",
"}\n",
"#T_33ffa_row0_col1, #T_33ffa_row1_col1, #T_33ffa_row2_col1, #T_33ffa_row3_col1, #T_33ffa_row4_col1, #T_33ffa_row5_col1, #T_33ffa_row6_col1, #T_33ffa_row7_col1, #T_33ffa_row8_col1, #T_33ffa_row9_col1, #T_33ffa_row10_col1, #T_33ffa_row11_col1, #T_33ffa_row12_col1, #T_33ffa_row13_col1, #T_33ffa_row14_col1, #T_33ffa_row15_col1, #T_33ffa_row16_col1, #T_33ffa_row17_col1, #T_33ffa_row18_col1, #T_33ffa_row19_col1, #T_33ffa_row20_col1, #T_33ffa_row21_col1, #T_33ffa_row22_col1, #T_33ffa_row23_col1, #T_33ffa_row24_col1, #T_33ffa_row25_col1 {\n",
" color: black;\n",
"}\n",
"</style>\n",
"<table id=\"T_33ffa\">\n",
" <thead>\n",
" <tr>\n",
" <th class=\"blank level0\" >&nbsp;</th>\n",
" <th id=\"T_33ffa_level0_col0\" class=\"col_heading level0 col0\" >count(initial)</th>\n",
" <th id=\"T_33ffa_level0_col1\" class=\"col_heading level0 col1\" >initial</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th id=\"T_33ffa_level0_row0\" class=\"row_heading level0 row0\" >0</th>\n",
" <td id=\"T_33ffa_row0_col0\" class=\"data row0 col0\" >212</td>\n",
" <td id=\"T_33ffa_row0_col1\" class=\"data row0 col1\" >A</td>\n",
" </tr>\n",
" <tr>\n",
" <th id=\"T_33ffa_level0_row1\" class=\"row_heading level0 row1\" >1</th>\n",
" <td id=\"T_33ffa_row1_col0\" class=\"data row1 col0\" >235</td>\n",
" <td id=\"T_33ffa_row1_col1\" class=\"data row1 col1\" >B</td>\n",
" </tr>\n",
" <tr>\n",
" <th id=\"T_33ffa_level0_row2\" class=\"row_heading level0 row2\" >2</th>\n",
" <td id=\"T_33ffa_row2_col0\" class=\"data row2 col0\" >214</td>\n",
" <td id=\"T_33ffa_row2_col1\" class=\"data row2 col1\" >C</td>\n",
" </tr>\n",
" <tr>\n",
" <th id=\"T_33ffa_level0_row3\" class=\"row_heading level0 row3\" >3</th>\n",
" <td id=\"T_33ffa_row3_col0\" class=\"data row3 col0\" >116</td>\n",
" <td id=\"T_33ffa_row3_col1\" class=\"data row3 col1\" >D</td>\n",
" </tr>\n",
" <tr>\n",
" <th id=\"T_33ffa_level0_row4\" class=\"row_heading level0 row4\" >4</th>\n",
" <td id=\"T_33ffa_row4_col0\" class=\"data row4 col0\" >95</td>\n",
" <td id=\"T_33ffa_row4_col1\" class=\"data row4 col1\" >E</td>\n",
" </tr>\n",
" <tr>\n",
" <th id=\"T_33ffa_level0_row5\" class=\"row_heading level0 row5\" >5</th>\n",
" <td id=\"T_33ffa_row5_col0\" class=\"data row5 col0\" >76</td>\n",
" <td id=\"T_33ffa_row5_col1\" class=\"data row5 col1\" >F</td>\n",
" </tr>\n",
" <tr>\n",
" <th id=\"T_33ffa_level0_row6\" class=\"row_heading level0 row6\" >6</th>\n",
" <td id=\"T_33ffa_row6_col0\" class=\"data row6 col0\" >135</td>\n",
" <td id=\"T_33ffa_row6_col1\" class=\"data row6 col1\" >G</td>\n",
" </tr>\n",
" <tr>\n",
" <th id=\"T_33ffa_level0_row7\" class=\"row_heading level0 row7\" >7</th>\n",
" <td id=\"T_33ffa_row7_col0\" class=\"data row7 col0\" >129</td>\n",
" <td id=\"T_33ffa_row7_col1\" class=\"data row7 col1\" >H</td>\n",
" </tr>\n",
" <tr>\n",
" <th id=\"T_33ffa_level0_row8\" class=\"row_heading level0 row8\" >8</th>\n",
" <td id=\"T_33ffa_row8_col0\" class=\"data row8 col0\" >112</td>\n",
" <td id=\"T_33ffa_row8_col1\" class=\"data row8 col1\" >I</td>\n",
" </tr>\n",
" <tr>\n",
" <th id=\"T_33ffa_level0_row9\" class=\"row_heading level0 row9\" >9</th>\n",
" <td id=\"T_33ffa_row9_col0\" class=\"data row9 col0\" >80</td>\n",
" <td id=\"T_33ffa_row9_col1\" class=\"data row9 col1\" >J</td>\n",
" </tr>\n",
" <tr>\n",
" <th id=\"T_33ffa_level0_row10\" class=\"row_heading level0 row10\" >10</th>\n",
" <td id=\"T_33ffa_row10_col0\" class=\"data row10 col0\" >197</td>\n",
" <td id=\"T_33ffa_row10_col1\" class=\"data row10 col1\" >K</td>\n",
" </tr>\n",
" <tr>\n",
" <th id=\"T_33ffa_level0_row11\" class=\"row_heading level0 row11\" >11</th>\n",
" <td id=\"T_33ffa_row11_col0\" class=\"data row11 col0\" >184</td>\n",
" <td id=\"T_33ffa_row11_col1\" class=\"data row11 col1\" >L</td>\n",
" </tr>\n",
" <tr>\n",
" <th id=\"T_33ffa_level0_row12\" class=\"row_heading level0 row12\" >12</th>\n",
" <td id=\"T_33ffa_row12_col0\" class=\"data row12 col0\" >228</td>\n",
" <td id=\"T_33ffa_row12_col1\" class=\"data row12 col1\" >M</td>\n",
" </tr>\n",
" <tr>\n",
" <th id=\"T_33ffa_level0_row13\" class=\"row_heading level0 row13\" >13</th>\n",
" <td id=\"T_33ffa_row13_col0\" class=\"data row13 col0\" >111</td>\n",
" <td id=\"T_33ffa_row13_col1\" class=\"data row13 col1\" >N</td>\n",
" </tr>\n",
" <tr>\n",
" <th id=\"T_33ffa_level0_row14\" class=\"row_heading level0 row14\" >14</th>\n",
" <td id=\"T_33ffa_row14_col0\" class=\"data row14 col0\" >89</td>\n",
" <td id=\"T_33ffa_row14_col1\" class=\"data row14 col1\" >O</td>\n",
" </tr>\n",
" <tr>\n",
" <th id=\"T_33ffa_level0_row15\" class=\"row_heading level0 row15\" >15</th>\n",
" <td id=\"T_33ffa_row15_col0\" class=\"data row15 col0\" >203</td>\n",
" <td id=\"T_33ffa_row15_col1\" class=\"data row15 col1\" >P</td>\n",
" </tr>\n",
" <tr>\n",
" <th id=\"T_33ffa_level0_row16\" class=\"row_heading level0 row16\" >16</th>\n",
" <td id=\"T_33ffa_row16_col0\" class=\"data row16 col0\" >7</td>\n",
" <td id=\"T_33ffa_row16_col1\" class=\"data row16 col1\" >Q</td>\n",
" </tr>\n",
" <tr>\n",
" <th id=\"T_33ffa_level0_row17\" class=\"row_heading level0 row17\" >17</th>\n",
" <td id=\"T_33ffa_row17_col0\" class=\"data row17 col0\" >121</td>\n",
" <td id=\"T_33ffa_row17_col1\" class=\"data row17 col1\" >R</td>\n",
" </tr>\n",
" <tr>\n",
" <th id=\"T_33ffa_level0_row18\" class=\"row_heading level0 row18\" >18</th>\n",
" <td id=\"T_33ffa_row18_col0\" class=\"data row18 col0\" >245</td>\n",
" <td id=\"T_33ffa_row18_col1\" class=\"data row18 col1\" >S</td>\n",
" </tr>\n",
" <tr>\n",
" <th id=\"T_33ffa_level0_row19\" class=\"row_heading level0 row19\" >19</th>\n",
" <td id=\"T_33ffa_row19_col0\" class=\"data row19 col0\" >205</td>\n",
" <td id=\"T_33ffa_row19_col1\" class=\"data row19 col1\" >T</td>\n",
" </tr>\n",
" <tr>\n",
" <th id=\"T_33ffa_level0_row20\" class=\"row_heading level0 row20\" >20</th>\n",
" <td id=\"T_33ffa_row20_col0\" class=\"data row20 col0\" >77</td>\n",
" <td id=\"T_33ffa_row20_col1\" class=\"data row20 col1\" >U</td>\n",
" </tr>\n",
" <tr>\n",
" <th id=\"T_33ffa_level0_row21\" class=\"row_heading level0 row21\" >21</th>\n",
" <td id=\"T_33ffa_row21_col0\" class=\"data row21 col0\" >86</td>\n",
" <td id=\"T_33ffa_row21_col1\" class=\"data row21 col1\" >V</td>\n",
" </tr>\n",
" <tr>\n",
" <th id=\"T_33ffa_level0_row22\" class=\"row_heading level0 row22\" >22</th>\n",
" <td id=\"T_33ffa_row22_col0\" class=\"data row22 col0\" >59</td>\n",
" <td id=\"T_33ffa_row22_col1\" class=\"data row22 col1\" >W</td>\n",
" </tr>\n",
" <tr>\n",
" <th id=\"T_33ffa_level0_row23\" class=\"row_heading level0 row23\" >23</th>\n",
" <td id=\"T_33ffa_row23_col0\" class=\"data row23 col0\" >28</td>\n",
" <td id=\"T_33ffa_row23_col1\" class=\"data row23 col1\" >X</td>\n",
" </tr>\n",
" <tr>\n",
" <th id=\"T_33ffa_level0_row24\" class=\"row_heading level0 row24\" >24</th>\n",
" <td id=\"T_33ffa_row24_col0\" class=\"data row24 col0\" >211</td>\n",
" <td id=\"T_33ffa_row24_col1\" class=\"data row24 col1\" >Y</td>\n",
" </tr>\n",
" <tr>\n",
" <th id=\"T_33ffa_level0_row25\" class=\"row_heading level0 row25\" >25</th>\n",
" <td id=\"T_33ffa_row25_col0\" class=\"data row25 col0\" >49</td>\n",
" <td id=\"T_33ffa_row25_col1\" class=\"data row25 col1\" >Z</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n"
],
"text/plain": [
"<pandas.io.formats.style.Styler at 0x11206a440>"
]
},
"execution_count": 12,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"initials[i, iata] := [_ airport.iata iata], i = first(chars(iata))\n",
"?[count(initial), initial] := initials[initial, _]\n",
"\n",
":order initial"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"There are many aggregate functions in Cozo, most of them should be quite familiar for anyone fluent in SQL. For example, the following calculates the statistics for runways:"
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<style type=\"text/css\">\n",
"#T_d9eb7_row0_col0, #T_d9eb7_row0_col1, #T_d9eb7_row0_col2, #T_d9eb7_row0_col3, #T_d9eb7_row0_col4, #T_d9eb7_row0_col5, #T_d9eb7_row0_col6 {\n",
" color: #307fc1;\n",
"}\n",
"</style>\n",
"<table id=\"T_d9eb7\">\n",
" <thead>\n",
" <tr>\n",
" <th class=\"blank level0\" >&nbsp;</th>\n",
" <th id=\"T_d9eb7_level0_col0\" class=\"col_heading level0 col0\" >count(r)</th>\n",
" <th id=\"T_d9eb7_level0_col1\" class=\"col_heading level0 col1\" >count_unique(r)</th>\n",
" <th id=\"T_d9eb7_level0_col2\" class=\"col_heading level0 col2\" >sum(r)</th>\n",
" <th id=\"T_d9eb7_level0_col3\" class=\"col_heading level0 col3\" >min(r)</th>\n",
" <th id=\"T_d9eb7_level0_col4\" class=\"col_heading level0 col4\" >max(r)</th>\n",
" <th id=\"T_d9eb7_level0_col5\" class=\"col_heading level0 col5\" >mean(r)</th>\n",
" <th id=\"T_d9eb7_level0_col6\" class=\"col_heading level0 col6\" >std_dev(r)</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th id=\"T_d9eb7_level0_row0\" class=\"row_heading level0 row0\" >0</th>\n",
" <td id=\"T_d9eb7_row0_col0\" class=\"data row0 col0\" >3504</td>\n",
" <td id=\"T_d9eb7_row0_col1\" class=\"data row0 col1\" >7</td>\n",
" <td id=\"T_d9eb7_row0_col2\" class=\"data row0 col2\" >4980.000000</td>\n",
" <td id=\"T_d9eb7_row0_col3\" class=\"data row0 col3\" >1</td>\n",
" <td id=\"T_d9eb7_row0_col4\" class=\"data row0 col4\" >7</td>\n",
" <td id=\"T_d9eb7_row0_col5\" class=\"data row0 col5\" >1.421233</td>\n",
" <td id=\"T_d9eb7_row0_col6\" class=\"data row0 col6\" >0.743083</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n"
],
"text/plain": [
"<pandas.io.formats.style.Styler at 0x11206a710>"
]
},
"execution_count": 13,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"?[count(r), count_unique(r), sum(r), min(r), max(r), mean(r), std_dev(r)] := \n",
" [a airport.runways r]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Recursive aggregations"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Much of the power of Datalog comes from its recursive rules. But with aggregations, recursion can be disallowed even without negation:"
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"\u001b[31meval::unstratifiable\u001b[0m\n",
"\n",
" \u001b[31m×\u001b[0m Query is unstratifiable\n",
"\u001b[36m help: \u001b[0mThe rule 'what' is in the strongly connected component [\"what\"],\n",
" and is involved in at least one forbidden dependency\n",
" (negation, non-meet aggregation, or algorithm-application).\n"
]
},
"execution_count": 14,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"what[sum(r)] := [a airport.runways r]\n",
"what[sum(r)] := what[r]\n",
"?[r] := what[r]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The compiler is right to reject the query since there is no meaningful interpretation for it. But sometimes there is. Let's see an example.\n",
"\n",
"We want to find the distance of the _shortest route_ between two airports. One way to calculate is to enumerate all the routes between the two airports, and then apply `min` aggregation to the results. This cannot be implemented as stated, since the routes may contain cycles and hence there can be an infinite number of routes between two airports.\n",
"\n",
"Instead, let's think recursively. If we already have all the shortest routes between all nodes, can we derive an _equation_ satisfied by the shortest route? Yes, A shortest route between `a` and `b` is either the distance of a direct route, or the sum of the shortest distance from `a` to `c` and the distance of a direct route from `c` to `d`. We apply our `min` aggregation to this recursive set instead. Let's write it out and try to find the shortest route between `LHR` and `YPO`:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<style type=\"text/css\">\n",
"#T_83ea8_row0_col0 {\n",
" color: #307fc1;\n",
"}\n",
"</style>\n",
"<table id=\"T_83ea8\">\n",
" <thead>\n",
" <tr>\n",
" <th class=\"blank level0\" >&nbsp;</th>\n",
" <th id=\"T_83ea8_level0_col0\" class=\"col_heading level0 col0\" >dist</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th id=\"T_83ea8_level0_row0\" class=\"row_heading level0 row0\" >0</th>\n",
" <td id=\"T_83ea8_row0_col0\" class=\"data row0 col0\" >4147</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n"
],
"text/plain": [
"<pandas.io.formats.style.Styler at 0x1122bdcc0>"
]
},
"execution_count": 15,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"shortest[b, min(dist)] := [a airport.iata 'LHR'], # Start with the airport 'LHR'\n",
" [r route.src a], [r route.dst b], [r route.distance dist] # Retrive a direct route from 'LHR' to b\n",
"\n",
"shortest[b, min(dist)] := shortest[c, d1], # Start with an existing shortest route from 'LHR' to c\n",
" [r route.src c], [r route.dst b], [r route.distance d2], # Retrieve a direct route from c to b\n",
" dist <- d1 + d2 # Add the distances\n",
"\n",
"?[dist] := [a airport.iata 'YPO'], shortest[a, dist] # Extract the answer for 'YPO'. \n",
" # We chose it since it is the hardest airport to get to from 'LHR'."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The surprise is that the compiler actually accepts this program and gives the correct answer for it! So there must be a fundamental difference between the `min` and `count` aggregations.\n",
"\n",
"What is it then? We actually gave a hint above when we discussed the importance of applying set instead of bag semantics for `count`. For `min`, it doesn't matter which semantics you apply. The final result is the same either way.\n",
"\n",
"Mathematically, we say that `min` is a _meet operation_ satisfying commutativity, distributivity and idempotency. In Cozo, recursion through meet aggregations is allowed since the minimum fixed-point semantics can be extended to meet operations (if a rule contains several aggregations, all must be meet operations for it to be eligible for recursion).\n",
"\n",
"By the way, there are much better and much faster ways to look for shortest routes. We will learn these later. The point of this example is that recursive aggregation is a very general construct that is enormously powerful. Tricky problems that in other databases require pulling all the data to the client and processing them in a general programming language can usually be solved by apt applications of recursive aggregations."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Algorithms"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Cozo's version of Datalog is already Turing-complete, yet we need aggregations for things like counting to be practically feasible and useful. In the same vein, any conceivable algorithm can be implemented with what we already have, but the implementation may be too complicated and inefficient to be of practical use.\n",
"\n",
"Cozo claimed to be a graph-focused database. There are common operations we want to do on graphs that are just awkward to do with Datalog (or any general purpose query language, such as SQL). The code for the shortest path example we gave above is actually not too bad. For algorithms like PageRank it can get much worse."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"In Cozo we take a pragmatic approach and introduce _algorithms_. They can be thought of as black-box rules that take in existing relations and produce a new relation according to its specification. For the shortest path, the appropriate algorithm to use is Dijkstra's algorithm:"
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<style type=\"text/css\">\n",
"#T_b0bc2_row0_col0, #T_b0bc2_row0_col1 {\n",
" color: black;\n",
"}\n",
"#T_b0bc2_row0_col2 {\n",
" color: #307fc1;\n",
"}\n",
"#T_b0bc2_row0_col3 {\n",
" color: #bf5b3d;\n",
"}\n",
"</style>\n",
"<table id=\"T_b0bc2\">\n",
" <thead>\n",
" <tr>\n",
" <th class=\"blank level0\" >&nbsp;</th>\n",
" <th id=\"T_b0bc2_level0_col0\" class=\"col_heading level0 col0\" >starting</th>\n",
" <th id=\"T_b0bc2_level0_col1\" class=\"col_heading level0 col1\" >goal</th>\n",
" <th id=\"T_b0bc2_level0_col2\" class=\"col_heading level0 col2\" >distance</th>\n",
" <th id=\"T_b0bc2_level0_col3\" class=\"col_heading level0 col3\" >path</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th id=\"T_b0bc2_level0_row0\" class=\"row_heading level0 row0\" >0</th>\n",
" <td id=\"T_b0bc2_row0_col0\" class=\"data row0 col0\" >LHR</td>\n",
" <td id=\"T_b0bc2_row0_col1\" class=\"data row0 col1\" >YPO</td>\n",
" <td id=\"T_b0bc2_row0_col2\" class=\"data row0 col2\" >4147.000000</td>\n",
" <td id=\"T_b0bc2_row0_col3\" class=\"data row0 col3\" >['LHR', 'YUL', 'YVO', 'YKQ', 'YMO', 'YFA', 'ZKE', 'YAT', 'YPO']</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n"
],
"text/plain": [
"<pandas.io.formats.style.Styler at 0x11206b7f0>"
]
},
"execution_count": 16,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"paths[fr, to, dist] := [r route.src fr_a], [r route.dst to_a], [r route.distance dist], [fr_a airport.iata fr], [to_a airport.iata to]\n",
"starting[] <- [['LHR']]\n",
"goal[] <- [['YPO']]\n",
"?[starting, goal, distance, path] <~ ShortestPathDijkstra(paths[], starting[], goal[])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Algorithm application is indicated by the `<~` symbol separating the rule head and rule body. As for constant rules, rule head bindings can be omitted. The algorithm is then called like a function, but taking in relations as arguments. Above we have used three relations we defined inline. For stored relations, the notation is `:stored_relation`."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Some algorithms take in additional arguments. The following example calculates the shortest path to the same problem, but returns the ten shortest paths instead:"
]
},
{
"cell_type": "code",
"execution_count": 17,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<style type=\"text/css\">\n",
"#T_b7575_row0_col0, #T_b7575_row0_col1, #T_b7575_row1_col0, #T_b7575_row1_col1, #T_b7575_row2_col0, #T_b7575_row2_col1, #T_b7575_row3_col0, #T_b7575_row3_col1, #T_b7575_row4_col0, #T_b7575_row4_col1, #T_b7575_row5_col0, #T_b7575_row5_col1, #T_b7575_row6_col0, #T_b7575_row6_col1, #T_b7575_row7_col0, #T_b7575_row7_col1, #T_b7575_row8_col0, #T_b7575_row8_col1, #T_b7575_row9_col0, #T_b7575_row9_col1 {\n",
" color: black;\n",
"}\n",
"#T_b7575_row0_col2, #T_b7575_row1_col2, #T_b7575_row2_col2, #T_b7575_row3_col2, #T_b7575_row4_col2, #T_b7575_row5_col2, #T_b7575_row6_col2, #T_b7575_row7_col2, #T_b7575_row8_col2, #T_b7575_row9_col2 {\n",
" color: #307fc1;\n",
"}\n",
"#T_b7575_row0_col3, #T_b7575_row1_col3, #T_b7575_row2_col3, #T_b7575_row3_col3, #T_b7575_row4_col3, #T_b7575_row5_col3, #T_b7575_row6_col3, #T_b7575_row7_col3, #T_b7575_row8_col3, #T_b7575_row9_col3 {\n",
" color: #bf5b3d;\n",
"}\n",
"</style>\n",
"<table id=\"T_b7575\">\n",
" <thead>\n",
" <tr>\n",
" <th class=\"blank level0\" >&nbsp;</th>\n",
" <th id=\"T_b7575_level0_col0\" class=\"col_heading level0 col0\" >starting</th>\n",
" <th id=\"T_b7575_level0_col1\" class=\"col_heading level0 col1\" >goal</th>\n",
" <th id=\"T_b7575_level0_col2\" class=\"col_heading level0 col2\" >distance</th>\n",
" <th id=\"T_b7575_level0_col3\" class=\"col_heading level0 col3\" >path</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th id=\"T_b7575_level0_row0\" class=\"row_heading level0 row0\" >0</th>\n",
" <td id=\"T_b7575_row0_col0\" class=\"data row0 col0\" >LHR</td>\n",
" <td id=\"T_b7575_row0_col1\" class=\"data row0 col1\" >YPO</td>\n",
" <td id=\"T_b7575_row0_col2\" class=\"data row0 col2\" >4147.000000</td>\n",
" <td id=\"T_b7575_row0_col3\" class=\"data row0 col3\" >['LHR', 'YUL', 'YVO', 'YKQ', 'YMO', 'YFA', 'ZKE', 'YAT', 'YPO']</td>\n",
" </tr>\n",
" <tr>\n",
" <th id=\"T_b7575_level0_row1\" class=\"row_heading level0 row1\" >1</th>\n",
" <td id=\"T_b7575_row1_col0\" class=\"data row1 col0\" >LHR</td>\n",
" <td id=\"T_b7575_row1_col1\" class=\"data row1 col1\" >YPO</td>\n",
" <td id=\"T_b7575_row1_col2\" class=\"data row1 col2\" >4150.000000</td>\n",
" <td id=\"T_b7575_row1_col3\" class=\"data row1 col3\" >['LHR', 'DUB', 'YUL', 'YVO', 'YKQ', 'YMO', 'YFA', 'ZKE', 'YAT', 'YPO']</td>\n",
" </tr>\n",
" <tr>\n",
" <th id=\"T_b7575_level0_row2\" class=\"row_heading level0 row2\" >2</th>\n",
" <td id=\"T_b7575_row2_col0\" class=\"data row2 col0\" >LHR</td>\n",
" <td id=\"T_b7575_row2_col1\" class=\"data row2 col1\" >YPO</td>\n",
" <td id=\"T_b7575_row2_col2\" class=\"data row2 col2\" >4164.000000</td>\n",
" <td id=\"T_b7575_row2_col3\" class=\"data row2 col3\" >['LHR', 'YUL', 'YMT', 'YKQ', 'YMO', 'YFA', 'ZKE', 'YAT', 'YPO']</td>\n",
" </tr>\n",
" <tr>\n",
" <th id=\"T_b7575_level0_row3\" class=\"row_heading level0 row3\" >3</th>\n",
" <td id=\"T_b7575_row3_col0\" class=\"data row3 col0\" >LHR</td>\n",
" <td id=\"T_b7575_row3_col1\" class=\"data row3 col1\" >YPO</td>\n",
" <td id=\"T_b7575_row3_col2\" class=\"data row3 col2\" >4167.000000</td>\n",
" <td id=\"T_b7575_row3_col3\" class=\"data row3 col3\" >['LHR', 'DUB', 'YUL', 'YMT', 'YKQ', 'YMO', 'YFA', 'ZKE', 'YAT', 'YPO']</td>\n",
" </tr>\n",
" <tr>\n",
" <th id=\"T_b7575_level0_row4\" class=\"row_heading level0 row4\" >4</th>\n",
" <td id=\"T_b7575_row4_col0\" class=\"data row4 col0\" >LHR</td>\n",
" <td id=\"T_b7575_row4_col1\" class=\"data row4 col1\" >YPO</td>\n",
" <td id=\"T_b7575_row4_col2\" class=\"data row4 col2\" >4187.000000</td>\n",
" <td id=\"T_b7575_row4_col3\" class=\"data row4 col3\" >['LHR', 'MAN', 'DUB', 'YUL', 'YVO', 'YKQ', 'YMO', 'YFA', 'ZKE', 'YAT', 'YPO']</td>\n",
" </tr>\n",
" <tr>\n",
" <th id=\"T_b7575_level0_row5\" class=\"row_heading level0 row5\" >5</th>\n",
" <td id=\"T_b7575_row5_col0\" class=\"data row5 col0\" >LHR</td>\n",
" <td id=\"T_b7575_row5_col1\" class=\"data row5 col1\" >YPO</td>\n",
" <td id=\"T_b7575_row5_col2\" class=\"data row5 col2\" >4202.000000</td>\n",
" <td id=\"T_b7575_row5_col3\" class=\"data row5 col3\" >['LHR', 'IOM', 'DUB', 'YUL', 'YVO', 'YKQ', 'YMO', 'YFA', 'ZKE', 'YAT', 'YPO']</td>\n",
" </tr>\n",
" <tr>\n",
" <th id=\"T_b7575_level0_row6\" class=\"row_heading level0 row6\" >6</th>\n",
" <td id=\"T_b7575_row6_col0\" class=\"data row6 col0\" >LHR</td>\n",
" <td id=\"T_b7575_row6_col1\" class=\"data row6 col1\" >YPO</td>\n",
" <td id=\"T_b7575_row6_col2\" class=\"data row6 col2\" >4204.000000</td>\n",
" <td id=\"T_b7575_row6_col3\" class=\"data row6 col3\" >['LHR', 'MAN', 'DUB', 'YUL', 'YMT', 'YKQ', 'YMO', 'YFA', 'ZKE', 'YAT', 'YPO']</td>\n",
" </tr>\n",
" <tr>\n",
" <th id=\"T_b7575_level0_row7\" class=\"row_heading level0 row7\" >7</th>\n",
" <td id=\"T_b7575_row7_col0\" class=\"data row7 col0\" >LHR</td>\n",
" <td id=\"T_b7575_row7_col1\" class=\"data row7 col1\" >YPO</td>\n",
" <td id=\"T_b7575_row7_col2\" class=\"data row7 col2\" >4209.000000</td>\n",
" <td id=\"T_b7575_row7_col3\" class=\"data row7 col3\" >['LHR', 'YUL', 'YMT', 'YNS', 'YKQ', 'YMO', 'YFA', 'ZKE', 'YAT', 'YPO']</td>\n",
" </tr>\n",
" <tr>\n",
" <th id=\"T_b7575_level0_row8\" class=\"row_heading level0 row8\" >8</th>\n",
" <td id=\"T_b7575_row8_col0\" class=\"data row8 col0\" >LHR</td>\n",
" <td id=\"T_b7575_row8_col1\" class=\"data row8 col1\" >YPO</td>\n",
" <td id=\"T_b7575_row8_col2\" class=\"data row8 col2\" >4211.000000</td>\n",
" <td id=\"T_b7575_row8_col3\" class=\"data row8 col3\" >['LHR', 'MAN', 'IOM', 'DUB', 'YUL', 'YVO', 'YKQ', 'YMO', 'YFA', 'ZKE', 'YAT', 'YPO']</td>\n",
" </tr>\n",
" <tr>\n",
" <th id=\"T_b7575_level0_row9\" class=\"row_heading level0 row9\" >9</th>\n",
" <td id=\"T_b7575_row9_col0\" class=\"data row9 col0\" >LHR</td>\n",
" <td id=\"T_b7575_row9_col1\" class=\"data row9 col1\" >YPO</td>\n",
" <td id=\"T_b7575_row9_col2\" class=\"data row9 col2\" >4212.000000</td>\n",
" <td id=\"T_b7575_row9_col3\" class=\"data row9 col3\" >['LHR', 'DUB', 'YUL', 'YMT', 'YNS', 'YKQ', 'YMO', 'YFA', 'ZKE', 'YAT', 'YPO']</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n"
],
"text/plain": [
"<pandas.io.formats.style.Styler at 0x11206b9a0>"
]
},
"execution_count": 17,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"paths[fr, to, dist] := [r route.src fr_a], [r route.dst to_a], [r route.distance dist], [fr_a airport.iata fr], [to_a airport.iata to]\n",
"starting[] <- [['LHR']]\n",
"goal[] <- [['YPO']]\n",
"?[starting, goal, distance, path] <~ KShortestPathYen(paths[], starting[], goal[], k: 10)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"As we can see, in addition to relations as arguments, an algorithm can also take _parameters_, `k` in this case."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.6"
}
},
"nbformat": 4,
"nbformat_minor": 4
}