You cannot select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

3358 lines
132 KiB
Plaintext

This file contains ambiguous Unicode characters!

This file contains ambiguous Unicode characters that may be confused with others in your current locale. If your use case is intentional and legitimate, you can safely ignore this warning. Use the Escape button to highlight these characters.

{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# The pilgrim to Mount Acid"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"%reload_ext pycozo.ipyext_direct\n",
"%cozo_auth tutorial *******"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## A schema for data"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"At this age of BigDataⒸ, your business data are enormous and change fast. You may have one billion active users on your platform carrying out all sorts of activities, concurrently of course. You don't want these activities to step on each other. You don't want to store the wrong thing into your user's accounts. You _especially_ don't want any money in transit to disappear in midair. To make things worse, hundreds of new activities pop up each day. \n",
"\n",
"Storing any of these in a stored relation is infeasible. With a traditional RDBMS, [data migrations](https://en.wikipedia.org/wiki/Data_migration) would have already killed you. And with Cozo, stored relations don't even try to support schema change (in fact, the only 'schema' for a stored relation is its arity).\n",
"\n",
"To store such data and meet its query and mutation requirements, a database needs:\n",
"\n",
"* high concurrency;\n",
"* fine-grained transactions;\n",
"* checks for data integrity;\n",
"* ability to rapidly adapt to new data shapes and requirements.\n",
"\n",
"To support these, we need to pay some prices. With Cozo, we pay by:\n",
"\n",
"* we demand that most transactions only apply _local changes_ that only touch on a tiny fraction of the data (otherwise the database cannot satisfy the high concurrency requirements);\n",
"* we tolerate indirections (since \"all problems in computer science can be solved by another level of indirection\")."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"With these tradeoffs, the solution is the [triple store](https://en.wikipedia.org/wiki/Triplestore)."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"A _triple_ is a sentence consisting of a subject, a verb, and an object. In the Cozo flavour, the subject is always an opaque identity, such as _entity42_, so it is actually an _entity-attribute-value_ triple. Examples:\n",
"\n",
"* _entity42_ has first name `'Alice'`.\n",
"* _entity42_ has last name `'Liddell'`.\n",
"* _entity42_ loves _entity81_.\n",
"* _entity81_ is aged `20` years old."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We schematize triples by schematizing the verbs (attributes). In our example, the schema for first name and last name should have type strings, the schema for age should have type integers, and the schema for the \"loves\" relationship should be other entities. Here the types refer to the objects in the triple, since the subject is always an entity."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"So let's put this into code:"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<style type=\"text/css\">\n",
"#T_61fc6_row0_col0, #T_61fc6_row1_col0, #T_61fc6_row2_col0, #T_61fc6_row3_col0 {\n",
" color: #307fc1;\n",
"}\n",
"#T_61fc6_row0_col1, #T_61fc6_row1_col1, #T_61fc6_row2_col1, #T_61fc6_row3_col1 {\n",
" color: black;\n",
"}\n",
"</style>\n",
"<table id=\"T_61fc6\">\n",
" <thead>\n",
" <tr>\n",
" <th class=\"blank level0\" >&nbsp;</th>\n",
" <th id=\"T_61fc6_level0_col0\" class=\"col_heading level0 col0\" >attr_id</th>\n",
" <th id=\"T_61fc6_level0_col1\" class=\"col_heading level0 col1\" >op</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th id=\"T_61fc6_level0_row0\" class=\"row_heading level0 row0\" >0</th>\n",
" <td id=\"T_61fc6_row0_col0\" class=\"data row0 col0\" >10000001</td>\n",
" <td id=\"T_61fc6_row0_col1\" class=\"data row0 col1\" >assert</td>\n",
" </tr>\n",
" <tr>\n",
" <th id=\"T_61fc6_level0_row1\" class=\"row_heading level0 row1\" >1</th>\n",
" <td id=\"T_61fc6_row1_col0\" class=\"data row1 col0\" >10000002</td>\n",
" <td id=\"T_61fc6_row1_col1\" class=\"data row1 col1\" >assert</td>\n",
" </tr>\n",
" <tr>\n",
" <th id=\"T_61fc6_level0_row2\" class=\"row_heading level0 row2\" >2</th>\n",
" <td id=\"T_61fc6_row2_col0\" class=\"data row2 col0\" >10000003</td>\n",
" <td id=\"T_61fc6_row2_col1\" class=\"data row2 col1\" >assert</td>\n",
" </tr>\n",
" <tr>\n",
" <th id=\"T_61fc6_level0_row3\" class=\"row_heading level0 row3\" >3</th>\n",
" <td id=\"T_61fc6_row3_col0\" class=\"data row3 col0\" >10000004</td>\n",
" <td id=\"T_61fc6_row3_col1\" class=\"data row3 col1\" >assert</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n"
],
"text/plain": [
"<pandas.io.formats.style.Styler at 0x10bb802e0>"
]
},
"execution_count": 2,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
":schema\n",
"\n",
":put person {\n",
" first_name: string index,\n",
" nick_name: string many index,\n",
" loves: ref many,\n",
" age: int\n",
"}"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The `:schema` at the top indicates that we want to manage the schema instead of run normal queries. We then `put` a _group_ of related schema. Now even though they are declared together similarly to a table definition in SQL, we need to stress that this actually defines four separate, independent attributes named `person.first_name`, `person.last_name`, `person.loves`, `person.age`. An entity can have whatever attributes associated with it, even those with different prefixes."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The allowed types for attributes are:\n",
"\n",
"* `ref`\n",
"* `bool`\n",
"* `int`\n",
"* `float`\n",
"* `string`\n",
"* `bytes`\n",
"* `list`\n",
"\n",
"The list type is heterogeneous in its elements. There is no concept of a nullable type and you can't put `null` into values of triples (other than wrapping them in lists first). To indicate missing values, you simply omit the attribute.\n",
"\n",
"The `ref` type has the special meaning of refering to other entities.\n",
"\n",
"After the type comes one or more _modifiers_. The `many` modifier indicates that `loves` is a to-many relationship. If we omit it, any person can love at most one other person, which is not very realistic.\n",
"\n",
"The modifier `index` indicates that we want values of this attribute to be _indexed_. Only indexed attributes support efficient value lookups and range scans. `ref` types are always implicitly indexed since the database wants to be able to traverse the graph in both directions."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Instead of `index`, we can mark attributes with the modifier `unique`, indicating there cannot be two entities with the same value for the attribute. The value then acts as an _unique identifier_ for the entity, which can be convenient when retrieving the entities since the entity ID is assigned by the database automatically and you cannot choose how it is assigned. So let's add an explicit `person.id` attribute, this time using the non-grouped syntax:"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<style type=\"text/css\">\n",
"#T_3387c_row0_col0 {\n",
" color: #307fc1;\n",
"}\n",
"#T_3387c_row0_col1 {\n",
" color: black;\n",
"}\n",
"</style>\n",
"<table id=\"T_3387c\">\n",
" <thead>\n",
" <tr>\n",
" <th class=\"blank level0\" >&nbsp;</th>\n",
" <th id=\"T_3387c_level0_col0\" class=\"col_heading level0 col0\" >attr_id</th>\n",
" <th id=\"T_3387c_level0_col1\" class=\"col_heading level0 col1\" >op</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th id=\"T_3387c_level0_row0\" class=\"row_heading level0 row0\" >0</th>\n",
" <td id=\"T_3387c_row0_col0\" class=\"data row0 col0\" >10000005</td>\n",
" <td id=\"T_3387c_row0_col1\" class=\"data row0 col1\" >assert</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n"
],
"text/plain": [
"<pandas.io.formats.style.Styler at 0x1036aab60>"
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
":schema\n",
"\n",
":put person.id: string unique;"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We can see what schema are there in the database now by running a system directive:"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<style type=\"text/css\">\n",
"#T_fd676_row0_col0, #T_fd676_row0_col5, #T_fd676_row1_col0, #T_fd676_row1_col5, #T_fd676_row2_col0, #T_fd676_row2_col5, #T_fd676_row3_col0, #T_fd676_row3_col5, #T_fd676_row4_col0, #T_fd676_row4_col5 {\n",
" color: #307fc1;\n",
"}\n",
"#T_fd676_row0_col1, #T_fd676_row0_col2, #T_fd676_row0_col3, #T_fd676_row0_col4, #T_fd676_row1_col1, #T_fd676_row1_col2, #T_fd676_row1_col3, #T_fd676_row1_col4, #T_fd676_row2_col1, #T_fd676_row2_col2, #T_fd676_row2_col3, #T_fd676_row2_col4, #T_fd676_row3_col1, #T_fd676_row3_col2, #T_fd676_row3_col3, #T_fd676_row3_col4, #T_fd676_row4_col1, #T_fd676_row4_col2, #T_fd676_row4_col3, #T_fd676_row4_col4 {\n",
" color: black;\n",
"}\n",
"</style>\n",
"<table id=\"T_fd676\">\n",
" <thead>\n",
" <tr>\n",
" <th class=\"blank level0\" >&nbsp;</th>\n",
" <th id=\"T_fd676_level0_col0\" class=\"col_heading level0 col0\" >attr_id</th>\n",
" <th id=\"T_fd676_level0_col1\" class=\"col_heading level0 col1\" >name</th>\n",
" <th id=\"T_fd676_level0_col2\" class=\"col_heading level0 col2\" >type</th>\n",
" <th id=\"T_fd676_level0_col3\" class=\"col_heading level0 col3\" >cardinality</th>\n",
" <th id=\"T_fd676_level0_col4\" class=\"col_heading level0 col4\" >index</th>\n",
" <th id=\"T_fd676_level0_col5\" class=\"col_heading level0 col5\" >history</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th id=\"T_fd676_level0_row0\" class=\"row_heading level0 row0\" >0</th>\n",
" <td id=\"T_fd676_row0_col0\" class=\"data row0 col0\" >10000001</td>\n",
" <td id=\"T_fd676_row0_col1\" class=\"data row0 col1\" >person.first_name</td>\n",
" <td id=\"T_fd676_row0_col2\" class=\"data row0 col2\" >string</td>\n",
" <td id=\"T_fd676_row0_col3\" class=\"data row0 col3\" >one</td>\n",
" <td id=\"T_fd676_row0_col4\" class=\"data row0 col4\" >index</td>\n",
" <td id=\"T_fd676_row0_col5\" class=\"data row0 col5\" >False</td>\n",
" </tr>\n",
" <tr>\n",
" <th id=\"T_fd676_level0_row1\" class=\"row_heading level0 row1\" >1</th>\n",
" <td id=\"T_fd676_row1_col0\" class=\"data row1 col0\" >10000002</td>\n",
" <td id=\"T_fd676_row1_col1\" class=\"data row1 col1\" >person.nick_name</td>\n",
" <td id=\"T_fd676_row1_col2\" class=\"data row1 col2\" >string</td>\n",
" <td id=\"T_fd676_row1_col3\" class=\"data row1 col3\" >many</td>\n",
" <td id=\"T_fd676_row1_col4\" class=\"data row1 col4\" >index</td>\n",
" <td id=\"T_fd676_row1_col5\" class=\"data row1 col5\" >False</td>\n",
" </tr>\n",
" <tr>\n",
" <th id=\"T_fd676_level0_row2\" class=\"row_heading level0 row2\" >2</th>\n",
" <td id=\"T_fd676_row2_col0\" class=\"data row2 col0\" >10000003</td>\n",
" <td id=\"T_fd676_row2_col1\" class=\"data row2 col1\" >person.loves</td>\n",
" <td id=\"T_fd676_row2_col2\" class=\"data row2 col2\" >ref</td>\n",
" <td id=\"T_fd676_row2_col3\" class=\"data row2 col3\" >many</td>\n",
" <td id=\"T_fd676_row2_col4\" class=\"data row2 col4\" >none</td>\n",
" <td id=\"T_fd676_row2_col5\" class=\"data row2 col5\" >False</td>\n",
" </tr>\n",
" <tr>\n",
" <th id=\"T_fd676_level0_row3\" class=\"row_heading level0 row3\" >3</th>\n",
" <td id=\"T_fd676_row3_col0\" class=\"data row3 col0\" >10000004</td>\n",
" <td id=\"T_fd676_row3_col1\" class=\"data row3 col1\" >person.age</td>\n",
" <td id=\"T_fd676_row3_col2\" class=\"data row3 col2\" >int</td>\n",
" <td id=\"T_fd676_row3_col3\" class=\"data row3 col3\" >one</td>\n",
" <td id=\"T_fd676_row3_col4\" class=\"data row3 col4\" >none</td>\n",
" <td id=\"T_fd676_row3_col5\" class=\"data row3 col5\" >False</td>\n",
" </tr>\n",
" <tr>\n",
" <th id=\"T_fd676_level0_row4\" class=\"row_heading level0 row4\" >4</th>\n",
" <td id=\"T_fd676_row4_col0\" class=\"data row4 col0\" >10000005</td>\n",
" <td id=\"T_fd676_row4_col1\" class=\"data row4 col1\" >person.id</td>\n",
" <td id=\"T_fd676_row4_col2\" class=\"data row4 col2\" >string</td>\n",
" <td id=\"T_fd676_row4_col3\" class=\"data row4 col3\" >one</td>\n",
" <td id=\"T_fd676_row4_col4\" class=\"data row4 col4\" >unique</td>\n",
" <td id=\"T_fd676_row4_col5\" class=\"data row4 col5\" >False</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n"
],
"text/plain": [
"<pandas.io.formats.style.Styler at 0x1036aad40>"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
":db schema"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We can rename the attribute:"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<style type=\"text/css\">\n",
"#T_a12bc_row0_col0 {\n",
" color: black;\n",
"}\n",
"</style>\n",
"<table id=\"T_a12bc\">\n",
" <thead>\n",
" <tr>\n",
" <th class=\"blank level0\" >&nbsp;</th>\n",
" <th id=\"T_a12bc_level0_col0\" class=\"col_heading level0 col0\" >status</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th id=\"T_a12bc_level0_row0\" class=\"row_heading level0 row0\" >0</th>\n",
" <td id=\"T_a12bc_row0_col0\" class=\"data row0 col0\" >OK</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n"
],
"text/plain": [
"<pandas.io.formats.style.Styler at 0x10bd086d0>"
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
":db rename attr person.id person.pid"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<style type=\"text/css\">\n",
"#T_429c2_row0_col0, #T_429c2_row0_col5, #T_429c2_row1_col0, #T_429c2_row1_col5, #T_429c2_row2_col0, #T_429c2_row2_col5, #T_429c2_row3_col0, #T_429c2_row3_col5, #T_429c2_row4_col0, #T_429c2_row4_col5 {\n",
" color: #307fc1;\n",
"}\n",
"#T_429c2_row0_col1, #T_429c2_row0_col2, #T_429c2_row0_col3, #T_429c2_row0_col4, #T_429c2_row1_col1, #T_429c2_row1_col2, #T_429c2_row1_col3, #T_429c2_row1_col4, #T_429c2_row2_col1, #T_429c2_row2_col2, #T_429c2_row2_col3, #T_429c2_row2_col4, #T_429c2_row3_col1, #T_429c2_row3_col2, #T_429c2_row3_col3, #T_429c2_row3_col4, #T_429c2_row4_col1, #T_429c2_row4_col2, #T_429c2_row4_col3, #T_429c2_row4_col4 {\n",
" color: black;\n",
"}\n",
"</style>\n",
"<table id=\"T_429c2\">\n",
" <thead>\n",
" <tr>\n",
" <th class=\"blank level0\" >&nbsp;</th>\n",
" <th id=\"T_429c2_level0_col0\" class=\"col_heading level0 col0\" >attr_id</th>\n",
" <th id=\"T_429c2_level0_col1\" class=\"col_heading level0 col1\" >name</th>\n",
" <th id=\"T_429c2_level0_col2\" class=\"col_heading level0 col2\" >type</th>\n",
" <th id=\"T_429c2_level0_col3\" class=\"col_heading level0 col3\" >cardinality</th>\n",
" <th id=\"T_429c2_level0_col4\" class=\"col_heading level0 col4\" >index</th>\n",
" <th id=\"T_429c2_level0_col5\" class=\"col_heading level0 col5\" >history</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th id=\"T_429c2_level0_row0\" class=\"row_heading level0 row0\" >0</th>\n",
" <td id=\"T_429c2_row0_col0\" class=\"data row0 col0\" >10000001</td>\n",
" <td id=\"T_429c2_row0_col1\" class=\"data row0 col1\" >person.first_name</td>\n",
" <td id=\"T_429c2_row0_col2\" class=\"data row0 col2\" >string</td>\n",
" <td id=\"T_429c2_row0_col3\" class=\"data row0 col3\" >one</td>\n",
" <td id=\"T_429c2_row0_col4\" class=\"data row0 col4\" >index</td>\n",
" <td id=\"T_429c2_row0_col5\" class=\"data row0 col5\" >False</td>\n",
" </tr>\n",
" <tr>\n",
" <th id=\"T_429c2_level0_row1\" class=\"row_heading level0 row1\" >1</th>\n",
" <td id=\"T_429c2_row1_col0\" class=\"data row1 col0\" >10000002</td>\n",
" <td id=\"T_429c2_row1_col1\" class=\"data row1 col1\" >person.nick_name</td>\n",
" <td id=\"T_429c2_row1_col2\" class=\"data row1 col2\" >string</td>\n",
" <td id=\"T_429c2_row1_col3\" class=\"data row1 col3\" >many</td>\n",
" <td id=\"T_429c2_row1_col4\" class=\"data row1 col4\" >index</td>\n",
" <td id=\"T_429c2_row1_col5\" class=\"data row1 col5\" >False</td>\n",
" </tr>\n",
" <tr>\n",
" <th id=\"T_429c2_level0_row2\" class=\"row_heading level0 row2\" >2</th>\n",
" <td id=\"T_429c2_row2_col0\" class=\"data row2 col0\" >10000003</td>\n",
" <td id=\"T_429c2_row2_col1\" class=\"data row2 col1\" >person.loves</td>\n",
" <td id=\"T_429c2_row2_col2\" class=\"data row2 col2\" >ref</td>\n",
" <td id=\"T_429c2_row2_col3\" class=\"data row2 col3\" >many</td>\n",
" <td id=\"T_429c2_row2_col4\" class=\"data row2 col4\" >none</td>\n",
" <td id=\"T_429c2_row2_col5\" class=\"data row2 col5\" >False</td>\n",
" </tr>\n",
" <tr>\n",
" <th id=\"T_429c2_level0_row3\" class=\"row_heading level0 row3\" >3</th>\n",
" <td id=\"T_429c2_row3_col0\" class=\"data row3 col0\" >10000004</td>\n",
" <td id=\"T_429c2_row3_col1\" class=\"data row3 col1\" >person.age</td>\n",
" <td id=\"T_429c2_row3_col2\" class=\"data row3 col2\" >int</td>\n",
" <td id=\"T_429c2_row3_col3\" class=\"data row3 col3\" >one</td>\n",
" <td id=\"T_429c2_row3_col4\" class=\"data row3 col4\" >none</td>\n",
" <td id=\"T_429c2_row3_col5\" class=\"data row3 col5\" >False</td>\n",
" </tr>\n",
" <tr>\n",
" <th id=\"T_429c2_level0_row4\" class=\"row_heading level0 row4\" >4</th>\n",
" <td id=\"T_429c2_row4_col0\" class=\"data row4 col0\" >10000005</td>\n",
" <td id=\"T_429c2_row4_col1\" class=\"data row4 col1\" >person.pid</td>\n",
" <td id=\"T_429c2_row4_col2\" class=\"data row4 col2\" >string</td>\n",
" <td id=\"T_429c2_row4_col3\" class=\"data row4 col3\" >one</td>\n",
" <td id=\"T_429c2_row4_col4\" class=\"data row4 col4\" >unique</td>\n",
" <td id=\"T_429c2_row4_col5\" class=\"data row4 col5\" >False</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n"
],
"text/plain": [
"<pandas.io.formats.style.Styler at 0x10bb806d0>"
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
":db schema"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"As well as getting rid of it (this will remove all the data associated with the attribute as well):"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<style type=\"text/css\">\n",
"#T_1b12f_row0_col0 {\n",
" color: black;\n",
"}\n",
"</style>\n",
"<table id=\"T_1b12f\">\n",
" <thead>\n",
" <tr>\n",
" <th class=\"blank level0\" >&nbsp;</th>\n",
" <th id=\"T_1b12f_level0_col0\" class=\"col_heading level0 col0\" >status</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th id=\"T_1b12f_level0_row0\" class=\"row_heading level0 row0\" >0</th>\n",
" <td id=\"T_1b12f_row0_col0\" class=\"data row0 col0\" >OK</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n"
],
"text/plain": [
"<pandas.io.formats.style.Styler at 0x1036ab040>"
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
":db remove attr person.pid"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<style type=\"text/css\">\n",
"#T_0326c_row0_col0, #T_0326c_row0_col5, #T_0326c_row1_col0, #T_0326c_row1_col5, #T_0326c_row2_col0, #T_0326c_row2_col5, #T_0326c_row3_col0, #T_0326c_row3_col5 {\n",
" color: #307fc1;\n",
"}\n",
"#T_0326c_row0_col1, #T_0326c_row0_col2, #T_0326c_row0_col3, #T_0326c_row0_col4, #T_0326c_row1_col1, #T_0326c_row1_col2, #T_0326c_row1_col3, #T_0326c_row1_col4, #T_0326c_row2_col1, #T_0326c_row2_col2, #T_0326c_row2_col3, #T_0326c_row2_col4, #T_0326c_row3_col1, #T_0326c_row3_col2, #T_0326c_row3_col3, #T_0326c_row3_col4 {\n",
" color: black;\n",
"}\n",
"</style>\n",
"<table id=\"T_0326c\">\n",
" <thead>\n",
" <tr>\n",
" <th class=\"blank level0\" >&nbsp;</th>\n",
" <th id=\"T_0326c_level0_col0\" class=\"col_heading level0 col0\" >attr_id</th>\n",
" <th id=\"T_0326c_level0_col1\" class=\"col_heading level0 col1\" >name</th>\n",
" <th id=\"T_0326c_level0_col2\" class=\"col_heading level0 col2\" >type</th>\n",
" <th id=\"T_0326c_level0_col3\" class=\"col_heading level0 col3\" >cardinality</th>\n",
" <th id=\"T_0326c_level0_col4\" class=\"col_heading level0 col4\" >index</th>\n",
" <th id=\"T_0326c_level0_col5\" class=\"col_heading level0 col5\" >history</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th id=\"T_0326c_level0_row0\" class=\"row_heading level0 row0\" >0</th>\n",
" <td id=\"T_0326c_row0_col0\" class=\"data row0 col0\" >10000001</td>\n",
" <td id=\"T_0326c_row0_col1\" class=\"data row0 col1\" >person.first_name</td>\n",
" <td id=\"T_0326c_row0_col2\" class=\"data row0 col2\" >string</td>\n",
" <td id=\"T_0326c_row0_col3\" class=\"data row0 col3\" >one</td>\n",
" <td id=\"T_0326c_row0_col4\" class=\"data row0 col4\" >index</td>\n",
" <td id=\"T_0326c_row0_col5\" class=\"data row0 col5\" >False</td>\n",
" </tr>\n",
" <tr>\n",
" <th id=\"T_0326c_level0_row1\" class=\"row_heading level0 row1\" >1</th>\n",
" <td id=\"T_0326c_row1_col0\" class=\"data row1 col0\" >10000002</td>\n",
" <td id=\"T_0326c_row1_col1\" class=\"data row1 col1\" >person.nick_name</td>\n",
" <td id=\"T_0326c_row1_col2\" class=\"data row1 col2\" >string</td>\n",
" <td id=\"T_0326c_row1_col3\" class=\"data row1 col3\" >many</td>\n",
" <td id=\"T_0326c_row1_col4\" class=\"data row1 col4\" >index</td>\n",
" <td id=\"T_0326c_row1_col5\" class=\"data row1 col5\" >False</td>\n",
" </tr>\n",
" <tr>\n",
" <th id=\"T_0326c_level0_row2\" class=\"row_heading level0 row2\" >2</th>\n",
" <td id=\"T_0326c_row2_col0\" class=\"data row2 col0\" >10000003</td>\n",
" <td id=\"T_0326c_row2_col1\" class=\"data row2 col1\" >person.loves</td>\n",
" <td id=\"T_0326c_row2_col2\" class=\"data row2 col2\" >ref</td>\n",
" <td id=\"T_0326c_row2_col3\" class=\"data row2 col3\" >many</td>\n",
" <td id=\"T_0326c_row2_col4\" class=\"data row2 col4\" >none</td>\n",
" <td id=\"T_0326c_row2_col5\" class=\"data row2 col5\" >False</td>\n",
" </tr>\n",
" <tr>\n",
" <th id=\"T_0326c_level0_row3\" class=\"row_heading level0 row3\" >3</th>\n",
" <td id=\"T_0326c_row3_col0\" class=\"data row3 col0\" >10000004</td>\n",
" <td id=\"T_0326c_row3_col1\" class=\"data row3 col1\" >person.age</td>\n",
" <td id=\"T_0326c_row3_col2\" class=\"data row3 col2\" >int</td>\n",
" <td id=\"T_0326c_row3_col3\" class=\"data row3 col3\" >one</td>\n",
" <td id=\"T_0326c_row3_col4\" class=\"data row3 col4\" >none</td>\n",
" <td id=\"T_0326c_row3_col5\" class=\"data row3 col5\" >False</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n"
],
"text/plain": [
"<pandas.io.formats.style.Styler at 0x1036aadd0>"
]
},
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
":db schema"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"But that's about it. Except its name, an attribute is _immutable_ and you cannot change a `string` attribute to a `ref` attribute, nor can you decide that your `one` attribute should really be `many`.\n",
"\n",
"So what do we mean when we said that this kind of structure can deal with new requirements? Say you initially made the `person.loves` attribute one-to-one and made `person.last_name` a unique index, and now you need to change them. But you need to change them not because the requirements have changed. You need to change them because you have made _mistakes_ at the beginning. These mistakes are fixed by, for example, first rename the offending attributes, then create a new attribute with the old name, next copy the data from the old attribute to the new attribute, and finally delete the old, wrong attribute. Fixing mistakes should be explicit, and this is procedure is very explicit."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"New requirements are not mistakes, and they do not invalidate your old data or schema. Examples of changing requirements: you now need to record the passport number and the parent-child relationships of the people in your graph. Very easy:"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<style type=\"text/css\">\n",
"#T_3b816_row0_col0, #T_3b816_row1_col0 {\n",
" color: #307fc1;\n",
"}\n",
"#T_3b816_row0_col1, #T_3b816_row1_col1 {\n",
" color: black;\n",
"}\n",
"</style>\n",
"<table id=\"T_3b816\">\n",
" <thead>\n",
" <tr>\n",
" <th class=\"blank level0\" >&nbsp;</th>\n",
" <th id=\"T_3b816_level0_col0\" class=\"col_heading level0 col0\" >attr_id</th>\n",
" <th id=\"T_3b816_level0_col1\" class=\"col_heading level0 col1\" >op</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th id=\"T_3b816_level0_row0\" class=\"row_heading level0 row0\" >0</th>\n",
" <td id=\"T_3b816_row0_col0\" class=\"data row0 col0\" >10000006</td>\n",
" <td id=\"T_3b816_row0_col1\" class=\"data row0 col1\" >assert</td>\n",
" </tr>\n",
" <tr>\n",
" <th id=\"T_3b816_level0_row1\" class=\"row_heading level0 row1\" >1</th>\n",
" <td id=\"T_3b816_row1_col0\" class=\"data row1 col0\" >10000007</td>\n",
" <td id=\"T_3b816_row1_col1\" class=\"data row1 col1\" >assert</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n"
],
"text/plain": [
"<pandas.io.formats.style.Styler at 0x10bd09000>"
]
},
"execution_count": 9,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
":schema\n",
"\n",
":put person.passport_no: string many index;\n",
":put person.parent_of: ref many;"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<style type=\"text/css\">\n",
"#T_d093e_row0_col0, #T_d093e_row0_col5, #T_d093e_row1_col0, #T_d093e_row1_col5, #T_d093e_row2_col0, #T_d093e_row2_col5, #T_d093e_row3_col0, #T_d093e_row3_col5, #T_d093e_row4_col0, #T_d093e_row4_col5, #T_d093e_row5_col0, #T_d093e_row5_col5 {\n",
" color: #307fc1;\n",
"}\n",
"#T_d093e_row0_col1, #T_d093e_row0_col2, #T_d093e_row0_col3, #T_d093e_row0_col4, #T_d093e_row1_col1, #T_d093e_row1_col2, #T_d093e_row1_col3, #T_d093e_row1_col4, #T_d093e_row2_col1, #T_d093e_row2_col2, #T_d093e_row2_col3, #T_d093e_row2_col4, #T_d093e_row3_col1, #T_d093e_row3_col2, #T_d093e_row3_col3, #T_d093e_row3_col4, #T_d093e_row4_col1, #T_d093e_row4_col2, #T_d093e_row4_col3, #T_d093e_row4_col4, #T_d093e_row5_col1, #T_d093e_row5_col2, #T_d093e_row5_col3, #T_d093e_row5_col4 {\n",
" color: black;\n",
"}\n",
"</style>\n",
"<table id=\"T_d093e\">\n",
" <thead>\n",
" <tr>\n",
" <th class=\"blank level0\" >&nbsp;</th>\n",
" <th id=\"T_d093e_level0_col0\" class=\"col_heading level0 col0\" >attr_id</th>\n",
" <th id=\"T_d093e_level0_col1\" class=\"col_heading level0 col1\" >name</th>\n",
" <th id=\"T_d093e_level0_col2\" class=\"col_heading level0 col2\" >type</th>\n",
" <th id=\"T_d093e_level0_col3\" class=\"col_heading level0 col3\" >cardinality</th>\n",
" <th id=\"T_d093e_level0_col4\" class=\"col_heading level0 col4\" >index</th>\n",
" <th id=\"T_d093e_level0_col5\" class=\"col_heading level0 col5\" >history</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th id=\"T_d093e_level0_row0\" class=\"row_heading level0 row0\" >0</th>\n",
" <td id=\"T_d093e_row0_col0\" class=\"data row0 col0\" >10000001</td>\n",
" <td id=\"T_d093e_row0_col1\" class=\"data row0 col1\" >person.first_name</td>\n",
" <td id=\"T_d093e_row0_col2\" class=\"data row0 col2\" >string</td>\n",
" <td id=\"T_d093e_row0_col3\" class=\"data row0 col3\" >one</td>\n",
" <td id=\"T_d093e_row0_col4\" class=\"data row0 col4\" >index</td>\n",
" <td id=\"T_d093e_row0_col5\" class=\"data row0 col5\" >False</td>\n",
" </tr>\n",
" <tr>\n",
" <th id=\"T_d093e_level0_row1\" class=\"row_heading level0 row1\" >1</th>\n",
" <td id=\"T_d093e_row1_col0\" class=\"data row1 col0\" >10000002</td>\n",
" <td id=\"T_d093e_row1_col1\" class=\"data row1 col1\" >person.nick_name</td>\n",
" <td id=\"T_d093e_row1_col2\" class=\"data row1 col2\" >string</td>\n",
" <td id=\"T_d093e_row1_col3\" class=\"data row1 col3\" >many</td>\n",
" <td id=\"T_d093e_row1_col4\" class=\"data row1 col4\" >index</td>\n",
" <td id=\"T_d093e_row1_col5\" class=\"data row1 col5\" >False</td>\n",
" </tr>\n",
" <tr>\n",
" <th id=\"T_d093e_level0_row2\" class=\"row_heading level0 row2\" >2</th>\n",
" <td id=\"T_d093e_row2_col0\" class=\"data row2 col0\" >10000003</td>\n",
" <td id=\"T_d093e_row2_col1\" class=\"data row2 col1\" >person.loves</td>\n",
" <td id=\"T_d093e_row2_col2\" class=\"data row2 col2\" >ref</td>\n",
" <td id=\"T_d093e_row2_col3\" class=\"data row2 col3\" >many</td>\n",
" <td id=\"T_d093e_row2_col4\" class=\"data row2 col4\" >none</td>\n",
" <td id=\"T_d093e_row2_col5\" class=\"data row2 col5\" >False</td>\n",
" </tr>\n",
" <tr>\n",
" <th id=\"T_d093e_level0_row3\" class=\"row_heading level0 row3\" >3</th>\n",
" <td id=\"T_d093e_row3_col0\" class=\"data row3 col0\" >10000004</td>\n",
" <td id=\"T_d093e_row3_col1\" class=\"data row3 col1\" >person.age</td>\n",
" <td id=\"T_d093e_row3_col2\" class=\"data row3 col2\" >int</td>\n",
" <td id=\"T_d093e_row3_col3\" class=\"data row3 col3\" >one</td>\n",
" <td id=\"T_d093e_row3_col4\" class=\"data row3 col4\" >none</td>\n",
" <td id=\"T_d093e_row3_col5\" class=\"data row3 col5\" >False</td>\n",
" </tr>\n",
" <tr>\n",
" <th id=\"T_d093e_level0_row4\" class=\"row_heading level0 row4\" >4</th>\n",
" <td id=\"T_d093e_row4_col0\" class=\"data row4 col0\" >10000006</td>\n",
" <td id=\"T_d093e_row4_col1\" class=\"data row4 col1\" >person.passport_no</td>\n",
" <td id=\"T_d093e_row4_col2\" class=\"data row4 col2\" >string</td>\n",
" <td id=\"T_d093e_row4_col3\" class=\"data row4 col3\" >many</td>\n",
" <td id=\"T_d093e_row4_col4\" class=\"data row4 col4\" >index</td>\n",
" <td id=\"T_d093e_row4_col5\" class=\"data row4 col5\" >False</td>\n",
" </tr>\n",
" <tr>\n",
" <th id=\"T_d093e_level0_row5\" class=\"row_heading level0 row5\" >5</th>\n",
" <td id=\"T_d093e_row5_col0\" class=\"data row5 col0\" >10000007</td>\n",
" <td id=\"T_d093e_row5_col1\" class=\"data row5 col1\" >person.parent_of</td>\n",
" <td id=\"T_d093e_row5_col2\" class=\"data row5 col2\" >ref</td>\n",
" <td id=\"T_d093e_row5_col3\" class=\"data row5 col3\" >many</td>\n",
" <td id=\"T_d093e_row5_col4\" class=\"data row5 col4\" >none</td>\n",
" <td id=\"T_d093e_row5_col5\" class=\"data row5 col5\" >False</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n"
],
"text/plain": [
"<pandas.io.formats.style.Styler at 0x10bd0a320>"
]
},
"execution_count": 10,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
":db schema"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Data with schema"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Let's reinstate the `person.id` attribute first:"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<style type=\"text/css\">\n",
"#T_8aaad_row0_col0 {\n",
" color: #307fc1;\n",
"}\n",
"#T_8aaad_row0_col1 {\n",
" color: black;\n",
"}\n",
"</style>\n",
"<table id=\"T_8aaad\">\n",
" <thead>\n",
" <tr>\n",
" <th class=\"blank level0\" >&nbsp;</th>\n",
" <th id=\"T_8aaad_level0_col0\" class=\"col_heading level0 col0\" >attr_id</th>\n",
" <th id=\"T_8aaad_level0_col1\" class=\"col_heading level0 col1\" >op</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th id=\"T_8aaad_level0_row0\" class=\"row_heading level0 row0\" >0</th>\n",
" <td id=\"T_8aaad_row0_col0\" class=\"data row0 col0\" >10000008</td>\n",
" <td id=\"T_8aaad_row0_col1\" class=\"data row0 col1\" >assert</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n"
],
"text/plain": [
"<pandas.io.formats.style.Styler at 0x10bd0ba30>"
]
},
"execution_count": 11,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
":schema\n",
"\n",
":put person.id: string one unique;"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"and now we add data to our database. First we add a person called Peter. Besides the `:tx` at the top indicating that we want to execute a transaction, it is just a map:"
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<style type=\"text/css\">\n",
"#T_1c6a1_row0_col0, #T_1c6a1_row0_col1 {\n",
" color: #307fc1;\n",
"}\n",
"</style>\n",
"<table id=\"T_1c6a1\">\n",
" <thead>\n",
" <tr>\n",
" <th class=\"blank level0\" >&nbsp;</th>\n",
" <th id=\"T_1c6a1_level0_col0\" class=\"col_heading level0 col0\" >asserts</th>\n",
" <th id=\"T_1c6a1_level0_col1\" class=\"col_heading level0 col1\" >retracts</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th id=\"T_1c6a1_level0_row0\" class=\"row_heading level0 row0\" >0</th>\n",
" <td id=\"T_1c6a1_row0_col0\" class=\"data row0 col0\" >3</td>\n",
" <td id=\"T_1c6a1_row0_col1\" class=\"data row0 col1\" >0</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n"
],
"text/plain": [
"<pandas.io.formats.style.Styler at 0x10bd08c10>"
]
},
"execution_count": 12,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
":tx\n",
"\n",
"{ person.first_name: 'Peter', person.nick_name: 'Pan', person.id: 'p' }"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"You can insert multiple 'rows' at the same time, and the maps also allow some stylistic variations:"
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<style type=\"text/css\">\n",
"#T_342f5_row0_col0, #T_342f5_row0_col1 {\n",
" color: #307fc1;\n",
"}\n",
"</style>\n",
"<table id=\"T_342f5\">\n",
" <thead>\n",
" <tr>\n",
" <th class=\"blank level0\" >&nbsp;</th>\n",
" <th id=\"T_342f5_level0_col0\" class=\"col_heading level0 col0\" >asserts</th>\n",
" <th id=\"T_342f5_level0_col1\" class=\"col_heading level0 col1\" >retracts</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th id=\"T_342f5_level0_row0\" class=\"row_heading level0 row0\" >0</th>\n",
" <td id=\"T_342f5_row0_col0\" class=\"data row0 col0\" >6</td>\n",
" <td id=\"T_342f5_row0_col1\" class=\"data row0 col1\" >0</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n"
],
"text/plain": [
"<pandas.io.formats.style.Styler at 0x10bd0b940>"
]
},
"execution_count": 13,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
":tx\n",
"\n",
"{\"person.first_name\": \"Quin\", \"*person.nick_name\": [\"Q\", \"The Quick\"], \"person.id\": \"q\"}\n",
"{\"person.first_name\": \"Rich\", \"person.id\": \"r\"}"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Every entity is free to have any combination of attributes suitable for it. Note how we specified several nicknames for Quin at the same time, and Rich does not have a nickname."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"To query the triples, use _triple rules_: these look like a list of three items, except there is no comma inside. The first slot contains the _entity id_ assigned by the system, the middle symbol is the attribute name and must be explicit (can't be a variable), and the last slot contains the value for the attribute. In fact, you should interpret the attribute name in the middle as an _operator_, that's why there are no commas around it:"
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<style type=\"text/css\">\n",
"#T_594db_row0_col0, #T_594db_row0_col1, #T_594db_row0_col2, #T_594db_row1_col0, #T_594db_row1_col1, #T_594db_row1_col2, #T_594db_row2_col0, #T_594db_row2_col1, #T_594db_row2_col2 {\n",
" color: black;\n",
"}\n",
"</style>\n",
"<table id=\"T_594db\">\n",
" <thead>\n",
" <tr>\n",
" <th class=\"blank level0\" >&nbsp;</th>\n",
" <th id=\"T_594db_level0_col0\" class=\"col_heading level0 col0\" >eid</th>\n",
" <th id=\"T_594db_level0_col1\" class=\"col_heading level0 col1\" >first_name</th>\n",
" <th id=\"T_594db_level0_col2\" class=\"col_heading level0 col2\" >nick_name</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th id=\"T_594db_level0_row0\" class=\"row_heading level0 row0\" >0</th>\n",
" <td id=\"T_594db_row0_col0\" class=\"data row0 col0\" >f26fc8c4-388e-11ed-8b86-b7091d48cdc7</td>\n",
" <td id=\"T_594db_row0_col1\" class=\"data row0 col1\" >Peter</td>\n",
" <td id=\"T_594db_row0_col2\" class=\"data row0 col2\" >Pan</td>\n",
" </tr>\n",
" <tr>\n",
" <th id=\"T_594db_level0_row1\" class=\"row_heading level0 row1\" >1</th>\n",
" <td id=\"T_594db_row1_col0\" class=\"data row1 col0\" >f3478ab6-388e-11ed-9737-b3eeb128adfc</td>\n",
" <td id=\"T_594db_row1_col1\" class=\"data row1 col1\" >Quin</td>\n",
" <td id=\"T_594db_row1_col2\" class=\"data row1 col2\" >Q</td>\n",
" </tr>\n",
" <tr>\n",
" <th id=\"T_594db_level0_row2\" class=\"row_heading level0 row2\" >2</th>\n",
" <td id=\"T_594db_row2_col0\" class=\"data row2 col0\" >f3478ab6-388e-11ed-9737-b3eeb128adfc</td>\n",
" <td id=\"T_594db_row2_col1\" class=\"data row2 col1\" >Quin</td>\n",
" <td id=\"T_594db_row2_col2\" class=\"data row2 col2\" >The Quick</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n"
],
"text/plain": [
"<pandas.io.formats.style.Styler at 0x10bd0a5c0>"
]
},
"execution_count": 14,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"?[eid, first_name, nick_name] := [eid person.nick_name nick_name], \n",
" [eid person.first_name first_name]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Besides the above _explicit querying_, there is another way to get attributes associated with an entity: you may specify an _pull directive_ which will expand an integer (interpreted as an entity ID) into a map containing its specified attributes. Observe:"
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<style type=\"text/css\">\n",
"#T_aff6d_row0_col0, #T_aff6d_row1_col0, #T_aff6d_row2_col0 {\n",
" color: black;\n",
"}\n",
"#T_aff6d_row0_col1, #T_aff6d_row1_col1, #T_aff6d_row2_col1 {\n",
" color: #bf5b3d;\n",
"}\n",
"</style>\n",
"<table id=\"T_aff6d\">\n",
" <thead>\n",
" <tr>\n",
" <th class=\"blank level0\" >&nbsp;</th>\n",
" <th id=\"T_aff6d_level0_col0\" class=\"col_heading level0 col0\" >pid</th>\n",
" <th id=\"T_aff6d_level0_col1\" class=\"col_heading level0 col1\" >eid</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th id=\"T_aff6d_level0_row0\" class=\"row_heading level0 row0\" >0</th>\n",
" <td id=\"T_aff6d_row0_col0\" class=\"data row0 col0\" >p</td>\n",
" <td id=\"T_aff6d_row0_col1\" class=\"data row0 col1\" >{'_id': 'f26fc8c4-388e-11ed-8b86-b7091d48cdc7', 'person.age': None, 'person.first_name': 'Peter', 'person.nick_name': ['Pan']}</td>\n",
" </tr>\n",
" <tr>\n",
" <th id=\"T_aff6d_level0_row1\" class=\"row_heading level0 row1\" >1</th>\n",
" <td id=\"T_aff6d_row1_col0\" class=\"data row1 col0\" >q</td>\n",
" <td id=\"T_aff6d_row1_col1\" class=\"data row1 col1\" >{'_id': 'f3478ab6-388e-11ed-9737-b3eeb128adfc', 'person.age': None, 'person.first_name': 'Quin', 'person.nick_name': ['Q', 'The Quick']}</td>\n",
" </tr>\n",
" <tr>\n",
" <th id=\"T_aff6d_level0_row2\" class=\"row_heading level0 row2\" >2</th>\n",
" <td id=\"T_aff6d_row2_col0\" class=\"data row2 col0\" >r</td>\n",
" <td id=\"T_aff6d_row2_col1\" class=\"data row2 col1\" >{'_id': 'f3478b10-388e-11ed-8ca1-3ab031344a45', 'person.age': None, 'person.first_name': 'Rich', 'person.nick_name': []}</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n"
],
"text/plain": [
"<pandas.io.formats.style.Styler at 0x10bd09900>"
]
},
"execution_count": 15,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"?[pid, eid] := [eid person.id pid]\n",
"\n",
":pull eid {person.first_name, person.nick_name, person.age}"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"If you have several entry bindings that are entities, you can specify several `:pull` directives one after another, but each output binding can have at most one pull directive associated with it."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Another notable thing is that pulls always return a map, even if some of the requested attributes are missing for the entity (they are filled with `null` instead). In constrast, observe that the query not using pull directive did not return Rich, but returned Quin twice. As can be seen above, the pull also deals with to-many relationships automatically."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Pulls can have nested directives (see the manual for details) and can traverse `ref` triples in the reverse direction. But otherwise pull directives are kept deliberately simple. They are only intended for output processing. If you want recursions, non-trivial filters and the like, do it in the Datalog query instead."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Insertions in the triple store actually amounts to _assertions_ of facts. If two conflicting facts are asserted, the last one wins:"
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<style type=\"text/css\">\n",
"#T_db1aa_row0_col0, #T_db1aa_row0_col1 {\n",
" color: #307fc1;\n",
"}\n",
"</style>\n",
"<table id=\"T_db1aa\">\n",
" <thead>\n",
" <tr>\n",
" <th class=\"blank level0\" >&nbsp;</th>\n",
" <th id=\"T_db1aa_level0_col0\" class=\"col_heading level0 col0\" >asserts</th>\n",
" <th id=\"T_db1aa_level0_col1\" class=\"col_heading level0 col1\" >retracts</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th id=\"T_db1aa_level0_row0\" class=\"row_heading level0 row0\" >0</th>\n",
" <td id=\"T_db1aa_row0_col0\" class=\"data row0 col0\" >1</td>\n",
" <td id=\"T_db1aa_row0_col1\" class=\"data row0 col1\" >0</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n"
],
"text/plain": [
"<pandas.io.formats.style.Styler at 0x10bd0a4a0>"
]
},
"execution_count": 16,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
":tx\n",
"\n",
"{_key: ['person.id', 'p'], person.first_name: \"Pete\"}"
]
},
{
"cell_type": "code",
"execution_count": 17,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<style type=\"text/css\">\n",
"#T_d5943_row0_col0 {\n",
" color: black;\n",
"}\n",
"#T_d5943_row0_col1 {\n",
" color: #bf5b3d;\n",
"}\n",
"</style>\n",
"<table id=\"T_d5943\">\n",
" <thead>\n",
" <tr>\n",
" <th class=\"blank level0\" >&nbsp;</th>\n",
" <th id=\"T_d5943_level0_col0\" class=\"col_heading level0 col0\" >pid</th>\n",
" <th id=\"T_d5943_level0_col1\" class=\"col_heading level0 col1\" >eid</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th id=\"T_d5943_level0_row0\" class=\"row_heading level0 row0\" >0</th>\n",
" <td id=\"T_d5943_row0_col0\" class=\"data row0 col0\" >p</td>\n",
" <td id=\"T_d5943_row0_col1\" class=\"data row0 col1\" >{'_id': 'f26fc8c4-388e-11ed-8b86-b7091d48cdc7', 'person.first_name': 'Pete', 'person.nick_name': ['Pan']}</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n"
],
"text/plain": [
"<pandas.io.formats.style.Styler at 0x10bd0ad10>"
]
},
"execution_count": 17,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"?[pid, eid] := [eid person.id pid], pid == 'p'\n",
"\n",
":pull eid {person.first_name, person.nick_name}"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Here we specified an existing entity by providing `_key` with an attribute name and a unique value for the attribute. You can only refer to entities this way if the attribute is uniquely indexed. You can also specify an entity by providing its `_id`, but if you have a unique key to use, it is often much clearer."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The next transaction is superficially similar to the last one. But in this case, `person.nick_name` has cardinality `many` instead of `one`:"
]
},
{
"cell_type": "code",
"execution_count": 18,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<style type=\"text/css\">\n",
"#T_b58bf_row0_col0, #T_b58bf_row0_col1 {\n",
" color: #307fc1;\n",
"}\n",
"</style>\n",
"<table id=\"T_b58bf\">\n",
" <thead>\n",
" <tr>\n",
" <th class=\"blank level0\" >&nbsp;</th>\n",
" <th id=\"T_b58bf_level0_col0\" class=\"col_heading level0 col0\" >asserts</th>\n",
" <th id=\"T_b58bf_level0_col1\" class=\"col_heading level0 col1\" >retracts</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th id=\"T_b58bf_level0_row0\" class=\"row_heading level0 row0\" >0</th>\n",
" <td id=\"T_b58bf_row0_col0\" class=\"data row0 col0\" >1</td>\n",
" <td id=\"T_b58bf_row0_col1\" class=\"data row0 col1\" >0</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n"
],
"text/plain": [
"<pandas.io.formats.style.Styler at 0x1036ab2b0>"
]
},
"execution_count": 18,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
":tx\n",
"\n",
"{_key: ['person.id', 'p'], person.nick_name: \"Ping\"}"
]
},
{
"cell_type": "code",
"execution_count": 19,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<style type=\"text/css\">\n",
"#T_c0d7e_row0_col0 {\n",
" color: black;\n",
"}\n",
"#T_c0d7e_row0_col1 {\n",
" color: #bf5b3d;\n",
"}\n",
"</style>\n",
"<table id=\"T_c0d7e\">\n",
" <thead>\n",
" <tr>\n",
" <th class=\"blank level0\" >&nbsp;</th>\n",
" <th id=\"T_c0d7e_level0_col0\" class=\"col_heading level0 col0\" >pid</th>\n",
" <th id=\"T_c0d7e_level0_col1\" class=\"col_heading level0 col1\" >eid</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th id=\"T_c0d7e_level0_row0\" class=\"row_heading level0 row0\" >0</th>\n",
" <td id=\"T_c0d7e_row0_col0\" class=\"data row0 col0\" >p</td>\n",
" <td id=\"T_c0d7e_row0_col1\" class=\"data row0 col1\" >{'_id': 'f26fc8c4-388e-11ed-8b86-b7091d48cdc7', 'person.first_name': 'Pete', 'person.nick_name': ['Pan', 'Ping']}</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n"
],
"text/plain": [
"<pandas.io.formats.style.Styler at 0x10bd095d0>"
]
},
"execution_count": 19,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"?[pid, eid] := [eid person.id pid], pid == 'p'\n",
"\n",
":pull eid {person.first_name, person.nick_name}"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now the new nick name is simply recorded together with the last one. Note that if you try to add the same nickname for the same person again, you still get only one copy instead of two:"
]
},
{
"cell_type": "code",
"execution_count": 20,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<style type=\"text/css\">\n",
"#T_ad5e7_row0_col0, #T_ad5e7_row0_col1 {\n",
" color: #307fc1;\n",
"}\n",
"</style>\n",
"<table id=\"T_ad5e7\">\n",
" <thead>\n",
" <tr>\n",
" <th class=\"blank level0\" >&nbsp;</th>\n",
" <th id=\"T_ad5e7_level0_col0\" class=\"col_heading level0 col0\" >asserts</th>\n",
" <th id=\"T_ad5e7_level0_col1\" class=\"col_heading level0 col1\" >retracts</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th id=\"T_ad5e7_level0_row0\" class=\"row_heading level0 row0\" >0</th>\n",
" <td id=\"T_ad5e7_row0_col0\" class=\"data row0 col0\" >1</td>\n",
" <td id=\"T_ad5e7_row0_col1\" class=\"data row0 col1\" >0</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n"
],
"text/plain": [
"<pandas.io.formats.style.Styler at 0x10bd0aa70>"
]
},
"execution_count": 20,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
":tx\n",
"\n",
"{_key: ['person.id', 'p'], person.nick_name: \"Ping\"}"
]
},
{
"cell_type": "code",
"execution_count": 21,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<style type=\"text/css\">\n",
"#T_04b51_row0_col0 {\n",
" color: black;\n",
"}\n",
"#T_04b51_row0_col1 {\n",
" color: #bf5b3d;\n",
"}\n",
"</style>\n",
"<table id=\"T_04b51\">\n",
" <thead>\n",
" <tr>\n",
" <th class=\"blank level0\" >&nbsp;</th>\n",
" <th id=\"T_04b51_level0_col0\" class=\"col_heading level0 col0\" >pid</th>\n",
" <th id=\"T_04b51_level0_col1\" class=\"col_heading level0 col1\" >eid</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th id=\"T_04b51_level0_row0\" class=\"row_heading level0 row0\" >0</th>\n",
" <td id=\"T_04b51_row0_col0\" class=\"data row0 col0\" >p</td>\n",
" <td id=\"T_04b51_row0_col1\" class=\"data row0 col1\" >{'_id': 'f26fc8c4-388e-11ed-8b86-b7091d48cdc7', 'person.first_name': 'Pete', 'person.nick_name': ['Pan', 'Ping']}</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n"
],
"text/plain": [
"<pandas.io.formats.style.Styler at 0x10bce6ef0>"
]
},
"execution_count": 21,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"?[pid, eid] := [eid person.id pid], pid == 'p'\n",
"\n",
":pull eid {person.first_name, person.nick_name}"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"As we have seen, triples abide by set semantics instead of bag semantics as well. If you really want to have duplicates, you need to disambiguate them at the level of values, by for example wrapping them in lists."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"To get rid of data, you perform _retractions_:"
]
},
{
"cell_type": "code",
"execution_count": 22,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<style type=\"text/css\">\n",
"#T_f7736_row0_col0, #T_f7736_row0_col1 {\n",
" color: #307fc1;\n",
"}\n",
"</style>\n",
"<table id=\"T_f7736\">\n",
" <thead>\n",
" <tr>\n",
" <th class=\"blank level0\" >&nbsp;</th>\n",
" <th id=\"T_f7736_level0_col0\" class=\"col_heading level0 col0\" >asserts</th>\n",
" <th id=\"T_f7736_level0_col1\" class=\"col_heading level0 col1\" >retracts</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th id=\"T_f7736_level0_row0\" class=\"row_heading level0 row0\" >0</th>\n",
" <td id=\"T_f7736_row0_col0\" class=\"data row0 col0\" >0</td>\n",
" <td id=\"T_f7736_row0_col1\" class=\"data row0 col1\" >2</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n"
],
"text/plain": [
"<pandas.io.formats.style.Styler at 0x10bce6800>"
]
},
"execution_count": 22,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
":tx\n",
"\n",
":retract {_key: ['person.id', 'p'], person.nick_name: \"Ping\", person.first_name: 'Peter'}"
]
},
{
"cell_type": "code",
"execution_count": 23,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<style type=\"text/css\">\n",
"#T_6df5c_row0_col0 {\n",
" color: black;\n",
"}\n",
"#T_6df5c_row0_col1 {\n",
" color: #bf5b3d;\n",
"}\n",
"</style>\n",
"<table id=\"T_6df5c\">\n",
" <thead>\n",
" <tr>\n",
" <th class=\"blank level0\" >&nbsp;</th>\n",
" <th id=\"T_6df5c_level0_col0\" class=\"col_heading level0 col0\" >pid</th>\n",
" <th id=\"T_6df5c_level0_col1\" class=\"col_heading level0 col1\" >eid</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th id=\"T_6df5c_level0_row0\" class=\"row_heading level0 row0\" >0</th>\n",
" <td id=\"T_6df5c_row0_col0\" class=\"data row0 col0\" >p</td>\n",
" <td id=\"T_6df5c_row0_col1\" class=\"data row0 col1\" >{'_id': 'f26fc8c4-388e-11ed-8b86-b7091d48cdc7', 'person.first_name': None, 'person.id': 'p', 'person.nick_name': ['Pan']}</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n"
],
"text/plain": [
"<pandas.io.formats.style.Styler at 0x10bd0a4d0>"
]
},
"execution_count": 23,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"?[pid, eid] := [eid person.id pid], pid == 'p'\n",
"\n",
":pull eid {person.first_name, person.nick_name, person.id}"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"It is OK to retract facts that do not exist, in which case this is just a no-op. Notice that the entity still has its `person.id` attribute: the `_key` specification only indicates what entity to transact. If you want to get rid of the keyed attribute, you have to include it in the transaction map explicitly."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Note that when retracting facts above, we have to provide the database of values for existing triples. This can be cumbersome, especially in the case of to-many attributes --- if you someone miss one value, it will remain. Therefore another form of retraction `retract_all` is provided:"
]
},
{
"cell_type": "code",
"execution_count": 24,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<style type=\"text/css\">\n",
"#T_9a5e5_row0_col0, #T_9a5e5_row0_col1 {\n",
" color: #307fc1;\n",
"}\n",
"</style>\n",
"<table id=\"T_9a5e5\">\n",
" <thead>\n",
" <tr>\n",
" <th class=\"blank level0\" >&nbsp;</th>\n",
" <th id=\"T_9a5e5_level0_col0\" class=\"col_heading level0 col0\" >asserts</th>\n",
" <th id=\"T_9a5e5_level0_col1\" class=\"col_heading level0 col1\" >retracts</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th id=\"T_9a5e5_level0_row0\" class=\"row_heading level0 row0\" >0</th>\n",
" <td id=\"T_9a5e5_row0_col0\" class=\"data row0 col0\" >0</td>\n",
" <td id=\"T_9a5e5_row0_col1\" class=\"data row0 col1\" >2</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n"
],
"text/plain": [
"<pandas.io.formats.style.Styler at 0x10bce7fd0>"
]
},
"execution_count": 24,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
":tx\n",
"\n",
":retract_all {_key: ['person.id', 'p'], person.nick_name: 0, person.first_name: 0, person.id: 0}"
]
},
{
"cell_type": "code",
"execution_count": 25,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<style type=\"text/css\">\n",
"</style>\n",
"<table id=\"T_719dc\">\n",
" <thead>\n",
" <tr>\n",
" <th class=\"blank level0\" >&nbsp;</th>\n",
" <th id=\"T_719dc_level0_col0\" class=\"col_heading level0 col0\" >pid</th>\n",
" <th id=\"T_719dc_level0_col1\" class=\"col_heading level0 col1\" >eid</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" </tbody>\n",
"</table>\n"
],
"text/plain": [
"<pandas.io.formats.style.Styler at 0x10bce6530>"
]
},
"execution_count": 25,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"?[pid, eid] := [eid person.id pid], pid == 'p'\n",
"\n",
":pull eid {person.first_name, person.nick_name, person.id}"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"In this form, you can provide any value for the attributes, the database does not care and just removes all values associated with the attributes. Above we have used `0` since it is simple to type."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Nested data mutations"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We have so far inserted data in units of entities. This is fine for simple cases, but can become awkward for tree or graph shaped data which are linked together in non-trivial ways. We would need to insert some triples first, get ids of some entities (or use their unique keys), and use these to insert other triples."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Instead, Cozo supports nested data insertion. Let's insert our whole love triangle graph all at once."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Recall that our love triangles are:"
]
},
{
"cell_type": "code",
"execution_count": 26,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<style type=\"text/css\">\n",
"#T_ca0b9_row0_col0, #T_ca0b9_row0_col1, #T_ca0b9_row1_col0, #T_ca0b9_row1_col1, #T_ca0b9_row2_col0, #T_ca0b9_row2_col1, #T_ca0b9_row3_col0, #T_ca0b9_row3_col1, #T_ca0b9_row4_col0, #T_ca0b9_row4_col1, #T_ca0b9_row5_col0, #T_ca0b9_row5_col1, #T_ca0b9_row6_col0, #T_ca0b9_row6_col1, #T_ca0b9_row7_col0, #T_ca0b9_row7_col1 {\n",
" color: black;\n",
"}\n",
"</style>\n",
"<table id=\"T_ca0b9\">\n",
" <thead>\n",
" <tr>\n",
" <th class=\"blank level0\" >&nbsp;</th>\n",
" <th id=\"T_ca0b9_level0_col0\" class=\"col_heading level0 col0\" >0</th>\n",
" <th id=\"T_ca0b9_level0_col1\" class=\"col_heading level0 col1\" >1</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th id=\"T_ca0b9_level0_row0\" class=\"row_heading level0 row0\" >0</th>\n",
" <td id=\"T_ca0b9_row0_col0\" class=\"data row0 col0\" >alice</td>\n",
" <td id=\"T_ca0b9_row0_col1\" class=\"data row0 col1\" >eve</td>\n",
" </tr>\n",
" <tr>\n",
" <th id=\"T_ca0b9_level0_row1\" class=\"row_heading level0 row1\" >1</th>\n",
" <td id=\"T_ca0b9_row1_col0\" class=\"data row1 col0\" >bob</td>\n",
" <td id=\"T_ca0b9_row1_col1\" class=\"data row1 col1\" >alice</td>\n",
" </tr>\n",
" <tr>\n",
" <th id=\"T_ca0b9_level0_row2\" class=\"row_heading level0 row2\" >2</th>\n",
" <td id=\"T_ca0b9_row2_col0\" class=\"data row2 col0\" >charlie</td>\n",
" <td id=\"T_ca0b9_row2_col1\" class=\"data row2 col1\" >eve</td>\n",
" </tr>\n",
" <tr>\n",
" <th id=\"T_ca0b9_level0_row3\" class=\"row_heading level0 row3\" >3</th>\n",
" <td id=\"T_ca0b9_row3_col0\" class=\"data row3 col0\" >david</td>\n",
" <td id=\"T_ca0b9_row3_col1\" class=\"data row3 col1\" >george</td>\n",
" </tr>\n",
" <tr>\n",
" <th id=\"T_ca0b9_level0_row4\" class=\"row_heading level0 row4\" >4</th>\n",
" <td id=\"T_ca0b9_row4_col0\" class=\"data row4 col0\" >eve</td>\n",
" <td id=\"T_ca0b9_row4_col1\" class=\"data row4 col1\" >alice</td>\n",
" </tr>\n",
" <tr>\n",
" <th id=\"T_ca0b9_level0_row5\" class=\"row_heading level0 row5\" >5</th>\n",
" <td id=\"T_ca0b9_row5_col0\" class=\"data row5 col0\" >eve</td>\n",
" <td id=\"T_ca0b9_row5_col1\" class=\"data row5 col1\" >bob</td>\n",
" </tr>\n",
" <tr>\n",
" <th id=\"T_ca0b9_level0_row6\" class=\"row_heading level0 row6\" >6</th>\n",
" <td id=\"T_ca0b9_row6_col0\" class=\"data row6 col0\" >eve</td>\n",
" <td id=\"T_ca0b9_row6_col1\" class=\"data row6 col1\" >charlie</td>\n",
" </tr>\n",
" <tr>\n",
" <th id=\"T_ca0b9_level0_row7\" class=\"row_heading level0 row7\" >7</th>\n",
" <td id=\"T_ca0b9_row7_col0\" class=\"data row7 col0\" >george</td>\n",
" <td id=\"T_ca0b9_row7_col1\" class=\"data row7 col1\" >george</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n"
],
"text/plain": [
"<pandas.io.formats.style.Styler at 0x10bce5c90>"
]
},
"execution_count": 26,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"?[] <- [['alice', 'eve'],\n",
" ['bob', 'alice'],\n",
" ['eve', 'alice'],\n",
" ['eve', 'bob'],\n",
" ['eve', 'charlie'],\n",
" ['charlie', 'eve'],\n",
" ['david', 'george'],\n",
" ['george', 'george']]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We insert them into the triple store thus:"
]
},
{
"cell_type": "code",
"execution_count": 27,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<style type=\"text/css\">\n",
"#T_ada78_row0_col0, #T_ada78_row0_col1 {\n",
" color: #307fc1;\n",
"}\n",
"</style>\n",
"<table id=\"T_ada78\">\n",
" <thead>\n",
" <tr>\n",
" <th class=\"blank level0\" >&nbsp;</th>\n",
" <th id=\"T_ada78_level0_col0\" class=\"col_heading level0 col0\" >asserts</th>\n",
" <th id=\"T_ada78_level0_col1\" class=\"col_heading level0 col1\" >retracts</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th id=\"T_ada78_level0_row0\" class=\"row_heading level0 row0\" >0</th>\n",
" <td id=\"T_ada78_row0_col0\" class=\"data row0 col0\" >20</td>\n",
" <td id=\"T_ada78_row0_col1\" class=\"data row0 col1\" >0</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n"
],
"text/plain": [
"<pandas.io.formats.style.Styler at 0x10bce7460>"
]
},
"execution_count": 27,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
":tx\n",
"\n",
"{\n",
" _tid: 'a', \n",
" person.id: 'a', \n",
" person.first_name: 'Alice',\n",
" person.loves: {\n",
" _tid: 'e',\n",
" person.id: 'e',\n",
" person.first_name: 'Eve',\n",
" *person.loves: [\n",
" 'a',\n",
" {\n",
" _tid: 'b',\n",
" person.id: 'b',\n",
" person.first_name: 'Bob',\n",
" person.loves: 'a'\n",
" },\n",
" {\n",
" _tid: 'c',\n",
" person.id: 'c',\n",
" person.first_name: 'Charlie',\n",
" person.loves: 'e'\n",
" }\n",
" ]\n",
" }\n",
"}\n",
"\n",
"{person.id: 'd', person.first_name: 'David', person.loves: 'g'}\n",
"{_tid: 'g', person.id: 'g', person.first_name: 'George', person.loves: 'g'}"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Nested mutations are done simply by using maps for `ref` attribute values. We identified entities that do not yet exist in the database by their `_tid` given inline. `_tid`s can be any string you like _except_ strings that can be interpreted as UUIDs. As before, an asterisk `*` before the attribute name denotes that we are transacting multiple triples into an attribute. As the last two maps in the example shows, you do not need `_tid` if you do not need to refer to an entity, and you can use `_tid` to refer to an entity itself."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Let's see if we get the same results querying the triple store:"
]
},
{
"cell_type": "code",
"execution_count": 28,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<style type=\"text/css\">\n",
"#T_9ebce_row0_col0, #T_9ebce_row0_col1, #T_9ebce_row1_col0, #T_9ebce_row1_col1, #T_9ebce_row2_col0, #T_9ebce_row2_col1, #T_9ebce_row3_col0, #T_9ebce_row3_col1, #T_9ebce_row4_col0, #T_9ebce_row4_col1, #T_9ebce_row5_col0, #T_9ebce_row5_col1, #T_9ebce_row6_col0, #T_9ebce_row6_col1, #T_9ebce_row7_col0, #T_9ebce_row7_col1 {\n",
" color: black;\n",
"}\n",
"</style>\n",
"<table id=\"T_9ebce\">\n",
" <thead>\n",
" <tr>\n",
" <th class=\"blank level0\" >&nbsp;</th>\n",
" <th id=\"T_9ebce_level0_col0\" class=\"col_heading level0 col0\" >loving</th>\n",
" <th id=\"T_9ebce_level0_col1\" class=\"col_heading level0 col1\" >loved</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th id=\"T_9ebce_level0_row0\" class=\"row_heading level0 row0\" >0</th>\n",
" <td id=\"T_9ebce_row0_col0\" class=\"data row0 col0\" >Alice</td>\n",
" <td id=\"T_9ebce_row0_col1\" class=\"data row0 col1\" >Eve</td>\n",
" </tr>\n",
" <tr>\n",
" <th id=\"T_9ebce_level0_row1\" class=\"row_heading level0 row1\" >1</th>\n",
" <td id=\"T_9ebce_row1_col0\" class=\"data row1 col0\" >Bob</td>\n",
" <td id=\"T_9ebce_row1_col1\" class=\"data row1 col1\" >Alice</td>\n",
" </tr>\n",
" <tr>\n",
" <th id=\"T_9ebce_level0_row2\" class=\"row_heading level0 row2\" >2</th>\n",
" <td id=\"T_9ebce_row2_col0\" class=\"data row2 col0\" >Charlie</td>\n",
" <td id=\"T_9ebce_row2_col1\" class=\"data row2 col1\" >Eve</td>\n",
" </tr>\n",
" <tr>\n",
" <th id=\"T_9ebce_level0_row3\" class=\"row_heading level0 row3\" >3</th>\n",
" <td id=\"T_9ebce_row3_col0\" class=\"data row3 col0\" >David</td>\n",
" <td id=\"T_9ebce_row3_col1\" class=\"data row3 col1\" >George</td>\n",
" </tr>\n",
" <tr>\n",
" <th id=\"T_9ebce_level0_row4\" class=\"row_heading level0 row4\" >4</th>\n",
" <td id=\"T_9ebce_row4_col0\" class=\"data row4 col0\" >Eve</td>\n",
" <td id=\"T_9ebce_row4_col1\" class=\"data row4 col1\" >Alice</td>\n",
" </tr>\n",
" <tr>\n",
" <th id=\"T_9ebce_level0_row5\" class=\"row_heading level0 row5\" >5</th>\n",
" <td id=\"T_9ebce_row5_col0\" class=\"data row5 col0\" >Eve</td>\n",
" <td id=\"T_9ebce_row5_col1\" class=\"data row5 col1\" >Bob</td>\n",
" </tr>\n",
" <tr>\n",
" <th id=\"T_9ebce_level0_row6\" class=\"row_heading level0 row6\" >6</th>\n",
" <td id=\"T_9ebce_row6_col0\" class=\"data row6 col0\" >Eve</td>\n",
" <td id=\"T_9ebce_row6_col1\" class=\"data row6 col1\" >Charlie</td>\n",
" </tr>\n",
" <tr>\n",
" <th id=\"T_9ebce_level0_row7\" class=\"row_heading level0 row7\" >7</th>\n",
" <td id=\"T_9ebce_row7_col0\" class=\"data row7 col0\" >George</td>\n",
" <td id=\"T_9ebce_row7_col1\" class=\"data row7 col1\" >George</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n"
],
"text/plain": [
"<pandas.io.formats.style.Styler at 0x10bce5a20>"
]
},
"execution_count": 28,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"?[loving, loved] := [a person.first_name loving], \n",
" [a person.loves b], \n",
" [b person.first_name loved]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Nice!"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### A note on the entity ID"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"As you have probably already noticed, the database assigns UUIDs as entity IDs automatically when we created the entities. You can also create the IDs yourself when doing the creation for more control:"
]
},
{
"cell_type": "code",
"execution_count": 29,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<style type=\"text/css\">\n",
"#T_09968_row0_col0, #T_09968_row0_col1 {\n",
" color: #307fc1;\n",
"}\n",
"</style>\n",
"<table id=\"T_09968\">\n",
" <thead>\n",
" <tr>\n",
" <th class=\"blank level0\" >&nbsp;</th>\n",
" <th id=\"T_09968_level0_col0\" class=\"col_heading level0 col0\" >asserts</th>\n",
" <th id=\"T_09968_level0_col1\" class=\"col_heading level0 col1\" >retracts</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th id=\"T_09968_level0_row0\" class=\"row_heading level0 row0\" >0</th>\n",
" <td id=\"T_09968_row0_col0\" class=\"data row0 col0\" >2</td>\n",
" <td id=\"T_09968_row0_col1\" class=\"data row0 col1\" >0</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n"
],
"text/plain": [
"<pandas.io.formats.style.Styler at 0x10bce7730>"
]
},
"execution_count": 29,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
":tx\n",
"\n",
"{_id: '4e7a35b9-e04d-48a3-9eeb-d8a68ef33c43', person.id: 'u', person.first_name: 'Ursula'}"
]
},
{
"cell_type": "code",
"execution_count": 30,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<style type=\"text/css\">\n",
"#T_a7904_row0_col0 {\n",
" color: #bf5b3d;\n",
"}\n",
"</style>\n",
"<table id=\"T_a7904\">\n",
" <thead>\n",
" <tr>\n",
" <th class=\"blank level0\" >&nbsp;</th>\n",
" <th id=\"T_a7904_level0_col0\" class=\"col_heading level0 col0\" >p</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th id=\"T_a7904_level0_row0\" class=\"row_heading level0 row0\" >0</th>\n",
" <td id=\"T_a7904_row0_col0\" class=\"data row0 col0\" >{'_id': '4e7a35b9-e04d-48a3-9eeb-d8a68ef33c43', 'person.first_name': 'Ursula', 'person.id': 'u'}</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n"
],
"text/plain": [
"<pandas.io.formats.style.Styler at 0x10bce7ca0>"
]
},
"execution_count": 30,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"?[p] <- [['4e7a35b9-e04d-48a3-9eeb-d8a68ef33c43']]\n",
"\n",
":pull p { person.first_name, person.id }"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The system-assigned IDs are UUID version 1 and is contains a timestamp. You can extract the timestamp by using the function `uuid_timestamp`:"
]
},
{
"cell_type": "code",
"execution_count": 31,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<style type=\"text/css\">\n",
"#T_6d00c_row0_col0, #T_6d00c_row1_col0, #T_6d00c_row2_col0, #T_6d00c_row3_col0, #T_6d00c_row4_col0, #T_6d00c_row5_col0, #T_6d00c_row6_col0, #T_6d00c_row7_col0, #T_6d00c_row8_col0 {\n",
" color: black;\n",
"}\n",
"#T_6d00c_row0_col1, #T_6d00c_row1_col1, #T_6d00c_row2_col1, #T_6d00c_row3_col1, #T_6d00c_row4_col1, #T_6d00c_row5_col1, #T_6d00c_row6_col1, #T_6d00c_row7_col1, #T_6d00c_row8_col1 {\n",
" color: #307fc1;\n",
"}\n",
"</style>\n",
"<table id=\"T_6d00c\">\n",
" <thead>\n",
" <tr>\n",
" <th class=\"blank level0\" >&nbsp;</th>\n",
" <th id=\"T_6d00c_level0_col0\" class=\"col_heading level0 col0\" >pid</th>\n",
" <th id=\"T_6d00c_level0_col1\" class=\"col_heading level0 col1\" >ts</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th id=\"T_6d00c_level0_row0\" class=\"row_heading level0 row0\" >0</th>\n",
" <td id=\"T_6d00c_row0_col0\" class=\"data row0 col0\" >a</td>\n",
" <td id=\"T_6d00c_row0_col1\" class=\"data row0 col1\" >1663642232.365364</td>\n",
" </tr>\n",
" <tr>\n",
" <th id=\"T_6d00c_level0_row1\" class=\"row_heading level0 row1\" >1</th>\n",
" <td id=\"T_6d00c_row1_col0\" class=\"data row1 col0\" >b</td>\n",
" <td id=\"T_6d00c_row1_col1\" class=\"data row1 col1\" >1663642232.365371</td>\n",
" </tr>\n",
" <tr>\n",
" <th id=\"T_6d00c_level0_row2\" class=\"row_heading level0 row2\" >2</th>\n",
" <td id=\"T_6d00c_row2_col0\" class=\"data row2 col0\" >c</td>\n",
" <td id=\"T_6d00c_row2_col1\" class=\"data row2 col1\" >1663642232.365372</td>\n",
" </tr>\n",
" <tr>\n",
" <th id=\"T_6d00c_level0_row3\" class=\"row_heading level0 row3\" >3</th>\n",
" <td id=\"T_6d00c_row3_col0\" class=\"data row3 col0\" >d</td>\n",
" <td id=\"T_6d00c_row3_col1\" class=\"data row3 col1\" >1663642232.365372</td>\n",
" </tr>\n",
" <tr>\n",
" <th id=\"T_6d00c_level0_row4\" class=\"row_heading level0 row4\" >4</th>\n",
" <td id=\"T_6d00c_row4_col0\" class=\"data row4 col0\" >e</td>\n",
" <td id=\"T_6d00c_row4_col1\" class=\"data row4 col1\" >1663642232.365370</td>\n",
" </tr>\n",
" <tr>\n",
" <th id=\"T_6d00c_level0_row5\" class=\"row_heading level0 row5\" >5</th>\n",
" <td id=\"T_6d00c_row5_col0\" class=\"data row5 col0\" >g</td>\n",
" <td id=\"T_6d00c_row5_col1\" class=\"data row5 col1\" >1663642232.365373</td>\n",
" </tr>\n",
" <tr>\n",
" <th id=\"T_6d00c_level0_row6\" class=\"row_heading level0 row6\" >6</th>\n",
" <td id=\"T_6d00c_row6_col0\" class=\"data row6 col0\" >q</td>\n",
" <td id=\"T_6d00c_row6_col1\" class=\"data row6 col1\" >1663642213.641695</td>\n",
" </tr>\n",
" <tr>\n",
" <th id=\"T_6d00c_level0_row7\" class=\"row_heading level0 row7\" >7</th>\n",
" <td id=\"T_6d00c_row7_col0\" class=\"data row7 col0\" >r</td>\n",
" <td id=\"T_6d00c_row7_col1\" class=\"data row7 col1\" >1663642213.641704</td>\n",
" </tr>\n",
" <tr>\n",
" <th id=\"T_6d00c_level0_row8\" class=\"row_heading level0 row8\" >8</th>\n",
" <td id=\"T_6d00c_row8_col0\" class=\"data row8 col0\" >u</td>\n",
" <td id=\"T_6d00c_row8_col1\" class=\"data row8 col1\" >nan</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n"
],
"text/plain": [
"<pandas.io.formats.style.Styler at 0x10bce73a0>"
]
},
"execution_count": 31,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"?[pid, ts] := [p person.id pid], ts = uuid_timestamp(p)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The returned numbers indicate seconds since the UNIX epoch. The UUID we made ourselves does not contain a timestamp as it is of version 4. You can provide any valid UUID as entity ID except the 'nil ID' `00000000-0000-0000-0000-000000000000`:"
]
},
{
"cell_type": "code",
"execution_count": 32,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"\u001b[31meval::amend_triple_with_reserved_id\u001b[0m\n",
"\n",
" \u001b[31m×\u001b[0m Attempting to amend triple person.id via reserved ID 00000000-0000-0000-0000-000000000000\n"
]
},
"execution_count": 32,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
":tx\n",
"\n",
"{_id: '00000000-0000-0000-0000-000000000000', person.id: '0', person.first_name: 'I am ZERO'}"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Using the timestamped version has performance benefits: the database sorts UUIDs in a way such that those with similar timestamps are near each other. This provides the kind of data locality similar to an auto-incrementing integer key in a RDBMS, while mitigating the risk of malicious users trying to iterate over your data sequentially, or estimating the cardinality of your data. The UUIDs generated by the system contain only random bits besides the timestamp, in particular there is no node information encoded with them (as allowed but not required by the UUID specification), so users cannot tell on which machine the IDs were generated either. Still, if you want your keys to be completely obscure, provide your own UUIDv4 backed by a good random number generator."
]
},
{
"cell_type": "markdown",
"metadata": {
"tags": []
},
"source": [
"## The time machine"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Your data is changing fast. For administrative or regulative reasons, you may also need records of _how_ your data change. Or you may be presented with historical data in the first place, and you want your queries to reflect facts _at a particular instant of time_."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Someone used to say that 'more columns in a RDBMS solves anything'. In our case, maybe adding more attributes helps? Let's add to each entity the attribute `valid_at` indicating when the entity is considered valid."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"In fact, this is doable, but the resulting system is a total pain to use. First, you will need to _reify_ most of your values. Instead of saying that `[bob person.name 'Bob']`, you need something like `[bob person.used_name name]`, where `[name name.is_spelled 'Bob']` and `[name name.is_valid_at '2020-03-04']`, etc. Next, how are you going to find our what everything was at a particular moment? You cannot use equality conditions to filter entities based on `is_valid_at`, since something that was introduced in 1999 is still valid in 2020, _unless_ some other fact supercedes it or it was retracted _after_ 1999. And we are only after the latest valid fact, not all historical facts at a point in time. Fulfilling these requirements _is_ possible in Cozo with aggregations, but they necessitate a huge amount of complexities for even the simplest queries."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"To solve this particular problem, which occurs more commonly than you might think, Cozo has built-in support for historical facts. This functionality carries a non-trivial performance penalty, so you have to request it explicitly for each attribute. And like other properties of attributes, whether it has history support is immutable. If you later change your mind, you need to define a new attribute and copy data over, as usual.\n",
"\n",
"If you are already worried about performance, let's assure you that Cozo's historical facts implementation is MUCH MORE performant than the hand-rolled solution indicated above. In fact, querying a history-enabled attribute is on average $c \\log n$ times slower than the corresponding query for a non-history-enabled attribute, where $c$ is a small constant and $n$ is the number of historical facts a given entity-attribute pair has. The logarithmic complexity beats any simple-minded implementation, especially when the amount of historical records is enormous."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Let's have some examples. We want to store countries and their heads of states. The schema:"
]
},
{
"cell_type": "code",
"execution_count": 33,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<style type=\"text/css\">\n",
"#T_251ff_row0_col0, #T_251ff_row1_col0 {\n",
" color: #307fc1;\n",
"}\n",
"#T_251ff_row0_col1, #T_251ff_row1_col1 {\n",
" color: black;\n",
"}\n",
"</style>\n",
"<table id=\"T_251ff\">\n",
" <thead>\n",
" <tr>\n",
" <th class=\"blank level0\" >&nbsp;</th>\n",
" <th id=\"T_251ff_level0_col0\" class=\"col_heading level0 col0\" >attr_id</th>\n",
" <th id=\"T_251ff_level0_col1\" class=\"col_heading level0 col1\" >op</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th id=\"T_251ff_level0_row0\" class=\"row_heading level0 row0\" >0</th>\n",
" <td id=\"T_251ff_row0_col0\" class=\"data row0 col0\" >10000009</td>\n",
" <td id=\"T_251ff_row0_col1\" class=\"data row0 col1\" >assert</td>\n",
" </tr>\n",
" <tr>\n",
" <th id=\"T_251ff_level0_row1\" class=\"row_heading level0 row1\" >1</th>\n",
" <td id=\"T_251ff_row1_col0\" class=\"data row1 col0\" >10000010</td>\n",
" <td id=\"T_251ff_row1_col1\" class=\"data row1 col1\" >assert</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n"
],
"text/plain": [
"<pandas.io.formats.style.Styler at 0x10bd26b30>"
]
},
"execution_count": 33,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
":schema\n",
"\n",
":put country {\n",
" name: string unique,\n",
" head: string index history,\n",
"}"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"For simplicity we assume that a country's name does not change, but obviously its head of state changes every few years, indicated by the modifier `history`. That's all you need for the schema."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"You insert data as you do before:"
]
},
{
"cell_type": "code",
"execution_count": 34,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<style type=\"text/css\">\n",
"#T_0fd4b_row0_col0, #T_0fd4b_row0_col1 {\n",
" color: #307fc1;\n",
"}\n",
"</style>\n",
"<table id=\"T_0fd4b\">\n",
" <thead>\n",
" <tr>\n",
" <th class=\"blank level0\" >&nbsp;</th>\n",
" <th id=\"T_0fd4b_level0_col0\" class=\"col_heading level0 col0\" >asserts</th>\n",
" <th id=\"T_0fd4b_level0_col1\" class=\"col_heading level0 col1\" >retracts</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th id=\"T_0fd4b_level0_row0\" class=\"row_heading level0 row0\" >0</th>\n",
" <td id=\"T_0fd4b_row0_col0\" class=\"data row0 col0\" >4</td>\n",
" <td id=\"T_0fd4b_row0_col1\" class=\"data row0 col1\" >0</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n"
],
"text/plain": [
"<pandas.io.formats.style.Styler at 0x10be47c40>"
]
},
"execution_count": 34,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
":tx\n",
"\n",
":put {country.name: 'US', country.head: 'Biden'}\n",
"{country.name: 'UK', country.head: 'Truss'}"
]
},
{
"cell_type": "code",
"execution_count": 35,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<style type=\"text/css\">\n",
"#T_7a3c4_row0_col0, #T_7a3c4_row0_col1, #T_7a3c4_row1_col0, #T_7a3c4_row1_col1 {\n",
" color: black;\n",
"}\n",
"</style>\n",
"<table id=\"T_7a3c4\">\n",
" <thead>\n",
" <tr>\n",
" <th class=\"blank level0\" >&nbsp;</th>\n",
" <th id=\"T_7a3c4_level0_col0\" class=\"col_heading level0 col0\" >country</th>\n",
" <th id=\"T_7a3c4_level0_col1\" class=\"col_heading level0 col1\" >head</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th id=\"T_7a3c4_level0_row0\" class=\"row_heading level0 row0\" >0</th>\n",
" <td id=\"T_7a3c4_row0_col0\" class=\"data row0 col0\" >UK</td>\n",
" <td id=\"T_7a3c4_row0_col1\" class=\"data row0 col1\" >Truss</td>\n",
" </tr>\n",
" <tr>\n",
" <th id=\"T_7a3c4_level0_row1\" class=\"row_heading level0 row1\" >1</th>\n",
" <td id=\"T_7a3c4_row1_col0\" class=\"data row1 col0\" >US</td>\n",
" <td id=\"T_7a3c4_row1_col1\" class=\"data row1 col1\" >Biden</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n"
],
"text/plain": [
"<pandas.io.formats.style.Styler at 0x10be473a0>"
]
},
"execution_count": 35,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"?[country, head] := [c country.name country], [c country.head head]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"By the way, we showed that you can explicitly tell the system that you are doing `put`."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now let's add in the historical data:"
]
},
{
"cell_type": "code",
"execution_count": 36,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<style type=\"text/css\">\n",
"#T_cedc6_row0_col0, #T_cedc6_row0_col1 {\n",
" color: #307fc1;\n",
"}\n",
"</style>\n",
"<table id=\"T_cedc6\">\n",
" <thead>\n",
" <tr>\n",
" <th class=\"blank level0\" >&nbsp;</th>\n",
" <th id=\"T_cedc6_level0_col0\" class=\"col_heading level0 col0\" >asserts</th>\n",
" <th id=\"T_cedc6_level0_col1\" class=\"col_heading level0 col1\" >retracts</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th id=\"T_cedc6_level0_row0\" class=\"row_heading level0 row0\" >0</th>\n",
" <td id=\"T_cedc6_row0_col0\" class=\"data row0 col0\" >2</td>\n",
" <td id=\"T_cedc6_row0_col1\" class=\"data row0 col1\" >0</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n"
],
"text/plain": [
"<pandas.io.formats.style.Styler at 0x10be47ee0>"
]
},
"execution_count": 36,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
":tx\n",
"\n",
"@'2019-07-24' {_key: ['country.name', 'UK'], country.head: 'Johnson'}\n",
":put @ '2017-01-20' {_key: ['country.name', 'US'], country.head: 'Trump'}"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The syntax should explain itself. You can specify the date in ISO 8601 format, in which case it is interpreted as a timestamp at the stated date at midnight UTC, or as RFC 3339 format such as `'1996-12-19T16:39:57-08:00'`, or as an integer indicating the number of _microseconds_ since the UNIX epoch (negative numbers for before the epoch). The validity marker only affects attributes that were defined with the `history` modifier.\n",
"\n",
"Let's see who are the heads of states _now_:"
]
},
{
"cell_type": "code",
"execution_count": 37,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<style type=\"text/css\">\n",
"#T_5f514_row0_col0, #T_5f514_row0_col1, #T_5f514_row1_col0, #T_5f514_row1_col1 {\n",
" color: black;\n",
"}\n",
"</style>\n",
"<table id=\"T_5f514\">\n",
" <thead>\n",
" <tr>\n",
" <th class=\"blank level0\" >&nbsp;</th>\n",
" <th id=\"T_5f514_level0_col0\" class=\"col_heading level0 col0\" >country</th>\n",
" <th id=\"T_5f514_level0_col1\" class=\"col_heading level0 col1\" >head</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th id=\"T_5f514_level0_row0\" class=\"row_heading level0 row0\" >0</th>\n",
" <td id=\"T_5f514_row0_col0\" class=\"data row0 col0\" >UK</td>\n",
" <td id=\"T_5f514_row0_col1\" class=\"data row0 col1\" >Truss</td>\n",
" </tr>\n",
" <tr>\n",
" <th id=\"T_5f514_level0_row1\" class=\"row_heading level0 row1\" >1</th>\n",
" <td id=\"T_5f514_row1_col0\" class=\"data row1 col0\" >US</td>\n",
" <td id=\"T_5f514_row1_col1\" class=\"data row1 col1\" >Biden</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n"
],
"text/plain": [
"<pandas.io.formats.style.Styler at 0x10be46ec0>"
]
},
"execution_count": 37,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"?[country, head] := [c country.name country], [c country.head head]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"As expected."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Let's explicitly request historical facts:"
]
},
{
"cell_type": "code",
"execution_count": 38,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<style type=\"text/css\">\n",
"#T_5e9cb_row0_col0, #T_5e9cb_row0_col1, #T_5e9cb_row1_col0, #T_5e9cb_row1_col1 {\n",
" color: black;\n",
"}\n",
"</style>\n",
"<table id=\"T_5e9cb\">\n",
" <thead>\n",
" <tr>\n",
" <th class=\"blank level0\" >&nbsp;</th>\n",
" <th id=\"T_5e9cb_level0_col0\" class=\"col_heading level0 col0\" >country</th>\n",
" <th id=\"T_5e9cb_level0_col1\" class=\"col_heading level0 col1\" >head</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th id=\"T_5e9cb_level0_row0\" class=\"row_heading level0 row0\" >0</th>\n",
" <td id=\"T_5e9cb_row0_col0\" class=\"data row0 col0\" >UK</td>\n",
" <td id=\"T_5e9cb_row0_col1\" class=\"data row0 col1\" >Johnson</td>\n",
" </tr>\n",
" <tr>\n",
" <th id=\"T_5e9cb_level0_row1\" class=\"row_heading level0 row1\" >1</th>\n",
" <td id=\"T_5e9cb_row1_col0\" class=\"data row1 col0\" >US</td>\n",
" <td id=\"T_5e9cb_row1_col1\" class=\"data row1 col1\" >Trump</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n"
],
"text/plain": [
"<pandas.io.formats.style.Styler at 0x10be47f40>"
]
},
"execution_count": 38,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"?[country, head] @ '2020-01-01' := [c country.name country], [c country.head head]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Right. Try another one:"
]
},
{
"cell_type": "code",
"execution_count": 39,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<style type=\"text/css\">\n",
"#T_dbb64_row0_col0, #T_dbb64_row0_col1, #T_dbb64_row1_col0, #T_dbb64_row1_col1 {\n",
" color: black;\n",
"}\n",
"</style>\n",
"<table id=\"T_dbb64\">\n",
" <thead>\n",
" <tr>\n",
" <th class=\"blank level0\" >&nbsp;</th>\n",
" <th id=\"T_dbb64_level0_col0\" class=\"col_heading level0 col0\" >country</th>\n",
" <th id=\"T_dbb64_level0_col1\" class=\"col_heading level0 col1\" >head</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th id=\"T_dbb64_level0_row0\" class=\"row_heading level0 row0\" >0</th>\n",
" <td id=\"T_dbb64_row0_col0\" class=\"data row0 col0\" >UK</td>\n",
" <td id=\"T_dbb64_row0_col1\" class=\"data row0 col1\" >Johnson</td>\n",
" </tr>\n",
" <tr>\n",
" <th id=\"T_dbb64_level0_row1\" class=\"row_heading level0 row1\" >1</th>\n",
" <td id=\"T_dbb64_row1_col0\" class=\"data row1 col0\" >US</td>\n",
" <td id=\"T_dbb64_row1_col1\" class=\"data row1 col1\" >Trump</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n"
],
"text/plain": [
"<pandas.io.formats.style.Styler at 0x10be47820>"
]
},
"execution_count": 39,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"?[country, head] @ '2022-01-01' := [c country.name country], [c country.head head]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Umm ... that doesn't look right. The problem is, when we inserted facts about Biden and Truss, we did not tell the system when that fact starts being valid, so the system assumes the current timestamp. If you are inserting facts in real time, this is what you want. But if you are inserting historical facts as we are doing here, or are doing catch-ups, this causes problems. In our case the fix is easy:"
]
},
{
"cell_type": "code",
"execution_count": 40,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<style type=\"text/css\">\n",
"#T_38c14_row0_col0, #T_38c14_row0_col1 {\n",
" color: #307fc1;\n",
"}\n",
"</style>\n",
"<table id=\"T_38c14\">\n",
" <thead>\n",
" <tr>\n",
" <th class=\"blank level0\" >&nbsp;</th>\n",
" <th id=\"T_38c14_level0_col0\" class=\"col_heading level0 col0\" >asserts</th>\n",
" <th id=\"T_38c14_level0_col1\" class=\"col_heading level0 col1\" >retracts</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th id=\"T_38c14_level0_row0\" class=\"row_heading level0 row0\" >0</th>\n",
" <td id=\"T_38c14_row0_col0\" class=\"data row0 col0\" >2</td>\n",
" <td id=\"T_38c14_row0_col1\" class=\"data row0 col1\" >0</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n"
],
"text/plain": [
"<pandas.io.formats.style.Styler at 0x10be45bd0>"
]
},
"execution_count": 40,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
":tx\n",
"\n",
"@'2022-09-05' {_key: ['country.name', 'UK'], country.head: 'Truss'}\n",
"@'2021-01-20' {_key: ['country.name', 'US'], country.head: 'Biden'}"
]
},
{
"cell_type": "code",
"execution_count": 41,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<style type=\"text/css\">\n",
"#T_2553a_row0_col0, #T_2553a_row0_col1, #T_2553a_row1_col0, #T_2553a_row1_col1 {\n",
" color: black;\n",
"}\n",
"</style>\n",
"<table id=\"T_2553a\">\n",
" <thead>\n",
" <tr>\n",
" <th class=\"blank level0\" >&nbsp;</th>\n",
" <th id=\"T_2553a_level0_col0\" class=\"col_heading level0 col0\" >country</th>\n",
" <th id=\"T_2553a_level0_col1\" class=\"col_heading level0 col1\" >head</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th id=\"T_2553a_level0_row0\" class=\"row_heading level0 row0\" >0</th>\n",
" <td id=\"T_2553a_row0_col0\" class=\"data row0 col0\" >UK</td>\n",
" <td id=\"T_2553a_row0_col1\" class=\"data row0 col1\" >Johnson</td>\n",
" </tr>\n",
" <tr>\n",
" <th id=\"T_2553a_level0_row1\" class=\"row_heading level0 row1\" >1</th>\n",
" <td id=\"T_2553a_row1_col0\" class=\"data row1 col0\" >US</td>\n",
" <td id=\"T_2553a_row1_col1\" class=\"data row1 col1\" >Biden</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n"
],
"text/plain": [
"<pandas.io.formats.style.Styler at 0x10be47910>"
]
},
"execution_count": 41,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"?[country, head] @ '2022-01-01' := [c country.name country], [c country.head head]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"That's more accurate. What about the future?"
]
},
{
"cell_type": "code",
"execution_count": 42,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<style type=\"text/css\">\n",
"#T_55a83_row0_col0, #T_55a83_row0_col1, #T_55a83_row1_col0, #T_55a83_row1_col1 {\n",
" color: black;\n",
"}\n",
"</style>\n",
"<table id=\"T_55a83\">\n",
" <thead>\n",
" <tr>\n",
" <th class=\"blank level0\" >&nbsp;</th>\n",
" <th id=\"T_55a83_level0_col0\" class=\"col_heading level0 col0\" >country</th>\n",
" <th id=\"T_55a83_level0_col1\" class=\"col_heading level0 col1\" >head</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th id=\"T_55a83_level0_row0\" class=\"row_heading level0 row0\" >0</th>\n",
" <td id=\"T_55a83_row0_col0\" class=\"data row0 col0\" >UK</td>\n",
" <td id=\"T_55a83_row0_col1\" class=\"data row0 col1\" >Truss</td>\n",
" </tr>\n",
" <tr>\n",
" <th id=\"T_55a83_level0_row1\" class=\"row_heading level0 row1\" >1</th>\n",
" <td id=\"T_55a83_row1_col0\" class=\"data row1 col0\" >US</td>\n",
" <td id=\"T_55a83_row1_col1\" class=\"data row1 col1\" >Biden</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n"
],
"text/plain": [
"<pandas.io.formats.style.Styler at 0x10bd24190>"
]
},
"execution_count": 42,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"?[country, head] @ '9999-01-01' := [c country.name country], [c country.head head]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Wow, that can't happen no matter what the world is coming to. We fix that by _retracting_ facts as before, but with a timestamp attached (we will use a _very_ generous timestamp for them):"
]
},
{
"cell_type": "code",
"execution_count": 43,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<style type=\"text/css\">\n",
"#T_4179b_row0_col0, #T_4179b_row0_col1 {\n",
" color: #307fc1;\n",
"}\n",
"</style>\n",
"<table id=\"T_4179b\">\n",
" <thead>\n",
" <tr>\n",
" <th class=\"blank level0\" >&nbsp;</th>\n",
" <th id=\"T_4179b_level0_col0\" class=\"col_heading level0 col0\" >asserts</th>\n",
" <th id=\"T_4179b_level0_col1\" class=\"col_heading level0 col1\" >retracts</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th id=\"T_4179b_level0_row0\" class=\"row_heading level0 row0\" >0</th>\n",
" <td id=\"T_4179b_row0_col0\" class=\"data row0 col0\" >0</td>\n",
" <td id=\"T_4179b_row0_col1\" class=\"data row0 col1\" >6</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n"
],
"text/plain": [
"<pandas.io.formats.style.Styler at 0x10be86980>"
]
},
"execution_count": 43,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
":tx\n",
"\n",
":retract_all @ '2099-01-01' {_key: ['country.name', 'UK'], country.head: 0}\n",
":retract_all @ '2099-01-01' {_key: ['country.name', 'US'], country.head: 0}"
]
},
{
"cell_type": "code",
"execution_count": 44,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<style type=\"text/css\">\n",
"</style>\n",
"<table id=\"T_80207\">\n",
" <thead>\n",
" <tr>\n",
" <th class=\"blank level0\" >&nbsp;</th>\n",
" <th id=\"T_80207_level0_col0\" class=\"col_heading level0 col0\" >country</th>\n",
" <th id=\"T_80207_level0_col1\" class=\"col_heading level0 col1\" >head</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" </tbody>\n",
"</table>\n"
],
"text/plain": [
"<pandas.io.formats.style.Styler at 0x10be86fb0>"
]
},
"execution_count": 44,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"?[country, head] @ '9999-01-01' := [c country.name country], [c country.head head]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Good. What about now, again?"
]
},
{
"cell_type": "code",
"execution_count": 45,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<style type=\"text/css\">\n",
"#T_e01be_row0_col0, #T_e01be_row0_col1, #T_e01be_row1_col0, #T_e01be_row1_col1 {\n",
" color: black;\n",
"}\n",
"</style>\n",
"<table id=\"T_e01be\">\n",
" <thead>\n",
" <tr>\n",
" <th class=\"blank level0\" >&nbsp;</th>\n",
" <th id=\"T_e01be_level0_col0\" class=\"col_heading level0 col0\" >country</th>\n",
" <th id=\"T_e01be_level0_col1\" class=\"col_heading level0 col1\" >head</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th id=\"T_e01be_level0_row0\" class=\"row_heading level0 row0\" >0</th>\n",
" <td id=\"T_e01be_row0_col0\" class=\"data row0 col0\" >UK</td>\n",
" <td id=\"T_e01be_row0_col1\" class=\"data row0 col1\" >Truss</td>\n",
" </tr>\n",
" <tr>\n",
" <th id=\"T_e01be_level0_row1\" class=\"row_heading level0 row1\" >1</th>\n",
" <td id=\"T_e01be_row1_col0\" class=\"data row1 col0\" >US</td>\n",
" <td id=\"T_e01be_row1_col1\" class=\"data row1 col1\" >Biden</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n"
],
"text/plain": [
"<pandas.io.formats.style.Styler at 0x10be87640>"
]
},
"execution_count": 45,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"?[country, head] := [c country.name country], [c country.head head]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"And history?"
]
},
{
"cell_type": "code",
"execution_count": 46,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<style type=\"text/css\">\n",
"#T_9d5c3_row0_col0, #T_9d5c3_row0_col1 {\n",
" color: black;\n",
"}\n",
"</style>\n",
"<table id=\"T_9d5c3\">\n",
" <thead>\n",
" <tr>\n",
" <th class=\"blank level0\" >&nbsp;</th>\n",
" <th id=\"T_9d5c3_level0_col0\" class=\"col_heading level0 col0\" >country</th>\n",
" <th id=\"T_9d5c3_level0_col1\" class=\"col_heading level0 col1\" >head</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th id=\"T_9d5c3_level0_row0\" class=\"row_heading level0 row0\" >0</th>\n",
" <td id=\"T_9d5c3_row0_col0\" class=\"data row0 col0\" >US</td>\n",
" <td id=\"T_9d5c3_row0_col1\" class=\"data row0 col1\" >Trump</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n"
],
"text/plain": [
"<pandas.io.formats.style.Styler at 0x10be866b0>"
]
},
"execution_count": 46,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"?[country, head] @ '2018-01-01' := [c country.name country], [c country.head head]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"UK is missing since we have yet to enter the head of state for UK at this period into the database. Fix:"
]
},
{
"cell_type": "code",
"execution_count": 47,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<style type=\"text/css\">\n",
"#T_01644_row0_col0, #T_01644_row0_col1 {\n",
" color: #307fc1;\n",
"}\n",
"</style>\n",
"<table id=\"T_01644\">\n",
" <thead>\n",
" <tr>\n",
" <th class=\"blank level0\" >&nbsp;</th>\n",
" <th id=\"T_01644_level0_col0\" class=\"col_heading level0 col0\" >asserts</th>\n",
" <th id=\"T_01644_level0_col1\" class=\"col_heading level0 col1\" >retracts</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th id=\"T_01644_level0_row0\" class=\"row_heading level0 row0\" >0</th>\n",
" <td id=\"T_01644_row0_col0\" class=\"data row0 col0\" >1</td>\n",
" <td id=\"T_01644_row0_col1\" class=\"data row0 col1\" >0</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n"
],
"text/plain": [
"<pandas.io.formats.style.Styler at 0x10be84040>"
]
},
"execution_count": 47,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
":tx\n",
"\n",
"@'2016-07-11' {_key: ['country.name', 'UK'], country.head: 'May'}"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"One thing if it is not already obvious: timestamps apply at the level of rules, not queries, so you can have a different timestamp for each rule:"
]
},
{
"cell_type": "code",
"execution_count": 49,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<style type=\"text/css\">\n",
"#T_c82a6_row0_col0, #T_c82a6_row1_col0, #T_c82a6_row2_col0, #T_c82a6_row3_col0 {\n",
" color: #307fc1;\n",
"}\n",
"#T_c82a6_row0_col1, #T_c82a6_row0_col2, #T_c82a6_row1_col1, #T_c82a6_row1_col2, #T_c82a6_row2_col1, #T_c82a6_row2_col2, #T_c82a6_row3_col1, #T_c82a6_row3_col2, #T_c82a6_row4_col0, #T_c82a6_row4_col1, #T_c82a6_row4_col2, #T_c82a6_row5_col0, #T_c82a6_row5_col1, #T_c82a6_row5_col2 {\n",
" color: black;\n",
"}\n",
"</style>\n",
"<table id=\"T_c82a6\">\n",
" <thead>\n",
" <tr>\n",
" <th class=\"blank level0\" >&nbsp;</th>\n",
" <th id=\"T_c82a6_level0_col0\" class=\"col_heading level0 col0\" >year</th>\n",
" <th id=\"T_c82a6_level0_col1\" class=\"col_heading level0 col1\" >country</th>\n",
" <th id=\"T_c82a6_level0_col2\" class=\"col_heading level0 col2\" >head</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th id=\"T_c82a6_level0_row0\" class=\"row_heading level0 row0\" >0</th>\n",
" <td id=\"T_c82a6_row0_col0\" class=\"data row0 col0\" >2019</td>\n",
" <td id=\"T_c82a6_row0_col1\" class=\"data row0 col1\" >UK</td>\n",
" <td id=\"T_c82a6_row0_col2\" class=\"data row0 col2\" >May</td>\n",
" </tr>\n",
" <tr>\n",
" <th id=\"T_c82a6_level0_row1\" class=\"row_heading level0 row1\" >1</th>\n",
" <td id=\"T_c82a6_row1_col0\" class=\"data row1 col0\" >2019</td>\n",
" <td id=\"T_c82a6_row1_col1\" class=\"data row1 col1\" >US</td>\n",
" <td id=\"T_c82a6_row1_col2\" class=\"data row1 col2\" >Trump</td>\n",
" </tr>\n",
" <tr>\n",
" <th id=\"T_c82a6_level0_row2\" class=\"row_heading level0 row2\" >2</th>\n",
" <td id=\"T_c82a6_row2_col0\" class=\"data row2 col0\" >2022</td>\n",
" <td id=\"T_c82a6_row2_col1\" class=\"data row2 col1\" >UK</td>\n",
" <td id=\"T_c82a6_row2_col2\" class=\"data row2 col2\" >Johnson</td>\n",
" </tr>\n",
" <tr>\n",
" <th id=\"T_c82a6_level0_row3\" class=\"row_heading level0 row3\" >3</th>\n",
" <td id=\"T_c82a6_row3_col0\" class=\"data row3 col0\" >2022</td>\n",
" <td id=\"T_c82a6_row3_col1\" class=\"data row3 col1\" >US</td>\n",
" <td id=\"T_c82a6_row3_col2\" class=\"data row3 col2\" >Biden</td>\n",
" </tr>\n",
" <tr>\n",
" <th id=\"T_c82a6_level0_row4\" class=\"row_heading level0 row4\" >4</th>\n",
" <td id=\"T_c82a6_row4_col0\" class=\"data row4 col0\" >now</td>\n",
" <td id=\"T_c82a6_row4_col1\" class=\"data row4 col1\" >UK</td>\n",
" <td id=\"T_c82a6_row4_col2\" class=\"data row4 col2\" >Truss</td>\n",
" </tr>\n",
" <tr>\n",
" <th id=\"T_c82a6_level0_row5\" class=\"row_heading level0 row5\" >5</th>\n",
" <td id=\"T_c82a6_row5_col0\" class=\"data row5 col0\" >now</td>\n",
" <td id=\"T_c82a6_row5_col1\" class=\"data row5 col1\" >US</td>\n",
" <td id=\"T_c82a6_row5_col2\" class=\"data row5 col2\" >Biden</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n"
],
"text/plain": [
"<pandas.io.formats.style.Styler at 0x10be85f90>"
]
},
"execution_count": 49,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"?[year, country, head] @ '2019-01-01' := year = 2019, [c country.name country], [c country.head head]\n",
"?[year, country, head] @ '2022-01-01' := year = 2022, [c country.name country], [c country.head head]\n",
"?[year, country, head] /* ~~NoW!~~ */ := year = 'now', [c country.name country], [c country.head head]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The timestamp is also not required to represent actual time. You can `put` data by giving them integer timestamps with custom interpretation, and query them using the same interpretation. Just don't mix your fictional time and real time."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"A final API before we are done with this time-travelling thing. If you want a record of the actual history of attributes for a certain entity instead of its time slices, use this system op:"
]
},
{
"cell_type": "code",
"execution_count": 50,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<style type=\"text/css\">\n",
"#T_9dccf_row0_col0, #T_9dccf_row0_col1, #T_9dccf_row0_col3, #T_9dccf_row0_col4, #T_9dccf_row0_col5, #T_9dccf_row1_col0, #T_9dccf_row1_col1, #T_9dccf_row1_col3, #T_9dccf_row1_col4, #T_9dccf_row2_col0, #T_9dccf_row2_col1, #T_9dccf_row2_col3, #T_9dccf_row2_col4, #T_9dccf_row2_col5, #T_9dccf_row3_col0, #T_9dccf_row3_col1, #T_9dccf_row3_col3, #T_9dccf_row3_col4, #T_9dccf_row3_col5, #T_9dccf_row4_col0, #T_9dccf_row4_col1, #T_9dccf_row4_col3, #T_9dccf_row4_col4, #T_9dccf_row4_col5, #T_9dccf_row5_col0, #T_9dccf_row5_col1, #T_9dccf_row5_col3, #T_9dccf_row5_col4, #T_9dccf_row5_col5, #T_9dccf_row6_col0, #T_9dccf_row6_col1, #T_9dccf_row6_col3, #T_9dccf_row6_col4, #T_9dccf_row6_col5, #T_9dccf_row7_col0, #T_9dccf_row7_col1, #T_9dccf_row7_col3, #T_9dccf_row7_col4, #T_9dccf_row8_col0, #T_9dccf_row8_col1, #T_9dccf_row8_col3, #T_9dccf_row8_col4, #T_9dccf_row8_col5, #T_9dccf_row9_col0, #T_9dccf_row9_col1, #T_9dccf_row9_col3, #T_9dccf_row9_col4, #T_9dccf_row9_col5, #T_9dccf_row10_col0, #T_9dccf_row10_col1, #T_9dccf_row10_col3, #T_9dccf_row10_col4, #T_9dccf_row10_col5 {\n",
" color: black;\n",
"}\n",
"#T_9dccf_row0_col2, #T_9dccf_row1_col2, #T_9dccf_row2_col2, #T_9dccf_row3_col2, #T_9dccf_row4_col2, #T_9dccf_row5_col2, #T_9dccf_row6_col2, #T_9dccf_row7_col2, #T_9dccf_row8_col2, #T_9dccf_row9_col2, #T_9dccf_row10_col2 {\n",
" color: #307fc1;\n",
"}\n",
"#T_9dccf_row1_col5, #T_9dccf_row7_col5 {\n",
" color: #bf5b3d;\n",
"}\n",
"</style>\n",
"<table id=\"T_9dccf\">\n",
" <thead>\n",
" <tr>\n",
" <th class=\"blank level0\" >&nbsp;</th>\n",
" <th id=\"T_9dccf_level0_col0\" class=\"col_heading level0 col0\" >entity_id</th>\n",
" <th id=\"T_9dccf_level0_col1\" class=\"col_heading level0 col1\" >attr</th>\n",
" <th id=\"T_9dccf_level0_col2\" class=\"col_heading level0 col2\" >timestamp</th>\n",
" <th id=\"T_9dccf_level0_col3\" class=\"col_heading level0 col3\" >timestamp_str</th>\n",
" <th id=\"T_9dccf_level0_col4\" class=\"col_heading level0 col4\" >op</th>\n",
" <th id=\"T_9dccf_level0_col5\" class=\"col_heading level0 col5\" >value</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th id=\"T_9dccf_level0_row0\" class=\"row_heading level0 row0\" >0</th>\n",
" <td id=\"T_9dccf_row0_col0\" class=\"data row0 col0\" >0639834a-388f-11ed-9d48-bdd20af27054</td>\n",
" <td id=\"T_9dccf_row0_col1\" class=\"data row0 col1\" >country.name</td>\n",
" <td id=\"T_9dccf_row0_col2\" class=\"data row0 col2\" >nan</td>\n",
" <td id=\"T_9dccf_row0_col3\" class=\"data row0 col3\" >NO_HISTORY</td>\n",
" <td id=\"T_9dccf_row0_col4\" class=\"data row0 col4\" >assert</td>\n",
" <td id=\"T_9dccf_row0_col5\" class=\"data row0 col5\" >UK</td>\n",
" </tr>\n",
" <tr>\n",
" <th id=\"T_9dccf_level0_row1\" class=\"row_heading level0 row1\" >1</th>\n",
" <td id=\"T_9dccf_row1_col0\" class=\"data row1 col0\" >0639834a-388f-11ed-9d48-bdd20af27054</td>\n",
" <td id=\"T_9dccf_row1_col1\" class=\"data row1 col1\" >country.head</td>\n",
" <td id=\"T_9dccf_row1_col2\" class=\"data row1 col2\" >4070908800000000.000000</td>\n",
" <td id=\"T_9dccf_row1_col3\" class=\"data row1 col3\" >2099-01-01T00:00:00+00:00</td>\n",
" <td id=\"T_9dccf_row1_col4\" class=\"data row1 col4\" >retract</td>\n",
" <td id=\"T_9dccf_row1_col5\" class=\"data row1 col5\" >None</td>\n",
" </tr>\n",
" <tr>\n",
" <th id=\"T_9dccf_level0_row2\" class=\"row_heading level0 row2\" >2</th>\n",
" <td id=\"T_9dccf_row2_col0\" class=\"data row2 col0\" >0639834a-388f-11ed-9d48-bdd20af27054</td>\n",
" <td id=\"T_9dccf_row2_col1\" class=\"data row2 col1\" >country.head</td>\n",
" <td id=\"T_9dccf_row2_col2\" class=\"data row2 col2\" >1663642245426446.000000</td>\n",
" <td id=\"T_9dccf_row2_col3\" class=\"data row2 col3\" >2022-09-20T02:50:45.426446+00:00</td>\n",
" <td id=\"T_9dccf_row2_col4\" class=\"data row2 col4\" >assert</td>\n",
" <td id=\"T_9dccf_row2_col5\" class=\"data row2 col5\" >Truss</td>\n",
" </tr>\n",
" <tr>\n",
" <th id=\"T_9dccf_level0_row3\" class=\"row_heading level0 row3\" >3</th>\n",
" <td id=\"T_9dccf_row3_col0\" class=\"data row3 col0\" >0639834a-388f-11ed-9d48-bdd20af27054</td>\n",
" <td id=\"T_9dccf_row3_col1\" class=\"data row3 col1\" >country.head</td>\n",
" <td id=\"T_9dccf_row3_col2\" class=\"data row3 col2\" >1662336000000000.000000</td>\n",
" <td id=\"T_9dccf_row3_col3\" class=\"data row3 col3\" >2022-09-05T00:00:00+00:00</td>\n",
" <td id=\"T_9dccf_row3_col4\" class=\"data row3 col4\" >assert</td>\n",
" <td id=\"T_9dccf_row3_col5\" class=\"data row3 col5\" >Truss</td>\n",
" </tr>\n",
" <tr>\n",
" <th id=\"T_9dccf_level0_row4\" class=\"row_heading level0 row4\" >4</th>\n",
" <td id=\"T_9dccf_row4_col0\" class=\"data row4 col0\" >0639834a-388f-11ed-9d48-bdd20af27054</td>\n",
" <td id=\"T_9dccf_row4_col1\" class=\"data row4 col1\" >country.head</td>\n",
" <td id=\"T_9dccf_row4_col2\" class=\"data row4 col2\" >1563926400000000.000000</td>\n",
" <td id=\"T_9dccf_row4_col3\" class=\"data row4 col3\" >2019-07-24T00:00:00+00:00</td>\n",
" <td id=\"T_9dccf_row4_col4\" class=\"data row4 col4\" >assert</td>\n",
" <td id=\"T_9dccf_row4_col5\" class=\"data row4 col5\" >Johnson</td>\n",
" </tr>\n",
" <tr>\n",
" <th id=\"T_9dccf_level0_row5\" class=\"row_heading level0 row5\" >5</th>\n",
" <td id=\"T_9dccf_row5_col0\" class=\"data row5 col0\" >0639834a-388f-11ed-9d48-bdd20af27054</td>\n",
" <td id=\"T_9dccf_row5_col1\" class=\"data row5 col1\" >country.head</td>\n",
" <td id=\"T_9dccf_row5_col2\" class=\"data row5 col2\" >1468195200000000.000000</td>\n",
" <td id=\"T_9dccf_row5_col3\" class=\"data row5 col3\" >2016-07-11T00:00:00+00:00</td>\n",
" <td id=\"T_9dccf_row5_col4\" class=\"data row5 col4\" >assert</td>\n",
" <td id=\"T_9dccf_row5_col5\" class=\"data row5 col5\" >May</td>\n",
" </tr>\n",
" <tr>\n",
" <th id=\"T_9dccf_level0_row6\" class=\"row_heading level0 row6\" >6</th>\n",
" <td id=\"T_9dccf_row6_col0\" class=\"data row6 col0\" >063982e6-388f-11ed-90d5-354957e3b083</td>\n",
" <td id=\"T_9dccf_row6_col1\" class=\"data row6 col1\" >country.name</td>\n",
" <td id=\"T_9dccf_row6_col2\" class=\"data row6 col2\" >nan</td>\n",
" <td id=\"T_9dccf_row6_col3\" class=\"data row6 col3\" >NO_HISTORY</td>\n",
" <td id=\"T_9dccf_row6_col4\" class=\"data row6 col4\" >assert</td>\n",
" <td id=\"T_9dccf_row6_col5\" class=\"data row6 col5\" >US</td>\n",
" </tr>\n",
" <tr>\n",
" <th id=\"T_9dccf_level0_row7\" class=\"row_heading level0 row7\" >7</th>\n",
" <td id=\"T_9dccf_row7_col0\" class=\"data row7 col0\" >063982e6-388f-11ed-90d5-354957e3b083</td>\n",
" <td id=\"T_9dccf_row7_col1\" class=\"data row7 col1\" >country.head</td>\n",
" <td id=\"T_9dccf_row7_col2\" class=\"data row7 col2\" >4070908800000000.000000</td>\n",
" <td id=\"T_9dccf_row7_col3\" class=\"data row7 col3\" >2099-01-01T00:00:00+00:00</td>\n",
" <td id=\"T_9dccf_row7_col4\" class=\"data row7 col4\" >retract</td>\n",
" <td id=\"T_9dccf_row7_col5\" class=\"data row7 col5\" >None</td>\n",
" </tr>\n",
" <tr>\n",
" <th id=\"T_9dccf_level0_row8\" class=\"row_heading level0 row8\" >8</th>\n",
" <td id=\"T_9dccf_row8_col0\" class=\"data row8 col0\" >063982e6-388f-11ed-90d5-354957e3b083</td>\n",
" <td id=\"T_9dccf_row8_col1\" class=\"data row8 col1\" >country.head</td>\n",
" <td id=\"T_9dccf_row8_col2\" class=\"data row8 col2\" >1663642245426446.000000</td>\n",
" <td id=\"T_9dccf_row8_col3\" class=\"data row8 col3\" >2022-09-20T02:50:45.426446+00:00</td>\n",
" <td id=\"T_9dccf_row8_col4\" class=\"data row8 col4\" >assert</td>\n",
" <td id=\"T_9dccf_row8_col5\" class=\"data row8 col5\" >Biden</td>\n",
" </tr>\n",
" <tr>\n",
" <th id=\"T_9dccf_level0_row9\" class=\"row_heading level0 row9\" >9</th>\n",
" <td id=\"T_9dccf_row9_col0\" class=\"data row9 col0\" >063982e6-388f-11ed-90d5-354957e3b083</td>\n",
" <td id=\"T_9dccf_row9_col1\" class=\"data row9 col1\" >country.head</td>\n",
" <td id=\"T_9dccf_row9_col2\" class=\"data row9 col2\" >1611100800000000.000000</td>\n",
" <td id=\"T_9dccf_row9_col3\" class=\"data row9 col3\" >2021-01-20T00:00:00+00:00</td>\n",
" <td id=\"T_9dccf_row9_col4\" class=\"data row9 col4\" >assert</td>\n",
" <td id=\"T_9dccf_row9_col5\" class=\"data row9 col5\" >Biden</td>\n",
" </tr>\n",
" <tr>\n",
" <th id=\"T_9dccf_level0_row10\" class=\"row_heading level0 row10\" >10</th>\n",
" <td id=\"T_9dccf_row10_col0\" class=\"data row10 col0\" >063982e6-388f-11ed-90d5-354957e3b083</td>\n",
" <td id=\"T_9dccf_row10_col1\" class=\"data row10 col1\" >country.head</td>\n",
" <td id=\"T_9dccf_row10_col2\" class=\"data row10 col2\" >1484870400000000.000000</td>\n",
" <td id=\"T_9dccf_row10_col3\" class=\"data row10 col3\" >2017-01-20T00:00:00+00:00</td>\n",
" <td id=\"T_9dccf_row10_col4\" class=\"data row10 col4\" >assert</td>\n",
" <td id=\"T_9dccf_row10_col5\" class=\"data row10 col5\" >Trump</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n"
],
"text/plain": [
"<pandas.io.formats.style.Styler at 0x10be873a0>"
]
},
"execution_count": 50,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
":db history for ['country.name', 'UK'], ['country.name', 'US'] : country.name, country.head"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We have used a unique key to identify the entity. You can of course use the entity ID itself. The time ordering within each entity-attribute pair is reverse chronological."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Restricting the range of time for the query is also possible:"
]
},
{
"cell_type": "code",
"execution_count": 51,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<style type=\"text/css\">\n",
"#T_5fe59_row0_col0, #T_5fe59_row0_col1, #T_5fe59_row0_col3, #T_5fe59_row0_col4, #T_5fe59_row0_col5, #T_5fe59_row1_col0, #T_5fe59_row1_col1, #T_5fe59_row1_col3, #T_5fe59_row1_col4, #T_5fe59_row1_col5, #T_5fe59_row2_col0, #T_5fe59_row2_col1, #T_5fe59_row2_col3, #T_5fe59_row2_col4, #T_5fe59_row2_col5 {\n",
" color: black;\n",
"}\n",
"#T_5fe59_row0_col2, #T_5fe59_row1_col2, #T_5fe59_row2_col2 {\n",
" color: #307fc1;\n",
"}\n",
"</style>\n",
"<table id=\"T_5fe59\">\n",
" <thead>\n",
" <tr>\n",
" <th class=\"blank level0\" >&nbsp;</th>\n",
" <th id=\"T_5fe59_level0_col0\" class=\"col_heading level0 col0\" >entity_id</th>\n",
" <th id=\"T_5fe59_level0_col1\" class=\"col_heading level0 col1\" >attr</th>\n",
" <th id=\"T_5fe59_level0_col2\" class=\"col_heading level0 col2\" >timestamp</th>\n",
" <th id=\"T_5fe59_level0_col3\" class=\"col_heading level0 col3\" >timestamp_str</th>\n",
" <th id=\"T_5fe59_level0_col4\" class=\"col_heading level0 col4\" >op</th>\n",
" <th id=\"T_5fe59_level0_col5\" class=\"col_heading level0 col5\" >value</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th id=\"T_5fe59_level0_row0\" class=\"row_heading level0 row0\" >0</th>\n",
" <td id=\"T_5fe59_row0_col0\" class=\"data row0 col0\" >0639834a-388f-11ed-9d48-bdd20af27054</td>\n",
" <td id=\"T_5fe59_row0_col1\" class=\"data row0 col1\" >country.name</td>\n",
" <td id=\"T_5fe59_row0_col2\" class=\"data row0 col2\" >nan</td>\n",
" <td id=\"T_5fe59_row0_col3\" class=\"data row0 col3\" >NO_HISTORY</td>\n",
" <td id=\"T_5fe59_row0_col4\" class=\"data row0 col4\" >assert</td>\n",
" <td id=\"T_5fe59_row0_col5\" class=\"data row0 col5\" >UK</td>\n",
" </tr>\n",
" <tr>\n",
" <th id=\"T_5fe59_level0_row1\" class=\"row_heading level0 row1\" >1</th>\n",
" <td id=\"T_5fe59_row1_col0\" class=\"data row1 col0\" >063982e6-388f-11ed-90d5-354957e3b083</td>\n",
" <td id=\"T_5fe59_row1_col1\" class=\"data row1 col1\" >country.name</td>\n",
" <td id=\"T_5fe59_row1_col2\" class=\"data row1 col2\" >nan</td>\n",
" <td id=\"T_5fe59_row1_col3\" class=\"data row1 col3\" >NO_HISTORY</td>\n",
" <td id=\"T_5fe59_row1_col4\" class=\"data row1 col4\" >assert</td>\n",
" <td id=\"T_5fe59_row1_col5\" class=\"data row1 col5\" >US</td>\n",
" </tr>\n",
" <tr>\n",
" <th id=\"T_5fe59_level0_row2\" class=\"row_heading level0 row2\" >2</th>\n",
" <td id=\"T_5fe59_row2_col0\" class=\"data row2 col0\" >063982e6-388f-11ed-90d5-354957e3b083</td>\n",
" <td id=\"T_5fe59_row2_col1\" class=\"data row2 col1\" >country.head</td>\n",
" <td id=\"T_5fe59_row2_col2\" class=\"data row2 col2\" >1611100800000000.000000</td>\n",
" <td id=\"T_5fe59_row2_col3\" class=\"data row2 col3\" >2021-01-20T00:00:00+00:00</td>\n",
" <td id=\"T_5fe59_row2_col4\" class=\"data row2 col4\" >assert</td>\n",
" <td id=\"T_5fe59_row2_col5\" class=\"data row2 col5\" >Biden</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n"
],
"text/plain": [
"<pandas.io.formats.style.Styler at 0x10be86320>"
]
},
"execution_count": 51,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
":db history from '2020-01-01' to '2022-01-01' for ['country.name', 'UK'], ['country.name', 'US'] : country.name, country.head"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Note that even though the UK had a head of state in this period, it is not included in the output since its _assertions_ lies outside the time range. This API is only meant for administrative purposes. For general queries, use Datalog queries instead."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"As we have seen, for attributes with history, retraction does not really remove the data from the database. If you are e.g. legally required to make sure a piece of data is physically gone, retract with exactly the same timestamp as the piece of data originally had. In this case it is recommended to use the integer form of the timestamp. You won't be able to retrieve the data with the public API after the retraction, but some traces of the data may still persist in write ahead logs and other places. Complete eradication may take an unspecified amount of time. That is, if you did not have any backups set up yourself (GASP). Yes, absolute elimination of data is difficult and uncertain."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.6"
}
},
"nbformat": 4,
"nbformat_minor": 4
}