{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# The pilgrim to Mount Acid" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "%reload_ext pycozo.ipyext_direct\n", "%cozo_auth tutorial *******" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## A schema for data" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "At this age of BigDataⒸ, your business data are enormous and change fast. You may have one billion active users on your platform carrying out all sorts of activities, concurrently of course. You don't want these activities to step on each other. You don't want to store the wrong thing into your user's accounts. You _especially_ don't want any money in transit to disappear in midair. To make things worse, hundreds of new activities pop up each day. \n", "\n", "Storing any of these in a stored relation is infeasible. With a traditional RDBMS, [data migrations](https://en.wikipedia.org/wiki/Data_migration) would have already killed you. And with Cozo, stored relations don't even try to support schema change (in fact, the only 'schema' for a stored relation is its arity).\n", "\n", "To store such data and meet its query and mutation requirements, a database needs:\n", "\n", "* high concurrency;\n", "* fine-grained transactions;\n", "* checks for data integrity;\n", "* ability to rapidly adapt to new data shapes and requirements.\n", "\n", "To support these, we need to pay some prices. With Cozo, we pay by:\n", "\n", "* we demand that most transactions only apply _local changes_ that only touch on a tiny fraction of the data (otherwise the database cannot satisfy the high concurrency requirements);\n", "* we tolerate indirections (since \"all problems in computer science can be solved by another level of indirection\")." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "With these tradeoffs, the solution is the [triple store](https://en.wikipedia.org/wiki/Triplestore)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "A _triple_ is a sentence consisting of a subject, a verb, and an object. In the Cozo flavour, the subject is always an opaque identity, such as _entity42_, so it is actually an _entity-attribute-value_ triple. Examples:\n", "\n", "* _entity42_ has first name `'Alice'`.\n", "* _entity42_ has last name `'Liddell'`.\n", "* _entity42_ loves _entity81_.\n", "* _entity81_ is aged `20` years old." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We schematize triples by schematizing the verbs (attributes). In our example, the schema for first name and last name should have type strings, the schema for age should have type integers, and the schema for the \"loves\" relationship should be other entities. Here the types refer to the objects in the triple, since the subject is always an entity." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "So let's put this into code:" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
 attr_idop
010000001assert
110000002assert
210000003assert
310000004assert
\n" ], "text/plain": [ "" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ ":schema\n", "\n", ":put person {\n", " first_name: string index,\n", " nick_name: string many index,\n", " loves: ref many,\n", " age: int\n", "}" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The `:schema` at the top indicates that we want to manage the schema instead of run normal queries. We then `put` a _group_ of related schema. Now even though they are declared together similarly to a table definition in SQL, we need to stress that this actually defines four separate, independent attributes named `person.first_name`, `person.last_name`, `person.loves`, `person.age`. An entity can have whatever attributes associated with it, even those with different prefixes." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The allowed types for attributes are:\n", "\n", "* `ref`\n", "* `bool`\n", "* `int`\n", "* `float`\n", "* `string`\n", "* `bytes`\n", "* `list`\n", "\n", "The list type is heterogeneous in its elements. There is no concept of a nullable type and you can't put `null` into values of triples (other than wrapping them in lists first). To indicate missing values, you simply omit the attribute.\n", "\n", "The `ref` type has the special meaning of refering to other entities.\n", "\n", "After the type comes one or more _modifiers_. The `many` modifier indicates that `loves` is a to-many relationship. If we omit it, any person can love at most one other person, which is not very realistic.\n", "\n", "The modifier `index` indicates that we want values of this attribute to be _indexed_. Only indexed attributes support efficient value lookups and range scans. `ref` types are always implicitly indexed since the database wants to be able to traverse the graph in both directions." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Instead of `index`, we can mark attributes with the modifier `unique`, indicating there cannot be two entities with the same value for the attribute. The value then acts as an _unique identifier_ for the entity, which can be convenient when retrieving the entities since the entity ID is assigned by the database automatically and you cannot choose how it is assigned. So let's add an explicit `person.id` attribute, this time using the non-grouped syntax:" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
 attr_idop
010000005assert
\n" ], "text/plain": [ "" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ ":schema\n", "\n", ":put person.id: string unique;" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can see what schema are there in the database now by running a system directive:" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
 attr_idnametypecardinalityindexhistory
010000001person.first_namestringoneindexFalse
110000002person.nick_namestringmanyindexFalse
210000003person.lovesrefmanynoneFalse
310000004person.ageintonenoneFalse
410000005person.idstringoneuniqueFalse
\n" ], "text/plain": [ "" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ ":db schema" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can rename the attribute:" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
 status
0OK
\n" ], "text/plain": [ "" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ ":db rename attr person.id person.pid" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
 attr_idnametypecardinalityindexhistory
010000001person.first_namestringoneindexFalse
110000002person.nick_namestringmanyindexFalse
210000003person.lovesrefmanynoneFalse
310000004person.ageintonenoneFalse
410000005person.pidstringoneuniqueFalse
\n" ], "text/plain": [ "" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ ":db schema" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "As well as getting rid of it (this will remove all the data associated with the attribute as well):" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
 status
0OK
\n" ], "text/plain": [ "" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ ":db remove attr person.pid" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
 attr_idnametypecardinalityindexhistory
010000001person.first_namestringoneindexFalse
110000002person.nick_namestringmanyindexFalse
210000003person.lovesrefmanynoneFalse
310000004person.ageintonenoneFalse
\n" ], "text/plain": [ "" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ ":db schema" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "But that's about it. Except its name, an attribute is _immutable_ and you cannot change a `string` attribute to a `ref` attribute, nor can you decide that your `one` attribute should really be `many`.\n", "\n", "So what do we mean when we said that this kind of structure can deal with new requirements? Say you initially made the `person.loves` attribute one-to-one and made `person.last_name` a unique index, and now you need to change them. But you need to change them not because the requirements have changed. You need to change them because you have made _mistakes_ at the beginning. These mistakes are fixed by, for example, first rename the offending attributes, then create a new attribute with the old name, next copy the data from the old attribute to the new attribute, and finally delete the old, wrong attribute. Fixing mistakes should be explicit, and this is procedure is very explicit." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "New requirements are not mistakes, and they do not invalidate your old data or schema. Examples of changing requirements: you now need to record the passport number and the parent-child relationships of the people in your graph. Very easy:" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
 attr_idop
010000006assert
110000007assert
\n" ], "text/plain": [ "" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ ":schema\n", "\n", ":put person.passport_no: string many index;\n", ":put person.parent_of: ref many;" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
 attr_idnametypecardinalityindexhistory
010000001person.first_namestringoneindexFalse
110000002person.nick_namestringmanyindexFalse
210000003person.lovesrefmanynoneFalse
310000004person.ageintonenoneFalse
410000006person.passport_nostringmanyindexFalse
510000007person.parent_ofrefmanynoneFalse
\n" ], "text/plain": [ "" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ ":db schema" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Data with schema" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's reinstate the `person.id` attribute first:" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
 attr_idop
010000008assert
\n" ], "text/plain": [ "" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ ":schema\n", "\n", ":put person.id: string one unique;" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "and now we add data to our database. First we add a person called Peter. Besides the `:tx` at the top indicating that we want to execute a transaction, it is just a map:" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
 assertsretracts
030
\n" ], "text/plain": [ "" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ ":tx\n", "\n", "{ person.first_name: 'Peter', person.nick_name: 'Pan', person.id: 'p' }" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You can insert multiple 'rows' at the same time, and the maps also allow some stylistic variations:" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
 assertsretracts
060
\n" ], "text/plain": [ "" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ ":tx\n", "\n", "{\"person.first_name\": \"Quin\", \"*person.nick_name\": [\"Q\", \"The Quick\"], \"person.id\": \"q\"}\n", "{\"person.first_name\": \"Rich\", \"person.id\": \"r\"}" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Every entity is free to have any combination of attributes suitable for it. Note how we specified several nicknames for Quin at the same time, and Rich does not have a nickname." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To query the triples, use _triple rules_: these look like a list of three items, except there is no comma inside. The first slot contains the _entity id_ assigned by the system, the middle symbol is the attribute name and must be explicit (can't be a variable), and the last slot contains the value for the attribute. In fact, you should interpret the attribute name in the middle as an _operator_, that's why there are no commas around it:" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
 eidfirst_namenick_name
0f26fc8c4-388e-11ed-8b86-b7091d48cdc7PeterPan
1f3478ab6-388e-11ed-9737-b3eeb128adfcQuinQ
2f3478ab6-388e-11ed-9737-b3eeb128adfcQuinThe Quick
\n" ], "text/plain": [ "" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "?[eid, first_name, nick_name] := [eid person.nick_name nick_name], \n", " [eid person.first_name first_name]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Besides the above _explicit querying_, there is another way to get attributes associated with an entity: you may specify an _pull directive_ which will expand an integer (interpreted as an entity ID) into a map containing its specified attributes. Observe:" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
 pideid
0p{'_id': 'f26fc8c4-388e-11ed-8b86-b7091d48cdc7', 'person.age': None, 'person.first_name': 'Peter', 'person.nick_name': ['Pan']}
1q{'_id': 'f3478ab6-388e-11ed-9737-b3eeb128adfc', 'person.age': None, 'person.first_name': 'Quin', 'person.nick_name': ['Q', 'The Quick']}
2r{'_id': 'f3478b10-388e-11ed-8ca1-3ab031344a45', 'person.age': None, 'person.first_name': 'Rich', 'person.nick_name': []}
\n" ], "text/plain": [ "" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "?[pid, eid] := [eid person.id pid]\n", "\n", ":pull eid {person.first_name, person.nick_name, person.age}" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "If you have several entry bindings that are entities, you can specify several `:pull` directives one after another, but each output binding can have at most one pull directive associated with it." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Another notable thing is that pulls always return a map, even if some of the requested attributes are missing for the entity (they are filled with `null` instead). In constrast, observe that the query not using pull directive did not return Rich, but returned Quin twice. As can be seen above, the pull also deals with to-many relationships automatically." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Pulls can have nested directives (see the manual for details) and can traverse `ref` triples in the reverse direction. But otherwise pull directives are kept deliberately simple. They are only intended for output processing. If you want recursions, non-trivial filters and the like, do it in the Datalog query instead." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Insertions in the triple store actually amounts to _assertions_ of facts. If two conflicting facts are asserted, the last one wins:" ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
 assertsretracts
010
\n" ], "text/plain": [ "" ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ ":tx\n", "\n", "{_key: ['person.id', 'p'], person.first_name: \"Pete\"}" ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
 pideid
0p{'_id': 'f26fc8c4-388e-11ed-8b86-b7091d48cdc7', 'person.first_name': 'Pete', 'person.nick_name': ['Pan']}
\n" ], "text/plain": [ "" ] }, "execution_count": 17, "metadata": {}, "output_type": "execute_result" } ], "source": [ "?[pid, eid] := [eid person.id pid], pid == 'p'\n", "\n", ":pull eid {person.first_name, person.nick_name}" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Here we specified an existing entity by providing `_key` with an attribute name and a unique value for the attribute. You can only refer to entities this way if the attribute is uniquely indexed. You can also specify an entity by providing its `_id`, but if you have a unique key to use, it is often much clearer." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The next transaction is superficially similar to the last one. But in this case, `person.nick_name` has cardinality `many` instead of `one`:" ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
 assertsretracts
010
\n" ], "text/plain": [ "" ] }, "execution_count": 18, "metadata": {}, "output_type": "execute_result" } ], "source": [ ":tx\n", "\n", "{_key: ['person.id', 'p'], person.nick_name: \"Ping\"}" ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
 pideid
0p{'_id': 'f26fc8c4-388e-11ed-8b86-b7091d48cdc7', 'person.first_name': 'Pete', 'person.nick_name': ['Pan', 'Ping']}
\n" ], "text/plain": [ "" ] }, "execution_count": 19, "metadata": {}, "output_type": "execute_result" } ], "source": [ "?[pid, eid] := [eid person.id pid], pid == 'p'\n", "\n", ":pull eid {person.first_name, person.nick_name}" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now the new nick name is simply recorded together with the last one. Note that if you try to add the same nickname for the same person again, you still get only one copy instead of two:" ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
 assertsretracts
010
\n" ], "text/plain": [ "" ] }, "execution_count": 20, "metadata": {}, "output_type": "execute_result" } ], "source": [ ":tx\n", "\n", "{_key: ['person.id', 'p'], person.nick_name: \"Ping\"}" ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
 pideid
0p{'_id': 'f26fc8c4-388e-11ed-8b86-b7091d48cdc7', 'person.first_name': 'Pete', 'person.nick_name': ['Pan', 'Ping']}
\n" ], "text/plain": [ "" ] }, "execution_count": 21, "metadata": {}, "output_type": "execute_result" } ], "source": [ "?[pid, eid] := [eid person.id pid], pid == 'p'\n", "\n", ":pull eid {person.first_name, person.nick_name}" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "As we have seen, triples abide by set semantics instead of bag semantics as well. If you really want to have duplicates, you need to disambiguate them at the level of values, by for example wrapping them in lists." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To get rid of data, you perform _retractions_:" ] }, { "cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
 assertsretracts
002
\n" ], "text/plain": [ "" ] }, "execution_count": 22, "metadata": {}, "output_type": "execute_result" } ], "source": [ ":tx\n", "\n", ":retract {_key: ['person.id', 'p'], person.nick_name: \"Ping\", person.first_name: 'Peter'}" ] }, { "cell_type": "code", "execution_count": 23, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
 pideid
0p{'_id': 'f26fc8c4-388e-11ed-8b86-b7091d48cdc7', 'person.first_name': None, 'person.id': 'p', 'person.nick_name': ['Pan']}
\n" ], "text/plain": [ "" ] }, "execution_count": 23, "metadata": {}, "output_type": "execute_result" } ], "source": [ "?[pid, eid] := [eid person.id pid], pid == 'p'\n", "\n", ":pull eid {person.first_name, person.nick_name, person.id}" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "It is OK to retract facts that do not exist, in which case this is just a no-op. Notice that the entity still has its `person.id` attribute: the `_key` specification only indicates what entity to transact. If you want to get rid of the keyed attribute, you have to include it in the transaction map explicitly." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Note that when retracting facts above, we have to provide the database of values for existing triples. This can be cumbersome, especially in the case of to-many attributes --- if you someone miss one value, it will remain. Therefore another form of retraction `retract_all` is provided:" ] }, { "cell_type": "code", "execution_count": 24, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
 assertsretracts
002
\n" ], "text/plain": [ "" ] }, "execution_count": 24, "metadata": {}, "output_type": "execute_result" } ], "source": [ ":tx\n", "\n", ":retract_all {_key: ['person.id', 'p'], person.nick_name: 0, person.first_name: 0, person.id: 0}" ] }, { "cell_type": "code", "execution_count": 25, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
 pideid
\n" ], "text/plain": [ "" ] }, "execution_count": 25, "metadata": {}, "output_type": "execute_result" } ], "source": [ "?[pid, eid] := [eid person.id pid], pid == 'p'\n", "\n", ":pull eid {person.first_name, person.nick_name, person.id}" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In this form, you can provide any value for the attributes, the database does not care and just removes all values associated with the attributes. Above we have used `0` since it is simple to type." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Nested data mutations" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We have so far inserted data in units of entities. This is fine for simple cases, but can become awkward for tree or graph shaped data which are linked together in non-trivial ways. We would need to insert some triples first, get ids of some entities (or use their unique keys), and use these to insert other triples." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Instead, Cozo supports nested data insertion. Let's insert our whole love triangle graph all at once." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Recall that our love triangles are:" ] }, { "cell_type": "code", "execution_count": 26, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
 01
0aliceeve
1bobalice
2charlieeve
3davidgeorge
4evealice
5evebob
6evecharlie
7georgegeorge
\n" ], "text/plain": [ "" ] }, "execution_count": 26, "metadata": {}, "output_type": "execute_result" } ], "source": [ "?[] <- [['alice', 'eve'],\n", " ['bob', 'alice'],\n", " ['eve', 'alice'],\n", " ['eve', 'bob'],\n", " ['eve', 'charlie'],\n", " ['charlie', 'eve'],\n", " ['david', 'george'],\n", " ['george', 'george']]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We insert them into the triple store thus:" ] }, { "cell_type": "code", "execution_count": 27, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
 assertsretracts
0200
\n" ], "text/plain": [ "" ] }, "execution_count": 27, "metadata": {}, "output_type": "execute_result" } ], "source": [ ":tx\n", "\n", "{\n", " _tid: 'a', \n", " person.id: 'a', \n", " person.first_name: 'Alice',\n", " person.loves: {\n", " _tid: 'e',\n", " person.id: 'e',\n", " person.first_name: 'Eve',\n", " *person.loves: [\n", " 'a',\n", " {\n", " _tid: 'b',\n", " person.id: 'b',\n", " person.first_name: 'Bob',\n", " person.loves: 'a'\n", " },\n", " {\n", " _tid: 'c',\n", " person.id: 'c',\n", " person.first_name: 'Charlie',\n", " person.loves: 'e'\n", " }\n", " ]\n", " }\n", "}\n", "\n", "{person.id: 'd', person.first_name: 'David', person.loves: 'g'}\n", "{_tid: 'g', person.id: 'g', person.first_name: 'George', person.loves: 'g'}" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Nested mutations are done simply by using maps for `ref` attribute values. We identified entities that do not yet exist in the database by their `_tid` given inline. `_tid`s can be any string you like _except_ strings that can be interpreted as UUIDs. As before, an asterisk `*` before the attribute name denotes that we are transacting multiple triples into an attribute. As the last two maps in the example shows, you do not need `_tid` if you do not need to refer to an entity, and you can use `_tid` to refer to an entity itself." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's see if we get the same results querying the triple store:" ] }, { "cell_type": "code", "execution_count": 28, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
 lovingloved
0AliceEve
1BobAlice
2CharlieEve
3DavidGeorge
4EveAlice
5EveBob
6EveCharlie
7GeorgeGeorge
\n" ], "text/plain": [ "" ] }, "execution_count": 28, "metadata": {}, "output_type": "execute_result" } ], "source": [ "?[loving, loved] := [a person.first_name loving], \n", " [a person.loves b], \n", " [b person.first_name loved]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Nice!" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### A note on the entity ID" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "As you have probably already noticed, the database assigns UUIDs as entity IDs automatically when we created the entities. You can also create the IDs yourself when doing the creation for more control:" ] }, { "cell_type": "code", "execution_count": 29, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
 assertsretracts
020
\n" ], "text/plain": [ "" ] }, "execution_count": 29, "metadata": {}, "output_type": "execute_result" } ], "source": [ ":tx\n", "\n", "{_id: '4e7a35b9-e04d-48a3-9eeb-d8a68ef33c43', person.id: 'u', person.first_name: 'Ursula'}" ] }, { "cell_type": "code", "execution_count": 30, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
 p
0{'_id': '4e7a35b9-e04d-48a3-9eeb-d8a68ef33c43', 'person.first_name': 'Ursula', 'person.id': 'u'}
\n" ], "text/plain": [ "" ] }, "execution_count": 30, "metadata": {}, "output_type": "execute_result" } ], "source": [ "?[p] <- [['4e7a35b9-e04d-48a3-9eeb-d8a68ef33c43']]\n", "\n", ":pull p { person.first_name, person.id }" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The system-assigned IDs are UUID version 1 and is contains a timestamp. You can extract the timestamp by using the function `uuid_timestamp`:" ] }, { "cell_type": "code", "execution_count": 31, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
 pidts
0a1663642232.365364
1b1663642232.365371
2c1663642232.365372
3d1663642232.365372
4e1663642232.365370
5g1663642232.365373
6q1663642213.641695
7r1663642213.641704
8unan
\n" ], "text/plain": [ "" ] }, "execution_count": 31, "metadata": {}, "output_type": "execute_result" } ], "source": [ "?[pid, ts] := [p person.id pid], ts = uuid_timestamp(p)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The returned numbers indicate seconds since the UNIX epoch. The UUID we made ourselves does not contain a timestamp as it is of version 4. You can provide any valid UUID as entity ID except the 'nil ID' `00000000-0000-0000-0000-000000000000`:" ] }, { "cell_type": "code", "execution_count": 32, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "\u001b[31meval::amend_triple_with_reserved_id\u001b[0m\n", "\n", " \u001b[31m×\u001b[0m Attempting to amend triple person.id via reserved ID 00000000-0000-0000-0000-000000000000\n" ] }, "execution_count": 32, "metadata": {}, "output_type": "execute_result" } ], "source": [ ":tx\n", "\n", "{_id: '00000000-0000-0000-0000-000000000000', person.id: '0', person.first_name: 'I am ZERO'}" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Using the timestamped version has performance benefits: the database sorts UUIDs in a way such that those with similar timestamps are near each other. This provides the kind of data locality similar to an auto-incrementing integer key in a RDBMS, while mitigating the risk of malicious users trying to iterate over your data sequentially, or estimating the cardinality of your data. The UUIDs generated by the system contain only random bits besides the timestamp, in particular there is no node information encoded with them (as allowed but not required by the UUID specification), so users cannot tell on which machine the IDs were generated either. Still, if you want your keys to be completely obscure, provide your own UUIDv4 backed by a good random number generator." ] }, { "cell_type": "markdown", "metadata": { "tags": [] }, "source": [ "## The time machine" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Your data is changing fast. For administrative or regulative reasons, you may also need records of _how_ your data change. Or you may be presented with historical data in the first place, and you want your queries to reflect facts _at a particular instant of time_." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Someone used to say that 'more columns in a RDBMS solves anything'. In our case, maybe adding more attributes helps? Let's add to each entity the attribute `valid_at` indicating when the entity is considered valid." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In fact, this is doable, but the resulting system is a total pain to use. First, you will need to _reify_ most of your values. Instead of saying that `[bob person.name 'Bob']`, you need something like `[bob person.used_name name]`, where `[name name.is_spelled 'Bob']` and `[name name.is_valid_at '2020-03-04']`, etc. Next, how are you going to find our what everything was at a particular moment? You cannot use equality conditions to filter entities based on `is_valid_at`, since something that was introduced in 1999 is still valid in 2020, _unless_ some other fact supercedes it or it was retracted _after_ 1999. And we are only after the latest valid fact, not all historical facts at a point in time. Fulfilling these requirements _is_ possible in Cozo with aggregations, but they necessitate a huge amount of complexities for even the simplest queries." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To solve this particular problem, which occurs more commonly than you might think, Cozo has built-in support for historical facts. This functionality carries a non-trivial performance penalty, so you have to request it explicitly for each attribute. And like other properties of attributes, whether it has history support is immutable. If you later change your mind, you need to define a new attribute and copy data over, as usual.\n", "\n", "If you are already worried about performance, let's assure you that Cozo's historical facts implementation is MUCH MORE performant than the hand-rolled solution indicated above. In fact, querying a history-enabled attribute is on average $c \\log n$ times slower than the corresponding query for a non-history-enabled attribute, where $c$ is a small constant and $n$ is the number of historical facts a given entity-attribute pair has. The logarithmic complexity beats any simple-minded implementation, especially when the amount of historical records is enormous." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's have some examples. We want to store countries and their heads of states. The schema:" ] }, { "cell_type": "code", "execution_count": 33, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
 attr_idop
010000009assert
110000010assert
\n" ], "text/plain": [ "" ] }, "execution_count": 33, "metadata": {}, "output_type": "execute_result" } ], "source": [ ":schema\n", "\n", ":put country {\n", " name: string unique,\n", " head: string index history,\n", "}" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "For simplicity we assume that a country's name does not change, but obviously its head of state changes every few years, indicated by the modifier `history`. That's all you need for the schema." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You insert data as you do before:" ] }, { "cell_type": "code", "execution_count": 34, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
 assertsretracts
040
\n" ], "text/plain": [ "" ] }, "execution_count": 34, "metadata": {}, "output_type": "execute_result" } ], "source": [ ":tx\n", "\n", ":put {country.name: 'US', country.head: 'Biden'}\n", "{country.name: 'UK', country.head: 'Truss'}" ] }, { "cell_type": "code", "execution_count": 35, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
 countryhead
0UKTruss
1USBiden
\n" ], "text/plain": [ "" ] }, "execution_count": 35, "metadata": {}, "output_type": "execute_result" } ], "source": [ "?[country, head] := [c country.name country], [c country.head head]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "By the way, we showed that you can explicitly tell the system that you are doing `put`." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now let's add in the historical data:" ] }, { "cell_type": "code", "execution_count": 36, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
 assertsretracts
020
\n" ], "text/plain": [ "" ] }, "execution_count": 36, "metadata": {}, "output_type": "execute_result" } ], "source": [ ":tx\n", "\n", "@'2019-07-24' {_key: ['country.name', 'UK'], country.head: 'Johnson'}\n", ":put @ '2017-01-20' {_key: ['country.name', 'US'], country.head: 'Trump'}" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The syntax should explain itself. You can specify the date in ISO 8601 format, in which case it is interpreted as a timestamp at the stated date at midnight UTC, or as RFC 3339 format such as `'1996-12-19T16:39:57-08:00'`, or as an integer indicating the number of _microseconds_ since the UNIX epoch (negative numbers for before the epoch). The validity marker only affects attributes that were defined with the `history` modifier.\n", "\n", "Let's see who are the heads of states _now_:" ] }, { "cell_type": "code", "execution_count": 37, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
 countryhead
0UKTruss
1USBiden
\n" ], "text/plain": [ "" ] }, "execution_count": 37, "metadata": {}, "output_type": "execute_result" } ], "source": [ "?[country, head] := [c country.name country], [c country.head head]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "As expected." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's explicitly request historical facts:" ] }, { "cell_type": "code", "execution_count": 38, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
 countryhead
0UKJohnson
1USTrump
\n" ], "text/plain": [ "" ] }, "execution_count": 38, "metadata": {}, "output_type": "execute_result" } ], "source": [ "?[country, head] @ '2020-01-01' := [c country.name country], [c country.head head]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Right. Try another one:" ] }, { "cell_type": "code", "execution_count": 39, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
 countryhead
0UKJohnson
1USTrump
\n" ], "text/plain": [ "" ] }, "execution_count": 39, "metadata": {}, "output_type": "execute_result" } ], "source": [ "?[country, head] @ '2022-01-01' := [c country.name country], [c country.head head]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Umm ... that doesn't look right. The problem is, when we inserted facts about Biden and Truss, we did not tell the system when that fact starts being valid, so the system assumes the current timestamp. If you are inserting facts in real time, this is what you want. But if you are inserting historical facts as we are doing here, or are doing catch-ups, this causes problems. In our case the fix is easy:" ] }, { "cell_type": "code", "execution_count": 40, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
 assertsretracts
020
\n" ], "text/plain": [ "" ] }, "execution_count": 40, "metadata": {}, "output_type": "execute_result" } ], "source": [ ":tx\n", "\n", "@'2022-09-05' {_key: ['country.name', 'UK'], country.head: 'Truss'}\n", "@'2021-01-20' {_key: ['country.name', 'US'], country.head: 'Biden'}" ] }, { "cell_type": "code", "execution_count": 41, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
 countryhead
0UKJohnson
1USBiden
\n" ], "text/plain": [ "" ] }, "execution_count": 41, "metadata": {}, "output_type": "execute_result" } ], "source": [ "?[country, head] @ '2022-01-01' := [c country.name country], [c country.head head]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "That's more accurate. What about the future?" ] }, { "cell_type": "code", "execution_count": 42, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
 countryhead
0UKTruss
1USBiden
\n" ], "text/plain": [ "" ] }, "execution_count": 42, "metadata": {}, "output_type": "execute_result" } ], "source": [ "?[country, head] @ '9999-01-01' := [c country.name country], [c country.head head]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Wow, that can't happen no matter what the world is coming to. We fix that by _retracting_ facts as before, but with a timestamp attached (we will use a _very_ generous timestamp for them):" ] }, { "cell_type": "code", "execution_count": 43, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
 assertsretracts
006
\n" ], "text/plain": [ "" ] }, "execution_count": 43, "metadata": {}, "output_type": "execute_result" } ], "source": [ ":tx\n", "\n", ":retract_all @ '2099-01-01' {_key: ['country.name', 'UK'], country.head: 0}\n", ":retract_all @ '2099-01-01' {_key: ['country.name', 'US'], country.head: 0}" ] }, { "cell_type": "code", "execution_count": 44, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
 countryhead
\n" ], "text/plain": [ "" ] }, "execution_count": 44, "metadata": {}, "output_type": "execute_result" } ], "source": [ "?[country, head] @ '9999-01-01' := [c country.name country], [c country.head head]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Good. What about now, again?" ] }, { "cell_type": "code", "execution_count": 45, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
 countryhead
0UKTruss
1USBiden
\n" ], "text/plain": [ "" ] }, "execution_count": 45, "metadata": {}, "output_type": "execute_result" } ], "source": [ "?[country, head] := [c country.name country], [c country.head head]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "And history?" ] }, { "cell_type": "code", "execution_count": 46, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
 countryhead
0USTrump
\n" ], "text/plain": [ "" ] }, "execution_count": 46, "metadata": {}, "output_type": "execute_result" } ], "source": [ "?[country, head] @ '2018-01-01' := [c country.name country], [c country.head head]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "UK is missing since we have yet to enter the head of state for UK at this period into the database. Fix:" ] }, { "cell_type": "code", "execution_count": 47, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
 assertsretracts
010
\n" ], "text/plain": [ "" ] }, "execution_count": 47, "metadata": {}, "output_type": "execute_result" } ], "source": [ ":tx\n", "\n", "@'2016-07-11' {_key: ['country.name', 'UK'], country.head: 'May'}" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "One thing if it is not already obvious: timestamps apply at the level of rules, not queries, so you can have a different timestamp for each rule:" ] }, { "cell_type": "code", "execution_count": 49, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
 yearcountryhead
02019UKMay
12019USTrump
22022UKJohnson
32022USBiden
4nowUKTruss
5nowUSBiden
\n" ], "text/plain": [ "" ] }, "execution_count": 49, "metadata": {}, "output_type": "execute_result" } ], "source": [ "?[year, country, head] @ '2019-01-01' := year = 2019, [c country.name country], [c country.head head]\n", "?[year, country, head] @ '2022-01-01' := year = 2022, [c country.name country], [c country.head head]\n", "?[year, country, head] /* ~~NoW!~~ */ := year = 'now', [c country.name country], [c country.head head]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The timestamp is also not required to represent actual time. You can `put` data by giving them integer timestamps with custom interpretation, and query them using the same interpretation. Just don't mix your fictional time and real time." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "A final API before we are done with this time-travelling thing. If you want a record of the actual history of attributes for a certain entity instead of its time slices, use this system op:" ] }, { "cell_type": "code", "execution_count": 50, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
 entity_idattrtimestamptimestamp_stropvalue
00639834a-388f-11ed-9d48-bdd20af27054country.namenanNO_HISTORYassertUK
10639834a-388f-11ed-9d48-bdd20af27054country.head4070908800000000.0000002099-01-01T00:00:00+00:00retractNone
20639834a-388f-11ed-9d48-bdd20af27054country.head1663642245426446.0000002022-09-20T02:50:45.426446+00:00assertTruss
30639834a-388f-11ed-9d48-bdd20af27054country.head1662336000000000.0000002022-09-05T00:00:00+00:00assertTruss
40639834a-388f-11ed-9d48-bdd20af27054country.head1563926400000000.0000002019-07-24T00:00:00+00:00assertJohnson
50639834a-388f-11ed-9d48-bdd20af27054country.head1468195200000000.0000002016-07-11T00:00:00+00:00assertMay
6063982e6-388f-11ed-90d5-354957e3b083country.namenanNO_HISTORYassertUS
7063982e6-388f-11ed-90d5-354957e3b083country.head4070908800000000.0000002099-01-01T00:00:00+00:00retractNone
8063982e6-388f-11ed-90d5-354957e3b083country.head1663642245426446.0000002022-09-20T02:50:45.426446+00:00assertBiden
9063982e6-388f-11ed-90d5-354957e3b083country.head1611100800000000.0000002021-01-20T00:00:00+00:00assertBiden
10063982e6-388f-11ed-90d5-354957e3b083country.head1484870400000000.0000002017-01-20T00:00:00+00:00assertTrump
\n" ], "text/plain": [ "" ] }, "execution_count": 50, "metadata": {}, "output_type": "execute_result" } ], "source": [ ":db history for ['country.name', 'UK'], ['country.name', 'US'] : country.name, country.head" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We have used a unique key to identify the entity. You can of course use the entity ID itself. The time ordering within each entity-attribute pair is reverse chronological." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Restricting the range of time for the query is also possible:" ] }, { "cell_type": "code", "execution_count": 51, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
 entity_idattrtimestamptimestamp_stropvalue
00639834a-388f-11ed-9d48-bdd20af27054country.namenanNO_HISTORYassertUK
1063982e6-388f-11ed-90d5-354957e3b083country.namenanNO_HISTORYassertUS
2063982e6-388f-11ed-90d5-354957e3b083country.head1611100800000000.0000002021-01-20T00:00:00+00:00assertBiden
\n" ], "text/plain": [ "" ] }, "execution_count": 51, "metadata": {}, "output_type": "execute_result" } ], "source": [ ":db history from '2020-01-01' to '2022-01-01' for ['country.name', 'UK'], ['country.name', 'US'] : country.name, country.head" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Note that even though the UK had a head of state in this period, it is not included in the output since its _assertions_ lies outside the time range. This API is only meant for administrative purposes. For general queries, use Datalog queries instead." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "As we have seen, for attributes with history, retraction does not really remove the data from the database. If you are e.g. legally required to make sure a piece of data is physically gone, retract with exactly the same timestamp as the piece of data originally had. In this case it is recommended to use the integer form of the timestamp. You won't be able to retrieve the data with the public API after the retraction, but some traces of the data may still persist in write ahead logs and other places. Complete eradication may take an unspecified amount of time. That is, if you did not have any backups set up yourself (GASP). Yes, absolute elimination of data is difficult and uncertain." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.6" } }, "nbformat": 4, "nbformat_minor": 4 }