# The pilgrim to Mount Acid

## Stored relations

An obvious shortcoming of our previous acrobatics is that we have to carry around our love triangles network and enter it anew for every query, which leads to rapid deterioration of the `CTRL`, `C` and `V` keys. So let's fix that:

In [1]:
?[] <- [['alice', 'eve'],
        ['bob', 'alice'],
        ['eve', 'alice'],
        ['eve', 'bob'],
        ['eve', 'charlie'],
        ['charlie', 'eve'],
        ['david', 'george'],
        ['george', 'george']]
        
:relation create triangles

status
OK


We have the _query directive_ `:relation create` together with a normal query. The results will then be stored on your disk with the name `triangles` instead of returned to you.

You will receive an error if you try to run this script twice. In which case don't worry and continue.

Stored relations are safe from restarts and power failures. Let's query against it:

In [2]:
?[a, b] := :triangles[a, b]

a,b
alice,eve
bob,alice
charlie,eve
david,george
eve,alice
eve,bob
eve,charlie
george,george


The colon `:` in front of the name tells the database that we want a _stored_ relation instead of a relation defined within the query itself.

Now, Fred finally comes to the party and Fred loves Alice and Eve. We add these facts in the following way:

In [3]:
?[] <- [['fred', 'alice'],
        ['fred', 'eve']]

:relation put triangles

status
OK


In [4]:
?[a, b] := :triangles[a, b]

a,b
alice,eve
bob,alice
charlie,eve
david,george
eve,alice
eve,bob
eve,charlie
fred,alice
fred,eve
george,george


Notice that we used `:relation put` instead of `:relation create`. In fact, you can use `:relation put` before any call to `:relation create`. The `create` op just ensures that the insertion is into a new stored relation.

Now Eve no longer loves Alice and Charlie! Let's reflect this fact by using `retract`

In [5]:
?[] <- [['eve', 'charlie'],
        ['eve', 'alice']]

:relation retract triangles

status
OK


In [6]:
?[a, b] := :triangles[a, b]

a,b
alice,eve
bob,alice
charlie,eve
david,george
eve,bob
fred,alice
fred,eve
george,george


It is OK to retract non-existent facts, in which case the operation does nothing.

You can also reset the whole relation with `rederive`:

In [7]:
?[] <- [['eve', 'charlie'],
        ['eve', 'alice']]

:relation rederive triangles

status
OK


In [8]:
?[a, b] := :triangles[a, b]

a,b
eve,alice
eve,charlie


Only the `rederive`ed tuples remain.

You can see what stored relations you currently have in your database by running the following _system directive_:

In [9]:
:db relations

name,arity
triangles,2


Relations can be renamed:

In [10]:
:db rename relation triangles love_triangles

status
OK


In [11]:
:db relations

name,arity
love_triangles,2


In [12]:
?[a, b] := :love_triangles[a, b]

a,b
eve,alice
eve,charlie


Now this triangles business is becoming tiring. Let's get rid of it:

In [13]:
:db remove relation love_triangles

status
OK


Since we do not have any queries to run when nuking relations, we use a system directive instead of a query directive. Now you can no longer query the triangles:

In [14]:
?[a, b] := :love_triangles[a, b]

This completes all the operations on stored relations: `create`, `put`, `retract`, `rederive`. The syntax for `remove` is different from the rest for technical reasons.

All these operations are _atomic_, meaning that for all the tuples they affect, either all are affected at the same time, or the operation completely fails. There is no in-between, corrupted state.

## A schema for data

The stored relation operations introduced above are simple, fast, and very raw. They can be used in exactly the same way as rules defined inline with the query. The way you use them is also not very different than in a traditional SQL database.

Stored relations are suitable for data that has a well-defined structure at the onset, and which is loaded and updated in bulk. For example, you may have obtained from domain experts an [ontology](https://www.wikiwand.com/en/Ontology_\(information_science\)) in the form of a network of metadata. The ontology comes in nice tables with clear, detailed documentation. You store this ontology as a group of stored relations, and use them to extract insights from your business data. The ontology is updated periodically, and when an update comes you just use the `rederive` operation to replace the old version. Very simple and efficient.

But your _business_ data is mostly likely not as simple as that. At this age of BigDataⒸ, you must have one billion active users on your platforms carrying out all sorts of activities, concurrently of course. You don't want these activities to step on each other. You don't want to store the wrong thing into your user's accounts. You _especially_ don't want any money in transit to disappear in midair. To make things worse, hundreds of new activities pop up each day. 

Storing any of these in a stored relation is infeasible. With a traditional RDBMS, [data migrations](https://en.wikipedia.org/wiki/Data_migration) would have already killed you. And with Cozo, stored relations don't even try to support schema change (in fact, the only 'schema' for a stored relation is its arity).

To store such data and meet its query and mutation requirements, a database needs:

* high concurrency;
* fine-grained transactions;
* checks for data integrity;
* ability to rapidly adapt to new data shapes and requirements.

But in turn, we have to give up something. So with Cozo, we are willing to pay the following prices:

* we demand that most transactions only apply _local changes_ that only touch on a tiny fraction of the data (otherwise the database cannot satisfy the high concurrency requirements);
* we tolerate indirections (since "all problems in computer science can be solved by another level of indirection").

The solution is the [triple store](https://en.wikipedia.org/wiki/Triplestore).

A _triple_ is a sentence consisting of a subject, a verb, and an object. In the Cozo flavour, the subject is always an opaque identity, such as _entity42_, so it is actually an _entity-attribute-value_ triple. Examples:

* _entity42_ has first name `'Alice'`.
* _entity42_ has last name `'Liddell'`.
* _entity42_ loves _entity81_.
* _entity81_ is aged `20` years old.

We schematize triples by schematizing the verbs (attributes). In our example, the schema for first name and last name should have type strings, the schema for age should have type integers, and the schema for the "loves" relationship should be other entities. Here the types refer to the objects in the triple, since the subject is always an entity.

So let's put this into code:

In [15]:
:schema

put person {
    first_name: string index,
    nick_name: string many index,
    loves: ref many,
    age: int
}

attr_id,op
10000001,assert
10000002,assert
10000003,assert
10000004,assert


The `:schema` at the top indicates that we want to manage the schema instead of run normal queries. We then `put` a _group_ of related schema. Now even though they are declared together similarly to a table definition in SQL, we need to stress that this actually defines four separate, independent attributes named `person.first_name`, `person.last_name`, `person.loves`, `person.age`. An entity can have whatever attributes associated with it, even those with different prefixes.

The allowed types for attributes are:

* `ref`
* `bool`
* `int`
* `float`
* `string`
* `bytes`
* `list`

The list type is heterogeneous in its elements. There is no concept of a nullable type and you can't put `null` into values of triples (other than wrapping them in lists first). To indicate missing values, you simply omit the attribute.

The `ref` type has the special meaning of refering to other entities.

After the type comes one or more _modifiers_. The `many` modifier indicates that `loves` is a to-many relationship. If we omit it, any person can love at most one other person, which is not very realistic.

The modifier `index` indicates that we want values of this attribute to be _indexed_. Only indexed attributes support efficient value lookups and range scans. `ref` types are always implicitly indexed since the database wants to be able to traverse the graph in both directions.

Instead of `index`, we can mark attributes with the modifier `unique`, indicating there cannot be two entities with the same value for the attribute. The value then acts as an _unique identifier_ for the entity, which can be convenient when retrieving the entities since the entity ID is assigned by the database automatically and you cannot choose how it is assigned. So let's add an explicit `person.id` attribute, this time using the non-grouped syntax:

In [16]:
:schema

put person.id: string unique;

attr_id,op
10000005,assert


We can see what schema are there in the database now by running a system directive:

In [17]:
:db schema

attr_id,name,type,cardinality,index,history
10000001,person.first_name,string,one,index,False
10000002,person.nick_name,string,many,index,False
10000003,person.loves,ref,many,none,False
10000004,person.age,int,one,none,False
10000005,person.id,string,one,unique,False


We can rename the attribute:

In [18]:
:db rename attr person.id person.pid

status
OK


In [19]:
:db schema

attr_id,name,type,cardinality,index,history
10000001,person.first_name,string,one,index,False
10000002,person.nick_name,string,many,index,False
10000003,person.loves,ref,many,none,False
10000004,person.age,int,one,none,False
10000005,person.pid,string,one,unique,False


As well as getting rid of it (this will remove all the data associated with the attribute as well):

In [20]:
:db remove attr person.pid

status
OK


In [21]:
:db schema

attr_id,name,type,cardinality,index,history
10000001,person.first_name,string,one,index,False
10000002,person.nick_name,string,many,index,False
10000003,person.loves,ref,many,none,False
10000004,person.age,int,one,none,False


But that's about it. Except its name, an attribute is _immutable_ and you cannot change a `string` attribute to a `ref` attribute, nor can you decide that your `one` attribute should really be `many`.

So what do we mean when we said that this kind of structure can deal with new requirements? Say you initially made the `person.loves` attribute one-to-one and made `person.last_name` a unique index, and now you need to change them. But you need to change them not because the requirements have changed. You need to change them because you have made _mistakes_ at the beginning. These mistakes are fixed by, for example, first rename the offending attributes, then create a new attribute with the old name, next copy the data from the old attribute to the new attribute, and finally delete the old, wrong attribute. Fixing mistakes should be explicit, and this is procedure is very explicit.

New requirements are not mistakes, and they do not invalidate your old data or schema. Examples of changing requirements: you now need to record the passport number and the parent-child relationships of the people in your graph. Very easy:

In [22]:
:schema

put person.passport_no: string many index;
put person.parent_of: ref many;

attr_id,op
10000006,assert
10000007,assert


In [23]:
:db schema

attr_id,name,type,cardinality,index,history
10000001,person.first_name,string,one,index,False
10000002,person.nick_name,string,many,index,False
10000003,person.loves,ref,many,none,False
10000004,person.age,int,one,none,False
10000006,person.passport_no,string,many,index,False
10000007,person.parent_of,ref,many,none,False


## Data with schema

Let's reinstate the `person.id` attribute first:

In [24]:
:schema

put person.id: string one unique;

attr_id,op
10000008,assert


and now we add data to our database. First we add a person called Peter. Besides the `:tx` at the top indicating that we want to execute a transaction, it is just a map:

In [25]:
:tx

{ person.first_name: 'Peter', person.nick_name: 'Pan', person.id: 'p' }

entity_id,asserts,retracts
550dad48-3501-11ed-8dc9-9aefd164fdd2,3,0


You can insert multiple 'rows' at the same time, and the maps also allow some stylistic variations:

In [26]:
:tx

{"person.first_name": "Quin", "*person.nick_name": ["Q", "The Quick"], "person.id": "q"}
{"person.first_name": "Rich", "person.id": "r"}

entity_id,asserts,retracts
567c963a-3501-11ed-9c95-82ccbc12f696,4,0
567c9806-3501-11ed-8e7b-d4c62ea1da21,2,0


Every entity is free to have any combination of attributes suitable for it. Note how we specified several nicknames for Quin at the same time, and Rich does not have a nickname.

To query the triples, use _triple rules_: these look like a list of three items, except there is no comma inside. The first slot contains the _entity id_ assigned by the system, the middle symbol is the attribute name and must be explicit (can't be a variable), and the last slot contains the value for the attribute:

In [27]:
?[eid, first_name, nick_name] := [eid person.nick_name nick_name], [eid person.first_name first_name]

eid,first_name,nick_name
550dad48-3501-11ed-8dc9-9aefd164fdd2,Peter,Pan
567c963a-3501-11ed-9c95-82ccbc12f696,Quin,Q
567c963a-3501-11ed-9c95-82ccbc12f696,Quin,The Quick


Besides the above _explicit querying_, there is another way to get attributes associated with an entity: you may specify an _pull directive_ which will expand an integer (interpreted as an entity ID) into a map containing its specified attributes. Observe:

In [28]:
?[pid, eid] := [eid person.id pid]

:pull eid {person.first_name, person.nick_name, person.age}

pid,eid
p,"{""_id"":""550dad48-3501-11ed-8dc9-9aefd164fdd2"",""person.age"":null,""person.first_name"":""Peter"",""person.nick_name"":[""Pan""]}"
q,"{""_id"":""567c963a-3501-11ed-9c95-82ccbc12f696"",""person.age"":null,""person.first_name"":""Quin"",""person.nick_name"":[""Q"",""The Quick""]}"
r,"{""_id"":""567c9806-3501-11ed-8e7b-d4c62ea1da21"",""person.age"":null,""person.first_name"":""Rich"",""person.nick_name"":[]}"


If you have several entry bindings that are entities, you can specify several `:pull` directives one after another, but each output binding can have at most one pull directive associated with it.

Another notable thing is that pulls always return a map, even if some of the requested attributes are missing for the entity (they are filled with `null` instead). In constrast, observe that the query not using pull directive did not return Rich, but returned Quin twice. As can be seen above, the pull also deals with to-many relationships automatically.

Pulls can have nested directives (see the manual for details) and can traverse `ref` triples in the reverse direction. But otherwise pull directives are kept deliberately simple. They are only intended for output processing. If you want recursions, non-trivial filters and the like, do it in the Datalog query instead.

Insertions in the triple store actually amounts to _assertions_ of facts. If two conflicting facts are asserted, the last one wins:

In [29]:
:tx

{_key: ['person.id', 'p'], person.first_name: "Pete"}

entity_id,asserts,retracts
550dad48-3501-11ed-8dc9-9aefd164fdd2,1,0


In [40]:
?[pid, eid] := [eid person.id pid], pid == 'p'

:pull eid {person.first_name, person.nick_name}

pid,eid
p,"{""_id"":""550dad48-3501-11ed-8dc9-9aefd164fdd2"",""person.first_name"":""Pete"",""person.nick_name"":[""Pan"",""Ping""]}"


Here we specified an existing entity by providing `_key` with an attribute name and a unique value for the attribute. You can only refer to entities this way if the attribute is uniquely indexed. You can also specify an entity by providing its `_id`, but if you have a unique key to use, it is often much clearer.

The next transaction is superficially similar to the last one. But in this case, `person.nick_name` has cardinality `many` instead of `one`:

In [33]:
:tx

{_key: ['person.id', 'p'], person.nick_name: "Ping"}

entity_id,asserts,retracts
550dad48-3501-11ed-8dc9-9aefd164fdd2,1,0


In [41]:
?[pid, eid] := [eid person.id pid], pid == 'p'

:pull eid {person.first_name, person.nick_name}

pid,eid
p,"{""_id"":""550dad48-3501-11ed-8dc9-9aefd164fdd2"",""person.first_name"":""Pete"",""person.nick_name"":[""Pan"",""Ping""]}"


Now the new nick name is simply recorded together with the last one. Note that if you try to add the same nickname for the same person again, you still get only one copy instead of two:

In [37]:
:tx

{_key: ['person.id', 'p'], person.nick_name: "Ping"}

entity_id,asserts,retracts
550dad48-3501-11ed-8dc9-9aefd164fdd2,1,0


In [38]:
?[pid, eid] := [eid person.id pid]

:pull eid {person.first_name, person.nick_name}

pid,eid
p,"{""_id"":""550dad48-3501-11ed-8dc9-9aefd164fdd2"",""person.first_name"":""Pete"",""person.nick_name"":[""Pan"",""Ping""]}"
q,"{""_id"":""567c963a-3501-11ed-9c95-82ccbc12f696"",""person.first_name"":""Quin"",""person.nick_name"":[""Q"",""The Quick""]}"
r,"{""_id"":""567c9806-3501-11ed-8e7b-d4c62ea1da21"",""person.first_name"":""Rich"",""person.nick_name"":[]}"


As we have seen, triples abide by set semantics instead of bag semantics as well. If you really want to have duplicates, you need to disambiguate them at the level of values, by for example wrapping them in lists.

## The time machine

In [2]:
a[] <- [[1]]
?[a] := a[a], a > 100

:assert none

a
