# The pilgrim to Mount Acid

## Stored relations

An obvious shortcoming of our previous acrobatics is that we have to carry around our love triangles network and enter it anew for every query, which leads to rapid deterioration of the `CTRL`, `C` and `V` keys. So let's fix that:

In [1]:
?[] <- [['alice', 'eve'],
        ['bob', 'alice'],
        ['eve', 'alice'],
        ['eve', 'bob'],
        ['eve', 'charlie'],
        ['charlie', 'eve'],
        ['david', 'george'],
        ['george', 'george']]
        
:relation create triangles

status
OK


We have the _query directive_ `:relation create` together with a normal query. The results will then be stored on your disk with the name `triangles` instead of returned to you.

You will receive an error if you try to run this script twice. In which case don't worry and continue.

Stored relations are safe from restarts and power failures. Let's query against it:

In [2]:
?[a, b] := :triangles[a, b]

a,b
alice,eve
bob,alice
charlie,eve
david,george
eve,alice
eve,bob
eve,charlie
george,george


The colon `:` in front of the name tells the database that we want a _stored_ relation instead of a relation defined within the query itself.

Now, Fred finally comes to the party and Fred loves Alice and Eve. We add these facts in the following way:

In [3]:
?[] <- [['fred', 'alice'],
        ['fred', 'eve']]

:relation put triangles

status
OK


In [4]:
?[a, b] := :triangles[a, b]

a,b
alice,eve
bob,alice
charlie,eve
david,george
eve,alice
eve,bob
eve,charlie
fred,alice
fred,eve
george,george


Notice that we used `:relation put` instead of `:relation create`. In fact, you can use `:relation put` before any call to `:relation create`. The `create` op just ensures that the insertion is into a new stored relation.

Now Eve no longer loves Alice and Charlie! Let's reflect this fact by using `retract`

In [5]:
?[] <- [['eve', 'charlie'],
        ['eve', 'alice']]

:relation retract triangles

status
OK


In [6]:
?[a, b] := :triangles[a, b]

a,b
alice,eve
bob,alice
charlie,eve
david,george
eve,bob
fred,alice
fred,eve
george,george


It is OK to retract non-existent facts, in which case the operation does nothing.

You can also reset the whole relation with `rederive`:

In [7]:
?[] <- [['eve', 'charlie'],
        ['eve', 'alice']]

:relation rederive triangles

status
OK


In [8]:
?[a, b] := :triangles[a, b]

a,b
eve,alice
eve,charlie


Only the `rederive`ed tuples remain.

You can see what stored relations you currently have in your database by running the following _system directive_:

In [9]:
:db relations

name,arity
triangles,2


Now this triangles business is becoming tiring. Let's get rid of it:

In [10]:
:db remove relation triangles

status
OK


Since we do not have any queries to run when nuking relations, we use a system directive instead of a query directive. Now you can no longer query the triangles:

In [11]:
?[a, b] := :triangles[a, b]

This completes all the operations on stored relations: `create`, `put`, `retract`, `rederive`. The syntax for `remove` is different from the rest for technical reasons.

All these operations are _atomic_, meaning that for all the tuples they affect, either all are affected at the same time, or the operation completely fails. There is no in-between, corrupted state.

## A schema for data

The stored relation operations introduced above are simple, fast, and very raw. They can be used in exactly the same way as rules defined inline with the query. The way you use them is also not very different than in a traditional SQL database.

Stored relations are suitable for data that has a well-defined structure at the onset, and which is loaded and updated in bulk. For example, you may have obtained from domain experts an [ontology](https://www.wikiwand.com/en/Ontology_\(information_science\)) in the form of a network of metadata. The ontology comes in nice tables with clear, detailed documentation. You store this ontology as a group of stored relations, and use them to extract insights from your business data. The ontology is updated periodically, and when an update comes you just use the `rederive` operation to replace the old version. Very simple and efficient.

But your _business_ data is mostly likely not as simple as that. At this age of BigDataⒸ, you must have one billion active users on your platforms carrying out all sorts of activities, concurrently of course. You don't want these activities to step on each other. You don't want to store the wrong thing into your user's accounts. You _especially_ don't want any money in transit to disappear in midair. To make things worse, hundreds of new activities pop up each day. 

You don't want to store any of those in a stored relation. With a traditional RDBMS, [data migrations](https://en.wikipedia.org/wiki/Data_migration) would have already killed you. And with Cozo, stored relations don't even try to support schema change (in fact, the only 'schema' for a stored relation is its arity).

So let's step back a bit and recap what we want in this case:

* high concurrency;
* fine-grained transactions;
* checks for data integrity;
* ability to rapidly adapt to changing data shapes.

But at what cost? We have to give up something, right? Here are the prices we are willing to pay in this case:

* we restrict most transactions to _local changes_: i.e. they only touch on a tiny fraction of the data;
* we can tolerate levels of indirections.

After we've paid our price we got our solution. It is a very old solution actually: the [triple store](https://en.wikipedia.org/wiki/Triplestore).

The idea is very simple. a _triple_ is a sentence consisting of a subject, a verb, and an object. In the Cozo flavour, the subject is always an opaque identity, such as _entity42_. The following are examples of triples in this sense:

* _entity42_ has first name `'Alice'`.
* _entity42_ has last name `'Liddell'`.
* _entity42_ loves _entity81_.
* _entity81_ is aged `20` years old.

We put schema into triples by schematize the verbs. We see that in our case, the schema for first name and last name should have type strings, the schema for age should have type integers, and the schema for the "loves" relationship should be other entities. Here the types refer to the objects in the triple, since the subject is always an entity.

So let's finally put this into code:

In [12]:
:schema

put person {
    first_name: string index,
    last_name: string index,
    loves: ref many,
    age: int
}

attr_id,op
10000001,assert
10000002,assert
10000003,assert
10000004,assert


Let's explain. The `:schema` at the top indicates that we want to manage the schema instead of run normal queries. We then `put` a _group_ of related schema. Now even though they are declared together similarly to a table definition in SQL, we need to stress that this actually defines four separate, independent attributes named `person.first_name`, `person.last_name`, `person.loves`, `person.age`. An entity can have whatever attributes associated with it, even those with different prefixes.

The allowed types for attributes are:

* `ref`
* `bool`
* `int`
* `float`
* `string`
* `bytes`
* `list`

The list type is heterogeneous in its elements. There is no concept of a nullable type and you can't put `null` into values of triples (other than wrapping them in lists first). To indicate missing values, you simply omit the attribute.

The `ref` type has the special meaning of refering to other entities.

After the type comes one or more _modifiers_. The `many` modifier indicates that `loves` is a to-many relationship. If we omit it, any person can love at most one other person, which is not very realistic.

The modifier `index` indicates that we want values of this attribute to be _indexed_. Only indexed attributes support efficient value lookups and range scans. `ref` types are always implicitly indexed since the database wants to be able to traverse the graph in both directions.

Instead of `index`, we can mark attributes with the modifier `unique`, indicating there cannot be two entities with the same value for the attribute. The value then acts as an _unique identifier_ for the entity, which is convenient in certain circumstances since the entity ID is assigned by the database automatically and you cannot choose it. So let's add an explicit `person.id` attribute, this time using the non-grouped syntax:

In [13]:
:schema

put person.id: string unique;

attr_id,op
10000005,assert


We can see what schema are there in the database now by running a system directive:

In [1]:
:db schema

attr_id,name,type,cardinality,index,history
10000001,person.first_name,string,one,index,False
10000002,person.last_name,string,one,index,False
10000003,person.loves,ref,many,none,False
10000004,person.age,int,one,none,False
10000005,person,string,one,unique,False


## The time machine