"source": "In fact, this is doable, but the resulting system is a total pain to use. First, you will need to _reify_ most of your values. Instead of saying that `[bob person.name 'Bob']`, you need something like `[bob person.used_name name]`, where `[name name.is_spelled 'Bob']` and `[name name.is_valid_at '2020-03-04']`, etc. Next, how are you going to find our what everything was at a particular moment? You cannot use equality conditions to filter entities based on `is_valid_at`, since something that was introduced in 1999 is still valid in 2020, _unless_ some other fact supercedes it or it was retracted _after_ 1999. And we are only after the latest valid fact, not all historical facts at a point in time. Fulfilling these requirements _is_ possible in Cozo with aggregations, but they add a huge amount of complexities to the queries for something that was intuitively very simple.",
"source": "In fact, this is doable, but the resulting system is a total pain to use. First, you will need to _reify_ most of your values. Instead of saying that `[bob person.name 'Bob']`, you need something like `[bob person.used_name name]`, where `[name name.is_spelled 'Bob']` and `[name name.is_valid_at '2020-03-04']`, etc. Next, how are you going to find our what everything was at a particular moment? You cannot use equality conditions to filter entities based on `is_valid_at`, since something that was introduced in 1999 is still valid in 2020, _unless_ some other fact supercedes it or it was retracted _after_ 1999. And we are only after the latest valid fact, not all historical facts at a point in time. Fulfilling these requirements _is_ possible in Cozo with aggregations, but they necessitate a huge amount of complexities for even the simplest queries.",
"metadata": {}
},
{
"cell_type": "markdown",
"source": "To solve this particular problem, which occurs more common than you might think, Cozo has built-in support for historical facts. This functionality carries a non-trivial performance penalty, so you have to request it explicitly for each attribute. And like other properties of attributes, whether it has history support is immutable. If you change your mind, you need to define a new attribute and copy data over, as usual.\n\nIf you are already worried about performance, let's assure you that it is MUCH MORE performant than the hand-rolled solution indicated above. In fact, querying a history-enabled attribute is about $c \\log n$ times slower than the corresponding query for a non-history-enabled attribute, where $c$ is a constant and $n$ is the number of historical facts a given entity-attribute pair has. The logarithmic complexity beats any simple-minded implementation.",
"source": "To solve this particular problem, which occurs more commonly than you might think, Cozo has built-in support for historical facts. This functionality carries a non-trivial performance penalty, so you have to request it explicitly for each attribute. And like other properties of attributes, whether it has history support is immutable. If you later change your mind, you need to define a new attribute and copy data over, as usual.\n\nIf you are already worried about performance, let's assure you that Cozo's historical facts implementation is MUCH MORE performant than the hand-rolled solution indicated above. In fact, querying a history-enabled attribute is on average $c \\log n$ times slower than the corresponding query for a non-history-enabled attribute, where $c$ is a small constant and $n$ is the number of historical facts a given entity-attribute pair has. The logarithmic complexity beats any simple-minded implementation, especially when the amount of historical records is enormous.",
"source": "For simplicity we assumed a country's name does not change, but obviously its head of state will change, indicated by the modifier `history`. That's actually all you need for the schema.",
"source": "For simplicity we assume that a country's name does not change, but obviously its head of state changes every few years, indicated by the modifier `history`. That's all you need for the schema.",
"metadata": {}
},
{
"cell_type": "markdown",
"source": "Now let's insert some data. You can actually insert data as you do before:",
"source": "The syntax should explain itself. You can specify the date in ISO 8601 format, in which case it is interpreted as a timestamp at the stated date at midnight UTC, or as RFC 3339 format such as `'1996-12-19T16:39:57-08:00'`, or as an integer indicating the number of _microseconds_ since the UNIX epoch (negative numbers for before). Let's see who are the heads of states _now_:",
"source": "The syntax should explain itself. You can specify the date in ISO 8601 format, in which case it is interpreted as a timestamp at the stated date at midnight UTC, or as RFC 3339 format such as `'1996-12-19T16:39:57-08:00'`, or as an integer indicating the number of _microseconds_ since the UNIX epoch (negative numbers for before the epoch). The validity marker only affects attributes that were defined with the `history` modifier.\n\nLet's see who are the heads of states _now_:",
"source": "Umm ... that doesn't look right. The problem is, when we inserted facts about Biden and Truss, we did not tell the system when that fact starts being valid, so the system assumes the current timestamp. Let's fix that:",
"source": "Umm ... that doesn't look right. The problem is, when we inserted facts about Biden and Truss, we did not tell the system when that fact starts being valid, so the system assumes the current timestamp. If you are inserting facts in real time, this is what you want. But if you are inserting historical facts as we are doing here, or are doing catch-ups, this causes problems. In our case the fix is easy:",
"source": "Wow, that can't happen no matter what the world is coming to. We fix that by _retracting_ facts as before, but with a timestamp attached (we will use a _very_ generous timestamp):",
"source": "Wow, that can't happen no matter what the world is coming to. We fix that by _retracting_ facts as before, but with a timestamp attached (we will use a _very_ generous timestamp for them):",
"source": "A final API before we are done with this time-travelling thing.",
"source": "A final API before we are done with this time-travelling thing. If you want a record of the actual history of attributes for a certain entity instead of its time slices, use this system op:",
"metadata": {}
},
{
"cell_type": "code",
"source": ":db history for ['country.name', 'UK'], ['country.name', 'US'] : country.name, country.head",
"source": "We have used a unique key to identify the entity. You can of course use the entity ID itself. The time ordering within each entity-attribute pair is reverse chronological.",
"metadata": {}
},
{
"cell_type": "markdown",
"source": "Restricting the range of time for the query is also possible:",
"metadata": {}
},
{
"cell_type": "code",
"source": ":db history from '2020-01-01' to '2022-01-01' for ['country.name', 'UK'], ['country.name', 'US'] : country.name, country.head",
"source": "Note that even though the UK had a head of state in this period, it is not included in the output since its _assertions_ lies outside the time range. This API is only meant for administrative purposes. For general queries, use Datalog queries instead.",
"metadata": {}
},
{
"cell_type": "markdown",
"source": "As we have seen, for attributes with history, retraction does not really remove the data from the database. If you are e.g. legally required to make sure a piece of data is physically gone, retract with exactly the same timestamp as the piece of data originally had. In this case it is recommended to use the integer form of the timestamp. You won't be able to retrieve the data with the public API after the retraction, but some traces of the data may still persist in write ahead logs and other places. Complete eradication may take an unspecified amount of time. That is, if you did not have any backups set up yourself (GASP). Yes, absolute elimination of data is difficult and uncertain.",