README

2 years ago · bae26ee90e
parent 1feaac46a4
commit bae26ee90e
1 changed files with 15 additions and 1 deletions
--- a/README.md
+++ b/README.md
@ -1,6 +1,20 @@
 # The Cozo Database

-Cozo is a graph-focused relational database designed by and for data hackers. It's free and open-source software.
+Cozo is an experimental, relational database that has a focus on graph data, with support for ACID transactions. It aims to implement a Purer, Better relational algebra without the historical baggage of SQL.
+
+## Motivations
+
+The so-called "NoSQL" movement in recent years brought forth a plethora of databases that try to introduce new data paradigms and revolutionize the industry. However, almost all the so-called "new" paradigms, in particular, the document paradigm, the (entity-relationship) graph model paradigm, and the key-value paradigm, actually predate the invention of the relational model. There is nothing wrong _per se_ with recycling old ideas, as changing circumstances can make previously infeasible solutions viable. However, since the historical development is deliberately obscured (with understandable business motivations), many users and even implementers fail to understand why relational databases became the standard in the first place, and do not have a clear picture of the strengths and weaknesses of the new databases. Suboptimal systems result. It is inevitable but still mildly amusing that even the name "NoSQL" was later reinterpreted to become "Not Only SQL".
+
+So what is essential about these relational databases that has earned them such a firm position in the industry? Looking at the history of ideas accompanying the emergence of the relational systems, the answer is obvious: relational algebra. This intuitive, idealized mathematical model of data is powerful and elegant because it is an _algebra_, in particular, because it has the _closure property_ of algebras: operations on relations still produce relations. Thus, relations become a generic interface for data: once stored in the relational form, the data can be subjected to _all_ of the allowed transformations, and these can be nested or even applied recursively. An important consequence of this power and flexibility is that you do not need to foresee every eventual use of the data and only need to store data in a canonical, business-logic-agnostic form (think of the "normal forms" and all the theory behind them). Of course, in real situations it is impossible to uphold this principle in every case, mainly due to performance constraints, but that's the general spirit of relational databases: any data that you care to put into your persistent storage are probably going to outlive current your business logic by a huge margin.
+
+But the NoSQL movement did occur, and with good reasons: relational databases fail in some ways. Every person has perhaps their own list of perceived shortcomings of relational databases, such as (the old relational systems') inability of dealing with the Big Data that comes with the explosion of the Internet. One of them is particularly unfortunate, however: the claim that relational databases are just bad with graph data. This accusation is particularly acute in the age of social networks. However, "graphs", "networks" and "relationships" are kind of synonyms, and "relational" is even in the name of relational algebra! In fact, relational algebra itself is perfectly capable of dealing with graph structures, and with recursion introduced, traditional relational databases can be no less powerful than dedicated graph databases.
+
+If relational algebra itself is not a real obstacle, why are many graph databases "going beyond" it, and in the process throwing away the closure property, which in practice makes the data stored much harder to use beyond the business logic originally envisioned? We think SQL is to blame. The syntax is kind of backward (it really logically should be "FROM-WHERE-SELECT" rather than the traditional "SELECT-FROM-WHERE", both humans and auto-completions have to mentally reorder as a consequence), inline nesting is hard to read and has corner cases (certain types of "correlated queries" which in fact cannot be expressed in relational algebra), common table expressions are clunky and escalate quickly to unreadability when recursion is thrown in. And nesting, joins, and recursion are essential for graphs. In this day, using SQL for querying graphs feels like using FORTRAN for scripting webpages.
+
+Datalog is a solution ...
+
+Commercial systems are averse to breaking SQL compatibility ...

 ## Another database?!