Skip to content

Instantly share code, notes, and snippets.

@tgk
Created May 14, 2014 20:55
Show Gist options
  • Save tgk/bedd9778ae5ab7804b6f to your computer and use it in GitHub Desktop.
Save tgk/bedd9778ae5ab7804b6f to your computer and use it in GitHub Desktop.
An attempt at understanding Quilt
;; These notes describe the snapshots of a shared state between
;; pariticipants in a system co-ordinated using CRDT methods, hopefully
;; matching the Quilt notes from @cemerick at
;; http://writings.quilt.org/2014/05/12/distributed-systems-and-the-end-of-the-api/
;; The ambition is to built systems where network failures, replays
;; etc. can be safely ignored (instead of just being ignored).
;; Maintaining the "timestamps" for the append only database is the only
;; thing I feel I don't grok. There are some notes at the end of the
;; gist. Hoping for feedback or pointers!
;; In a CRDT co-ordinated system, information is only added. We start
;; out with an empty shared state
{}
;; A pariticipant adds a facts to the database at time 1
{:db {1 '(1 2 4)}}
;; Some other participant adds a value to the database. We now have a
;; database with the time 2.
{:db {1 '(1 2 4)
2 '(1 2 4 5)}}
;; A participant has a copy of the shared state and knows that there is
;; a version available with the time-tag 1. It can therefore add a query
;; (idempotentily) to the store
{:db {1 '(1 2 4)
2 '(1 2 4 5)}
:qs #{{:db/version 1, :expr "/sum"}}}
;; A process able to fulfil the query picks it up, and adds the result
;; of the query back into the shared store (again, idempotently).
{:db {1 '(1 2 4)
2 '(1 2 4 5)}
:qs #{{:db/version 1, :expr "/sum"}}
:rs {{:db/version 1, :expr "/sum"} 7}}
;; Removing values from the database, again, yields a new value
{:db {1 '(1 2 4)
2 '(1 2 4 5)
3 '(2 4 5)}
:qs #{{:db/version 1, :expr "/sum"}}
:rs {{:db/version 1, :expr "/sum"} 7}}
;; I'm not entirely sure how database versions are co-ordinated. It
;; could be vector clocks, it could be sha's of the data in the
;; database, or it could be the events sourced for the db. Any feedback
;; on this aspect is more than welcome.
@tgk
Copy link
Author

tgk commented May 16, 2014

Thanks for all the feedback. It is extremely informative.

The explicit model of the entire database in my example is a bit misleading. Possibly an ordered list of available database transactions would make more sense (think Datomic). The important bit is that there is a shared knowledge of the data available. Actually responding to queries marked with a given transaction (or "version" of the database) does not need to look in the shared state. I'm pretty sure that's what you are suggesting in your response(?). Adding transactions is an add-only operation. For even more fine-grained queries, the version (sha or other) of the service intended to answer the query can be added to the request, along with the transaction. This should also help in defining idempotent operations.

I'll definitely look into the "only-once" aspect, but I don't really see it as a major problem for query-only services. Of course, you don't want it for other types of interaction with the world (fire missiles etc) but that part of the system could possibly be handled by some slower co-ordination service outside the core.

Once again, thanks for all the feedback. It's very refreshing to see a different take on co-ordination between processes. I'm very excited to see what shape splice will have.

@cemerick
Copy link

Yup, pairing CRDTs with various sorts of consensus mechanisms with different guarantees is a thing, and is a must for enabling general-purpose distributed computation (including interacting with the "real world").

One quibble (sorry, can't help it :-P):

Local reads — including all sorts of possible query mechanisms — are a core part of the construction of CRDTs. They're entirely orthogonal to any update or write paths/mechanisms, and so are readily scalable and tolerant of e.g. isolation from peers to which you might otherwise replicate writes. There may very well be query services that you might interact with via a CRDT (especially if you characterize any transformation of the shared data as a query!), but queries over the CRDT's actual state (e.g. the sum example) should always be available locally.

@tgk
Copy link
Author

tgk commented May 16, 2014

Alright, thanks for clearing that up for me. I shan't disturb your quest further :-)

@cemerick
Copy link

Not a disturbance at all. :-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment