On Distributed Consistency - Part 5 - Many Writer Eventual Consistency

Apr 14 • Posted 4 years ago

See also:

In part 2 we primarily discussed “single writer” eventual consistency.  Here we will discuss many-writer, and define that term more precisely.

By many-writer, we mean a system where different data servers can receive writes concurrently (and asynchronously).  Examples of many-writer eventually consistent systems include

  • Amazon Dynamo
  • CouchDB master-master replication

With multi-writer eventual consistency, we need to address the phenomenon of conflicting writes. Writes to two servers at the same time may be updates for the same object.  We must resolve the conflict in a way that is acceptable for the use case in question.  Some ways to do this are:

  • last write wins
  • programmatic merge
  • commutative operations

Last Write Wins

Last write wins is a popular default in many systems. If we receive an operation that is older, we simply ignore it.  In a distributed system the definition of “last” is hard as clocks can’t be perfectly synchronized.  Thus many systems use vector clocks.

Inserts

Surprisingly, a traditional insert operation is tricky with many writers. Consider these operations performed at about the same time at different servers:

op1: insert( { _id : 'joe', age : 30 } )
op2: insert( { _id : 'joe', age : 33 } )

If we naively apply these two operations in any order, we get an inconsistent result.  insert typically means:

if( !already_exists(x._id) ) then set( x );

However, with eventual consistency we do not have real-time global state.  Checking already_exists() is thus hard.

The best solution is to not support insert, but rather set() - i.e. “set a new value”.  Sometimes this is called an upsert.  Then, if we have last-write-wins semantics, everything is fine.

Deletes

Deletes require special handling in cases of object rebirth.  Consider this sequence:

op1: set( { _id : 'joe', age : 40 } }
op2: delete( { _id : 'joe' } )
op3: set( { _id : 'joe', age : 33 } )

If op2 and op3 are reversed in execution order, we would have a problem.  Thus we need to remember the delete for a while, and apply last-operation-wins semantics.  Some products call the remembrance of the delete a tombstone.

Updates

Updates have a similar issue as insert, so for updates, we use the set() operation we described above instead.

Note that partial object updates can be tricky to replicate efficiently.  Consider a set() operation where we wish to update a single field:

  update users set age=40 where _id=’joe’

This is no problem with eventual consistency if we replicate a full copy of the object.  However, what if the user object was 1MB in size?  It would be really nice to just send the new age field and the _id, rather than the whole object.  However, this is difficult.  Consider:

op1: update users set age=40 where _id='joe'
op2: update users set state='ca' where _id='joe'

Wen can’t simply replicate the partial update and use last-write-wins; the database will need more sophistication to handle this efficiently.

Programmatic Merge

Last-write-wins is great, but is not always sufficient.  Having the client application resolve the conflict via a merge is a fine alternative.  Let’s consider an example mentioned in the Amazon Dynamo paper: manipulations of shopping carts.  With eventual consistency it would not be safe to do something like:

update cart set this[our_sku].qty=1 where _id='joe'

If there are multiple manipulations of the cart, some may get lost using last-write-wins.  Instead, the Dynamo paper talks of storing the operations in the cart object, rather than the actual data state.  We could store something like:

update cart append { time : now(), op : 'addToCart', sku : our_sku, qty : 1 }
  where _id='joe'

When a conflict occurs, cart objects can be merged.  We do not lose any operations.  When it is time to check out, we replay all the operations, which might include quantity adjustments and removes from cart.  After replay we have the final cart state.

Note the above example uses a timestamp field — in a real system a vector clock might be used to order the operations in the cart.

It’s interesting to note that not only have we avoided conflicts, we are also able to do operations where atomicity would be required.

Commutative Operations

If all operations are commutative (more precisely, foldable), we will never have any conflicts.  Operations can simply be applied in any order, and the result is the same.  For example:

// x starts as { }
x.increment('a', 1);
x.increment('a', 3);
x.addToSet('b', 'foo');
x.addToSet('b', 'bar');
result: { a : 4, b : {bar,foo} }

// x starts as { }
x.addToSet('b', 'bar');
x.increment('a', 3);
x.increment('a', 1);
x.addToSet('b', 'foo');
result: { a : 4, b : {bar,foo} }

Note however that composition of addToSet and increment would not be foldable; thus, we have to use only one or the other for a particular field of the object.