On Distributed Consistency - Part 4 - Multi Data Center

Apr 12 • Posted 4 years ago

See also:

Eventual consistency makes multi-data center data storage easier.  There are reasons eventual consistency is helpful for multi-data center that are unrelated to availability and CAP.  And as mentioned in Part 3, some common types of network partitions, such as loss of an entire data center, are actually trivial network partitions and may not even effect availability anyway.

Here are a few architectures for multi-data center data storage:

  • DR
  • Single Region
  • Local reads, remote writes
  • Intelligent Homing
  • Eventual consistency

DR

By DR we mean a traditional disaster recovery / business continuity architecture.  It’s pretty simple: we serve everything from one data center, with replication to a secondary facility that is offline.  In a failure we cut over.

Availability can be quite high in this model as on any issue with the first data center, including internal network partitions, we cut over, and with the whole first data center disabled, the partition is trivial.

This model works fine with strong consistency.

Multi Data Center, Single Region

This option is analogous to using multiple data centers within a single region.  Amazon and DoubleClick have used this scheme in the past.  We have multiple data centers, physically separated, but all within one region (such as the Northwest).  The latency between data centers is then reasonable: if we stay within a 150 mile radius, we can have round trip times of around 5ms.  We might have a fiber ring among say, 3 or 4 data centers.  As the latency is reasonable, for many problems, a WAN operation here is fine.  With a ring topology, a non-trivial network partition is unlikely.

Single region is useful both for strong consistent and eventually consistent architectures.  With a Dynamo style product, when N=W or N=R, this is a good option, as otherwise when using multiple data centers we will have a long wait time to confirm remote writes.

Local Reads, Remote Writes

For read-heavy use cases, this is a good option.  Here we read eventually consistent data (easy with most database products including RDBMS systems) but do all writes back to the master facility over the WAN.  A dynamo style system in multiple data centers with a very high W value and low R value can be thought of this way also.

This pattern would work great for traditional content management: publishing is infrequent and reading is very frequent.

Using a Content Delivery Network (CDN), with a centralized origin web site serving dynamic content, is another example.

Intelligent Homing

We discussed “Intelligent Homing” a bit in Part 3.  The idea is to store the master copy of a given data entity near its user.

This model works quite well if data correlates with the user, such as the user’s profile, inbox, etc.

We have fast locally confirmed writes.  If a data center goes completely down, we could still fail over master status to somewhere else which has a replica.

Eventual consistency

Many-writer eventual consistency gives us two benefits with multiple data centers:

  • higher availability in the face of network outages;
  • fast locally confirmed writes

In the diagram below, a client of a dynamo-style system writes the data to four servers (N=4).  However, it only awaits confirmation of the writes from two servers in its local data center, to keep write confirmation latency low.

Note however that if R+W > N, we can’t have both fast local reads and writes at the same time if all the data centers are equal peers.

Combinations

Combinations often make sense.  For example, it’s common to mix DR and Read Local Write Remote.