May 1, 2006

Fault Tolerant Transactional Replication

In 2004, hardware was becoming cheap enough to make building high performance databases from commodity parts very attractive. However, the reliability was poor, making the construction of highly available clusters a pressing and challenging problem. Working with friends and colleagues Ben Rousseau, Rick Tompkins, and Gus Bjorklund over three years, we developed an approach for creating a distributed highly available database system.

We built our solution by creating middleware between the application and database that leveraged the Paxos consensus algorithm and total-ordered communications to achieve good performance (better than two-phase commit) and full transactional ACID guarantees. I authored this paper to describe our work.

Filed under Hacking