Using Paxos to Build a Scalable, Consistent, and Highly Available Datastore

被引:65
作者
Rao, Jun [2 ]
Shekita, Eugene J. [1 ]
Tata, Sandeep [1 ]
机构
[1] IBM Almaden Res Ctr, San Jose, CA 95192 USA
[2] LinkedIn Corp, Mountain View, CA 94035 USA
来源
PROCEEDINGS OF THE VLDB ENDOWMENT | 2011年 / 4卷 / 04期
关键词
D O I
10.14778/1938545.1938549
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Spinnaker is an experimental datastore that is designed to run on a large cluster of commodity servers in a single data-center. It features key-based range partitioning, 3-way replication, and a transactional get-put API with the option to choose either strong or timeline consistency on reads. This paper describes Spinnaker's Paxos-based replication protocol. The use of Paxos ensures that a data partition in Spinnaker will be available for reads and writes as long a majority of its replicas are alive. Unlike traditional master-slave replication, this is true regardless of the failure sequence that occurs. We show that Paxos replication can be competitive with alternatives that provide weaker consistency guarantees. Compared to an eventually consistent datastore, we show that Spinnaker can be as fast or even faster on reads and only 5% to 10% slower on writes.
引用
收藏
页码:243 / 254
页数:12
相关论文
共 24 条
[1]   Sinfonia: A New Paradigm for Building Scalable Distributed Systems [J].
Aguilera, Marcos K. ;
Merchant, Arif ;
Shah, Mehul ;
Veitch, Alistair ;
Karamanolis, Christos .
ACM TRANSACTIONS ON COMPUTER SYSTEMS, 2009, 27 (03)
[2]  
Andersen DG, 2009, SOSP'09: PROCEEDINGS OF THE TWENTY-SECOND ACM SIGOPS SYMPOSIUM ON OPERATING SYSTEMS PRINCIPLES, P1
[3]  
Bairavasundaram LN, 2008, PROCEEDINGS OF THE 6TH USENIX CONFERENCE ON FILE AND STORAGE TECHNOLOGIES (FAST '08), P223
[4]  
Brewer E., 2000, PROC ACM S PRINCIPLE, P7
[5]  
Campbell D., 2010, SIGMOD, P1021
[6]  
Cecchet E., 2008, P 2008 ACM SIGMOD IN, P739, DOI DOI 10.1145/1376616.1376691
[7]  
Chandra T, 2007, PODC'07: PROCEEDINGS OF THE 26TH ANNUAL ACM SYMPOSIUM ON PRINCIPLES OF DISTRIBUTED COMPUTING, P398
[8]  
Chang F, 2006, USENIX ASSOCIATION 7TH USENIX SYMPOSIUM ON OPERATING SYSTEMS DESIGN AND IMPLEMENTATION, P205
[9]  
Cooper BF, 2008, PROC VLDB ENDOW, V1, P1277
[10]  
DeCandia Giuseppe, 2007, Operating Systems Review, V41, P205, DOI 10.1145/1323293.1294281