research-article

Kronos: the design and implementation of an event ordering service

Authors:
Robert Escriva

Cornell University

Cornell University
View Profile

,
Ayush Dubey

Cornell University

Cornell University
View Profile

,
Bernard Wong

University of Waterloo

University of Waterloo
View Profile

,
Emin Gün Sirer

Cornell University

Cornell University
View Profile

EuroSys '14: Proceedings of the Ninth European Conference on Computer SystemsApril 2014Article No.: 3Pages 1–14https://doi.org/10.1145/2592798.2592822

Published:14 April 2014Publication History

EuroSys '14: Proceedings of the Ninth European Conference on Computer Systems

Pages 1–14

ABSTRACT

This paper proposes a new approach to determining the order of interdependent operations in a distributed system. The key idea behind our approach is to factor the task of tracking happens-before relationships out of components that comprise the system, and to centralize them in a separate event ordering service. This not only simplifies implementation of individual components by freeing them from having to propagate dependence information, but also enables dependence relationships to be maintained across multiple independent systems. A novel API enables the system to detect and take advantage of concurrency whenever possible by maintaining fine-grained information and binding events to a time order as late as possible. We demonstrate the benefits of this approach through several example applications, including a transactional key-value store, and an online graph store. Experiments show that our event ordering service scales well and has low overhead in practice.

References

M. Attariyan and J. Flinn. Using Causality To Diagnose Configuration Bugs. In Proc. of USENIX, Boston, MA, June 2008. Google ScholarDigital Library
K. Audenaert. Clock Trees: Logical Clocks For Programs With Nested Parallelism. In IEEE Transactions on Software Engineering, 23(10), 1997. Google ScholarDigital Library
P. Bailis, A. Fekete, A. Ghodsi, J. M. Hellerstein, and I. Stoica. The Potential Dangers Of Causal Consistency And An Explicit Solution. In Proc. of SoCC, San Jose, CA, Oct. 2012. Google ScholarDigital Library
M. Balakrishnan, D. Malkhi, T. Wobber, M. Wu, V. Prabhakaran, M. Wei, J. D. Davis, S. Rao, T. Zou, and A. Zuck. Tango: Distributed Data Structures Over A Shared Log. In Proc. of SOSP, Farmington, PA, Nov. 2013. Google ScholarDigital Library
K. P. Birman, A. Schiper, and P. Stephenson. Fast Causal Multicast. Cornell University, Technical Report TR90-1105, 1990. Google ScholarDigital Library
K. P. Birman, A. Schiper, and P. Stephenson. Lightweight Causal And Atomic Group Multicast. In ACM ToCS, 9(3), 1991. Google ScholarDigital Library
P. Briggs and L. Torczon. An Efficient Representation For Sparse Sets. In ACM LoPLaS, 2(1-4), 1993. Google ScholarDigital Library
N. Bronson, Z. Amsden, G. Cabrera, P. Chakka, P. Dimov, H. Ding, J. Ferris, A. Giardullo, S. Kulkarni, H. Li, M. Marchukov, D. Petrov, L. Puzar, Y. J. Song, and V. Venkataramani. TAO: Facebooks Distributed Data Store For The Social Graph. In Proc. of USENIX, San Jose, CA, June 2013. Google ScholarDigital Library
M. Burrows. The Chubby Lock Service For Loosely-Coupled Distributed Systems. In Proc. of OSDI, Seattle, WA, Nov. 2006. Google ScholarDigital Library
B. Charron-Bost. Concerning The Size Of Logical Clocks In Distributed Systems. In Information Processing Letters, 39(1), 1991. Google ScholarDigital Library
D. R. Cheriton and D. Skeen. Understanding The Limitations Of Causally And Totally Ordered Communication. In Proc. of SOSP, Asheville, NC, Oct. 1993. Google ScholarDigital Library
G. DeCandia, D. Hastorun, M. Jampani, G. Kakulapati, A. Lakshman, A. Pilchin, S. Sivasubramanian, P. Vosshall, and W. Vogels. Dynamo: Amazon's Highly Available Key-Value Store. In Proc. of SOSP, Stevenson, WA, Oct. 2007. Google ScholarDigital Library
R. A. Elmasri and S. Navathe. Fundamentals Of Database Systems. Addison-Wesley, US, 2010. Google ScholarDigital Library
P. Erdös and A. Rényi. On The Evolution Of Random Graphs. In Mathematical Institute of the Hungarian Academy of Sciences, 5(17--61), 1960.Google Scholar
R. Escriva, B. Wong, and E. G. Sirer. HyperDex: A Distributed, Searchable Key-Value Store. In Proc. of SIGCOMM, Helsinki, Finland, Aug. 2012. Google ScholarDigital Library
A. J. Feldman, W. P. Zeller, M. J. Freedman, and E. W. Felten. SPORC: Group Collaboration Using Untrusted Cloud Resources. In Proc. of OSDI, Vancouver, Canada, Oct. 2010. Google ScholarDigital Library
C. J. Fidge. Logical Time In Distributed Computing Systems. In IEEE Computer, 24(8), 1991. Google ScholarDigital Library
J. E. Gonzalez, Y. Low, H. Gu, D. Bickson, and C. Guestrin. PowerGraph: Distributed Graph-Parallel Computation on Natural Graphs. In Proc. of OSDI, Los Angeles, CA, Oct. 2012. Google ScholarDigital Library
J. Gray and L. Lamport. Consensus On Transaction Commit. Microsoft Research, Technical Report MSR-TR-2003-96, 2004.Google Scholar
P. Hunt, M. Konar, F. P. Junqueira, and B. Reed. ZooKeeper: Wait-Free Coordination For Internet-Scale Systems. In Proc. of USENIX, Boston, MA, June 2010. Google ScholarDigital Library
D. A. Khotimsky. Hierarchical Vector Clock: Scalable Plausible Clock For Detecting Causality In Large Distributed Systems. In Proc. of ICATM, Colmar, France, 1999.Google ScholarCross Ref
L. Lamport. The Part-Time Parliament. In ACM ToCS, 16(2), 1998. Google ScholarDigital Library
L. Lamport. Time, Clocks, And The Ordering Of Events In A Distributed System. In CACM, 21(7), 1978. Google ScholarDigital Library
B. Lampson and H. E. Sturgis. Crash Recovery In A Distributed Storage System. Xerox Parc, Palo Alto, CA, Technical Report, 1976.Google Scholar
W. Lloyd, M. J. Freedman, M. Kaminsky, and D. G. Andersen. Don't Settle For Eventual: Scalable Causal Consistency For Wide-Area Storage With COPS. In Proc. of SOSP, Cascais, Portugal, Oct. 2011. Google ScholarDigital Library
P. Mahajan, S. T. V. Setty, S. Lee, A. Clement, L. Alvisi, M. Dahlin, and M. Walfish. Depot: Cloud Storage With Minimal Trust. In Proc. of OSDI, Vancouver, Canada, Oct. 2010. Google ScholarDigital Library
G. Malewicz, M. H. Austern, A. J. C. Bik, J. C. Dehnert, I. Horn, N. Leiser, and G. Czajkowski. Pregel: A System For Large-Scale Graph Processing. In Proc. of SIGMOD, Indianapolis, IN, June 2010. Google ScholarDigital Library
F. Mattern. Virtual Time And Global States Of Distributed Systems. In Proc. of PDA Workshop, Chateau de Bonas, France, Oct. 1989.Google Scholar
J. J. McAuley and J. Leskovec. Learning To Discover Social Circles In Ego Networks. In Proc. of NIPS, Lake Tahoe, CA, Dec. 2012.Google Scholar
E. B. Nightingale, K. Veeraraghavan, P. M. Chen, and J. Flinn. Rethink The Sync. In Proc. of OSDI, Seattle, WA, Nov. 2006. Google ScholarDigital Library
B. M. Oki and B. Liskov. Viewstamped Replication: A General Primary Copy. In Proc. of PODC, Toronto, Canada, Aug. 1988.Google ScholarDigital Library
D. Peng and F. Dabek. Large-Scale Incremental Processing Using Distributed Transactions And Notifications. In Proc. of OSDI, Vancouver, Canada, Oct. 2010. Google ScholarDigital Library
A. Roy, I. Mihailovic, and W. Zwaenepoel. X-Stream: Edge-Centric Graph Processing Using Streaming Partitions. In Proc. of SOSP, Farmington, PA, Nov. 2013. Google ScholarDigital Library
D. Skeen and M. Stonebraker. A Formal Model Of Crash Recovery In A Distributed System. In IEEE Transactions on Software Engineering, 9(3), 1983. Google ScholarDigital Library
J. Terrace and M. J. Freedman. Object Storage On CRAQ: High-Throughput Chain Replication For Read-Mostly Workloads. In Proc. of USENIX, San Diego, CA, June 2009. Google ScholarDigital Library
D. B. Terry, M. Theimer, K. Petersen, A. J. Demers, M. Spreitzer, and C. Hauser. Managing Update Conflicts In Bayou, A Weakly Connected Replicated Storage System. In Proc. of SOSP, Copper Mountain, CO, Dec. 1995. Google ScholarDigital Library
Titan Distributed Graph Database. http://thinkaurelius.github.io/titan/.Google Scholar
F. J. Torres-Rojas and M. Ahamad. Plausible Clocks: Constant Size Logical Clocks For Distributed Systems. In Distributed Computing, 12(4), 1999. Google ScholarDigital Library
J. Ugander and L. Backstrom. Balanced Label Propagation For Partitioning Massive Graphs. In Proc. of WSDM, Rome, Italy, Feb. 2013. Google ScholarDigital Library
R. van Renesse and F. B. Schneider. Chain Replication For Supporting High Throughput And Availability. In Proc. of OSDI, San Francisco, CA, Dec. 2004. Google ScholarDigital Library

Recommendations

WFR-TM

Transactional Memory (TM) is a promising concurrent programming paradigm which employs transactions to achieve synchronization in accessing common data known as transactional variables. A transaction may either commit, making its updates to ...
Read More
Wait-n-GoTM: improving HTM performance by serializing cyclic dependencies
ASPLOS '13

Transactional memory (TM) has been proposed to alleviate some key programmability problems in chip multiprocessors. Most TMs optimistically allow concurrent transactions, detecting read-write or write-write conflicts. Upon conflicts, existing hardware ...
Read More
Wait-n-GoTM: improving HTM performance by serializing cyclic dependencies
ASPLOS '13: Proceedings of the eighteenth international conference on Architectural support for programming languages and operating systems

Transactional memory (TM) has been proposed to alleviate some key programmability problems in chip multiprocessors. Most TMs optimistically allow concurrent transactions, detecting read-write or write-write conflicts. Upon conflicts, existing hardware ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
EuroSys '14: Proceedings of the Ninth European Conference on Computer Systems
April 2014
388 pages
ISBN:9781450327046
DOI:10.1145/2592798
General Chairs:
Dick Bultermann
CWI
,
Herbert Bos
Vrije Universiteit Amsterdam
,
Program Chairs:
Ant Rowstron
Microsoft Research Cambridge
,
Peter Druschel
MPI SWS
Copyright © 2014 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 14 April 2014
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Qualifiers
- research-article
Conference

Acceptance Rates
EuroSys '14 Paper Acceptance Rate27of147submissions,18%Overall Acceptance Rate241of1,308submissions,18%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 14
  Total Citations
  View Citations
- 454
  Total Downloads
- Downloads (Last 12 months)20
- Downloads (Last 6 weeks)5
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Kronos: the design and implementation of an event ordering service

EuroSys '14: Proceedings of the Ninth European Conference on Computer Systems

ABSTRACT

References

Cited By

Recommendations

WFR-TM

Wait-n-GoTM: improving HTM performance by serializing cyclic dependencies

Wait-n-GoTM: improving HTM performance by serializing cyclic dependencies

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Kronos: the design and implementation of an event ordering service

EuroSys '14: Proceedings of the Ninth European Conference on Computer Systems

ABSTRACT

References

Cited By

Recommendations

WFR-TM

Wait-n-GoTM: improving HTM performance by serializing cyclic dependencies

Wait-n-GoTM: improving HTM performance by serializing cyclic dependencies

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media