skip to main content
10.1145/2592798.2592822acmconferencesArticle/Chapter ViewAbstractPublication PageseurosysConference Proceedingsconference-collections
research-article

Kronos: the design and implementation of an event ordering service

Published:14 April 2014Publication History

ABSTRACT

This paper proposes a new approach to determining the order of interdependent operations in a distributed system. The key idea behind our approach is to factor the task of tracking happens-before relationships out of components that comprise the system, and to centralize them in a separate event ordering service. This not only simplifies implementation of individual components by freeing them from having to propagate dependence information, but also enables dependence relationships to be maintained across multiple independent systems. A novel API enables the system to detect and take advantage of concurrency whenever possible by maintaining fine-grained information and binding events to a time order as late as possible. We demonstrate the benefits of this approach through several example applications, including a transactional key-value store, and an online graph store. Experiments show that our event ordering service scales well and has low overhead in practice.

References

  1. M. Attariyan and J. Flinn. Using Causality To Diagnose Configuration Bugs. In Proc. of USENIX, Boston, MA, June 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. K. Audenaert. Clock Trees: Logical Clocks For Programs With Nested Parallelism. In IEEE Transactions on Software Engineering, 23(10), 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. P. Bailis, A. Fekete, A. Ghodsi, J. M. Hellerstein, and I. Stoica. The Potential Dangers Of Causal Consistency And An Explicit Solution. In Proc. of SoCC, San Jose, CA, Oct. 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. M. Balakrishnan, D. Malkhi, T. Wobber, M. Wu, V. Prabhakaran, M. Wei, J. D. Davis, S. Rao, T. Zou, and A. Zuck. Tango: Distributed Data Structures Over A Shared Log. In Proc. of SOSP, Farmington, PA, Nov. 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. K. P. Birman, A. Schiper, and P. Stephenson. Fast Causal Multicast. Cornell University, Technical Report TR90-1105, 1990. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. K. P. Birman, A. Schiper, and P. Stephenson. Lightweight Causal And Atomic Group Multicast. In ACM ToCS, 9(3), 1991. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. P. Briggs and L. Torczon. An Efficient Representation For Sparse Sets. In ACM LoPLaS, 2(1-4), 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. N. Bronson, Z. Amsden, G. Cabrera, P. Chakka, P. Dimov, H. Ding, J. Ferris, A. Giardullo, S. Kulkarni, H. Li, M. Marchukov, D. Petrov, L. Puzar, Y. J. Song, and V. Venkataramani. TAO: Facebooks Distributed Data Store For The Social Graph. In Proc. of USENIX, San Jose, CA, June 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. M. Burrows. The Chubby Lock Service For Loosely-Coupled Distributed Systems. In Proc. of OSDI, Seattle, WA, Nov. 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. B. Charron-Bost. Concerning The Size Of Logical Clocks In Distributed Systems. In Information Processing Letters, 39(1), 1991. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. D. R. Cheriton and D. Skeen. Understanding The Limitations Of Causally And Totally Ordered Communication. In Proc. of SOSP, Asheville, NC, Oct. 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. G. DeCandia, D. Hastorun, M. Jampani, G. Kakulapati, A. Lakshman, A. Pilchin, S. Sivasubramanian, P. Vosshall, and W. Vogels. Dynamo: Amazon's Highly Available Key-Value Store. In Proc. of SOSP, Stevenson, WA, Oct. 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. R. A. Elmasri and S. Navathe. Fundamentals Of Database Systems. Addison-Wesley, US, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. P. Erdös and A. Rényi. On The Evolution Of Random Graphs. In Mathematical Institute of the Hungarian Academy of Sciences, 5(17--61), 1960.Google ScholarGoogle Scholar
  15. R. Escriva, B. Wong, and E. G. Sirer. HyperDex: A Distributed, Searchable Key-Value Store. In Proc. of SIGCOMM, Helsinki, Finland, Aug. 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. A. J. Feldman, W. P. Zeller, M. J. Freedman, and E. W. Felten. SPORC: Group Collaboration Using Untrusted Cloud Resources. In Proc. of OSDI, Vancouver, Canada, Oct. 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. C. J. Fidge. Logical Time In Distributed Computing Systems. In IEEE Computer, 24(8), 1991. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. J. E. Gonzalez, Y. Low, H. Gu, D. Bickson, and C. Guestrin. PowerGraph: Distributed Graph-Parallel Computation on Natural Graphs. In Proc. of OSDI, Los Angeles, CA, Oct. 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. J. Gray and L. Lamport. Consensus On Transaction Commit. Microsoft Research, Technical Report MSR-TR-2003-96, 2004.Google ScholarGoogle Scholar
  20. P. Hunt, M. Konar, F. P. Junqueira, and B. Reed. ZooKeeper: Wait-Free Coordination For Internet-Scale Systems. In Proc. of USENIX, Boston, MA, June 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. D. A. Khotimsky. Hierarchical Vector Clock: Scalable Plausible Clock For Detecting Causality In Large Distributed Systems. In Proc. of ICATM, Colmar, France, 1999.Google ScholarGoogle ScholarCross RefCross Ref
  22. L. Lamport. The Part-Time Parliament. In ACM ToCS, 16(2), 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. L. Lamport. Time, Clocks, And The Ordering Of Events In A Distributed System. In CACM, 21(7), 1978. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. B. Lampson and H. E. Sturgis. Crash Recovery In A Distributed Storage System. Xerox Parc, Palo Alto, CA, Technical Report, 1976.Google ScholarGoogle Scholar
  25. W. Lloyd, M. J. Freedman, M. Kaminsky, and D. G. Andersen. Don't Settle For Eventual: Scalable Causal Consistency For Wide-Area Storage With COPS. In Proc. of SOSP, Cascais, Portugal, Oct. 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. P. Mahajan, S. T. V. Setty, S. Lee, A. Clement, L. Alvisi, M. Dahlin, and M. Walfish. Depot: Cloud Storage With Minimal Trust. In Proc. of OSDI, Vancouver, Canada, Oct. 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. G. Malewicz, M. H. Austern, A. J. C. Bik, J. C. Dehnert, I. Horn, N. Leiser, and G. Czajkowski. Pregel: A System For Large-Scale Graph Processing. In Proc. of SIGMOD, Indianapolis, IN, June 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. F. Mattern. Virtual Time And Global States Of Distributed Systems. In Proc. of PDA Workshop, Chateau de Bonas, France, Oct. 1989.Google ScholarGoogle Scholar
  29. J. J. McAuley and J. Leskovec. Learning To Discover Social Circles In Ego Networks. In Proc. of NIPS, Lake Tahoe, CA, Dec. 2012.Google ScholarGoogle Scholar
  30. E. B. Nightingale, K. Veeraraghavan, P. M. Chen, and J. Flinn. Rethink The Sync. In Proc. of OSDI, Seattle, WA, Nov. 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. B. M. Oki and B. Liskov. Viewstamped Replication: A General Primary Copy. In Proc. of PODC, Toronto, Canada, Aug. 1988.Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. D. Peng and F. Dabek. Large-Scale Incremental Processing Using Distributed Transactions And Notifications. In Proc. of OSDI, Vancouver, Canada, Oct. 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. A. Roy, I. Mihailovic, and W. Zwaenepoel. X-Stream: Edge-Centric Graph Processing Using Streaming Partitions. In Proc. of SOSP, Farmington, PA, Nov. 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. D. Skeen and M. Stonebraker. A Formal Model Of Crash Recovery In A Distributed System. In IEEE Transactions on Software Engineering, 9(3), 1983. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. J. Terrace and M. J. Freedman. Object Storage On CRAQ: High-Throughput Chain Replication For Read-Mostly Workloads. In Proc. of USENIX, San Diego, CA, June 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. D. B. Terry, M. Theimer, K. Petersen, A. J. Demers, M. Spreitzer, and C. Hauser. Managing Update Conflicts In Bayou, A Weakly Connected Replicated Storage System. In Proc. of SOSP, Copper Mountain, CO, Dec. 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Titan Distributed Graph Database. http://thinkaurelius.github.io/titan/.Google ScholarGoogle Scholar
  38. F. J. Torres-Rojas and M. Ahamad. Plausible Clocks: Constant Size Logical Clocks For Distributed Systems. In Distributed Computing, 12(4), 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. J. Ugander and L. Backstrom. Balanced Label Propagation For Partitioning Massive Graphs. In Proc. of WSDM, Rome, Italy, Feb. 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. R. van Renesse and F. B. Schneider. Chain Replication For Supporting High Throughput And Availability. In Proc. of OSDI, San Francisco, CA, Dec. 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in
  • Published in

    cover image ACM Conferences
    EuroSys '14: Proceedings of the Ninth European Conference on Computer Systems
    April 2014
    388 pages
    ISBN:9781450327046
    DOI:10.1145/2592798

    Copyright © 2014 ACM

    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    • Published: 14 April 2014

    Permissions

    Request permissions about this article.

    Request Permissions

    Check for updates

    Qualifiers

    • research-article

    Acceptance Rates

    EuroSys '14 Paper Acceptance Rate27of147submissions,18%Overall Acceptance Rate241of1,308submissions,18%

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader