
The engines may modify the entry as needed for the engine’s logic. Each proposal is propagated down the engine stack before it reaches the shared log. An application selects a stack of these engines based on the features it requires. Delos provides the abstraction of state machine replication engines, or SMREngines. Many applications written on Delos share similar functionality, such as write batching. In this way, Delos promises linearizable reads and writes without knowledge of the application’s business logic. To provide linearizability for reads, a replica first syncs its local storage up to the tail of the shared log and then services the read from its local state. The replicas of the ensemble then learn these state machine updates in order and apply the updates to their local storage.Īs these updates are applied deterministically on all replicas, they guarantee consistency across the replicated state machine. A linearizable distributed shared log maintained by the Delos system reflects the state transitions. As such, the Delos platform manages numerous problems: state distribution and consensus, failure detection, leader election, distributed state management, ensemble membership management, and recovery from machine faults.ĭelos achieves this by abstracting an application into a finite state machine (FSM) replicated across the nodes in the system, often called replicas. This requires Delos applications to worry only about their application logic, and to get Delos’s strong consistency (linearizability) and high availability “for free” when they’re written within the Delos framework. Further complicating things, ZooKeeper has many uses at Meta, so we need a feature-compatible implementation that can support all legacy use cases and transparently migrate from legacy ZooKeeper to our new Delos-based implementation.īuilding Zelos and migrating a complex legacy distributed system like ZooKeeper into our state-of-the-art Delos platform meant we had to solve these challenges.ĭelos’s goal is to abstract away all the common issues that arise with distributed consensus and provide a simple interface to build distributed applications. Additionally, ZooKeeper provides stronger-than-linearizable semantics within a session (and weaker semantics outside of a session). For instance, ZooKeeper requires the notion of session management, a notion not present in Delos. However, ZooKeeper doesn’t naturally map itself to the Delos abstractions. Furthermore, as Delos cleanly separates application logic from consensus, it naturally allows the system to grow and evolve. It also provides a clean log and a database-based abstraction for application development. This led us to construct ZooKeeper on Delos, aka Zelos, which will eventually replace all ZooKeeper clusters in our fleet.ĭelos makes building strongly consistent distributed applications simple by abstracting away many of the challenges in distributed consensus and state management.

This inability to safely improve and scale compelled us to pose the question:Ĭan we construct a more modular, extensible, and performant variant of ZooKeeper? Consequently, extending ZooKeeper to work better at scale has proved extremely difficult despite several ambitious initiatives, including native transparent sharding support, weaker consistency models, persistent watch protocol, and server-side synchronization primitives. ZooKeeper, a single, tightly integrated monolithic system, couples much of the application state with its consensus protocol, ZooKeeper atomic broadcast (ZAB).

Modifying and tuning ZooKeeper for performance has become a significant pain point. However, as Meta’s workload has scaled, we’ve found ourselves pushing the limits of ZooKeeper’s capabilities. At Meta, we have historically used Apache ZooKeeper as a centralized service for these primitives. Within large-scale services, durable storage, distributed leases, and coordination primitives such as distributed locks, semaphores, and events should be strongly consistent.
