Hydra: Serialization-Free Network Ordering for Strongly Consistent Distributed Applications

20th USENIX Symposium on Networked Systems Design and Implementation (NSDI '23) |

A large class of distributed systems, e.g., state machine replication and fault-tolerant distributed databases, rely on establishing a consistent order of operations on groups of nodes in the system. Traditionally, an application-level distributed protocol such as Paxos and two-phase locking provide the ordering guarantees. To reduce the performance overhead imposed by these protocols, a recent line of work propose to move the responsibility of ensuring operation ordering into the network by sequencing requests through a centralized network sequencer. This network sequencing approach yields significant application-level performance improvements, but routing all requests through a single sequencer comes with several fundamental limitations, including sequencer scalability bottleneck, prolonged system downtime during sequencer failover, worsened network-level load balancing, etc.Our work, Hydra, overcomes these limitations by using a distributed set of network sequencers to provide network ordering. Hydra leverages loosely synchronized clocks on network sequencers to establish message ordering across them, per-sequencer sequence numbers to detect message drops, and periodic timestamp messages to enforce progress when some sequencers are idle. To demonstrate the benefit of Hydra, we co-designed a state machine replication protocol and a distributed transactional system using the Hydra network primitive. Compared to serialization-based network ordering systems, Hydra shows equivalent performance improvement over traditional approaches in both applications, but with significantly higher scalability, shorter sequencer failover time, and better network-level load balancing.