|
About Us People Research Publications Funding Download SMesh Steward Prime Spines Spread Secure Spread |
Survivable Replication Systems
Much of our critical infrastructure is controlled by large software systems whose participants are distributed across the Internet. As our dependence on these critical systems continues to grow, it becomes increasingly important that they meet strict availability and performance requirements, even in the face of malicious attacks, including those that are successful in compromising parts of the system. Since 2004, we have been working to develop scalable intrusion-tolerant replication systems capable of guaranteeing correctness, availability, and good performance even when some of the servers are compromised. Such systems enable the construction of highly available and highly resilient systems for our critical infrastructure. This page provides an overview of the four systems we have developed. Please refer to each system's individual page for more details. StewardBeginning in 1999, the research community made considerable progress in the design of high performance intrusion-tolerant replication systems. Castro and Liskov's BFT protocol was the first to show that intrusion-tolerant replication could be made efficient enough to perform well in practice (although with some limitations). BFT and its successors perform well on small-scale systems that are usually confined to local-area networks.However, many critical infrastructure systems span multiple wide-area sites and consist of tens of servers, rather than being confined to a one-room, local-area network setting. Unfortunately, BFT and similar protocols employ a flat architecture that makes it difficult to scale them to large wide-area deployments. To address this shortcoming, we developed the Steward system from 2004 to 2005. Steward is the first hierarchical intrusion-tolerant replication architecture suitable to systems that span multiple wide-area sites, each consisting of several server replicas. Please click here for more details. Customizable Replication ArchitectureComing soon.PrimeIn December 2005, a red team experiment was conducted on our Steward system. The red team's goal was to break Safety (i.e., to cause correct replicas to become inconsistent) or to break Liveness (i.e., to stop the system from making progress). Although the red team was unable to achieve its goal, it did, in one experiment, cause the performance of the system to be reduced by about a factor of 10. After analyzing the attack, we found that we could mount a similar attack and reduce performance by a factor of about 100.This experience opened our eyes to a limitation of existing Byzantine replication systems: while they guarantee Safety and Liveness (when the network is sufficiently stable), they are vulnerable to performance degradation by malicious servers that play the protocol correctly in the value domain but act just slowly enough to avoid triggering defense mechanisms. In order to address this limitation, we developed Prime. Prime is the first Byzantine fault-tolerant replication protocol to make a meaningful performance guarantee even when some of the servers are compromised. Please click here for more details. Attack-Resilient Architecture for Large-Scale ReplicationComing soon.Publications
FundingThe above work was partially funded by the Defense Advanced Research Projects Agency (contract FA8750-04-2-0232) and the National Science Foundation (grants 0430271, 0430276, and 0716620).
|