A DARPA/ITO grant (May 2000 - September 2003) to Johns Hopkins Univesity. A component of the DARPA Tolerant Networking effort.
Principal Investigators: Yair Amir and Baruch Awerbuch. Co-PI: Jonathan Stanton.
- From Total Order to Database Replication, ICDCS 2002
- Spines presentation, Future Directions in Distributed Computing, Bertinoro, Italy, June 2002
- Darpa meeting in San Diego, CA, January 2002
- Darpa meeting in Colorado Springs, CO, July 2001
- Darpa meeting in St. Petersburg, FL, January 2001
- Darpa meeting in Hawaii, July 2000
N-Way Fail-Over Infrastructure for Survivable Servers and Routers.
To appear in the Proceedings of the IEEE International Conference on
Dependable Systems and Networks (DSN03), San Francisco, June 2003.
Yair Amir, Ryan Caudy, Ashima Munjal, Theo Schlossnagle and Ciprian Tutu.
Maintaining the availability of critical servers and routers is an important
concern for many organizations. At the lowest level, IP addresses represent the
global namespace by which services are accessible on the Internet.
We introduce Wackamole, a completely distributed software solution
based on a provably correct algorithm that negotiates the
assignment of IP addresses among the currently available servers upon
detection of faults. This reallocation ensures that at any given time
any public IP address of the server cluster is covered exactly once,
as long as at least one physical server survives the network fault.
The same technique is extended to support highly available routers.
The paper presents the design considerations,
algorithm specification and correctness proof, discusses
the practical usage for server clusters and for routers,
and evaluates the performance of the system.
Reliable Communication in Overlay Networks
In the Proceedings of the IEEE International Conference on
Dependable Systems and Networks (DSN03), San Francisco, June 2003.
Yair Amir and
Reliable point-to-point communication is usually achieved in overlay
networks by applying TCP/IP on the end nodes of a connection.
This paper presents an hop-by-hop reliability approach that
considerably reduces the latency and jitter of reliable connections.
Our approach is feasible and beneficial in overlay networks that
do not have the scalability and interoperability requirements of
the global Internet.
The effects of the hop-by-hop reliability approach are quantified
in simulation as well as in practice using a newly developed
overlay network software that is fair with the external traffic
on the Internet. The experimental results show that
the overhead associated with overlay network processing at the
application level does not play an important factor compared with
the considerable gain of the approach.
On the Performance of Synchronous Wide-Area Database Replication
Technical Report CNDS-2002-4, September 2002.
Yair Amir, Claudiu Danilov, Michal
Stanton and Ciprian
A fundamental challenge in database replication is maintaining
a low cost of updates while assuring global system consistency.
The problem is magnified for wide-area replication due to the high latency
and the increased likelihood of network partitions. As a consequence,
most database replication research moved away from strictly consistent
models to update models with weaker semantics, relying on application
knowledge to resolve conflicts.
This paper explores a synchronous replication architecture for local
and wide-area networks that provides strong consistency and performs
considerably better than previous consistent approaches. As a proof of concept,
we implemented transparanet replication for the Postgres database system.
Our results show that sophisticated algorithms and careful distributed systems
design can make symmetric, synchronous, peer database replication a reality over
both local and wide-area networsk.
An On-Demand Secure Routing Protocol Resilient to Byzantine Failures
In ACM Workshop on Wireless Security (WiSe) , Atlanta, Georgia, September
and Herbert Rubens.
An ad hoc wireless network is an autonomous self-organizing system of
mobile nodes connected by wireless links where nodes not in direct
range can communicate via intermediate nodes. A common technique used
in routing protocols for ad hoc wireless networks is to establish the
routing paths on-demand, as opposed to continually maintaining a
complete routing table. A significant concern in routing is the
ability to function in the presence of byzantine failures which
include nodes that drop, modify, or mis-route packets in an attempt to
disrupt the routing service.
We propose an on-demand routing protocol for ad hoc wireless networks
that provides resilience to byzantine failures caused by individual or
colluding nodes. Our adaptive probing technique detects a malicious
link after log faults have occurred, where n is the length of
the path. These links are then avoided by multiplicatively increasing
their weights and by using an on-demand route discovery protocol that
finds a least weight path to the destination.
- Flow Control for Many-to-Many Multicast: A Cost-Benefit Approach
Technical Report CNDS-2001-1
We present a protocol
that is analytically grounded, yet also achieves real world goals,
such as simplicity, fairness and minimal resource usage. We base our
flow control protocol on the Cost-Benefit algorithmic framework for
resource management. We base decisions on the "opportunity" costs of
network resources, comparing the cost of each individual resource to
the benefit it provides. As opposed to existing window-based flow
control schemes, we avoid end-to-end feedback by basing decisions on
the state of the links between participating nodes. This produces
control traffic proportional only to the number of overlay network
links and independent of the number of groups.
- A Low Latency, Loss Tolerant Architecture and Protocol for Wide Area Group Communication
Published in the International Conference on Dependable Systems and Networks (FTCS-30, DCCA-8),
New York, New York, June 25-28, 2000.
paper presents the design of the transport protocols of the Spread
wide area group communication system. We focus on two aspects of the
system. First, the value of using overlay networks for application
level group communication services. Second, the requirements and
design of effective low latency link protocols used to construct
wide area group communication.
This project develop the theory and algorithms required to overcome extremely strong network
attacks, while providing theoretically provable performance bounds. We are building a system
that incorporates these algorithms, and that exhibits good performance in practice.
- Analysis of strong adversary models: In order to understand how robust our solutions must
be, we need to first understand what is the model of possible attacks and errors. We introduce a collection of adversary models. They range from a simple predictable
slow adversary, to a somewhat limited "stable path" adversary (which allows communication over
a path to be successfully completed), to a totally unpredictable adversary (which can
selectively block traffic based on type, source, destination, etc.). Many of these models have
not been considered in the literature so far.
- New routing and dissemination protocols: We design a suite of novel routing protocols
tailored to the above adversary models and prove that these protocols perform in a
near-optimal manner. Specifically, our solution, in case an operational
path exists, will be able to select such a path with high probability and, in case such a path
does not exist, will store and forward the packets. The performance and correctness properties
of these protocols will be supported by rigorous analysis. Our analysis will not assume
anything about either the topology or the traffic patterns in the network, and will not assume
a known correlation between past and future behavior of the adversary.
- New replication protocols: When there is no information-theoretic possibility of
communication, say in the case of a cut in the network, one can still continue the operation
by making sure that the data is replicated in most areas, or at least in the areas where
disconnection is likely. We design a suite of replication protocols that can handle a range
of adversaries and can gracefully degrade performance and semantics as the network hostility
- A Cost-Benefit decision framework: This framework is used to select the most suitable
protocol as network conditions change, both for network-level protocols such as routing and
dissemination, and data-level protocols such as replication. The main idea is to consider the
marginal benefit obtained by the application when consuming a given resource, versus the
"opportunity cost" of using this resource. The latter is the benefit that may potentially be
lost by other applications if this resource is committed. In the network level, the decision
is based on application tolerance to delay, and the reliability of the network. In the data
level, the decision is based on the cost of inaccessibility of data, the cost of updating
replicas, and the synchronization cost of replication.
- An overlay network architecture: We rely on an overlay network architecture that makes
these protocols practical since they are too complex to have any hope to be implemented in
general Internet routers anytime soon.
Questions or comments to:
webmaster (at) dsn.jhu.edu
TEL: (410) 516-5562
FAX: (410) 516-6134
Distributed Systems and Networks Lab|
Computer Science Department
Johns Hopkins University
3400 N. Charles Street
Baltimore, MD 21218-2686