|
|
Quarterly Technical Report, April 2002
Progress:
This quarter we focused on developing new more advanced protocols to provide replication and
dynamic network routing. We also enhanced the existing software systems by
increasing their modularity and adding new capabilities. The new overlay network
implementation provides a fully application independant platform upon which routing and
reliability protocols can be developed.
- New replication protocol: We continued to work on optimizing and evaluating
the replication architecture.
With the purpose of building a framework that will allow us to
clearly identify the tradeoffs involved when replicating databases on
wide area networks, we developed a more modular version of the
replication algorithm (Maintaining Database Consistency in P2P
Networks). We are investigating a new metric that will allow us to
quantify the opportunity of establishing new replicas into a
replicated system. We are also studying the possibility of enhancing
the current replication schemes in order to increase their fault
tolerance and scalability properties, in the context of dynamic
networks.
- Overlay network infrastructure:
In order to better analyze and understand the overlay networks paradigm
in an environment defined by weaker semantics, we developed a stand
alone prototype using the client-daemon architecture that is able to build
and automatically configure a dynamic overlay network. Our Overlay
Network aims to be very scalable, as it does not have any limitation in
number of nodes or links, other than the what the routing protocol used
can support.
In the current implementation we provide only unreliable, best effort
semantics, similarly with UDP. The overlay networks configures itself
automatically, and dynamically grows or shrinks as nodes decide to join
or leave the network, and supports partitions, merges, crashes and
recoveries, and any such cascading events. Applications that use the
overlay network use a simple API consisting in four calls (that provide
connect, disconnect, send and receive), very similar to UDP socket
functions.
- Archipelago: We continued to develop the modular architecture that enables
plugging in different protocol modules such as routing, and reliable transmission.
Papers:
|
From Total Order to Database Replication
| |
ps,
ps.gz,
pdf.
To appear in the Proceedings of the 22nd IEEE International Conference on Distributed
Computing Systems (ICDCS), Vienna Austria, July 2002
Yair Amir,
and Ciprian Tutu.
This paper presents in detail an efficient and
provably correct algorithm for database replication over
partitionable networks. Our algorithm avoids the need for end-to-end
acknowledgments for each action while supporting network partitions and
merges and allowing dynamic instantiation of new replicas.
One round of end-to-end acknowledgments is required only upon a
membership change event such as a network partition. New actions may
be introduced to the system at any point, not only while in a
primary component. We show how performance can be further improved
for applications that allow relaxation of consistency requirements.
We provide experimental results that demonstrate the superiority of
this approach.
|
|
Maintaining Database Consistency in Peer to Peer Networks
| |
ps,
ps.gz,
pdf.
Technical Report CNDS-2002-2, February 2002.
Baruch Awerbuch, and Ciprian Tutu.
We present an algorithm for persistent consistent distributed commit
(distributed database commit) in a dynamic, asynchronous, peer to peer
network. The algorithm has constant overhead in time and space and
almost constant communication complexity, allowing it to scale to very
large size networks. Previous solutions required linear overhead in
communication and space, making them unscalable. We introduce a
modular solution based on several well defined blocks with clear
formal specifications. These blocks can be implemented in a variety of
ways and we give examples of possible implementations. Most of the
existing solutions require acknowledgments from every participant for
each action. Our algorithm is highly efficient by aggregating these
acknowledgments. Also, in contrast with existing solutions, our
algorithm does not require any membership knowledge. Components are
detected based on local information and the information is
disseminated on an overlay spanning tree. The algorithm may prove to
be more suited for practical implementation than the existing ones,
because of its simplicity.
|
|
Practical Wide Area Database Replication | |
ps, ps.gz,
pdf.
Technical Report CNDS-2002-1, February 2002.
Yair Amir, Claudiu Danilov, Michal
Miskin-Amir, Jonathan
Stanton and Ciprian
Tutu.
This paper explores the architecture, implementation and performance
of a wide and local area database replication system. The architecture
provides synchronous, peer replication, supporting diverse application
semantics, based on a group communication paradigm. Network partitions
and merges, computer crashes and recoveries, and message omissions are
all handled. Using a generic replication engine and the Spread group
communication toolkit, we provide replication services for the
PostgreSQL database system. We define three different environments to
be used as test-beds: a local area cluster, a wide area network that
spans the U.S.A, and an emulated wide area test bed. We conduct an
extensive set of experiments on these environments, varying the number
of replicas and clients, the mix of updates and queries, and the
network latency. Our results show that sophisticated algorithms and
careful distributed systems design can make symmetric, synchronous,
peer database replication a reality for both local and wide area
networks.
|
Plans for Next Quarter:
Questions or comments to: webmaster@cnds.jhu.edu
TEL: (410) 516-5562
FAX: (410) 516-6134
|
Center for Networking and Distributed Systems
Computer Science Department
Johns Hopkins University
3400 N. Charles Street
Baltimore, MD 21218-2686
|
|