Algorithms for Data Migration
Algorithms for Data Migration
Files
Publication or External Link
Date
2005-06-15
Authors
Kim, Yoo-Ah
Advisor
Khuller, Samir
Citation
DRUM DOI
Abstract
This thesis is concerned with the problem related to data storage and management. A large
storage server consists of several hundreds of disks. To balance the load across disks, the system
computes data layouts that are typically adjusted according to the workload. As workloads change
over time, the system recomputes the data layout, and rearranges the data items according to the
new layout. We identify the problem of computing an efficient data migration plan that converts
an initial layout to a target layout.
We define the data migration problem as follows: for each item, there are a set of disks that
have the item (sources) and a set of disks that want to receive the item (destinations). We want to
migrate the data items from the sources to destinations. The crucial constraint is that each disk
can participate in only one transfer at a time. The most common objective has been to minimize
the makespan, which is the time when we finish all the migrations. The problem is NP-hard,
and we develop polynomial time algorithms with constant factor approximation guarantees and
several other heuristic algorithms. We present the performance evaluation of the different methods
through an experimental study.
We also consider the data migration problem to minimize the sum of completion times over
all migration jobs or storage devices. Minimizing the sum of completion times of jobs is one of
the most common objectives in scheduling literature. On the other hand, since a storage device
may run inefficiently while the device is involved in migrations, another interesting objective is to
minimize the sum of completion times over all storage devices. We present hardness results and
constant factor approximation algorithms for these objectives.
In addition, we consider the case when we have a heterogeneous collection of machines. We
assume that heterogeneity is modeled by a non-uniform speed of the sending machine. For the
basic problem of multicasting and broadcasting in the model, we show that Fastest Node First
scheme gives a approximation ratio of 1.5 for minimizing the makespan. We also prove that there
is a polynomial time approximation scheme.