A Performance Study of a Large-scale Data Collection Problem
Files
Publication or External Link
Date
Advisor
Citation
DRUM DOI
Abstract
In this paper, we consider the problem of moving a large amount of data from several source hosts to a destination host over a wide-area network, i.e., a large-scale data collection problem. This problem is important since improvements in data collection times in many applications such as wide-area upload applications, high-performance computing applications and data mining applications are crucial to performance of those applications.
Existing approaches to the large-scale research are transferring data
either directly, i.e., direct methods, or using ``best''-path type of
application-level re-routing techniques, which we refer as
non-coordinated methods. However, we believe that in the case of
large-scale data collection applications, it is important to
coordinate data transfers from multiple sources. More
specifically, our coordinated method would take into consideration the
transfer demands of all source hosts and then schedule all data
transfers in parallel by using all possible existing paths between the
source hosts and the destination host.
We present a performance and robustness study of different data collection methods. Our results showed that coordinated methods can perform significantly better than non-coordinated and direct methods under various degrees and types of network congestion. Moreover, we also showed that coordinated methods are more robust than non-coordinated methods under inaccuracies in network condition information. Therefore, we believe that coordinated methods are a promising approach to large-scale data collection problems. Also UMIACS-TR-2002-62