A Performance Study of a Large-scale Data Collection Problem

dc.contributor.authorChou, Cheng-Fuen_US
dc.contributor.authorWan, Yung-Chun (Justin)en_US
dc.contributor.authorCheng, William C.en_US
dc.contributor.authorGolubchik, Leanaen_US
dc.contributor.authorKhuller, Samiren_US
dc.date.accessioned2004-05-31T23:20:02Z
dc.date.available2004-05-31T23:20:02Z
dc.date.created2002-07en_US
dc.date.issued2002-08-01en_US
dc.description.abstractIn this paper, we consider the problem of moving a large amount of data from several source hosts to a destination host over a wide-area network, i.e., a large-scale data collection problem. This problem is important since improvements in data collection times in many applications such as wide-area upload applications, high-performance computing applications and data mining applications are crucial to performance of those applications. Existing approaches to the large-scale research are transferring data either directly, i.e., direct methods, or using ``best''-path type of application-level re-routing techniques, which we refer as non-coordinated methods. However, we believe that in the case of large-scale data collection applications, it is important to *coordinate* data transfers from multiple sources. More specifically, our coordinated method would take into consideration the transfer demands of all source hosts and then schedule all data transfers in parallel by using all possible existing paths between the source hosts and the destination host. We present a performance and robustness study of different data collection methods. Our results showed that coordinated methods can perform significantly better than non-coordinated and direct methods under various degrees and types of network congestion. Moreover, we also showed that coordinated methods are more robust than non-coordinated methods under inaccuracies in network condition information. Therefore, we believe that coordinated methods are a promising approach to large-scale data collection problems. Also UMIACS-TR-2002-62en_US
dc.format.extent387357 bytes
dc.format.mimetypeapplication/postscript
dc.identifier.urihttp://hdl.handle.net/1903/1214
dc.language.isoen_US
dc.relation.isAvailableAtDigital Repository at the University of Marylanden_US
dc.relation.isAvailableAtUniversity of Maryland (College Park, Md.)en_US
dc.relation.isAvailableAtTech Reports in Computer Science and Engineeringen_US
dc.relation.isAvailableAtUMIACS Technical Reportsen_US
dc.relation.ispartofseriesUM Computer Science Department; CS-TR-4382en_US
dc.relation.ispartofseriesUMIACS; UMIACS-TR-2002-62en_US
dc.titleA Performance Study of a Large-scale Data Collection Problemen_US
dc.typeTechnical Reporten_US

Files

Original bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
CS-TR-4382.ps
Size:
378.28 KB
Format:
Postscript Files