Skip to content
University of Maryland LibrariesDigital Repository at the University of Maryland
    • Login
    View Item 
    •   DRUM
    • College of Computer, Mathematical & Natural Sciences
    • Computer Science
    • Technical Reports from UMIACS
    • View Item
    •   DRUM
    • College of Computer, Mathematical & Natural Sciences
    • Computer Science
    • Technical Reports from UMIACS
    • View Item
    JavaScript is disabled for your browser. Some features of this site may not work without it.

    A Performance Study of a Large-scale Data Collection Problem

    Thumbnail
    View/Open
    CS-TR-4382.ps (378.2Kb)
    No. of downloads: 125

    Date
    2002-08-01
    Author
    Chou, Cheng-Fu
    Wan, Yung-Chun (Justin)
    Cheng, William C.
    Golubchik, Leana
    Khuller, Samir
    Metadata
    Show full item record
    Abstract
    In this paper, we consider the problem of moving a large amount of data from several source hosts to a destination host over a wide-area network, i.e., a large-scale data collection problem. This problem is important since improvements in data collection times in many applications such as wide-area upload applications, high-performance computing applications and data mining applications are crucial to performance of those applications. Existing approaches to the large-scale research are transferring data either directly, i.e., direct methods, or using ``best''-path type of application-level re-routing techniques, which we refer as non-coordinated methods. However, we believe that in the case of large-scale data collection applications, it is important to *coordinate* data transfers from multiple sources. More specifically, our coordinated method would take into consideration the transfer demands of all source hosts and then schedule all data transfers in parallel by using all possible existing paths between the source hosts and the destination host. We present a performance and robustness study of different data collection methods. Our results showed that coordinated methods can perform significantly better than non-coordinated and direct methods under various degrees and types of network congestion. Moreover, we also showed that coordinated methods are more robust than non-coordinated methods under inaccuracies in network condition information. Therefore, we believe that coordinated methods are a promising approach to large-scale data collection problems. Also UMIACS-TR-2002-62
    URI
    http://hdl.handle.net/1903/1214
    Collections
    • Technical Reports from UMIACS
    • Technical Reports of the Computer Science Department

    DRUM is brought to you by the University of Maryland Libraries
    University of Maryland, College Park, MD 20742-7011 (301)314-1328.
    Please send us your comments.
    Web Accessibility
     

     

    Browse

    All of DRUMCommunities & CollectionsBy Issue DateAuthorsTitlesSubjectsThis CollectionBy Issue DateAuthorsTitlesSubjects

    My Account

    LoginRegister
    Pages
    About DRUMAbout Download Statistics

    DRUM is brought to you by the University of Maryland Libraries
    University of Maryland, College Park, MD 20742-7011 (301)314-1328.
    Please send us your comments.
    Web Accessibility