On a Graph-theoretic Approach to Scheduling Large-scale Data Transfers

Thumbnail Image
CS-TR-4322.ps(510.79 KB)
No. of downloads: 297
CS-TR-4322.pdf(252.04 KB)
No. of downloads: 599
Publication or External Link
Cheng, William C.
Chou, Cheng-Fu
Golubchik, Leana
Khuller, Samir
Wan, Yungchun (Justin)
In this paper we consider the problem of moving a large amount of data from several different hosts to a single destination in a wide-area network. Often, due to congestion conditions, the choice of paths by the network may be poor. By choosing an indirect route at the application level, we may be able to obtain substantially higher performance in moving data through the network. We formulate this data transfer (collection) problem as a network flow problem. We show that by using a min-cost flow algorithm on an appropriately defined time-expanded (network) graph, we can obtain a data transfer schedule. We show that such schedules can be an order of magnitude better than schedules obtained by transferring data directly from each host to the destination. In fact, this holds, even though we make no assumptions about knowledge of the topology of the network or the capacity available on individual links of the network. We simply use end-to-end type information and compute a schedule for transferring the data. Finally, we also study the shortcomings of this approach in terms of the gap between the network flow formulation and data transfers in a wide-area network. Also UMIACS-TR-2002-04