Web Archiving: Organizing Web Objects into Web Containers to Optimize Access
Files
Publication or External Link
Date
Authors
Advisor
Citation
DRUM DOI
Abstract
The web is becoming the preferred medium for communicating and storing information pertaining to almost any human activity. However it is an ephemeral medium whose contents are constantly changing, resulting in a permanent loss of part of our cultural and scientific heritage on a regular basis. Archiving important web contents is a very challenging technical problem due to its tremendous scale and complex structure, extremely dynamic nature, and its rich heterogeneous and deep contents. In this paper, we consider the problem of archiving a linked set of web objects into web containers in such a way as to minimize the number of containers accessed during a typical browsing session. We develop a method that makes use of the notion of PageRank and optimized graph partitioning to enable faster browsing of archived web contents. We include simulation results that illustrate the performance of our scheme and compare it to the common scheme currently used to organize web objects into web containers.