COMPUTING APPROXIMATE CUSTOMIZED RANKING

Wu, Yao

COMPUTING APPROXIMATE CUSTOMIZED RANKING

Files

Wu_umd_0117E_10538.pdf (1.62 MB)

No. of downloads: 1202

Date

2009

Authors

Wu, Yao

Advisor

Raschid, Louiqa

Abstract

As the amount of information grows and as users become more

sophisticated, ranking techniques become important building blocks

to meet user needs when answering queries. PageRank is one of the

most successful link-based ranking methods, which iteratively

computes the importance scores for web pages based on the importance scores of incoming pages. Due to its success, PageRank has been applied in a number of applications that require customization.

We address the scalability challenges for two types of customized

ranking. The first challenge is to compute the ranking of a

subgraph. Various Web applications focus on identifying a

subgraph, such as focused crawlers and localized search engines.

The second challenge is to compute online personalized ranking.

Personalized search improves the quality of search results for each

user. The user needs are represented by a personalized set of pages

or personalized link importance in an entity relationship graph.

This requires an efficient online computation.

To solve the subgraph ranking problem efficiently, we estimate the

ranking scores for a subgraph. We propose a framework of an exact

solution (IdealRank) and an approximate solution (ApproxRank) for

computing ranking on a subgraph. Both IdealRank and ApproxRank

represent the set of external pages with an external node

$\Lambda$ and modify the PageRank-style transition matrix with respect to $\Lambda$. The IdealRank algorithm assumes that the scores of external pages are known. We prove that the IdealRank scores for pages in the subgraph converge to the true PageRank scores. Since the PageRank-style scores of external pages may not typically be available, we propose the ApproxRank algorithm to estimate scores for the subgraph. We analyze the $L_1$ distance between IdealRank scores and ApproxRank scores of the subgraph and show that it is within a

constant factor of the $L_1$ distance of the external pages. We demonstrate with real and synthetic data that ApproxRank provides a good approximation to PageRank for a variety of subgraphs.

We consider online personalization using ObjectRank; it is an

authority flow based ranking for entity relationship graphs. We formalize the concept of an aggregate surfer on a data graph; the surfer's behavior is controlled by multiple personalized rankings. We prove a linearity

theorem over these rankings which can be used as a tool to scale

this type of personalization. DataApprox uses a repository of precomputed rankings for a given set of link weights assignments. We define DataApprox as an optimization problem; it selects a subset of the precomputed rankings from the repository and produce a weighted combination of these rankings. We analyze the $L_1$ distance between the DataApprox scores and the real authority flow ranking scores and show that DataApprox has a theoretical bound. Our experiments on the DBLP data graph show that DataApprox performs well in practice and allows fast and accurate personalized authority flow ranking.

URI (handle)

http://hdl.handle.net/1903/11851

Collections

UMD Theses and Dissertations
Computer Science Theses and Dissertations

Full item page