### Browsing by Author "Wu, Yao"

Now showing 1 - 4 of 4

###### Results Per Page

###### Sort Options

Item Challenges of Navigational Queries: Finding Best Paths in Graphs(2005-10-06T16:30:25Z) Raschid, Louiqa; Vidal, Maria-Esther; Wu, Yao; Cardenas, Marelis; Marquez, NataliaLife science sources are characterized by a complex graph of overlapping sources, and multiple alternate links between sources. A (navigational) query may be answered by traversing multiple alternate paths between an origin and target source. Paths may be characterized by several metrics, including the cardinality of objects of the target source(TOC), the cost of query evaluation of a plan for the path, and the user's preference for specific paths. Our challenge is finding the best paths among the set of all solutions, AllPaths, that meet some user specified ranking criteria. If the user ranking criteria is strict, then the problem is to find the Top K paths. If the user wants a trade-off of several metrics, then the problem is to find the Skyline paths that are not dominated by other paths. {\em NSearch} is a naive solution. {\em BFSrchOpt} is a heuristic best-first search strategy. It uses a metric to rank partial solutions (subpaths) and (local) metrics to guide graph traversal, and produces BFPaths. We compare the precision and recall of BFPaths compared to the Top K\% or Skyline of AllPaths. We study the impact of graph properties on the behavior of {\em BFSrchOpt}. {\em BFSrchOpt} can be orders of magnitude faster than {\em NSearch}.Item COMPUTING APPROXIMATE CUSTOMIZED RANKING(2009) Wu, Yao; Raschid, Louiqa; Computer Science; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)As the amount of information grows and as users become more sophisticated, ranking techniques become important building blocks to meet user needs when answering queries. PageRank is one of the most successful link-based ranking methods, which iteratively computes the importance scores for web pages based on the importance scores of incoming pages. Due to its success, PageRank has been applied in a number of applications that require customization. We address the scalability challenges for two types of customized ranking. The first challenge is to compute the ranking of a subgraph. Various Web applications focus on identifying a subgraph, such as focused crawlers and localized search engines. The second challenge is to compute online personalized ranking. Personalized search improves the quality of search results for each user. The user needs are represented by a personalized set of pages or personalized link importance in an entity relationship graph. This requires an efficient online computation. To solve the subgraph ranking problem efficiently, we estimate the ranking scores for a subgraph. We propose a framework of an exact solution (IdealRank) and an approximate solution (ApproxRank) for computing ranking on a subgraph. Both IdealRank and ApproxRank represent the set of external pages with an external node $\Lambda$ and modify the PageRank-style transition matrix with respect to $\Lambda$. The IdealRank algorithm assumes that the scores of external pages are known. We prove that the IdealRank scores for pages in the subgraph converge to the true PageRank scores. Since the PageRank-style scores of external pages may not typically be available, we propose the ApproxRank algorithm to estimate scores for the subgraph. We analyze the $L_1$ distance between IdealRank scores and ApproxRank scores of the subgraph and show that it is within a constant factor of the $L_1$ distance of the external pages. We demonstrate with real and synthetic data that ApproxRank provides a good approximation to PageRank for a variety of subgraphs. We consider online personalization using ObjectRank; it is an authority flow based ranking for entity relationship graphs. We formalize the concept of an aggregate surfer on a data graph; the surfer's behavior is controlled by multiple personalized rankings. We prove a linearity theorem over these rankings which can be used as a tool to scale this type of personalization. DataApprox uses a repository of precomputed rankings for a given set of link weights assignments. We define DataApprox as an optimization problem; it selects a subset of the precomputed rankings from the repository and produce a weighted combination of these rankings. We analyze the $L_1$ distance between the DataApprox scores and the real authority flow ranking scores and show that DataApprox has a theoretical bound. Our experiments on the DBLP data graph show that DataApprox performs well in practice and allows fast and accurate personalized authority flow ranking.Item LP Randomized Rounding for Maximum Coverage Problem and Minimum Set Cover with Threshold Problem(2005-10-06T16:28:27Z) Khuller, Samir; Raschid, Louiqa; Wu, YaoThere are abundance of web accessible life science sources. Traversal of a particular path can answer a navigational query, returning a target object set (TOS). The cardinality of TOS is considered as the benefit of a path and there is some cost function associated with each path. It is common that multiple alternate paths satisfy the query and we are not allowed to pick all these paths to answer the query, since there could be exponential number of paths in a graph. We are interested in selecting a subset of these paths. We present two problems in this context. The first problem is to select a subset of paths of maximum benefit within a cost budget. This is known as {\em Budgeted Maximum Coverage Problem} in the literature. The second problem is to select a subset of paths of minimum cost with a threshold benefit guarantee. This is the {\em Minimum Set Cover with Threshold Problem}. We develop randomized approximation algorithms based on LP rounding and conduct experiments.Item LP Randomized Rounding for Maximum Coverage Problem and Minimum Set Cover with Threshold Problem(2006-05-26T18:44:56Z) Khuller, Samir; Raschid, Louiqa; Wu, YaoThere are abundance of web accessible life science sources. Traversal of a particular path can answer a navigational query, returning a target object set (TOS). The cardinality of TOS is considered as the benefit of a path and there is some cost function associated with each path. It is common that multiple alternate paths satisfy the query and we are not allowed to pick all these paths to answer the query, since there could be exponential number of paths in a graph. We are interested in selecting a subset of these paths. We present two problems in this context. The first problem is to select a subset of paths of maximum benefit within a cost budget. This is known as Budgeted Maximum Coverage Problem in the literature. The second problem is to select a subset of paths of minimum cost with a threshold benefit guarantee. This is the Minimum Set Cover with Threshold Problem. We develop randomized approximation algorithms based on LP rounding and conduct experiments.