Browsing by Author "Raschid, Louiqa"
Now showing 1 - 9 of 9
Results Per Page
Sort Options
Item Adaptive Pull-Based Data Freshness Policies for Diverse Update Patterns(2004-01-29) Bright, Laura; Gal, Avigdor; Raschid, LouiqaAn important challenge to effective data delivery in wide area environments is maintaining the data freshness of objects using solutions that can scale to a large number of clients without incurring significant server overhead. Policies for maintaining data freshness are traditionally either push-based or pull-based. Push-based policies involve pushing data updates by servers; they may not scale to a large number of clients. Pull-based policies require clients to contact servers to check for updates; their effectiveness is limited by the difficulty of predicting updates. Models to predict updates generally rely on some knowledge of past updates. Their accuracy of prediction may vary and determining the most appropriate model is non-trivial. In this paper, we present an adaptive pull-based solution to this challenge. We first present several techniques that use update history to estimate the freshness of cached objects, and identify update patterns for which each technique is most effective. We then introduce adaptive policies that can (automatically) choose a policy for an object based on its observed update patterns. Our proposed policies improve the freshness of cached data and reduce costly contacts with remote servers without incurring the large server overhead of push-based policies, and can scale to a large number of clients. Using trace data from a data-intensive website as well as two email logs, we show that our adaptive policies can adapt to diverse update patterns and provide significant improvement compared to a single policy. (UMIACS-TR-2004-01)Item Adaptive Pull-Based Policies for Wide Area Data Delivery(2005-10-19T16:15:35Z) Bright, Laura; Gal, Avigdor; Raschid, LouiqaWide area data delivery requires timely propagation of up-to-date information to thousands of clients over a wide area network. Applications include web caching, RSS source monitoring, and email access via a mobile network. Data sources vary widely in their update patterns and may experience different update rates at different times or unexpected changes to update patterns. Traditional data delivery solutions are either push-based, which requires servers to push updates to clients, or pull-based, which require clients to check for updates at servers. While push-based solutions ensure timely data delivery, they are not always feasible to implement and may not scale to a large number of clients. In this paper we present adaptive pull-based policies that can reduce the overhead of contacting remote servers compared to existing pull-based policies. We model updates to data sources using update histories, and present novel history-based policies to estimate when updates occur. We present a set of architectures to enable rapid deployment of the proposed policies. We develop adaptive policies to handle changes in update patterns, and present two examples of such policies. Extensive experimental evaluation using three data traces from diverse applications shows that history-based policies can reduce contact between clients and servers by up to 60% compared to existing pull-based policies while providing a comparable level of data freshness. Our adaptive policies are further shown to dominate individual history based policies.Item Challenges of Navigational Queries: Finding Best Paths in Graphs(2005-10-06T16:30:25Z) Raschid, Louiqa; Vidal, Maria-Esther; Wu, Yao; Cardenas, Marelis; Marquez, NataliaLife science sources are characterized by a complex graph of overlapping sources, and multiple alternate links between sources. A (navigational) query may be answered by traversing multiple alternate paths between an origin and target source. Paths may be characterized by several metrics, including the cardinality of objects of the target source(TOC), the cost of query evaluation of a plan for the path, and the user's preference for specific paths. Our challenge is finding the best paths among the set of all solutions, AllPaths, that meet some user specified ranking criteria. If the user ranking criteria is strict, then the problem is to find the Top K paths. If the user wants a trade-off of several metrics, then the problem is to find the Skyline paths that are not dominated by other paths. {\em NSearch} is a naive solution. {\em BFSrchOpt} is a heuristic best-first search strategy. It uses a metric to rank partial solutions (subpaths) and (local) metrics to guide graph traversal, and produces BFPaths. We compare the precision and recall of BFPaths compared to the Top K\% or Skyline of AllPaths. We study the impact of graph properties on the behavior of {\em BFSrchOpt}. {\em BFSrchOpt} can be orders of magnitude faster than {\em NSearch}.Item A Dual Framework and Algorithms for Targeted Data Delivery(2005-11-03T15:18:56Z) Roitman, Haggai; Raschid, Louiqa; Gal, Avigdor; Bright, LauraA variety of emerging wide area applications challenge existing techniques for data delivery to users and applications accessing data from multiple autonomous servers. In this paper, we develop a framework for comparing pull based solutions and present dual optimization approaches. Informally, the first approach maximizes user utility of profiles while satisfying constraints on the usage of system resources. The second approach satisfies the utility of user profiles while minimizing the usage of system resources. We present a static optimal solution (SUP) for the latter approach and formally identify sufficient conditions for SUP to be optimal for both. A shortcoming of static solutions to pull-based delivery is that they cannot adapt to the dynamic behavior of Web source updates. Therefore, we present an adaptive algorithm (fbSUP) and show how it can incorporate feedback to improve user utility with only a moderate increase in probing. Using real and synthetic data traces, we analyze the behavior of SUP and fbSUP under various update models.Item A Flexible Meta-Wrapper Interface for Autonomous Distributed Information Sources.(1998-10-15) Raschid, Louiqa; Vidal, Maria Esther; Gruser, Jean-RobertWe support flexible query processing with autonomous networked information sources. Flexibility allows a query to be accepted in a dynamic environment with unavailable sources. Flexibility provides the ability to identify equivalent sources, based on their contents; these equivalences are used to eliminate redundancy and provide alternate query plans, when some source is unavailable. We determine the best plan, i.e., the least-cost non-redundant plan, based on a cost-model for autonomous sources. These features are supported by a meta-wrapper component within the mediator. The meta-wrapper interface is defined by a structure and supported operations. WHOQL is a query language for queries and plans; it can represent sequential execution to obtain safe plans, and plans with redundancy (alternatives). A language WHODL defines the mapping from the meta-wrapper interface to each source. WHODL also describes the contents of a source. This content definition is used to determine equivalences of autonomous sources. We obtain a least-cost non-redundant plan in a dynamic environment. A meta-wrapper cost model uses three underlying sources of information: a selectivity model; a cost model for operators in the meta-wrapper; and a cost estimator for the query response time. The estimator uses a parameterized feedback technique to learn from query feedback, and to determine the relevance of various factors that affect response time. The cost model also provides feedback to the plan generator on low-cost plans. (Also cross-referenced as UMIACS-TR-97-07)Item Learning Response Time for WebSources using Query Feedback and Application in Query Optimization(1998-11-12) Gruser, Jean-Robert; Raschid, Louiqa; Zadorozhny, VladimirThe rapid growth of the Internet and support for interoperability protocols has increased the number of Web accessible sources, WebSources. Current optimization technology for wrapper mediator architectures needs to be extended to estimate the response time (delays) to access WebSources and to use this delay in query optimization. In this paper, we present a Multi-Dimensional Table (MDT), a tool that is based on learning using query feedback from WebSources. We describe the MDT learning algorithms, and report on the MDT learning for WebSources. The MDT uses dimensions Time of day, Day, and Quantity of data, to learn response times from a particular WebSource, and to predict the expected response time (delay), and a confidence in this prediction, for some query. Experiment data was collected from several WebSources and analyzed, to determine those dimensions that were significant in estimating the response time for particular WebSources. Our research shows that we can improve the quality of learning by tuning the MDT features, e.g., including significant dimensions in the MDT, or changing the ordering of dimensions. We then demonstrate how the MDT prediction of delay may be used by a scrambling enabled optimizer. A scrambling algorithm identifies some critical points of delay, where it makes a decision to scramble (modify) a plan, to attempt to hide the expected delay by computing some other part of the plan that is unaffected by the delay. We explore the space of real delay at a WebSource, versus the MDT prediction of this delay, with respect to critical points of delay in specific plans. We identify those cases where MDT overestimation or underestimation of the real delay results in a penalty in the scrambling enabled optimizer, and those cases where there is no penalty. Using the experimental data and MDT learning, we test how good the MDT is in minimizing these penalties. Also cross-referenced as UMIACS TR #98-64Item LP Randomized Rounding for Maximum Coverage Problem and Minimum Set Cover with Threshold Problem(2005-10-06T16:28:27Z) Khuller, Samir; Raschid, Louiqa; Wu, YaoThere are abundance of web accessible life science sources. Traversal of a particular path can answer a navigational query, returning a target object set (TOS). The cardinality of TOS is considered as the benefit of a path and there is some cost function associated with each path. It is common that multiple alternate paths satisfy the query and we are not allowed to pick all these paths to answer the query, since there could be exponential number of paths in a graph. We are interested in selecting a subset of these paths. We present two problems in this context. The first problem is to select a subset of paths of maximum benefit within a cost budget. This is known as {\em Budgeted Maximum Coverage Problem} in the literature. The second problem is to select a subset of paths of minimum cost with a threshold benefit guarantee. This is the {\em Minimum Set Cover with Threshold Problem}. We develop randomized approximation algorithms based on LP rounding and conduct experiments.Item LP Randomized Rounding for Maximum Coverage Problem and Minimum Set Cover with Threshold Problem(2006-05-26T18:44:56Z) Khuller, Samir; Raschid, Louiqa; Wu, YaoThere are abundance of web accessible life science sources. Traversal of a particular path can answer a navigational query, returning a target object set (TOS). The cardinality of TOS is considered as the benefit of a path and there is some cost function associated with each path. It is common that multiple alternate paths satisfy the query and we are not allowed to pick all these paths to answer the query, since there could be exponential number of paths in a graph. We are interested in selecting a subset of these paths. We present two problems in this context. The first problem is to select a subset of paths of maximum benefit within a cost budget. This is known as Budgeted Maximum Coverage Problem in the literature. The second problem is to select a subset of paths of minimum cost with a threshold benefit guarantee. This is the Minimum Set Cover with Threshold Problem. We develop randomized approximation algorithms based on LP rounding and conduct experiments.Item Optimized Seamless Integration of Biomolecular Data(2001-11-21) Eckman, Barbara A.; Lacroix, Zoe; Raschid, LouiqaToday, scientific data is inevitably digitized, stored in a wide variety of heterogeneous formats, and is accessible over the Internet. Scientists need to access an integrated view of multiple remote or local heterogeneous data sources. They then integrate the results of complex queries and apply further analysis and visualization to support the task of scientific discovery. Building such a digital library for scientific discovery requires accessing and manipulating data extracted from flat files or databases, documents retrieved from the Web, as well as data that is locally materialized in warehouses or is generated by software. We consider several tasks to provide optimized and seamless integration of biomolecular data. Challenges to be addressed include capturing and representing source capabilities; developing a methodology to acquire and represent semantic knowledge and metadata about source contents, overlap in source contents, and access costs; and decision support to select sources and capabilities using cost based and semantic knowledge, and generating low cost query evaluation plans. (Also referenced as UMIACS-TR-2001-51)