Computer Science Theses and Dissertations
Permanent URI for this collectionhttp://hdl.handle.net/1903/2756
Browse
276 results
Search Results
Item On Numerical Analysis in Residue Number Systems(1964) Lindamood, George Edward; Rheinboldt, Werner C.; Computer Science Center; Digital Repository at the University of Maryland; University of Maryland (College Park, Md)Recent attempts to utilize residue number systems in digital computers have raised numerous questions about adapting the techniques of numerical analysis to residue number systems. Among these questions are the fundamental problems of how to compare the magnitudes of two numbers, how to detect additive and multiplicative overflow, and how to divide in residue number systems. These three problems are treated in separate chapters of this thesis and methods are developed therein whereby magnitude comparison, overflow detection, and division can be performed in residue number systems. In an additional chapter, the division method is extended to provide an algorithm for the direct approximation of square roots in residue number systems. Numerous examples are provided illustrating the nature of the problems considered and showing the use of the solutions presented in practical computations. In a final chapter are presented the results of extensive trial calculations for which a conventional digital computer was programmed to simulate the use of the division and square root algorithms in approximating quotients and square roots in residue number systems. These results indicate that, in practice, these division and square root algorithms usually converge to the quotient or square root somewhat faster than is suggested by the theory.Item Restructuring Textual Information for Online Retrieval(1985) Koved, Lawrence; Shneiderman, Ben; Computer Science; Digital Repository at the University of Maryland; University of Maryland (College Park, Md)Two experiments were conducted to evaluate two styles of online documents. The first experiment compared paper manuals to online manuals using two different database structuring techniques - a sequential (linear) structure and a tree structure. People using the paper manuals were faster at solving problems than the people using the computer manuals. No differences were found between the linear and tree structures, or in accuracy of problem solutions. In a subjective evaluation of user preferences, the computer manuals were rated as better and more organized than the paper manuals. The second experiment compared two methods of retrieving online information that allowed the reader to specify the attributes needed to guide the information retrieval process. The first manual recorded the attributes entered by the reader via menus, and material in the manuals not relevant to the current search was pruned from the search space. The second manual did not record the menu selections, and the readers repeatedly entered the attributes several times in order to complete the task. The manual that recorded the attributes allowed the readers to work over twice as fast and was pref erred over the other manual. A theoretical foundation is presented for the underlying online documentation used in the experiments. The user's traversal through the database is presented as a graph search process, using a production system. The results of the experiments and their theoretical foundations are evaluated in terms of the impact they might have on future online document storage and retrieval systems.Item Adaptive Database Systems Based On Query Feedback and Cached Results(1994) Chen, Chung-Min; Roussopoulos, Nick; Computer Science; Digital Repository at the University of Maryland; University of Maryland (College Park, Md)This dissertation explores the query optimization technique of using cached results and feedback for improving performance of database systems. Cached results and experience obtained by running queries are used to save execution time for follow–up queries, adapt data and system parameters, and improve overall system performance. First, we develop a framework which integrates query optimization and cache management. The optimizer is capable of generating efficient query plans using previous query results cached on the disk. Alternative methods to access and update the caches are considered by the optimizer based on cost estimation. Different cache management strategies are also included in this framework for comparison. Empirical performance study verifies the advantage and practicality of this framework. To help the optimizer in selecting the best plan, we propose a novel approach for providing accurate but cost-effective selectivity estimation. Distribution of attribute values is regressed in real time, using actual query result sizes obtained as feedback, to make accurate selectivity estimation. This method avoids the expensive off-line database access overhead required by the conventional methods and adapts fairly well to updates and query locality. This is verified empirically. To execute a query plan more efficiently, a buffer pool is usually provided for caching data pages in memory to reduce disk accesses. We enhance buffer utilization by devising a buffer allocation scheme for recurring queries using page fault feedback obtained from previous executions. Performance improvement of this scheme is shown by empirical examples and a systematic simulation.Item Treemaps: Visualizing Hierarchical and Categorical Data(1993) Johnson, Brian Scott; Shneiderman, BenTreemaps are a graphical method for the visualization of hierarchical and categorical data sets. Treemap presentations of data shift mental workload from the cognitive to the perceptual systems, taking advantage of the human visual processing system to increase the bandwidth of the human-computer interface. Efficient use of display space allows for the simultaneous presentation of thousands of data records, as well as facilitating the presentation of semantic information. Treemaps let users see the forest and the trees by providing local detail in the context of a global overview, providing a visually engaging environment in which to analyze, search, explore and manipulate large data sets. The treemap method of hierarchical visualization, at its core, is based on the property of containment. This property of containment is a fundamental idea which powerfully encapsulates many of our reasons for constructing information hierarchies. All members of the treemap family of algorithms partition multi-dimensional display spaces based on weighted hierarchical data sets. In addition to generating treemaps and standard traditional hierarchical diagrams, the treemap algorithms extend non-hierarchical techniques such as bar and pie charts into the domain of hierarchical presentation. Treemap algorithms can be used to generate bar charts, outlines, traditional 2-D node and link diagrams, pie charts, cone trees, cam trees, drum trees, etc. Generating existing diagrams via treemap transformations is an exercise meant to show the power, ease, and generality with which alternative presentations can be generated from the basic treemap algorithms. Two controlled experiments with novice treemap users and real data highlight the strengths of treemaps. The first experiment with 12 subjects compares the Macintosh TreeVizTM implementation of treemaps with the UNIX command line for questions dealing with a 530 node file hierarchy. Treemaps are shown to significantly reduce user performance times for global file comparison tasks. A second experiment with 40 subjects compares treemaps with dynamic outlines for questions dealing with the allocation funds in the 1992 US Budget (357 node budget hierarchy). Treemap users are 50% faster overall and as much as 8 times faster for specific questions.Item COMPUTING APPROXIMATE CUSTOMIZED RANKING(2009) Wu, Yao; Raschid, Louiqa; Computer Science; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)As the amount of information grows and as users become more sophisticated, ranking techniques become important building blocks to meet user needs when answering queries. PageRank is one of the most successful link-based ranking methods, which iteratively computes the importance scores for web pages based on the importance scores of incoming pages. Due to its success, PageRank has been applied in a number of applications that require customization. We address the scalability challenges for two types of customized ranking. The first challenge is to compute the ranking of a subgraph. Various Web applications focus on identifying a subgraph, such as focused crawlers and localized search engines. The second challenge is to compute online personalized ranking. Personalized search improves the quality of search results for each user. The user needs are represented by a personalized set of pages or personalized link importance in an entity relationship graph. This requires an efficient online computation. To solve the subgraph ranking problem efficiently, we estimate the ranking scores for a subgraph. We propose a framework of an exact solution (IdealRank) and an approximate solution (ApproxRank) for computing ranking on a subgraph. Both IdealRank and ApproxRank represent the set of external pages with an external node $\Lambda$ and modify the PageRank-style transition matrix with respect to $\Lambda$. The IdealRank algorithm assumes that the scores of external pages are known. We prove that the IdealRank scores for pages in the subgraph converge to the true PageRank scores. Since the PageRank-style scores of external pages may not typically be available, we propose the ApproxRank algorithm to estimate scores for the subgraph. We analyze the $L_1$ distance between IdealRank scores and ApproxRank scores of the subgraph and show that it is within a constant factor of the $L_1$ distance of the external pages. We demonstrate with real and synthetic data that ApproxRank provides a good approximation to PageRank for a variety of subgraphs. We consider online personalization using ObjectRank; it is an authority flow based ranking for entity relationship graphs. We formalize the concept of an aggregate surfer on a data graph; the surfer's behavior is controlled by multiple personalized rankings. We prove a linearity theorem over these rankings which can be used as a tool to scale this type of personalization. DataApprox uses a repository of precomputed rankings for a given set of link weights assignments. We define DataApprox as an optimization problem; it selects a subset of the precomputed rankings from the repository and produce a weighted combination of these rankings. We analyze the $L_1$ distance between the DataApprox scores and the real authority flow ranking scores and show that DataApprox has a theoretical bound. Our experiments on the DBLP data graph show that DataApprox performs well in practice and allows fast and accurate personalized authority flow ranking.Item Lexical Features for Statistical Machine Translation(2009) Devlin, Jacob; Dorr, Bonnie; Computer Science; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)In modern phrasal and hierarchical statistical machine translation systems, two major features model translation: rule translation probabilities and lexical smoothing scores. The rule translation probabilities are computed as maximum likelihood estimates (MLEs) of an entire source (or target) phrase translating to a target (or source) phrase. The lexical smoothing scores are also a likelihood estimate of a source (target) phrase translating to a target (source) phrase, but they are computed using independent word-to-word translation probabilities. Intuitively, it would seem that the lexical smoothing score is a less powerful estimate of translation likelihood due to this independence assumption, but I present the somewhat surprising result that lexical smoothing is far more important to the quality of a state-of-the-art hierarchical SMT system than rule translation probabilities. I posit that this is due to a fundamental data sparsity problem: The average word-to-word translation is seen many more times than the average phrase-to-phrase translation, so the word-to-word translation probabilities (or lexical probabilities) are far better estimated. Motivated by this result, I present a number of novel methods for modifying the lexical probabilities to improve the quality of our MT output. First, I examine two methods of lexical probability biasing, where for each test document, a set of secondary lexical probabilities are extracted and interpolated with the primary lexical probability distribution. Biasing each document with the probabilities extracted from its own first-pass decoding output provides a small but consistent gain of about 0.4 BLEU. Second, I contextualize the lexical probabilities by factoring in additional information such as the previous or next word. The key to the success of this context-dependent lexical smoothing is a backoff model, where our "trust" of a context-dependent probability estimation is directly proportional to how many times it was seen in the training. In this way, I avoid the estimation problem seen in translation rules, where the amount of context is high but the probability estimation is inaccurate. When using the surrounding words as context, this feature provides a gain of about 0.6 BLEU on Arabic and Chinese. Finally, I describe several types of discriminatively trained lexical features, along with a new optimization procedure called Expected-BLEU optimization. This new optimization procedure is able to robustly estimate weights for thousands of decoding features, which can in effect discriminatively optimize a set of lexical probabilities to maximize BLEU. I also describe two other discriminative feature types, one of which is the part-of-speech analogue to lexical probabilities, and the other of which estimates training corpus weights based on lexical translations. The discriminative features produce a gain of 0.8 BLEU on Arabic and 0.4 BLEU on Chinese.Item Sequential Search With Ordinal Ranks and Cardinal Values: An Infinite Discounted Secretary Problem(2009) Palley, Asa Benjamin; Cramton, Peter; Applied Mathematics and Scientific Computation; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)We consider an extension of the classical secretary problem where a decision maker observes only the relative ranks of a sequence of up to N applicants, whose true values are i.i.d. U[0,1] random variables. Applicants arrive according to a homogeneous Poisson Process, and the decision maker seeks to maximize the expected time-discounted value of the applicant who she ultimately selects. This provides a straightforward and natural objective while retaining the structure of limited information based on relative ranks. We derive the optimal policy in the sequential search, and show that the solution converges as N goes to infinity. We compare these results with a closely related full information problem in order to quantify these informational limitations.Item Combinatorial Problems in Online Advertising(2009) Malekian, Azarakhsh; Khuller, Samir; Computer Science; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)Electronic commerce or eCommerce refers to the process of buying and selling of goods and services over the Internet. In fact, the Internet has completely transformed traditional media based advertising so much so that billions of dollars of advertising revenue is now flowing to search companies such as Microsoft, Yahoo! and Google. In addition, the new advertising landscape has opened up the advertising industry to all players, big and small. However, this transformation has led to a host of new problems faced by the search companies as they make decisions about how much to charge for advertisements, whose ads to display to users, and how to maximize their revenue. In this thesis we focus on an entire suite of problems motivated by the central question of "Which advertisement to display to which user?". Targeted advertisement happens when a user enters a relevant search query. The ads are usually displayed on the sides of the search result page. Internet advertising also takes place by displaying ads on the side of webpages with relevant content. While large advertisers (e.g. Coca Cola) pursue brand recognition by advertisement, small advertisers are happy with instant revenue as a result of a user following their ad and performing a desired action (e.g. making a purchase). Therefore, small advertisers are often happy to get any ad slot related to their ad while large advertisers prefer contracts that will guarantee that their ads will be delivered to enough number of desired users. We first focus on two problems that come up in the context of small advertisers. The first problem we consider deals with the allocation of ads to slots considering the fact that users enter search queries over a period of time, and as a result the slots become available gradually. We use a greedy method for allocation and show that the online ad allocation problem with a fixed distribution of queries over time can be modeled as maximizing a continuous non-decreasing submodular sequence function for which we can guarantee a solution with a factor of at least (1- 1/e) of the optimal. The second problem we consider is query rewriting problem in the context of keyword advertisement. This problem can be posed as a family of graph covering problems to maximize profit. We obtain constant-factor approximation algorithms for these covering problems under two sets of constraints and a realistic notion of ad benefit. We perform experiments on real data and show that our algorithms are capable of outperforming a competitive baseline algorithm in terms of the benefit due to rewrites. We next consider two problems related to premium customers, who need guaranteed delivery of a large number of ads for the purpose of brand recognition and would require signing a contract. In this context, we consider the allocation problem with the objective of maximizing either revenue or fairness. The problems considered in this thesis address just a few of the current challenges in e-Commerce and Internet Advertising. There are many interesting new problems arising in this field as the technology evolves and online-connectivity through interactive media and the internet become ubiquitous. We believe that this is one of the areas that will continue to receive greater attention by researchers in the near future.Item Algorithmic issues in visual object recognition(2009) Hussein, Mohamed Elsayed Ahmed; Davis, Larry; Computer Science; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)This thesis is divided into two parts covering two aspects of research in the area of visual object recognition. Part I is about human detection in still images. Human detection is a challenging computer vision task due to the wide variability in human visual appearances and body poses. In this part, we present several enhancements to human detection algorithms. First, we present an extension to the integral images framework to allow for constant time computation of non-uniformly weighted summations over rectangular regions using a bundle of integral images. Such computational element is commonly used in constructing gradient-based feature descriptors, which are the most successful in shape-based human detection. Second, we introduce deformable features as an alternative to the conventional static features used in classifiers based on boosted ensembles. Deformable features can enhance the accuracy of human detection by adapting to pose changes that can be described as translations of body features. Third, we present a comprehensive evaluation framework for cascade-based human detectors. The presented framework facilitates comparison between cascade-based detection algorithms, provides a confidence measure for result, and deploys a practical evaluation scenario. Part II explores the possibilities of enhancing the speed of core algorithms used in visual object recognition using the computing capabilities of Graphics Processing Units (GPUs). First, we present an implementation of Graph Cut on GPUs, which achieves up to 4x speedup against compared to a CPU implementation. The Graph Cut algorithm has many applications related to visual object recognition such as segmentation and 3D point matching. Second, we present an efficient sparse approximation of kernel matrices for GPUs that can significantly speed up kernel based learning algorithms, which are widely used in object detection and recognition. We present an implementation of the Affinity Propagation clustering algorithm based on this representation, which is about 6 times faster than another GPU implementation based on a conventional sparse matrix representation.Item Combining Static and Dynamic Typing in Ruby(2009) Furr, Michael; Foster, Jeffrey S; Computer Science; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)Many popular scripting languages such as Ruby, Python, and Perl are dynamically typed. Dynamic typing provides many advantages such as terse, flexible code and the ability to use highly dynamic language constructs, such as an eval method that evaluates a string as program text. However these dynamic features have traditionally obstructed static analyses leaving the programmer without the benefits of static typing, including early error detection and the documentation provided by type annotations. In this dissertation, we present Diamondback Ruby (DRuby), a tool that blends static and dynamic typing for Ruby. DRuby provides a type language that is rich enough to precisely type Ruby code, without unneeded complexity. DRuby uses static type inference to automatically discover type errors in Ruby programs and provides a type annotation language that serves as verified documentation of a method's behavior. When necessary, these annotations can be checked dynamically using runtime contracts. This allows statically and dynamically checked code to safely coexist, and any runtime errors are properly blamed on dynamic code. To handle dynamic features such as eval, DRuby includes a novel dynamic analysis and transformation that gathers per-application profiles of dynamic feature usage via a program's test suite. Based on these profiles, DRuby transforms the program before applying its type inference algorithm, enforcing type safety for dynamic constructs. By leveraging a program's test suite, our technique gives the programmer an easy to understand trade-off: the more dynamic features covered by their tests, the more static checking is achieved. We evaluated DRuby on a benchmark suite of sample Ruby programs. We found that our profile-guided analysis and type inference algorithms worked well, discovering several previously unknown type errors. Furthermore, our results give us insight into what kind of Ruby code programmers ``want'' to write but is not easily amenable to traditional static typing. This dissertation shows that it is possible to effectively integrate static typing into Ruby without losing the feel of a dynamic language.