Computer Science Theses and Dissertations

Permanent URI for this collectionhttp://hdl.handle.net/1903/2756

Browse

Search Results

Now showing 1 - 10 of 229
  • Thumbnail Image
    Item
    On the Foundations of Data Interoperability and Semantic Search on the Web
    (2011) Haidarian Shahri, Hamid; Perlis, Donald; Computer Science; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    This dissertation studies the problem of facilitating semantic search across disparate ontologies that are developed by different organizations. There is tremendous potential in enabling users to search independent ontologies and discover knowledge in a serendipitous fashion, i.e., often completely unintended by the developers of the ontologies. The main difficulty with such search is that users generally do not have any control over the naming conventions and content of the ontologies. Thus terms must be appropriately mapped across ontologies based on their meaning. The meaning-based search of data is referred to as semantic search, and its facilitation (aka semantic interoperability) then requires mapping between ontologies. In relational databases, searching across organizational boundaries currently involves the difficult task of setting up a rigid information integration system. Linked Data representations more flexibly tackle the problem of searching across organizational boundaries on the Web. However, there exists no consensus on how ontology mapping should be performed for this scenario, and the problem is open. We lay out the foundations of semantic search on the Web of Data by comparing it to keyword search in the relational model and by providing effective mechanisms to facilitate data interoperability across organizational boundaries. We identify two sharply distinct goals for ontology mapping based on real-world use cases. These goals are: (i) ontology development, and (ii) facilitating interoperability. We systematically analyze these goals, side-by-side, and contrast them. Our analysis demonstrates the implications of the goals on how to perform ontology mapping and how to represent the mappings. We rigorously compare facilitating interoperability between ontologies to information integration in databases. Based on the comparison, class matching is emphasized as a critical part of facilitating interoperability. For class matching, various class similarity metrics are formalized and an algorithm that utilizes these metrics is designed. We also experimentally evaluate the effectiveness of the class similarity metrics on real-world ontologies. In order to encode the correspondences between ontologies for interoperability, we develop a novel W3C-compliant representation, named skeleton.
  • Thumbnail Image
    Item
    Understanding Scientific Literature Networks: Case Study Evaluations of Integrating Visualizations and Statistics
    (2011) Gove, Robert Paul; Shneiderman, Ben; Computer Science; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    Investigators frequently need to quickly learn new research domains in order to advance their research. This thesis presents five contributions to understanding how software helps researchers explore scientific literature networks. (1) A taxonomy which summarizes capabilities in existing bibliography tools, revealing patterns of capabilities by system type. (2) Six participants in two user studies evaluate Action Science Explorer (ASE), which is designed to create surveys of scientific literature and integrates visualizations and statistics. Users found document-level statistics and attribute rankings to be convenient when beginning literature exploration. (3) User studies also identify users' questions when exploring academic literature, which include examining the evolution of a field, identifying author relationships, and searching for review papers. (4) The evaluations suggest shortcomings of ASE, and this thesis outlines improvements to ASE and lists user requirements for bibliographic exploration. (5) I recommend strategies for evaluating bibliographic exploration tools based on experiences evaluating ASE.
  • Thumbnail Image
    Item
    MATRIX REDUCTION IN NUMERICAL OPTIMIZATION
    (2011) Park, Sungwoo; O'Leary, Dianne P; Computer Science; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    Matrix reduction by eliminating some terms in the expansion of a matrix has been applied to a variety of numerical problems in many different areas. Since matrix reduction has different purposes for particular problems, the reduced matrices also have different meanings. In regression problems in statistics, the reduced parts of the matrix are considered to be noise or observation error, so the given raw data are purified by the matrix reduction. In factor analysis and principal component analysis (PCA), the reduced parts are regarded as idiosyncratic (unsystematic) factors, which are not shared by multiple variables in common. In solving constrained convex optimization problems, the reduced terms correspond to unnecessary (inactive) constraints which do not help in the search for an optimal solution. In using matrix reduction, it is both critical and difficult to determine how and how much we will reduce the matrix. This decision is very important since it determines the quality of the reduced matrix and the final solution. If we reduce too much, fundamental properties will be lost. On the other hand, if we reduce too little, we cannot expect enough benefit from the reduction. It is also a difficult decision because the criteria for the reduction must be based on the particular type of problem. In this study, we investigatematrix reduction for three numerical optimization problems. First, the total least squares problem uses matrix reduction to remove noise in observed data which follow an underlying linear model. We propose a new method to make the matrix reduction successful under relaxed noise assumptions. Second, we apply matrix reduction to the problem of estimating a covariance matrix of stock returns, used in financial portfolio optimization problem. We summarize all the previously proposed estimation methods in a common framework and present a new and effective Tikhonov method. Third, we present a new algorithm to solve semidefinite programming problems, adaptively reducing inactive constraints. In the constraint reduction, the Schur complement matrix for the Newton equations is the object of the matrix reduction. For all three problems, we propose appropriate criteria to determine the intensity of the matrix reduction. In addition, we verify the correctness of our criteria by experimental results and mathematical proof.
  • Thumbnail Image
    Item
    USING AND MANIPULATING PROBABILISTIC CONNECTIVITY IN SOCIAL NETWORKS
    (2011) DuBois, Thomas M.; Srinivasan, Aravind; Computer Science; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    Probabilistic connectivity problems arise naturally in many social networks. In particular the spread of an epidemic across a population and social trust inference motivate much of our work. We examine problems where some property, such as an infection or influence, starts from some initially seeded set of nodes and every affected node transmits the property to its neighbors with a probability determined by the connecting edge. Many problems in this area involve connectivity in a random- graph - the probability of a node being affected is the probability that there is a path to it in the random-graph from one of the seed nodes. We may wish to aid, disrupt, or simply monitor this connectivity. In our core applications, public health officials wish to minimize an epidemic's spread over a population, and connectivity in a social network suggests how closely tied its users are. In support of these and other applications, we study several combinatorial optimization problems on random-graphs. We derive algorithms and demonstrate their effectiveness through simulation, mathematical proof, or both.
  • Thumbnail Image
    Item
    Diamond-based models for scientific visualization
    (2011) Weiss, Kenneth; De Floriani, Leila; Computer Science; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    Hierarchical spatial decompositions are a basic modeling tool in a variety of application domains including scientific visualization, finite element analysis and shape modeling and analysis. A popular class of such approaches is based on the regular simplex bisection operator, which bisects simplices (e.g. line segments, triangles, tetrahedra) along the midpoint of a predetermined edge. Regular simplex bisection produces adaptive simplicial meshes of high geometric quality, while simplifying the extraction of crack-free, or conforming, approximations to the original dataset. Efficient multiresolution representations for such models have been achieved in 2D and 3D by clustering sets of simplices sharing the same bisection edge into structures called diamonds. In this thesis, we introduce several diamond-based approaches for scientific visualization. We first formalize the notion of diamonds in arbitrary dimensions in terms of two related simplicial decompositions of hypercubes. This enables us to enumerate the vertices, simplices, parents and children of a diamond. In particular, we identify the number of simplices involved in conforming updates to be factorial in the dimension and group these into a linear number of subclusters of simplices that are generated simultaneously. The latter form the basis for a compact pointerless representation for conforming meshes generated by regular simplex bisection and for efficiently navigating the topological connectivity of these meshes. Secondly, we introduce the supercube as a high-level primitive on such nested meshes based on the atomic units within the underlying triangulation grid. We propose the use of supercubes to associate information with coherent subsets of the full hierarchy and demonstrate the effectiveness of such a representation for modeling multiresolution terrain and volumetric datasets. Next, we introduce Isodiamond Hierarchies, a general framework for spatial access structures on a hierarchy of diamonds that exploits the implicit hierarchical and geometric relationships of the diamond model. We use an isodiamond hierarchy to encode irregular updates to a multiresolution isosurface or interval volume in terms of regular updates to diamonds. Finally, we consider nested hypercubic meshes, such as quadtrees, octrees and their higher dimensional analogues, through the lens of diamond hierarchies. This allows us to determine the relationships involved in generating balanced hypercubic meshes and to propose a compact pointerless representation of such meshes. We also provide a local diamond-based triangulation algorithm to generate high-quality conforming simplicial meshes.
  • Thumbnail Image
    Item
    Time-based Location Techniques Using Inexpensive, Unsynchronized Clocks in Wireless Networks
    (2011) Mah, Matthew Yew Mun; Agrawala, Ashok K; Computer Science; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    The ability to measure location using time of flight in IEEE 802.11 networks is impeded by the standard clock resolution, imprecise synchronization of the 802.11 protocol, and the inaccuracy of available clocks. To achieve real-time location with accuracy goals of a few meters, we derive new consensus synchronization techniques for free-running clocks. Using consensus synchronization, we improve existing time of arrival (TOA) techniques and introduce new time difference of arrival (TDOA) techniques. With this common basis, we show how TOA is theoretically superior to TDOA. Using TOA measurements, we can locate wireless nodes that participate in the location system, and using TDOA measurements, we can locate nodes that do not participate. We demonstrate applications using off-the-shelf 802.11 hardware that can determine location to within 3m using simple, existing optimization methods. The synchronization techniques extend existing ones providing distributed synchronization for free-running clocks to cases where send times cannot be controlled and adjusted precisely, as in 802.11 networks. These location and synchronization techniques may be applied to transmitting wireless nodes using any communication protocol where cooperating nodes can produce send and receive timestamps.
  • Thumbnail Image
    Item
    A Computational Theory of the Use-Mention Distinction in Natural Language
    (2011) Wilson, Shomir; Perlis, Donald R.; Computer Science; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    To understand the language we use, we sometimes must turn language on itself, and we do this through an understanding of the use-mention distinction. In particular, we are able to recognize mentioned language: that is, tokens (e.g., words, phrases, sentences, letters, symbols, sounds) produced to draw attention to linguistic properties that they possess. Evidence suggests that humans frequently employ the use-mention distinction, and we would be severely handicapped without it; mentioned language frequently occurs for the introduction of new words, attribution of statements, explanation of meaning, and assignment of names. Moreover, just as we benefit from mutual recognition of the use-mention distinction, the potential exists for us to benefit from language technologies that recognize it as well. With a better understanding of the use-mention distinction, applications can be built to extract valuable information from mentioned language, leading to better language learning materials, precise dictionary building tools, and highly adaptive computer dialogue systems. This dissertation presents the first computational study of how the use-mention distinction occurs in natural language, with a focus on occurrences of mentioned language. Three specific contributions are made. The first is a framework for identifying and analyzing instances of mentioned language, in an effort to reconcile elements of previous theoretical work for practical use. Definitions for mentioned language, metalanguage, and quotation have been formulated, and a procedural rubric has been constructed for labeling instances of mentioned language. The second is a sequence of three labeled corpora of mentioned language, containing delineated instances of the phenomenon. The corpora illustrate the variety of mentioned language, and they enable analysis of how the phenomenon relates to sentence structure. Using these corpora, inter-annotator agreement studies have quantified the concurrence of human readers in labeling the phenomenon. The third contribution is a method for identifying common forms of mentioned language in text, using patterns in metalanguage and sentence structure. Although the full breadth of the phenomenon is likely to elude computational tools for the foreseeable future, some specific, common rules for detecting and delineating mentioned language have been shown to perform well.
  • Thumbnail Image
    Item
    Learning Visual Patterns: Imposing Order on Objects, Trajectories and Networks
    (2011) Farrell, Ryan; Davis, Larry S; Computer Science; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    Fundamental to many tasks in the field of computer vision, this work considers the understanding of observed visual patterns in static images and dynamic scenes . Within this broad domain, we focus on three particular subtasks, contributing novel solutions to: (a) the subordinate categorization of objects (avian species specifically), (b) the analysis of multi-agent interactions using the agent trajectories, and (c) the estimation of camera network topology. In contrast to object recognition, where the presence or absence of certain parts is generally indicative of basic-level category, the problem of subordinate categorization rests on the ability to establish salient distinctions amongst the characteristics of those parts which comprise the basic-level category. Focusing on an avian domain due to the fine-grained structure of the category taxonomy, we explore a pose-normalized appearance model based on a volumetric poselet scheme. The variation in shape and appearance properties of these parts across a taxonomy provides the cues needed for subordinate categorization. Our model associates the underlying image pattern parameters used for detection with corresponding volumetric part location, scale and orientation parameters. These parameters implicitly define a mapping from the image pixels into a pose-normalized appearance space, removing view and pose dependencies, facilitating fine-grained categorization with relatively few training examples. We next examine the problem of leveraging trajectories to understand interactions in dynamic multi-agent environments. We focus on perceptual tasks, those for which an agent's behavior is governed largely by the individuals and objects around them. We introduce kinetic accessibility, a model for evaluating the perceived, and thus anticipated, movements of other agents. This new model is then applied to the analysis of basketball footage. The kinetic accessibility measures are coupled with low-level visual cues and domain-specific knowledge for determining which player has possession of the ball and for recognizing events such as passes, shots and turnovers. Finally, we present two differing approaches for estimating camera network topology. The first technique seeks to partition a set of observations made in the camera network into individual object trajectories. As exhaustive consideration of the partition space is intractable, partitions are considered incrementally, adding observations while pruning unlikely partitions. Partition likelihood is determined by the evaluation of a probabilistic graphical model, balancing the consistency of appearances across a hypothesized trajectory with the latest predictions of camera adjacency. A primarily benefit of estimating object trajectories is that higher-order statistics, as opposed to just first-order adjacency, can be derived, yielding resilience to camera failure and the potential for improved tracking performance between cameras. Unlike the former centralized technique, the latter takes a decentralized approach, estimating the global network topology with local computations using sequential Bayesian estimation on a modified multinomial distribution. Key to this method is an information-theoretic appearance model for observation weighting. The inherently distributed nature of the approach allows the simultaneous utilization of all sensors as processing agents in collectively recovering the network topology.
  • Thumbnail Image
    Item
    Spatio-Temporal Reasoning About Agent Behavior
    (2011) Shakarian, Paulo; Subrahmanian, V.S.; Computer Science; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    There are many applications where we wish to reason about spatio-temporal aspects of an agent's behavior. This dissertation examines several facets of this type of reasoning. First, given a model of past agent behavior, we wish to reason about the probability that an agent takes a given action at a certain time. Previous work combining temporal and probabilistic reasoning has made either independence or Markov assumptions. This work introduces Annotated Probabilistic Temporal (APT) logic which makes neither assumption. Statements in APT logic consist of rules of the form "Formula G becomes true with a probability [L,U] within T time units after formula F becomes true'' and can be written by experts or extracted automatically. We explore the problem of entailment - finding the probability that an agent performs a given action at a certain time based on such a model. We study this problem's complexity and develop a sound, but incomplete fixpoint operator as a heuristic - implementing it and testing it on automatically generated models from several datasets. Second, agent behavior often results in "observations'' at geospatial locations that imply the existence of other, unobserved, locations we wish to find ("partners"). In this dissertation, we formalize this notion with "geospatial abduction problems" (GAPs). GAPs try to infer a set of partner locations for a set of observations and a model representing the relationship between observations and partners for a given agent. This dissertation presents exact and approximate algorithms for solving GAPs as well as an implemented software package for addressing these problems called SCARE (the Spatio-Cultural Abductive Reasoning Engine). We tested SCARE on counter-insurgency data from Iraq and obtained good results. We then provide an adversarial extension to GAPs as follows: given a fixed set of observations, if an adversary has probabilistic knowledge of how an agent were to find a corresponding set of partners, he would place the partners in locations that minimize the expected number of partners found by the agent. We examine this problem, along with its complement by studying their computational complexity, developing algorithms, and implementing approximation schemes. We also introduce a class of problems called geospatial optimization problems (GOPs). Here the agent has a set of actions that modify attributes of a geospatial region and he wishes to select a limited number of such actions (with respect to some budget and other constraints) in a manner that maximizes a benefit function. We study the complexity of this problem and develop exact methods. We then develop an approximation algorithm with a guarantee. For some real-world applications, such as epidemiology, there is an underlying diffusion process that also affects geospatial proprieties. We address this with social network optimization problems (SNOPs) where given a weighted, labeled, directed graph we seek to find a set of vertices, that if given some initial property, optimize an aggregate study with respect to such diffusion. We develop and implement a heuristic that obtains a guarantee for a large class of such problems.
  • Thumbnail Image
    Item
    Scalable Techniques for Behavioral Analysis and Forecasting
    (2011) Sliva, Amy; Subrahmanian, V.S.; Computer Science; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    The ability to model, forecast, and analyze the behaviors of other agents has applications in many diverse contexts. For example, behavioral models can be used in multi-player games to forecast an opponent's next move, in economics to forecast a merger decision by a CEO, or in international politics to predict the behavior of a rival state or group. Such models can facilitate formulation of effective mitigating responses and provide a foundation for decision-support technologies. Behavioral modeling is a computationally challenging problem--real world data sets can contain on the order of 10^30,000 possible behaviors in any given situation. This work presents several scalable frameworks for modeling and forecasting agent behavior, particularly in the realm of international security dynamics. A probabilistic logic formalism for modeling and forecasting behavior is described, as well as distributed algorithms for efficient reasoning in this framework. To further cope with the scale of this problem, forecasting methods are also introduced that operate directly on time series data, rather than an intermediate behavioral model, to forecast actions and situations at some time in the future. Agent behavior can be adaptive, and in rare circumstances can deviate from the statistically "normal" past behavior. A system is also presented that can forecast when and how such behavioral changes will occur. These forecasting techniques, as well as any arbitrary time series forecasting approach, can be classified by a general axiomatic framework for forecasting in temporal databases. The knowledge gained from behavioral models and forecasts can be employed by decision-makers to develop effective response policies. An efficient framework is provided for identifying the optimal changes to the state of the world to elicit desired behaviors from another agent, balancing cost with likelihood of success. These modeling and analysis tools have also been incorporated into a prototype decision-support system and used in several case studies of real-world international security situations.