Computer Science Theses and Dissertations
Permanent URI for this collectionhttp://hdl.handle.net/1903/2756
Browse
3 results
Search Results
Item Collective Relational Data Integration with Diverse and Noisy Evidence(2019) Memory, Alexander; Getoor, Lise; Computer Science; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)Driven by the growth of the Internet, online applications, and data sharing initiatives, available structured data sources are now vast in number. There is a growing need to integrate these structured sources to support a variety of data science tasks, including predictive analysis, data mining, improving search results, and generating recommendations. A particularly important integration challenge is dealing with the heterogeneous structures of relational data sources. In addition to the large number of sources, the difficulty also lies in the growing complexity of sources, and in the noise and ambiguity present in real-world sources. Existing automated integration approaches handle the number and complexity of sources, but nearly all are too brittle to handle noise and ambiguity. Corresponding progress has been made in probabilistic learning approaches to handle noise and ambiguity in inputs, but until recently those technologies have not scaled to the size and complexity of relational data integration problems. My dissertation addresses key challenges arising from this gap in existing approaches. I begin the dissertation by introducing a common probabilistic framework for reasoning about both metadata and data in integration problems. I demonstrate that this approach allows us to mitigate noise in metadata. The type of transformation I generate is particularly rich – taking into account multi-relational structure in both the source and target databases. I introduce a new objective for selecting this type of relational transformation and demonstrate its effectiveness on particularly challenging problems in which only partial outputs to the target are possible. Next, I present a novel method for reasoning about ambiguity in integration problems and show it handles complex schemas with many alternative transformations. To discover transformations beyond those derivable from explicit source and target metadata, I introduce an iterative mapping search framework. In a complementary approach, I introduce a framework for reasoning jointly over both transformations and underlying semantic attribute matches, which are allowed to have uncertainty. Finally, I consider an important case in which multiple sources need to be fused but traditional transformations aren’t sufficient. I demonstrate that we can learn statistical transformations for an important practical application with the multiple sources problem.Item Collective Relational Data Integration with Diverse and Noisy Evidence(2019) Memory, Alexander; Getoor, Lise; Computer Science; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)Driven by the growth of the Internet, online applications, and data sharing initiatives, available structured data sources are now vast in number. There is a growing need to integrate these structured sources to support a variety data science tasks, including predictive analysis, data mining, improving search results, and generating recommendations. A particularly important integration challenge is dealing with the heterogeneous structures of relational data sources. In addition to the large number of sources, the difficulty also lies in the growing complexity of sources, and in the noise and ambiguity present in real-world sources. Existing automated integration approaches handle the number and complexity of sources, but nearly all are too brittle to handle noise and ambiguity. Corresponding progress has been made in probabilistic learning approaches to handle noise and ambiguity in inputs, but until recently those technologies have not scaled to the size and complexity of relational data integration problems. My dissertation addresses key challenges arising from this gap in existing approaches. I begin the dissertation by introducing a common probabilistic framework for reasoning about both metadata and data in integration problems. I demonstrate that this approach allows us to mitigate noise in metadata. The type of transformation I generate is particularly rich – taking into account multi-relational structure in both the source and target databases. I introduce a new objective for selecting this type of relational transformation and demonstrate its effectiveness on particularly challenging problems in which only partial outputs to the target are possible. Next, I present a novel method for reasoning about ambiguity in integration problems and show it handles complex schemas with many alternative transformations. To discover transformations beyond those derivable from explicit source and target metadata, I introduce an iterative mapping search framework. In a complementary approach, I introduce a framework for reasoning jointly over both transformations and underlying semantic attribute matches, which are allowed to have uncertainty. Finally, I consider an important case in which multiple sources need to be fused but traditional transformations aren’t sufficient. I demonstrate that we can learn statistical transformations for an important practical application with the multiple sources problem.Item Spatio-Temporal Reasoning About Agent Behavior(2011) Shakarian, Paulo; Subrahmanian, V.S.; Computer Science; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)There are many applications where we wish to reason about spatio-temporal aspects of an agent's behavior. This dissertation examines several facets of this type of reasoning. First, given a model of past agent behavior, we wish to reason about the probability that an agent takes a given action at a certain time. Previous work combining temporal and probabilistic reasoning has made either independence or Markov assumptions. This work introduces Annotated Probabilistic Temporal (APT) logic which makes neither assumption. Statements in APT logic consist of rules of the form "Formula G becomes true with a probability [L,U] within T time units after formula F becomes true'' and can be written by experts or extracted automatically. We explore the problem of entailment - finding the probability that an agent performs a given action at a certain time based on such a model. We study this problem's complexity and develop a sound, but incomplete fixpoint operator as a heuristic - implementing it and testing it on automatically generated models from several datasets. Second, agent behavior often results in "observations'' at geospatial locations that imply the existence of other, unobserved, locations we wish to find ("partners"). In this dissertation, we formalize this notion with "geospatial abduction problems" (GAPs). GAPs try to infer a set of partner locations for a set of observations and a model representing the relationship between observations and partners for a given agent. This dissertation presents exact and approximate algorithms for solving GAPs as well as an implemented software package for addressing these problems called SCARE (the Spatio-Cultural Abductive Reasoning Engine). We tested SCARE on counter-insurgency data from Iraq and obtained good results. We then provide an adversarial extension to GAPs as follows: given a fixed set of observations, if an adversary has probabilistic knowledge of how an agent were to find a corresponding set of partners, he would place the partners in locations that minimize the expected number of partners found by the agent. We examine this problem, along with its complement by studying their computational complexity, developing algorithms, and implementing approximation schemes. We also introduce a class of problems called geospatial optimization problems (GOPs). Here the agent has a set of actions that modify attributes of a geospatial region and he wishes to select a limited number of such actions (with respect to some budget and other constraints) in a manner that maximizes a benefit function. We study the complexity of this problem and develop exact methods. We then develop an approximation algorithm with a guarantee. For some real-world applications, such as epidemiology, there is an underlying diffusion process that also affects geospatial proprieties. We address this with social network optimization problems (SNOPs) where given a weighted, labeled, directed graph we seek to find a set of vertices, that if given some initial property, optimize an aggregate study with respect to such diffusion. We develop and implement a heuristic that obtains a guarantee for a large class of such problems.