Collective Relational Data Integration with Diverse and Noisy Evidence

Memory, Alexander

Collective Relational Data Integration with Diverse and Noisy Evidence

dc.contributor.advisor	Getoor, Lise	en_US
dc.contributor.author	Memory, Alexander	en_US
dc.contributor.department	Computer Science	en_US
dc.contributor.publisher	Digital Repository at the University of Maryland	en_US
dc.contributor.publisher	University of Maryland (College Park, Md.)	en_US
dc.date.accessioned	2020-02-14T06:30:13Z
dc.date.available	2020-02-14T06:30:13Z
dc.date.issued	2019	en_US
dc.description.abstract	Driven by the growth of the Internet, online applications, and data sharing initiatives, available structured data sources are now vast in number. There is a growing need to integrate these structured sources to support a variety of data science tasks, including predictive analysis, data mining, improving search results, and generating recommendations. A particularly important integration challenge is dealing with the heterogeneous structures of relational data sources. In addition to the large number of sources, the difficulty also lies in the growing complexity of sources, and in the noise and ambiguity present in real-world sources. Existing automated integration approaches handle the number and complexity of sources, but nearly all are too brittle to handle noise and ambiguity. Corresponding progress has been made in probabilistic learning approaches to handle noise and ambiguity in inputs, but until recently those technologies have not scaled to the size and complexity of relational data integration problems. My dissertation addresses key challenges arising from this gap in existing approaches. I begin the dissertation by introducing a common probabilistic framework for reasoning about both metadata and data in integration problems. I demonstrate that this approach allows us to mitigate noise in metadata. The type of transformation I generate is particularly rich – taking into account multi-relational structure in both the source and target databases. I introduce a new objective for selecting this type of relational transformation and demonstrate its effectiveness on particularly challenging problems in which only partial outputs to the target are possible. Next, I present a novel method for reasoning about ambiguity in integration problems and show it handles complex schemas with many alternative transformations. To discover transformations beyond those derivable from explicit source and target metadata, I introduce an iterative mapping search framework. In a complementary approach, I introduce a framework for reasoning jointly over both transformations and underlying semantic attribute matches, which are allowed to have uncertainty. Finally, I consider an important case in which multiple sources need to be fused but traditional transformations aren’t sufficient. I demonstrate that we can learn statistical transformations for an important practical application with the multiple sources problem.	en_US
dc.identifier	https://doi.org/10.13016/giv3-fotq
dc.identifier.uri	http://hdl.handle.net/1903/25568
dc.language.iso	en	en_US
dc.subject.pqcontrolled	Computer science	en_US
dc.subject.pquncontrolled	Data integration	en_US
dc.subject.pquncontrolled	Probabilistic reasoning	en_US
dc.subject.pquncontrolled	Schema mapping	en_US
dc.subject.pquncontrolled	Structured prediction	en_US
dc.title	Collective Relational Data Integration with Diverse and Noisy Evidence	en_US
dc.type	Dissertation	en_US

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Memory_umd_0117E_20497.pdf
Size:: 2.11 MB
Format:: Adobe Portable Document Format

Download

Collections

UMD Theses and Dissertations
Computer Science Theses and Dissertations