Identifying Graphs from Noisy Observational Data

Namata Jr., Galile Mark Supapo

Identifying Graphs from Noisy Observational Data

dc.contributor.advisor	Getoor, Lise	en_US
dc.contributor.author	Namata Jr., Galile Mark Supapo	en_US
dc.contributor.department	Computer Science	en_US
dc.contributor.publisher	Digital Repository at the University of Maryland	en_US
dc.contributor.publisher	University of Maryland (College Park, Md.)	en_US
dc.date.accessioned	2012-10-11T05:39:16Z
dc.date.available	2012-10-11T05:39:16Z
dc.date.issued	2012	en_US
dc.description.abstract	There is a growing amount of data describing networks -- examples include social networks, communication networks, and biological networks. As the amount of available data increases, so does our interest in analyzing the properties and characteristics of these networks. However, in most cases the data is noisy, incomplete, and the result of passively acquired observational data; naively analyzing these networks without taking these errors into account can result in inaccurate and misleading conclusions. In my dissertation, I study the tasks of entity resolution, link prediction, and collective classification to address these deficiencies. I describe these tasks in detail and discuss my own work on each of these tasks. For entity resolution, I develop a method for resolving the identities of name mentions in email communications. For link prediction, I develop a method for inferring subordinate-manager relationships between individuals in an email communication network. For collective classification, I propose an adaptive active surveying method to address node labeling in a query-driven setting on network data. In many real-world settings, however, these deficiencies are not found in isolation and all need to be addressed to infer the desired complete and accurate network. Furthermore, because of the dependencies typically found in these tasks, the tasks are inherently inter-related and must be performed jointly. I define the general problem of graph identification which simultaneously performs these tasks; removing the noise and missing values in the observed input network and inferring the complete and accurate output network. I present a novel approach to graph identification using a collection of Coupled Collective Classifiers, C3, which, in addition to capturing the variety of features typically used for each task, can capture the intra- and inter-dependencies required to correctly infer nodes, edges, and labels in the output network. I discuss variants of C3 using different learning and inference paradigms and show the superior performance of C3, in terms of both prediction quality and runtime performance, over various previous approaches. I then conclude by presenting the Graph Alignment, Identification, and Analysis (GAIA) open-source software library which not only provides an implementation of C3 but also algorithms for various tasks in network data such as entity resolution, link prediction, collective classification, clustering, active learning, data generation, and analysis.	en_US
dc.identifier.uri	http://hdl.handle.net/1903/13137
dc.subject.pqcontrolled	Computer science	en_US
dc.subject.pquncontrolled	Active Learning	en_US
dc.subject.pquncontrolled	Collective Classification	en_US
dc.subject.pquncontrolled	Entity Resolution	en_US
dc.subject.pquncontrolled	Graph Identification	en_US
dc.subject.pquncontrolled	Graphs and Networks	en_US
dc.subject.pquncontrolled	Link Prediction	en_US
dc.title	Identifying Graphs from Noisy Observational Data	en_US
dc.type	Dissertation	en_US

Files

Original bundle

Now showing 1 - 1 of 1

Name:: NamataJr_umd_0117E_13331.pdf
Size:: 1.51 MB
Format:: Adobe Portable Document Format

Download

Collections

UMD Theses and Dissertations
Computer Science Theses and Dissertations