Declarative Cleaning, Analysis, and Querying of Graph-structured Data

dc.contributor.advisorDeshpande, Amolen_US
dc.contributor.advisorGetoor, Liseen_US
dc.contributor.authorMoustafa, Walaa Eldinen_US
dc.contributor.departmentComputer Scienceen_US
dc.contributor.publisherDigital Repository at the University of Marylanden_US
dc.contributor.publisherUniversity of Maryland (College Park, Md.)en_US
dc.date.accessioned2013-10-02T05:31:39Z
dc.date.available2013-10-02T05:31:39Z
dc.date.issued2013en_US
dc.description.abstractMuch of today's data including social, biological, sensor, computer, and transportation network data is naturally modeled and represented by graphs. Typically, data describing these networks is observational, and thus noisy and incomplete. Therefore, methods for efficiently managing graph-structured data of this nature are needed, especially with the abundance and increasing sizes of such data. In my dissertation, I develop declarative methods to perform cleaning, analysis and querying of graph-structured data efficiently. For declarative cleaning of graph-structured data, I identify a set of primitives to support the extraction and inference of the underlying true network from observational data, and describe a framework that enables a network analyst to easily implement and combine new extraction and cleaning techniques. The task specification language is based on Datalog with a set of extensions designed to enable different graph cleaning primitives. For declarative analysis, I introduce 'ego-centric pattern census queries', a new type of graph analysis query that supports searching for structural patterns in every node's neighborhood and reporting their counts for further analysis. I define an SQL-based declarative language to support this class of queries, and develop a series of efficient query evaluation algorithms for it. Finally, I present an approach for querying large uncertain graphs that supports reasoning about uncertainty of node attributes, uncertainty of edge existence, and a new type of uncertainty, called identity linkage uncertainty, where a group of nodes can potentially refer to the same real-world entity. I define a probabilistic graph model to capture all these types of uncertainties, and to resolve identity linkage merges. I propose 'context-aware path indexing' and 'join-candidate reduction' methods to efficiently enable subgraph matching queries over large uncertain graphs of this type.en_US
dc.identifier.urihttp://hdl.handle.net/1903/14452
dc.subject.pqcontrolledComputer scienceen_US
dc.subject.pquncontrolledGraph Data Managementen_US
dc.subject.pquncontrolledProbabilistic Graph Modelingen_US
dc.subject.pquncontrolledSocial Network Analysisen_US
dc.titleDeclarative Cleaning, Analysis, and Querying of Graph-structured Dataen_US
dc.typeDissertationen_US
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Moustafa_umd_0117E_14371.pdf
Size:
6.81 MB
Format:
Adobe Portable Document Format
Description: