University of Maryland DRUM  
University of Maryland Digital Repository at the University of Maryland

Digital Repository at the University of Maryland (DRUM) >
Theses and Dissertations from UMD >
UMD Theses and Dissertations >

Please use this identifier to cite or link to this item: http://hdl.handle.net/1903/12347

Title: Social Network Data Management
Authors: Broecheler, Matthias
Advisors: Subrahmanian, V.S.
Department/Program: Computer Science
Type: Dissertation
Sponsors: Digital Repository at the University of Maryland
University of Maryland (College Park, Md.)
Subjects: Computer science
Keywords: Graph database
machine learning
probabilistic inference
relational learning
Social network
subgraph matching
Issue Date: 2011
Abstract: With the increasing usage of online social networks and the semantic web's graph structured RDF framework, and the rising adoption of networks in various fields from biology to social science, there is a rapidly growing need for indexing, querying, and analyzing massive graph structured data. Facebook has amassed over 500 million users creating huge volumes of highly connected data. Governments have made RDF datasets containing billions of triples available to the public. In the life sciences, researches have started to connect disparate data sets of research results into one giant network of valuable information. Clearly, networks are becoming increasingly popular and growing rapidly in size, requiring scalable solutions for network data management. This thesis focuses on the following aspects of network data management. We present a hierarchical index structure for external memory storage of network data that aims to maximize data locality. We propose efficient algorithms to answer subgraph matching queries against network databases and discuss effective pruning strategies to improve performance. We show how adaptive cost models can speed up subgraph matching query answering by assigning budgets to index retrieval operations and adjusting the query plan while executing. We develop a cloud oriented social network database, COSI, which handles massive network datasets too large for a single computer by partitioning the data across multiple machines and achieving high performance query answering through asynchronous parallelization and cluster-aware heuristics. Tracking multiple standing queries against a social network database is much faster with our novel multi-view maintenance algorithm, which exploits common substructures between queries. To capture uncertainty inherent in social network querying, we define probabilistic subgraph matching queries over deterministic graph data and propose algorithms to answer them efficiently. Finally, we introduce a general relational machine learning framework and rule-based language, Probabilistic Soft Logic, to learn from and probabilistically reason about social network data and describe applications to information integration and information fusion.
URI: http://hdl.handle.net/1903/12347
Appears in Collections:UMD Theses and Dissertations
Computer Science Theses and Dissertations

Files in This Item:

File Description SizeFormatNo. of Downloads
Broecheler_umd_0117E_12801.pdfRESTRICTED ACCESS4.95 MBAdobe PDF279View/Open

All items in DRUM are protected by copyright, with all rights reserved.

 

DRUM is brought to you by the University of Maryland Libraries
University of Maryland, College Park, MD 20742-7011 (301)314-1328.
Please send us your comments