Social Network Data Management

dc.contributor.advisorSubrahmanian, V.S.en_US
dc.contributor.authorBroecheler, Matthiasen_US
dc.contributor.departmentComputer Scienceen_US
dc.contributor.publisherDigital Repository at the University of Marylanden_US
dc.contributor.publisherUniversity of Maryland (College Park, Md.)en_US
dc.date.accessioned2012-02-17T07:03:31Z
dc.date.available2012-02-17T07:03:31Z
dc.date.issued2011en_US
dc.description.abstractWith the increasing usage of online social networks and the semantic web's graph structured RDF framework, and the rising adoption of networks in various fields from biology to social science, there is a rapidly growing need for indexing, querying, and analyzing massive graph structured data. Facebook has amassed over 500 million users creating huge volumes of highly connected data. Governments have made RDF datasets containing billions of triples available to the public. In the life sciences, researches have started to connect disparate data sets of research results into one giant network of valuable information. Clearly, networks are becoming increasingly popular and growing rapidly in size, requiring scalable solutions for network data management. This thesis focuses on the following aspects of network data management. We present a hierarchical index structure for external memory storage of network data that aims to maximize data locality. We propose efficient algorithms to answer subgraph matching queries against network databases and discuss effective pruning strategies to improve performance. We show how adaptive cost models can speed up subgraph matching query answering by assigning budgets to index retrieval operations and adjusting the query plan while executing. We develop a cloud oriented social network database, COSI, which handles massive network datasets too large for a single computer by partitioning the data across multiple machines and achieving high performance query answering through asynchronous parallelization and cluster-aware heuristics. Tracking multiple standing queries against a social network database is much faster with our novel multi-view maintenance algorithm, which exploits common substructures between queries. To capture uncertainty inherent in social network querying, we define probabilistic subgraph matching queries over deterministic graph data and propose algorithms to answer them efficiently. Finally, we introduce a general relational machine learning framework and rule-based language, Probabilistic Soft Logic, to learn from and probabilistically reason about social network data and describe applications to information integration and information fusion.en_US
dc.identifier.urihttp://hdl.handle.net/1903/12347
dc.subject.pqcontrolledComputer scienceen_US
dc.subject.pquncontrolledGraph databaseen_US
dc.subject.pquncontrolledmachine learningen_US
dc.subject.pquncontrolledprobabilistic inferenceen_US
dc.subject.pquncontrolledrelational learningen_US
dc.subject.pquncontrolledSocial networken_US
dc.subject.pquncontrolledsubgraph matchingen_US
dc.titleSocial Network Data Managementen_US
dc.typeDissertationen_US

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Broecheler_umd_0117E_12801.pdf
Size:
4.84 MB
Format:
Adobe Portable Document Format