Probabilistic Models for Scalable Knowledge Graph Construction

Pujara, Jay

Probabilistic Models for Scalable Knowledge Graph Construction

dc.contributor.advisor	Getoor, Lise C	en_US
dc.contributor.author	Pujara, Jay	en_US
dc.contributor.department	Computer Science	en_US
dc.contributor.publisher	Digital Repository at the University of Maryland	en_US
dc.contributor.publisher	University of Maryland (College Park, Md.)	en_US
dc.date.accessioned	2016-06-22T05:48:56Z
dc.date.available	2016-06-22T05:48:56Z
dc.date.issued	2016	en_US
dc.description.abstract	In the past decade, systems that extract information from millions of Internet documents have become commonplace. Knowledge graphs -- structured knowledge bases that describe entities, their attributes and the relationships between them -- are a powerful tool for understanding and organizing this vast amount of information. However, a significant obstacle to knowledge graph construction is the unreliability of the extracted information, due to noise and ambiguity in the underlying data or errors made by the extraction system and the complexity of reasoning about the dependencies between these noisy extractions. My dissertation addresses these challenges by exploiting the interdependencies between facts to improve the quality of the knowledge graph in a scalable framework. I introduce a new approach called knowledge graph identification (KGI), which resolves the entities, attributes and relationships in the knowledge graph by incorporating uncertain extractions from multiple sources, entity co-references, and ontological constraints. I define a probability distribution over possible knowledge graphs and infer the most probable knowledge graph using a combination of probabilistic and logical reasoning. Such probabilistic models are frequently dismissed due to scalability concerns, but my implementation of KGI maintains tractable performance on large problems through the use of hinge-loss Markov random fields, which have a convex inference objective. This allows the inference of large knowledge graphs using 4M facts and 20M ground constraints in 2 hours. To further scale the solution, I develop a distributed approach to the KGI problem which runs in parallel across multiple machines, reducing inference time by 90%. Finally, I extend my model to the streaming setting, where a knowledge graph is continuously updated by incorporating newly extracted facts. I devise a general approach for approximately updating inference in convex probabilistic models, and quantify the approximation error by defining and bounding inference regret for online models. Together, my work retains the attractive features of probabilistic models while providing the scalability necessary for large-scale knowledge graph construction. These models have been applied on a number of real-world knowledge graph projects, including the NELL project at Carnegie Mellon and the Google Knowledge Graph.	en_US
dc.identifier	https://doi.org/10.13016/M2420K
dc.identifier.uri	http://hdl.handle.net/1903/18221
dc.language.iso	en	en_US
dc.subject.pqcontrolled	Artificial intelligence	en_US
dc.subject.pqcontrolled	Computer science	en_US
dc.subject.pqcontrolled	Epistemology	en_US
dc.subject.pquncontrolled	knowledge base	en_US
dc.subject.pquncontrolled	knowledge graph	en_US
dc.subject.pquncontrolled	markov random field	en_US
dc.subject.pquncontrolled	online inference	en_US
dc.subject.pquncontrolled	statistical relational learning	en_US
dc.title	Probabilistic Models for Scalable Knowledge Graph Construction	en_US
dc.type	Dissertation	en_US

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Pujara_umd_0117E_16954.pdf
Size:: 1.06 MB
Format:: Adobe Portable Document Format

Download

Collections

UMD Theses and Dissertations
Computer Science Theses and Dissertations