Scalable Ontology Systems

dc.contributor.advisorSubrahmanian, Venkataraman Sen_US
dc.contributor.authorUdrea, Octavianen_US
dc.contributor.departmentComputer Scienceen_US
dc.contributor.publisherDigital Repository at the University of Marylanden_US
dc.contributor.publisherUniversity of Maryland (College Park, Md.)en_US
dc.date.accessioned2008-10-11T05:44:07Z
dc.date.available2008-10-11T05:44:07Z
dc.date.issued2008-07-28en_US
dc.description.abstractSince the adoption of the Resource Description Framework (RDF) by the World Wide Web Consortium (W3C), ontologies have become commonplace as a way to represent both knowledge and data. RDF databases have flexible schemas, are easy to integrate and allow a semantically rich query language. Unfortunately, these advantages come at the expense of increased query and application complexity. Existing RDF systems have attempted to address this problem by representing RDF data in relational format and translating queries and answers to and from SQL. As we will show, typical access patterns in RDF are substantially different than those in relational databases, to the extent that the performance of relational-backed systems degrades significantly for large datasets or complex queries. In this dissertation, we propose two solutions to the scalability issue in RDF databases. First, we introduce Annotated RDF, a representation language that extends the semantics of RDF by allowing triples to be annotated with partially ordered information such as temporal validity intervals, probabilities, provenance and many others. In standard RDF, using such information creates a blowup in the size of the database and therefore greatly increases the data complexity of queries. We define a query language for Annotated RDF that extends the RDF query language SPARQL and provides query processing and view maintenance algorithms. Our experimental evaluation shows Annotated RDF can answer queries 1.5 to 3.5 times faster than widely used systems such as Jena2, Sesame2 or Oracle 11g. Second, we introduce GRIN, to our knowledge the first index structure designed specifically for SPARQL queries. We describe query and update processing algorithms and a theoretical analysis of index optimization. GRIN is extended to Annotated RDF and evaluated thoroughly on real-world datasets of up to 26 million triples and benchmark synthetic datasets of up to 1 billion triples. Our results show that for SPARQL queries, GRIN outperforms all relational index structures at comparable resource expenditure. Moreover, we show GRIN can be integrated with Annotated RDF, but also with existing systems such as Jena2 or LucidDB.en_US
dc.format.extent1293709 bytes
dc.format.mimetypeapplication/pdf
dc.identifier.urihttp://hdl.handle.net/1903/8548
dc.language.isoen_US
dc.subject.pqcontrolledComputer Scienceen_US
dc.subject.pquncontrolledsemantic weben_US
dc.subject.pquncontrolledRDF databasesen_US
dc.subject.pquncontrolledRDF query languagesen_US
dc.subject.pquncontrolledRDF indexingen_US
dc.subject.pquncontrolledontologiesen_US
dc.titleScalable Ontology Systemsen_US
dc.typeDissertationen_US

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
umi-umd-5631.pdf
Size:
1.23 MB
Format:
Adobe Portable Document Format