Using Ontologies to Improve Answer Quality in Databases
Files
Publication or External Link
Date
Authors
Advisor
Citation
DRUM DOI
Abstract
One of the known shortcomings of relational and XML databases is that they overlook the semantics of terms when answering queries. Ontologies constitute a useful tool to convey the semantics of terms in databases. However, the problem of effectively using semantic information from ontologies is challenging.
We first address this problem for relational databases by the notion of an ontology extended relation (OER). An OER contains an ordinary relation as well as an associated ontology that conveys semantic meaning about the terms being used. We then extend the relational algebra to query OERs. We build a prototype for the OER model and show that the system scales to handle large datasets.
We then propose the concept of a similarity enhanced ontology (SEO), which brings a notion of similarity to a graph ontology. We extend TAX, one of the best known algebras for XML databases, with SEOs. The result is our TOSS system that provides a much higher answer quality than TAX does alone. We experimentally evaluate the TOSS system on the DBLP and SIGMOD bibliographic databases and show that TOSS has acceptable performance.
These two projects have involved ontology integration for supporting semantic queries across heterogeneous databases. We show how to efficiently compute the canonical witness to the integrability of graph ontologies given a set of interoperation constraints. We have also developed a polynomial algorithm to compute a minimal witness to the integrability of RDF ontologies under a set of Horn clauses and negative constraints, and experimentally show that our algorithm works very well on real-life ontologies and scales to massive ontologies.
We finally present our work on ontology-based similarity measures for finding relationships between ontologies and searching similar objects. These measures are applicable to practical classification systems, where ontologies can be DAG-structured, objects can be labeled with multiple terms, and ambiguity can be introduced by an evolving ontology or classifiers with imperfect knowledge. The experiments on a bioinformatics application show that our measures outperformed previous approaches.