HIGH PERFORMANCE XPATH EVALUATION IN XML STREAMS

dc.contributor.advisorChawathe, Sudarshan Sen_US
dc.contributor.authorPeng, Fengen_US
dc.contributor.departmentComputer Scienceen_US
dc.contributor.publisherDigital Repository at the University of Marylanden_US
dc.contributor.publisherUniversity of Maryland (College Park, Md.)en_US
dc.date.accessioned2006-06-14T05:41:09Z
dc.date.available2006-06-14T05:41:09Z
dc.date.issued2006-04-18en_US
dc.description.abstractThis thesis presents methods for efficiently evaluating structural queries over tree-structured data streams. A data stream usually consists of a sequence of items that arrive in an order determined by the source. An application that uses such data cannot revisit an earlier item in the stream unless it buffers the item itself. Naive buffering methods are not practical due to the high throughput and indefinite length of data streams. Compared with the flat, relational-like data model for data streams that has received recent attention, processing a tree-structured XML data stream poses additional challenges, since a data item cannot, in general, be interpreted without taking structural information into account. In this thesis, we focus on the evaluation of XPath queries on streaming XML. As a W3C standard, XPath has become a core XML technology not only as a standalone query language but also as the foundation of XQuery and XSLT. Features such as subqueries and reverse axes make XPath a powerful query language but they also complicate XPath query processing. We present our work on XSQ, a streaming XPath query engine. Our methods are based on a novel segment-based evaluation scheme. XSQ uses very little memory and is able to process unbounded and unsegmented streaming data because it does not build a DOM tree in memory. It also provides high throughput by only processing the relevant portions of the data and low response time by returning results as early as possible. XSQ is the first streaming system to support complex XPath features such as multiple predicates, closure axes, aggregations, reverse axes, and subqueries. We also describe our work on XPaSS, an XPath-based publish-subscribe system that simultaneously evaluates a large number of XPath queries over XML streams. Unlike other similar systems that filter pre-segmented documents as results, XPaSS returns only the precisely delineated data specified by a user query. It uses a segment-sharing scheme instead of prefix- and suffix-sharing that are commonly used. In our experiments, XPaSS supports up to one million XPath subscriptions using a modest PC-class server, with a throughput comparable to that of the simpler filtering systems.en_US
dc.format.extent1483676 bytes
dc.format.mimetypeapplication/pdf
dc.identifier.urihttp://hdl.handle.net/1903/3417
dc.language.isoen_US
dc.subject.pqcontrolledComputer Scienceen_US
dc.titleHIGH PERFORMANCE XPATH EVALUATION IN XML STREAMSen_US
dc.typeDissertationen_US

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
umi-umd-3232.pdf
Size:
1.41 MB
Format:
Adobe Portable Document Format