Recovering Information from Summary Data

Faloutsos, Christos; Jagadish, H.V.; Sidiropoulos, N.D.

Recovering Information from Summary Data

dc.contributor.author	Faloutsos, Christos	en_US
dc.contributor.author	Jagadish, H.V.	en_US
dc.contributor.author	Sidiropoulos, N.D.	en_US
dc.contributor.department	ISR	en_US
dc.date.accessioned	2007-05-23T10:03:39Z
dc.date.available	2007-05-23T10:03:39Z
dc.date.issued	1997	en_US
dc.description.abstract	Data is often stored in summarized form, as a histogram of aggregates (COUNTs,SUMs, or AVeraGes) over specified ranges. Queries regarding specific values, or ranges different from those stored, cannot be answered exactly from the summarized data. In this paper we study how to estimate the original detail data from the stored summary.<P>We formulate this task as an inverse problem, specifying a well-defined cost function that has to be optimized under constraints.<P>In particular, we propose the use of a Linear Regularization method, which ﲭaximizes the smoothness of the estimate. Our main theoretical contribution is a Theorem, which shows that, for smooth enough distributions, we can achieve full recovery from summary data.<P>Our theorem is closely related to the well known Shannon-Nyquist sampling theorem.<P>We describe how to apply this theory to a variety of database problems, that involve partial information, such as OLAP, data warehousing and histograms in query optimization. Our main practical contribution is that the Linear Regularization method is extremely effective, both on synthetic and on real data. Our experiments show that the proposed approach almost consistently outperforms the ﲵniformity assumption, achieving significant savings in root-mean-square error: up to 20% for stock price data, and up to 90% for smoother data sets.	en_US
dc.format.extent	884858 bytes
dc.format.mimetype	application/pdf
dc.identifier.uri	http://hdl.handle.net/1903/5846
dc.language.iso	en_US	en_US
dc.relation.ispartofseries	ISR; TR 1997-8	en_US
dc.subject	estimation	en_US
dc.subject	filtering	en_US
dc.subject	signal processing	en_US
dc.subject	database management systems	en_US
dc.subject	Systems Integration Methodology	en_US
dc.title	Recovering Information from Summary Data	en_US
dc.type	Technical Report	en_US

Files

Original bundle

Now showing 1 - 1 of 1

Name:: TR_97-8.pdf
Size:: 864.12 KB
Format:: Adobe Portable Document Format

Download

Collections

Institute for Systems Research Technical Reports