Quantifiable Data Mining Using Principal Component Analysis
dc.contributor.author | Faloutsos, Christos | en_US |
dc.contributor.author | Korn, Flip | en_US |
dc.contributor.author | Labrinidis, Alexandros | en_US |
dc.contributor.author | Kotidis, Yannis | en_US |
dc.contributor.author | Kaplunovich, Alex | en_US |
dc.contributor.author | Perkovic, Dejan | en_US |
dc.contributor.department | ISR | en_US |
dc.date.accessioned | 2007-05-23T10:03:50Z | |
dc.date.available | 2007-05-23T10:03:50Z | |
dc.date.issued | 1997 | en_US |
dc.description.abstract | Association Rule Mining algorithms operate on a data matrix (e.g., customers x products) to derive rules [2,23]. We propose a single-pass algorithm for mining linear rules in such a matrix based on Principal Component Analysis. PCA detects correlated columns of the matrix, which correspond to, e.g., products that sell together.<P>The first contribution of this work is that we propose to quantify the ﲧoodness of a set of discovered rules. We define the ﲧuessing error : the root-mean-square error of the reconstructed values of the cells of the given matrix, when we pretend that they are unknown. The second contribution is a novel method to guess missing/hidden values from the linear rules that our method derives. For example, if somebody bought $10 of milk and $3 of bread, our rules can ﲧuess the amount spent on, say, butter. Thus, we can perform a variety of important tasks such as forecasting, hat-if' scenarios, outlier detection, and visualization. Moreover, we show that we can compute the principal components with a single pass over the dataset.<P>Experiments on real datasets (e.g., NBA statistics) demonstrate that the proposed method consistently achieves a ﲧuessing error of up to 5 times lower than the straightforward competitor. | en_US |
dc.format.extent | 1037117 bytes | |
dc.format.mimetype | application/pdf | |
dc.identifier.uri | http://hdl.handle.net/1903/5855 | |
dc.language.iso | en_US | en_US |
dc.relation.ispartofseries | ISR; TR 1997-25 | en_US |
dc.subject | database mining | en_US |
dc.subject | Systems Integration Methodology | en_US |
dc.title | Quantifiable Data Mining Using Principal Component Analysis | en_US |
dc.type | Technical Report | en_US |
Files
Original bundle
1 - 1 of 1