|
DRUM >
College of Computer, Mathematical & Natural Sciences >
Computer Science >
Technical Reports from UMIACS >
Please use this identifier to cite or link to this item:
http://hdl.handle.net/1903/879
|
| Title: | Quantifiable Data Mining Using Principal Component Analysis |
| Authors: | Korn, Flip Labrinidis, Alexandros Kotidis, Yannis Faloutsos, Christos Kaplunovich, Alex Perkovic, Dejan |
| Type: | Technical Report |
| Issue Date: | 15-Oct-1998 |
| Series/Report no.: | UM Computer Science Department; CS-TR-3754 UMIACS; UMIACS-TR-97-13 |
| Abstract: | Association Rule Mining algorithms operate on a data matrix (e.g.,
customers x products) to derive rules. We propose a single-pass
algorithm for mining linear rules in such a matrix based on Principal
Component Analysis. PCA detects correlated columns of the matrix,
which correspond to, e.g., products that sell together.
The first contribution of this work is that we propose to quantify the
``goodness'' of a set of discovered rules. We define the ``guessing
error'': the root-mean-square error of the reconstructed values of the
cells of the given matrix, when we pretend that they are unknown. The
second contribution is a novel method to guess missing/hidden values
from the linear rules that our method derives. For example, if
somebody bought $10 of milk and $3 of bread, our rules can ``guess''
the amount spent on, say, butter. Thus, we can perform a variety of
important tasks such as forecasting, `what-if' scenarios, outlier
detection, and visualization. Moreover, we show that we can compute
the principal components with a single pass over the dataset.
Experiments on real datasets (e.g., NBA statistics) demonstrate that
the proposed method consistently achieves a ``guessing error'' of up to
5 times lower than the straightforward competitor.
(Also cross-referenced as UMIACS-TR-97-13) |
| URI: | http://hdl.handle.net/1903/879 |
| Appears in Collections: | Technical Reports of the Computer Science Department Technical Reports from UMIACS
|
All items in DRUM are protected by copyright, with all rights reserved.
|