University of Maryland DRUM  
University of Maryland Digital Repository at the University of Maryland

DRUM >
College of Computer, Mathematical & Natural Sciences >
Computer Science >
Technical Reports from UMIACS >

Please use this identifier to cite or link to this item: http://hdl.handle.net/1903/879

Title: Quantifiable Data Mining Using Principal Component Analysis
Authors: Korn, Flip
Labrinidis, Alexandros
Kotidis, Yannis
Faloutsos, Christos
Kaplunovich, Alex
Perkovic, Dejan
Type: Technical Report
Issue Date: 15-Oct-1998
Series/Report no.: UM Computer Science Department; CS-TR-3754
UMIACS; UMIACS-TR-97-13
Abstract: Association Rule Mining algorithms operate on a data matrix (e.g., customers x products) to derive rules. We propose a single-pass algorithm for mining linear rules in such a matrix based on Principal Component Analysis. PCA detects correlated columns of the matrix, which correspond to, e.g., products that sell together. The first contribution of this work is that we propose to quantify the ``goodness'' of a set of discovered rules. We define the ``guessing error'': the root-mean-square error of the reconstructed values of the cells of the given matrix, when we pretend that they are unknown. The second contribution is a novel method to guess missing/hidden values from the linear rules that our method derives. For example, if somebody bought $10 of milk and $3 of bread, our rules can ``guess'' the amount spent on, say, butter. Thus, we can perform a variety of important tasks such as forecasting, `what-if' scenarios, outlier detection, and visualization. Moreover, we show that we can compute the principal components with a single pass over the dataset. Experiments on real datasets (e.g., NBA statistics) demonstrate that the proposed method consistently achieves a ``guessing error'' of up to 5 times lower than the straightforward competitor. (Also cross-referenced as UMIACS-TR-97-13)
URI: http://hdl.handle.net/1903/879
Appears in Collections:Technical Reports of the Computer Science Department
Technical Reports from UMIACS

Files in This Item:

File Description SizeFormatNo. of Downloads
CS-TR-3754.pdfAuto-generated copy of CS-TR-3754.ps567.98 kBAdobe PDF560View/Open
CS-TR-3754.ps676.72 kBPostscript328View/Open

All items in DRUM are protected by copyright, with all rights reserved.

 

DRUM is brought to you by the University of Maryland Libraries
University of Maryland, College Park, MD 20742-7011 (301)314-1328.
Please send us your comments. -
All Contents