Browsing by Author "Korn, Flip"
Results Per Page
Sort Options
Item Fast Nearest Neighbor Search in Medical Image Databases(1998-10-15) Korn, Flip; Sidiropoulos, Nikolaos; Faloutsos, Christos; Siegel, Eliot; Protopapas, ZenonWe examine the problem of finding similar tumor shapes. Starting from a natural similarity function (the so-called `max morpholog- ical distance'), we showed how to lower-bound it and how to search for nearest neighbors in large collections of tumor-like shapes. Specifically, we used state-of-the-art concepts from morphology, namely the `pattern spectrum' of a shape, to map each shape to a point in $n$-dimensional space. Following \cite{Faloutsos94Fast,Jagadish91Retrieval}, we organized the $n$-d points in an R-tree. We showed that the $L_infty$ (= max) norm in the $n$-d space lower-bounds the actual distance. This guarantees no false dismissals for range queries. In addition, we developed a nearest-neighbor algorithm that also guarantees no false dismissals. Finally, we implemented the method, and we tested it against a testbed of realistic tumor shapes, using an established tumor- growth model of Murray Eden \cite{Eden:61}. The experiments showed that our method is up to 27 times faster than straightfor- ward sequential scanning. (Also cross-referenced as UMIACS-TR-96-17)Item Fast Nearest Neighbor Search in Medical Image Databases(1996) Korn, Flip; Sidiropoulos, N.; Faloutsos, Christos; ISRWe examine the problem of finding similar tumor shapes. Starting from a natural similarity function (the so-called ax morphological distance'), we showed how to lower-bound it and how to search for nearest neighbors in large collections of tumor- like shapes.Specifically, we used state-of-the-art concepts from morphology, namely the attern spectrum' of a shape, to map each shape to a point in n-dimensional space. Following [19, 36], we organized the n-d points in an R-tree. We showed that the L (= max) norm in the n-d space lower-bounds the actual distance. This guarantees no false dismissals for range queries. In addition, we developed a nearest neighbor algorithm that also guarantees no false dismissals.
Finally, we implemented the method, and we tested it against a testbed of realistic tumor shapes, using an established tumor-growth model of Murray Eden [15]. The experiments showed that our method is up to 27 times faster than straightforward sequential scanning.
Item Quantifiable Data Mining Using Principal Component Analysis(1998-10-15) Korn, Flip; Labrinidis, Alexandros; Kotidis, Yannis; Faloutsos, Christos; Kaplunovich, Alex; Perkovic, DejanAssociation Rule Mining algorithms operate on a data matrix (e.g., customers x products) to derive rules. We propose a single-pass algorithm for mining linear rules in such a matrix based on Principal Component Analysis. PCA detects correlated columns of the matrix, which correspond to, e.g., products that sell together. The first contribution of this work is that we propose to quantify the ``goodness'' of a set of discovered rules. We define the ``guessing error'': the root-mean-square error of the reconstructed values of the cells of the given matrix, when we pretend that they are unknown. The second contribution is a novel method to guess missing/hidden values from the linear rules that our method derives. For example, if somebody bought $10 of milk and $3 of bread, our rules can ``guess'' the amount spent on, say, butter. Thus, we can perform a variety of important tasks such as forecasting, `what-if' scenarios, outlier detection, and visualization. Moreover, we show that we can compute the principal components with a single pass over the dataset. Experiments on real datasets (e.g., NBA statistics) demonstrate that the proposed method consistently achieves a ``guessing error'' of up to 5 times lower than the straightforward competitor. (Also cross-referenced as UMIACS-TR-97-13)Item Quantifiable Data Mining Using Principal Component Analysis(1997) Faloutsos, Christos; Korn, Flip; Labrinidis, Alexandros; Kotidis, Yannis; Kaplunovich, Alex; Perkovic, Dejan; ISRAssociation Rule Mining algorithms operate on a data matrix (e.g., customers x products) to derive rules [2,23]. We propose a single-pass algorithm for mining linear rules in such a matrix based on Principal Component Analysis. PCA detects correlated columns of the matrix, which correspond to, e.g., products that sell together.The first contribution of this work is that we propose to quantify the ﲧoodness of a set of discovered rules. We define the ﲧuessing error : the root-mean-square error of the reconstructed values of the cells of the given matrix, when we pretend that they are unknown. The second contribution is a novel method to guess missing/hidden values from the linear rules that our method derives. For example, if somebody bought $10 of milk and $3 of bread, our rules can ﲧuess the amount spent on, say, butter. Thus, we can perform a variety of important tasks such as forecasting, hat-if' scenarios, outlier detection, and visualization. Moreover, we show that we can compute the principal components with a single pass over the dataset.
Experiments on real datasets (e.g., NBA statistics) demonstrate that the proposed method consistently achieves a ﲧuessing error of up to 5 times lower than the straightforward competitor.