Show simple item record

Fast Sequential and Parallel Algorithms for Association Rule Mining: A Comparison

dc.contributor.authorMueller, Andreasen_US
dc.description.abstractThe field of knowledge discovery in databases, or "Data Mining", has received increasing attention during recent years as large organizations have begun to realize the potential value of the information that is stored implicitly in their databases. One specific data mining task is the mining of Association Rules, particularly from retail data. The task is to determine patterns (or rules) that characterize the shopping behavior of customers from a large database of previous consumer transactions. The rules can then be used to focus marketing efforts such as product placement and sales promotions. Because early algorithms required an unpredictably large number of IO operations, reducing IO cost has been the primary target of the algorithms presented in the literature. One of the most recent proposed algorithms, called PARTITION, uses a new TID-list data representation and a new partitioning technique. The partitioning technique reduces IO cost to a constant amount by processing one database portion at a time in memory. We implemented an algorithm called SPTID that incorporates both TID-lists and partitioning to study their benefits. For comparison, a non-partitioning algorithm called SEAR, which is based on a new prefix-tree data structure, is used. Our experiments with SPTID and SEAR indicate that TID-lists have inherent inefficiencies; furthermore, because all of the algorithms tested tend to be CPU-boundn trading CPU-overhead against I/O operations by partitioning did not lead to better performance. In order to scale mining algorithms to the huge databases (e.g., multiple Terabytes) that large organizations will manage in the near future, we implemented parallel versions of SEAR and SPEAR (its partitioned counterpart). The performance results show that, while both algorithms parallelize easily and obtain good speedup and scale-up results, the parallel SEAR version performs better than parallel SPEAR, despite the fact that it uses more communication.en_US
dc.format.extent814579 bytes
dc.relation.ispartofseriesUM Computer Science Department; CS-TR-3515en_US
dc.titleFast Sequential and Parallel Algorithms for Association Rule Mining: A Comparisonen_US
dc.typeTechnical Reporten_US
dc.relation.isAvailableAtDigital Repository at the University of Marylanden_US
dc.relation.isAvailableAtUniversity of Maryland (College Park, Md.)en_US
dc.relation.isAvailableAtTech Reports in Computer Science and Engineeringen_US
dc.relation.isAvailableAtComputer Science Department Technical Reportsen_US

Files in this item


This item appears in the following Collection(s)

Show simple item record