Decision Tree Construction for Data Mining on Cluster of Shared-Memory Multiprocessors

dc.contributor.authorAndrade, Henriqueen_US
dc.contributor.authorKurc, Tahsinen_US
dc.contributor.authorSussman, Alanen_US
dc.contributor.authorSaltz, Joelen_US
dc.date.accessioned2004-05-31T23:08:28Z
dc.date.available2004-05-31T23:08:28Z
dc.date.created2000-12en_US
dc.date.issued2001-05-10en_US
dc.description.abstractClassification of very large datasets is a challenging problem in data mining. It is desirable to have decision-tree classifiers that can handle large datasets, because a large dataset often increases the accuracy of the resulting classification model. Classification tree algorithms can benefit from parallelization because of large memory and computation requirements for handling large datasets. Clusters of shared-memory multiprocessors (SMPs), in which each shared-memory node has a small number of processors (e.g., 2--8 processors) and is connected to the other nodes via a high-speed inter-connect, have become a popular alternative to pure distributed-memory and shared-memory machines. A cluster of SMPs provides a two-tier architecture, in which a combination of shared-memory and distributed-memory paradigms can be employed. In this paper we investigate decision tree construction on a cluster of SMPs. We present an algorithm that employs a hybrid approach. The classification training dataset is partitioned across the SMP nodes so that each SMP node performs tree construction using a subset of the records in the dataset. Within each SMP node, on the other hand, tasks associated with an attribute are dynamically scheduled to the light-weight threads running on the SMP node. We present experimental results on a Linux PC cluster with dual-processor SMP nodes. (Also cross-referenced as UMIACS-TR-2000-78)en_US
dc.format.extent282381 bytes
dc.format.mimetypeapplication/postscript
dc.identifier.urihttp://hdl.handle.net/1903/1113
dc.language.isoen_US
dc.relation.isAvailableAtDigital Repository at the University of Marylanden_US
dc.relation.isAvailableAtUniversity of Maryland (College Park, Md.)en_US
dc.relation.isAvailableAtTech Reports in Computer Science and Engineeringen_US
dc.relation.isAvailableAtUMIACS Technical Reportsen_US
dc.relation.ispartofseriesUM Computer Science Department; CS-TR-4203en_US
dc.relation.ispartofseriesUMIACS; UMIACS-TR-2000-78en_US
dc.titleDecision Tree Construction for Data Mining on Cluster of Shared-Memory Multiprocessorsen_US
dc.typeTechnical Reporten_US

Files

Original bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
CS-TR-4203.ps
Size:
275.76 KB
Format:
Postscript Files