Constructing Inverted Files on a Cluster of Multicore Processors Near Peak I/O Throughput

Wei, Zheng; JaJa, Joseph

Constructing Inverted Files on a Cluster of Multicore Processors Near Peak I/O Throughput

dc.contributor.author	Wei, Zheng
dc.contributor.author	JaJa, Joseph
dc.date.accessioned	2011-03-07T04:39:46Z
dc.date.available	2011-03-07T04:39:46Z
dc.date.issued	2011-03-03
dc.description.abstract	We develop a new strategy for processing a collection of documents on a cluster of multicore processors to build the inverted files at almost the peak I/O throughput of the underlying system. Our algorithm is based on a number of novel techniques including: (i) a high-throughput pipelined strategy that produces parallel parsed streams that are consumed at the same rate by parallel indexers; (ii) a hybrid trie and B-tree dictionary data structure that enables efficient parallel construction of the global dictionary; and (iii) a partitioning strategy of the work of the indexers using random sampling, which achieve extremely good load balancing with minimal communication overhead. We have performed extensive tests of our algorithm on a cluster of 32 nodes, each consisting of two Intel Xeon X5560 Quad-core, and were able to achieve a throughput close to the peak throughput of the I/O system. In particular, we achieve a throughput of 280 MB/s on a single node and a throughput of 6.12GB/s on a cluster with 32 nodes for processing the ClueWeb09 dataset. Similar results were obtained for widely different datasets. The throughput of our algorithm is superior to the best known algorithms reported in the literature even when compared to those running on much larger clusters.	en_US
dc.identifier.uri	http://hdl.handle.net/1903/11311
dc.language.iso	en_US	en_US
dc.relation.ispartofseries	UMIACS;UMIACS-TR-2011-03
dc.title	Constructing Inverted Files on a Cluster of Multicore Processors Near Peak I/O Throughput	en_US
dc.type	Technical Report	en_US

Files

Original bundle

Now showing 1 - 1 of 1

Name:: UMIACS-TR-2011-03.pdf
Size:: 795.81 KB
Format:: Adobe Portable Document Format

Download

Collections

Technical Reports from UMIACS