Parallel Algorithms for Processing Massive Texts and Graphs

Saleh Mohammadabad, Hamed

Parallel Algorithms for Processing Massive Texts and Graphs

dc.contributor.advisor	HajiAghayi, MohammadTaghi	en_US
dc.contributor.author	Saleh Mohammadabad, Hamed	en_US
dc.contributor.department	Computer Science	en_US
dc.contributor.publisher	Digital Repository at the University of Maryland	en_US
dc.contributor.publisher	University of Maryland (College Park, Md.)	en_US
dc.date.accessioned	2023-06-23T05:40:44Z
dc.date.available	2023-06-23T05:40:44Z
dc.date.issued	2022	en_US
dc.description.abstract	Designing parallel algorithms with sublinear space is an inevitable scenario when the input data does not fit in the memory of available systems. In recent years, with the abundance of data and the increasing demand for large scale data processing, the size of datasets easily surpasses that of the typical machine’s memory. Examples of the problem domain vary from social network graphs to DNA sequences. However, to address any problem subject to this restriction efficiently, we need alternative models of computation as the traditional RAM model is no longer an option. The main focus of our research is studying classical algorithms on texts and graphs in the Massively Parallel Computation model. During the last decade, the Massively Parallel Computation (MPC) model attracted a considerable amount of attention. The MPC model was originally proposed to provide a theoretical foundation to algorithms implemented on modern large scale data processing frameworks such as MapReduce, Hadoop, and Spark. The key idea behind this model, and also aforementioned frameworks, is to use many machines to compensate for the shortage of space in individual machines. In this model, the data is distributed among a set of machines each with a sublinear memory, and the process is consisted of several rounds. In each round, machines perform an arbitrary amount of computation on their local data independently. At the end of each round, machines can communicate with each other. We propose constant rounds MPC algorithms for Edge Coloring, Pattern Matching, constructing Suffix Trees, Longest Common Substring, and Longest Palindrome Substring. Recently, the Adaptive Massively Parallel Computation (AMPC) model was introduced as an extension of MPC. Despite the significant progress in designing highly efficient MPC algorithms, the community is facing an impenetrable barrier due to a hardness conjecture that Graph Connectivity requires at least a logarithmic number of rounds. Inspired by this limitation, the AMPC model adds distributed hash-tables to the setup where each machine can adaptively read from them during each round. This tweak not only leads to a constant-round Graph Connectivity algorithm and a plethora of logarithmic improvements in many graph problems, but it is also practical thanks to the recent developments in hardware infrastructure and technologies such as RDMA and eRPC. We explore the limitations of this model by introducing a general Tree Contraction framework which translates well-studied applications in the context of PRAM into constant-round AMPC algorithms, and solving the Minimum Cut problem in sublogarithmic rounds.	en_US
dc.identifier	https://doi.org/10.13016/dspace/8pi1-bau4
dc.identifier.uri	http://hdl.handle.net/1903/29927
dc.language.iso	en	en_US
dc.subject.pqcontrolled	Computer science	en_US
dc.title	Parallel Algorithms for Processing Massive Texts and Graphs	en_US
dc.type	Dissertation	en_US

Files

Original bundle

Now showing 1 - 1 of 1

Name:: SalehMohammadabad_umd_0117E_23110.pdf
Size:: 2.14 MB
Format:: Adobe Portable Document Format

Download

Collections

UMD Theses and Dissertations
Computer Science Theses and Dissertations