Parallel Algorithms for Processing Massive Texts and Graphs

dc.contributor.advisorHajiAghayi, MohammadTaghien_US
dc.contributor.authorSaleh Mohammadabad, Hameden_US
dc.contributor.departmentComputer Scienceen_US
dc.contributor.publisherDigital Repository at the University of Marylanden_US
dc.contributor.publisherUniversity of Maryland (College Park, Md.)en_US
dc.date.accessioned2023-06-23T05:40:44Z
dc.date.available2023-06-23T05:40:44Z
dc.date.issued2022en_US
dc.description.abstractDesigning parallel algorithms with sublinear space is an inevitable scenario when the input data does not fit in the memory of available systems. In recent years, with the abundance of data and the increasing demand for large scale data processing, the size of datasets easily surpasses that of the typical machine’s memory. Examples of the problem domain vary from social network graphs to DNA sequences. However, to address any problem subject to this restriction efficiently, we need alternative models of computation as the traditional RAM model is no longer an option. The main focus of our research is studying classical algorithms on texts and graphs in the Massively Parallel Computation model. During the last decade, the Massively Parallel Computation (MPC) model attracted a considerable amount of attention. The MPC model was originally proposed to provide a theoretical foundation to algorithms implemented on modern large scale data processing frameworks such as MapReduce, Hadoop, and Spark. The key idea behind this model, and also aforementioned frameworks, is to use many machines to compensate for the shortage of space in individual machines. In this model, the data is distributed among a set of machines each with a sublinear memory, and the process is consisted of several rounds. In each round, machines perform an arbitrary amount of computation on their local data independently. At the end of each round, machines can communicate with each other. We propose constant rounds MPC algorithms for Edge Coloring, Pattern Matching, constructing Suffix Trees, Longest Common Substring, and Longest Palindrome Substring. Recently, the Adaptive Massively Parallel Computation (AMPC) model was introduced as an extension of MPC. Despite the significant progress in designing highly efficient MPC algorithms, the community is facing an impenetrable barrier due to a hardness conjecture that Graph Connectivity requires at least a logarithmic number of rounds. Inspired by this limitation, the AMPC model adds distributed hash-tables to the setup where each machine can adaptively read from them during each round. This tweak not only leads to a constant-round Graph Connectivity algorithm and a plethora of logarithmic improvements in many graph problems, but it is also practical thanks to the recent developments in hardware infrastructure and technologies such as RDMA and eRPC. We explore the limitations of this model by introducing a general Tree Contraction framework which translates well-studied applications in the context of PRAM into constant-round AMPC algorithms, and solving the Minimum Cut problem in sublogarithmic rounds.en_US
dc.identifierhttps://doi.org/10.13016/dspace/8pi1-bau4
dc.identifier.urihttp://hdl.handle.net/1903/29927
dc.language.isoenen_US
dc.subject.pqcontrolledComputer scienceen_US
dc.titleParallel Algorithms for Processing Massive Texts and Graphsen_US
dc.typeDissertationen_US

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
SalehMohammadabad_umd_0117E_23110.pdf
Size:
2.14 MB
Format:
Adobe Portable Document Format