Towards Generalized and Scalable Machine Learning on Structured Data

Kong, Kezhi

Towards Generalized and Scalable Machine Learning on Structured Data

dc.contributor.advisor	Goldstein, Tom	en_US
dc.contributor.author	Kong, Kezhi	en_US
dc.contributor.department	Computer Science	en_US
dc.contributor.publisher	Digital Repository at the University of Maryland	en_US
dc.contributor.publisher	University of Maryland (College Park, Md.)	en_US
dc.date.accessioned	2024-06-28T05:43:56Z
dc.date.available	2024-06-28T05:43:56Z
dc.date.issued	2024	en_US
dc.description.abstract	Deep Learning and Neural Networks have brought a transformative era for the field of machine learning, significantly influencing how we approach and utilize structured data. This dissertation is dedicated to exploring machine learning methodologies specifically designed for structured graphs and tables, aiming to enhance the performance of neural networks on the important data modalities.Graph Neural Networks (GNNs) have emerged as powerful architectures for learning and analyzing graph representations. However, the training of GNNs on large-scale datasets usually suffers from overfitting, posing significant generalization challenges for prediction problems. Meanwhile, conventional GNNs are hindered by scalability problem when deployed on industrial- level graph datasets. Moreover for the table reasoning task, Large Language Models (LLMs) have shown competitive ability, but cannot fully process large tables due to context limit and may fail to comprehend the complex relationships within tabular data. In this dissertation, we investigate algorithms and techniques to address the generalization and scalability issues of GNNs, as well as the effective and efficient approach to the table reasoning task. In the first work, we propose to leverage data augmentation to generalize GNNs. We propose FLAG (Free Large-scale Adversarial Augmentation on Graphs), which iteratively augments node features with gradient-based adversarial perturbations during training. In the second and third work, we look into GNNs’ scalability problem. We propose VQ-GNN, a universal framework to scale up any convolution-based GNNs using Vector Quantization (VQ) without compromising the performance. We further propose GOAT, a global graph transformer that scales to large graphs with millions of nodes and is competitive on tasks of both homophilious and heterophilious graphs. Lastly, we propose OpenTab, an effective method towards open-domain table reasoning task built with the advanced Large Language Models.	en_US
dc.identifier	https://doi.org/10.13016/ml6u-6zx2
dc.identifier.uri	http://hdl.handle.net/1903/32795
dc.language.iso	en	en_US
dc.subject.pqcontrolled	Artificial intelligence	en_US
dc.subject.pqcontrolled	Computer science	en_US
dc.title	Towards Generalized and Scalable Machine Learning on Structured Data	en_US
dc.type	Dissertation	en_US

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Kong_umd_0117E_24052.pdf
Size:: 2.81 MB
Format:: Adobe Portable Document Format

Download

Collections

UMD Theses and Dissertations
Computer Science Theses and Dissertations