Understanding and Enhancing Machine Learning Models with Theoretical Foundations

dc.contributor.advisorHuang, Heng HHen_US
dc.contributor.authorHu, Zhengmianen_US
dc.contributor.departmentComputer Scienceen_US
dc.contributor.publisherDigital Repository at the University of Marylanden_US
dc.contributor.publisherUniversity of Maryland (College Park, Md.)en_US
dc.date.accessioned2024-09-23T06:07:52Z
dc.date.available2024-09-23T06:07:52Z
dc.date.issued2024en_US
dc.description.abstractMachine learning has become a key driver of many contemporary technological advancements. With its empirical success, there is an urgent need for theoretical research to explain and complement these practical achievements. This includes understanding the empirical success of machine learning, especially deep learning, and aiding the design of better algorithms in terms of performance, efficiency, and security. This dissertation aims to advance the understanding and practical development of machine learning through three interrelated research directions, while emphasizing reliable theoretical guarantees throughout the process. In the first part, we study the deep learning theory under overparameterization conditions. The core objects of study are the Conjugate Kernel and Neural Tangent Kernel, which have deep connections to the training dynamics of deep learning. Based on the analysis of these kernels, we prove several new concentration results characterizing the trainability and generalization of infinitely wide neural networks. In the second part, we focus on training algorithms. On one hand, we propose new algorithms to improve learning efficiency. This includes a new underdamped Langevin MCMC method called ALUM, for which we prove its complexity reaches the theoretical lower bound. On the other hand, we propose new theoretical tools to analyze existing algorithms and obtain tighter convergence results. For Proxskip, our analysis shows it can still achieve an improvement in communication complexity from sublinear to linear convergence under stochastic oracle. We also generalize the concept of Lipschitz smoothness for tighter non-convex optimization analysis. In the third part, we develop new Monte Carlo methods to large language models (LLMs) to improve their efficiency and security. We develop unbiased watermarking techniques to protect model outputs and propose an Accelerated Speculative Sampling method for faster inference. We also investigate the trade-off between watermark strength and inference sampling efficiency, pointing out the conflict between the two.en_US
dc.identifierhttps://doi.org/10.13016/lixf-enm5
dc.identifier.urihttp://hdl.handle.net/1903/33393
dc.language.isoenen_US
dc.subject.pqcontrolledComputer scienceen_US
dc.subject.pquncontrolledDeep Learning Theoryen_US
dc.subject.pquncontrolledLarge Language Modelen_US
dc.subject.pquncontrolledMachine Learningen_US
dc.subject.pquncontrolledOptimizationen_US
dc.subject.pquncontrolledProbabilistic Methoden_US
dc.titleUnderstanding and Enhancing Machine Learning Models with Theoretical Foundationsen_US
dc.typeDissertationen_US

Files

Original bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
Hu_umd_0117E_24579.pdf
Size:
3.97 MB
Format:
Adobe Portable Document Format