Understanding and Enhancing Machine Learning Models with Theoretical Foundations

Hu, Zhengmian

Understanding and Enhancing Machine Learning Models with Theoretical Foundations

dc.contributor.advisor	Huang, Heng HH	en_US
dc.contributor.author	Hu, Zhengmian	en_US
dc.contributor.department	Computer Science	en_US
dc.contributor.publisher	Digital Repository at the University of Maryland	en_US
dc.contributor.publisher	University of Maryland (College Park, Md.)	en_US
dc.date.accessioned	2024-09-23T06:07:52Z
dc.date.available	2024-09-23T06:07:52Z
dc.date.issued	2024	en_US
dc.description.abstract	Machine learning has become a key driver of many contemporary technological advancements. With its empirical success, there is an urgent need for theoretical research to explain and complement these practical achievements. This includes understanding the empirical success of machine learning, especially deep learning, and aiding the design of better algorithms in terms of performance, efficiency, and security. This dissertation aims to advance the understanding and practical development of machine learning through three interrelated research directions, while emphasizing reliable theoretical guarantees throughout the process. In the first part, we study the deep learning theory under overparameterization conditions. The core objects of study are the Conjugate Kernel and Neural Tangent Kernel, which have deep connections to the training dynamics of deep learning. Based on the analysis of these kernels, we prove several new concentration results characterizing the trainability and generalization of infinitely wide neural networks. In the second part, we focus on training algorithms. On one hand, we propose new algorithms to improve learning efficiency. This includes a new underdamped Langevin MCMC method called ALUM, for which we prove its complexity reaches the theoretical lower bound. On the other hand, we propose new theoretical tools to analyze existing algorithms and obtain tighter convergence results. For Proxskip, our analysis shows it can still achieve an improvement in communication complexity from sublinear to linear convergence under stochastic oracle. We also generalize the concept of Lipschitz smoothness for tighter non-convex optimization analysis. In the third part, we develop new Monte Carlo methods to large language models (LLMs) to improve their efficiency and security. We develop unbiased watermarking techniques to protect model outputs and propose an Accelerated Speculative Sampling method for faster inference. We also investigate the trade-off between watermark strength and inference sampling efficiency, pointing out the conflict between the two.	en_US
dc.identifier	https://doi.org/10.13016/lixf-enm5
dc.identifier.uri	http://hdl.handle.net/1903/33393
dc.language.iso	en	en_US
dc.subject.pqcontrolled	Computer science	en_US
dc.subject.pquncontrolled	Deep Learning Theory	en_US
dc.subject.pquncontrolled	Large Language Model	en_US
dc.subject.pquncontrolled	Machine Learning	en_US
dc.subject.pquncontrolled	Optimization	en_US
dc.subject.pquncontrolled	Probabilistic Method	en_US
dc.title	Understanding and Enhancing Machine Learning Models with Theoretical Foundations	en_US
dc.type	Dissertation	en_US

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Hu_umd_0117E_24579.pdf
Size:: 3.97 MB
Format:: Adobe Portable Document Format

Download

Collections

UMD Theses and Dissertations
Computer Science Theses and Dissertations