Understanding and Enhancing Machine Learning Models with Theoretical Foundations
dc.contributor.advisor | Huang, Heng HH | en_US |
dc.contributor.author | Hu, Zhengmian | en_US |
dc.contributor.department | Computer Science | en_US |
dc.contributor.publisher | Digital Repository at the University of Maryland | en_US |
dc.contributor.publisher | University of Maryland (College Park, Md.) | en_US |
dc.date.accessioned | 2024-09-23T06:07:52Z | |
dc.date.available | 2024-09-23T06:07:52Z | |
dc.date.issued | 2024 | en_US |
dc.description.abstract | Machine learning has become a key driver of many contemporary technological advancements. With its empirical success, there is an urgent need for theoretical research to explain and complement these practical achievements. This includes understanding the empirical success of machine learning, especially deep learning, and aiding the design of better algorithms in terms of performance, efficiency, and security. This dissertation aims to advance the understanding and practical development of machine learning through three interrelated research directions, while emphasizing reliable theoretical guarantees throughout the process. In the first part, we study the deep learning theory under overparameterization conditions. The core objects of study are the Conjugate Kernel and Neural Tangent Kernel, which have deep connections to the training dynamics of deep learning. Based on the analysis of these kernels, we prove several new concentration results characterizing the trainability and generalization of infinitely wide neural networks. In the second part, we focus on training algorithms. On one hand, we propose new algorithms to improve learning efficiency. This includes a new underdamped Langevin MCMC method called ALUM, for which we prove its complexity reaches the theoretical lower bound. On the other hand, we propose new theoretical tools to analyze existing algorithms and obtain tighter convergence results. For Proxskip, our analysis shows it can still achieve an improvement in communication complexity from sublinear to linear convergence under stochastic oracle. We also generalize the concept of Lipschitz smoothness for tighter non-convex optimization analysis. In the third part, we develop new Monte Carlo methods to large language models (LLMs) to improve their efficiency and security. We develop unbiased watermarking techniques to protect model outputs and propose an Accelerated Speculative Sampling method for faster inference. We also investigate the trade-off between watermark strength and inference sampling efficiency, pointing out the conflict between the two. | en_US |
dc.identifier | https://doi.org/10.13016/lixf-enm5 | |
dc.identifier.uri | http://hdl.handle.net/1903/33393 | |
dc.language.iso | en | en_US |
dc.subject.pqcontrolled | Computer science | en_US |
dc.subject.pquncontrolled | Deep Learning Theory | en_US |
dc.subject.pquncontrolled | Large Language Model | en_US |
dc.subject.pquncontrolled | Machine Learning | en_US |
dc.subject.pquncontrolled | Optimization | en_US |
dc.subject.pquncontrolled | Probabilistic Method | en_US |
dc.title | Understanding and Enhancing Machine Learning Models with Theoretical Foundations | en_US |
dc.type | Dissertation | en_US |
Files
Original bundle
1 - 1 of 1
No Thumbnail Available
- Name:
- Hu_umd_0117E_24579.pdf
- Size:
- 3.97 MB
- Format:
- Adobe Portable Document Format