Science of Deep Learning: From Initialization to Emergent Structures

Doshi, Darshil

Science of Deep Learning: From Initialization to Emergent Structures

dc.contributor.advisor	Barkeshli, Maissam	en_US
dc.contributor.advisor	Gromov, Andrey	en_US
dc.contributor.author	Doshi, Darshil	en_US
dc.contributor.department	Physics	en_US
dc.contributor.publisher	Digital Repository at the University of Maryland	en_US
dc.contributor.publisher	University of Maryland (College Park, Md.)	en_US
dc.date.accessioned	2025-08-08T12:18:18Z
dc.date.issued	2025	en_US
dc.description.abstract	As artificial intelligence (AI) systems grow increasingly powerful and permeate every aspect of our lives, their impact on both individuals and society is an urgent concern. Questions of safety and robustness in AI stem largely from our limited understanding of deep learning. Research in this domain has traditionally followed two parallel paths: an empirical approach that prioritizes practical advancements and a theoretical approach that seeks a mathematical understanding from first principles. Despite notable progress, a significant gap remains between deep learning practice and its theoretical underpinnings. This dissertation advocates for a phenomenological approach to understanding AI systems -- one that integrates empirical observations with theoretical model-building. This methodology has been instrumental in the physical sciences, and it holds similar promise for advancing the science of deep learning. Over two broad parts, this work demonstrates the effectiveness of this approach in characterizing model architectures and their emergent capabilities. In the first part, we explore how signal propagation analysis in large-N limits can inform the design and initialization of model architectures. We develop a diagnostic observable that distinguishes between ordered and chaotic behaviors in neural networks, guiding optimal parameter initialization for training. Our analysis establishes the theoretical soundness of this observable in simple networks and confirms its empirical utility in state-of-the-art architectures. The findings reveal an architecture design paradigm that eliminates the need for careful initialization, shedding light on widely used heuristic practices. Additionally, we introduce an algorithm that automates initialization across diverse model architectures, enhancing their trainability. In the second part, we highlight the importance of the systems identification approach for characterizing AI systems. We explore several stylized setups where model capabilities emerge as a function of compute, data quantity, and data diversity. Using arithmetic and cryptographic tasks as examples, we demonstrate that emergent abilities such as grokking and in-context learning arise alongside the formation of interpretable structures within the model’s parameters, hidden representations, and outputs. Through targeted experiments, we identify these structures using (i) black-box probing, which examines model responses to characteristic inputs, and (ii) open-box analysis, which leverages curated task-specific observables and metrics to study internal model states. This dissertation promotes a paradigm for understanding deep learning that complements both heuristic-driven and hypothesis-driven approaches. By integrating experimental methodologies and analytical tools from established scientific disciplines, this framework has the potential to steer the field toward safer, more robust, and more efficient AI systems.	en_US
dc.identifier	https://doi.org/10.13016/zn4m-3h1j
dc.identifier.uri	http://hdl.handle.net/1903/34284
dc.language.iso	en	en_US
dc.subject.pqcontrolled	Physics	en_US
dc.subject.pqcontrolled	Artificial intelligence	en_US
dc.subject.pquncontrolled	AI interpretability	en_US
dc.subject.pquncontrolled	Critical initialization	en_US
dc.subject.pquncontrolled	Deep Learning	en_US
dc.subject.pquncontrolled	Emergence	en_US
dc.subject.pquncontrolled	Grokking	en_US
dc.subject.pquncontrolled	In-context learning	en_US
dc.title	Science of Deep Learning: From Initialization to Emergent Structures	en_US
dc.type	Dissertation	en_US

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Doshi_umd_0117E_25152.pdf
Size:: 58.65 MB
Format:: Adobe Portable Document Format

Download

Collections

UMD Theses and Dissertations
Physics Theses and Dissertations