Science of Deep Learning: From Initialization to Emergent Structures

dc.contributor.advisorBarkeshli, Maissamen_US
dc.contributor.advisorGromov, Andreyen_US
dc.contributor.authorDoshi, Darshilen_US
dc.contributor.departmentPhysicsen_US
dc.contributor.publisherDigital Repository at the University of Marylanden_US
dc.contributor.publisherUniversity of Maryland (College Park, Md.)en_US
dc.date.accessioned2025-08-08T12:18:18Z
dc.date.issued2025en_US
dc.description.abstractAs artificial intelligence (AI) systems grow increasingly powerful and permeate every aspect of our lives, their impact on both individuals and society is an urgent concern. Questions of safety and robustness in AI stem largely from our limited understanding of deep learning. Research in this domain has traditionally followed two parallel paths: an empirical approach that prioritizes practical advancements and a theoretical approach that seeks a mathematical understanding from first principles. Despite notable progress, a significant gap remains between deep learning practice and its theoretical underpinnings. This dissertation advocates for a phenomenological approach to understanding AI systems -- one that integrates empirical observations with theoretical model-building. This methodology has been instrumental in the physical sciences, and it holds similar promise for advancing the science of deep learning. Over two broad parts, this work demonstrates the effectiveness of this approach in characterizing model architectures and their emergent capabilities. In the first part, we explore how signal propagation analysis in large-N limits can inform the design and initialization of model architectures. We develop a diagnostic observable that distinguishes between ordered and chaotic behaviors in neural networks, guiding optimal parameter initialization for training. Our analysis establishes the theoretical soundness of this observable in simple networks and confirms its empirical utility in state-of-the-art architectures. The findings reveal an architecture design paradigm that eliminates the need for careful initialization, shedding light on widely used heuristic practices. Additionally, we introduce an algorithm that automates initialization across diverse model architectures, enhancing their trainability. In the second part, we highlight the importance of the systems identification approach for characterizing AI systems. We explore several stylized setups where model capabilities emerge as a function of compute, data quantity, and data diversity. Using arithmetic and cryptographic tasks as examples, we demonstrate that emergent abilities such as grokking and in-context learning arise alongside the formation of interpretable structures within the model’s parameters, hidden representations, and outputs. Through targeted experiments, we identify these structures using (i) black-box probing, which examines model responses to characteristic inputs, and (ii) open-box analysis, which leverages curated task-specific observables and metrics to study internal model states. This dissertation promotes a paradigm for understanding deep learning that complements both heuristic-driven and hypothesis-driven approaches. By integrating experimental methodologies and analytical tools from established scientific disciplines, this framework has the potential to steer the field toward safer, more robust, and more efficient AI systems.en_US
dc.identifierhttps://doi.org/10.13016/zn4m-3h1j
dc.identifier.urihttp://hdl.handle.net/1903/34284
dc.language.isoenen_US
dc.subject.pqcontrolledPhysicsen_US
dc.subject.pqcontrolledArtificial intelligenceen_US
dc.subject.pquncontrolledAI interpretabilityen_US
dc.subject.pquncontrolledCritical initializationen_US
dc.subject.pquncontrolledDeep Learningen_US
dc.subject.pquncontrolledEmergenceen_US
dc.subject.pquncontrolledGrokkingen_US
dc.subject.pquncontrolledIn-context learningen_US
dc.titleScience of Deep Learning: From Initialization to Emergent Structuresen_US
dc.typeDissertationen_US

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Doshi_umd_0117E_25152.pdf
Size:
58.65 MB
Format:
Adobe Portable Document Format