Latent Space Explorations for Generative AI

dc.contributor.advisorShrivastava, Abhinaven_US
dc.contributor.advisorYacoob, Yaseren_US
dc.contributor.authorOorloff, Trevine Shane Judeen_US
dc.contributor.departmentElectrical Engineeringen_US
dc.contributor.publisherDigital Repository at the University of Marylanden_US
dc.contributor.publisherUniversity of Maryland (College Park, Md.)en_US
dc.date.accessioned2025-08-08T11:33:45Z
dc.date.issued2024en_US
dc.description.abstractGenerative AI has revolutionized content creation through models such as Generative Adversarial Networks (GANs) and diffusion models, which produce high-quality, realistic outputs across various domains. These advancements rely on the ability of generative models to learn and encode complex patterns and semantic relationships within high-dimensional latent spaces, which serve as a foundation for their capacity to generate coherent and diverse outputs. Beyond image generation, these latent spaces hold immense potential for adaptation to a variety of downstream applications, making them a critical focus for research. This thesis systematically explores latent spaces for generative AI along three key dimensions. The first and primary dimension investigates how latent spaces can be harnessed to extend generative models beyond traditional image synthesis. By leveraging the structured latent spaces of StyleGAN2 and Stable Diffusion, this work introduces novel methodologies for expressive face video encoding, robust one-shot face reenactment, and training-free visual in-context learning. Key contributions include methods for encoding fine-grained facial expressions and motions for video generation, decomposing identity and motion for seamless reenactment within the StyleGAN’s latent space, and reformulating self-attention in Stable Diffusion for multi-task visual in-context learning. The second dimension addresses a critical limitation of generative models: hallucinations in diffusion models. A novel framework, Adaptive Attention Modulation (AAM), is proposed to dynamically modulate self-attention distributions during early denoising stages. By introducing temperature scaling and a masked perturbation strategy, AAM mitigates the emergence of unrealistic artifacts, significantly improving the fidelity and reliability of diffusion-generated content. The third dimension focuses on mitigating societal risks posed by generative AI, particularly the proliferation of deepfakes. Through a multi-modal framework called Audio-Visual Feature Fusion (AVFF), this thesis develops a robust deepfake detection method that explicitly captures audio-visual correspondences. Combining self-supervised representation learning with a novel complementary masking and cross-modal fusion strategy, AVFF achieves state-of-the-art performance in identifying manipulated multimedia content, addressing a pressing ethical challenge in generative AI.en_US
dc.identifierhttps://doi.org/10.13016/sgey-i8b1
dc.identifier.urihttp://hdl.handle.net/1903/34033
dc.language.isoenen_US
dc.subject.pqcontrolledArtificial intelligenceen_US
dc.subject.pqcontrolledComputer scienceen_US
dc.subject.pqcontrolledComputer engineeringen_US
dc.subject.pquncontrolledDeepfake detectionen_US
dc.subject.pquncontrolledDiffusion hallucinationsen_US
dc.subject.pquncontrolledGenerative AIen_US
dc.subject.pquncontrolledGenerative priorsen_US
dc.subject.pquncontrolledLatent explorations/manipulationsen_US
dc.subject.pquncontrolledVisual in-context learningen_US
dc.titleLatent Space Explorations for Generative AIen_US
dc.typeDissertationen_US

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Oorloff_umd_0117E_24866.pdf
Size:
48.45 MB
Format:
Adobe Portable Document Format