Learning and Composing Primitives for the Visual World

dc.contributor.advisorShrivastava, Abhinaven_US
dc.contributor.advisorDavis, Larryen_US
dc.contributor.authorGupta, Kamalen_US
dc.contributor.departmentComputer Scienceen_US
dc.contributor.publisherDigital Repository at the University of Marylanden_US
dc.contributor.publisherUniversity of Maryland (College Park, Md.)en_US
dc.date.accessioned2023-06-26T05:35:45Z
dc.date.available2023-06-26T05:35:45Z
dc.date.issued2023en_US
dc.description.abstractCompositionality is at the core of how humans understand and create visual data. In order for the computational approaches to assist humans in creative tasks, it is crucial for them to understand and perform composition. The recent advances in deep generative models have enabled us to convert noise to highly realistic scenes. However, in order to harness these models for building real-world applications, I argue that we need to be able to represent and control the generation process with the composition of interpretable primitives. In the first half of this talk, I’ll discuss how deep models can discover such primitives from visual data. By playing a cooperative referential game between two neural network agents, we can represent images with discrete meaningful concepts without supervision. I further extend this work for applications in image and video editing by learning a dense correspondence of primitives across images. In the second half, I’ll focus on learning how to compose primitives for both 2D and 3D visual data. By expressing the scenes as an assembly of smaller parts, we can easily perform generation from scratch or from partial scenes as input. I’ll conclude the talk with a discussion of possible future directions and applications of generative models, and how we can better enable users to guide the creative process.en_US
dc.identifierhttps://doi.org/10.13016/dspace/y4ic-ixbx
dc.identifier.urihttp://hdl.handle.net/1903/30182
dc.language.isoenen_US
dc.subject.pqcontrolledComputer scienceen_US
dc.subject.pquncontrolledComputer Graphicsen_US
dc.subject.pquncontrolledComputer Visionen_US
dc.subject.pquncontrolledDeep Learningen_US
dc.subject.pquncontrolledGenerative Modelingen_US
dc.subject.pquncontrolledMachine Learningen_US
dc.subject.pquncontrolledNatural Language Processingen_US
dc.titleLearning and Composing Primitives for the Visual Worlden_US
dc.typeDissertationen_US

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Gupta_umd_0117E_23291.pdf
Size:
41.76 MB
Format:
Adobe Portable Document Format