Gupta, KamalCompositionality is at the core of how humans understand and create visual data. In order for the computational approaches to assist humans in creative tasks, it is crucial for them to understand and perform composition. The recent advances in deep generative models have enabled us to convert noise to highly realistic scenes. However, in order to harness these models for building real-world applications, I argue that we need to be able to represent and control the generation process with the composition of interpretable primitives. In the first half of this talk, I’ll discuss how deep models can discover such primitives from visual data. By playing a cooperative referential game between two neural network agents, we can represent images with discrete meaningful concepts without supervision. I further extend this work for applications in image and video editing by learning a dense correspondence of primitives across images. In the second half, I’ll focus on learning how to compose primitives for both 2D and 3D visual data. By expressing the scenes as an assembly of smaller parts, we can easily perform generation from scratch or from partial scenes as input. I’ll conclude the talk with a discussion of possible future directions and applications of generative models, and how we can better enable users to guide the creative process.enLearning and Composing Primitives for the Visual WorldDissertationComputer scienceComputer GraphicsComputer VisionDeep LearningGenerative ModelingMachine LearningNatural Language Processing