Object-Attribute Compositionality for Visual Understanding

Saini, Nirat

Object-Attribute Compositionality for Visual Understanding

dc.contributor.advisor	Shrivastava, Abhinav Dr	en_US
dc.contributor.author	Saini, Nirat	en_US
dc.contributor.department	Computer Science	en_US
dc.contributor.publisher	Digital Repository at the University of Maryland	en_US
dc.contributor.publisher	University of Maryland (College Park, Md.)	en_US
dc.date.accessioned	2024-09-23T06:27:25Z
dc.date.available	2024-09-23T06:27:25Z
dc.date.issued	2024	en_US
dc.description.abstract	Object appearances evolve overtime, which results in visually discernible changes in their colors, shapes, sizes and materials. Humans are innately good at recognizing and understanding the evolution of object states, which is also crucial for visual understanding across images and videos. However, current vision models still struggle to capture and account for these subtle changes to recognize the objects and underlying action causing the changes. This thesis focuses on using compositional learning for recognition and generation of attribute-object pairs. In the first part, we propose to disentangle visual features for object and attributes, to generalize recognition for novel object-attribute pairs. Next, we extend this approach to learn entirely unseen attribute-object pairs, by using semantic language priors, label smoothing and propagation techniques. Further, we use object states for action recognition in videos where subtle changes in object attributes and affordances help in identifying state-modifying and context-transforming actions. All of these methods for decomposing and composing objects and states generalize to unseen pairs and out-of-domain datasets for various compositional zero-shot learning and action recognition tasks. In the second part, we propose a new benchmark suite Chop \& Learn for a novel task of Compositional Image Generation as well as discuss the implications of these approaches for other compositional tasks in images, videos, and beyond. We further extend insertion and editing of attributes of objects consistently across frames of videos, using off-the-shelf training free architecture and discuss the future challenges and opportunities of compositionality for visual understanding.	en_US
dc.identifier	https://doi.org/10.13016/anle-scci
dc.identifier.uri	http://hdl.handle.net/1903/33459
dc.language.iso	en	en_US
dc.subject.pqcontrolled	Computer science	en_US
dc.subject.pquncontrolled	Compositional Learning	en_US
dc.subject.pquncontrolled	Computer Vision	en_US
dc.subject.pquncontrolled	Machine Learning	en_US
dc.title	Object-Attribute Compositionality for Visual Understanding	en_US
dc.type	Dissertation	en_US

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Saini_umd_0117E_24649.pdf
Size:: 103.72 MB
Format:: Adobe Portable Document Format

Download

Collections

UMD Theses and Dissertations
Computer Science Theses and Dissertations