Theses and Dissertations from UMD
Permanent URI for this communityhttp://hdl.handle.net/1903/2
New submissions to the thesis/dissertation collections are added automatically as they are received from the Graduate School. Currently, the Graduate School deposits all theses and dissertations from a given semester after the official graduation date. This means that there may be up to a 4 month delay in the appearance of a give thesis/dissertation in DRUM
More information is available at Theses and Dissertations at University of Maryland Libraries.
Browse
2 results
Search Results
Item Improving Efficiency and Generalization of Visual Recognition(2018) Yu, Ruichi; Davis, Larry S; Computer Science; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)Deep Neural Networks (DNNs) are heavy in terms of their number of parameters and computational cost. This leads to two major challenges: first, training and deployment of deep networks are expensive; second, without tremendous annotated training data, which are very costly to obtain, DNNs easily suffer over-fitting and have poor generalization. We propose approaches to these two challenges in the context of specific computer vision problems to improve their efficiency and generalization. First, we study network pruning using neuron importance score propagation. To reduce the significant redundancy in DNNs, we formulate network pruning as a binary integer optimization problem which minimizes the reconstruction errors on the final responses produced by the network, and derive a closed-form solution to it for pruning neurons in earlier layers. Based on our theoretical analysis, we propose the Neuron Importance Score Propagation (NISP) algorithm to propagate the importance scores of final responses to every neuron in the network, then prune neurons in the entire networks jointly. Second, we study visual relationship detection (VRD) with linguistic knowledge distillation. Since the semantic space of visual relationships is huge and training data is limited, especially for long-tail relationships that have few instances, detecting visual relationships from images is a challenging problem. To improve the predictive capability, especially generalization on unseen relationships, we utilize knowledge of linguistic statistics obtained from both training annotations (internal knowledge) and publicly available text, e.g., Wikipedia (external knowledge) to regularize visual model learning. Third, we study the role of context selection in object detection. We investigate the reasons why context in object detection has limited utility by isolating and evaluating the predictive power of different context cues under ideal conditions in which context provided by an oracle. Based on this study, we propose a region-based context re-scoring method with dynamic context selection to remove noise and emphasize informative context. Fourth, we study the efficient relevant motion event detection for large-scale home surveillance videos. To detect motion events of objects-of-interest from large scale home surveillance videos, traditional methods based on object detection and tracking are extremely slow and require expensive GPU devices. To dramatically speedup relevant motion event detection and improve its performance, we propose a novel network for relevant motion event detection, ReMotENet, which is a unified, end-to-end data-driven method using spatial-temporal attention-based 3D ConvNets to jointly model the appearance and motion of objects-of-interest in a video. In the last part, we address the recognition of agent-in-place actions, which are associated with agents who perform them and places where they occur, in the context of outdoor home surveillance. We introduce a representation of the geometry and topology of scene layouts so that a network can generalize from the layouts observed in the training set to unseen layouts in the test set. This Layout-Induced Video Representation (LIVR) abstracts away low-level appearance variance and encodes geometric and topological relationships of places in a specific scene layout. LIVR partitions the semantic features of a video clip into different places to force the network to learn place-based feature descriptions; to predict the confidence of each action, LIVR aggregates features from the place associated with an action and its adjacent places on the scene layout. We introduce the Agent-in-Place Action dataset to show that our method allows neural network models to generalize significantly better to unseen scenes.Item THE EFFECTIVENESS OF POINT-OF-VIEW VIDEO MODELING IN TEACHING SOCIAL INITIATION SKILLS TO CHILDREN WITH AUTISM SPECTRUM DISORDERS(2016) Kouo, Jennifer Lee; Kohl, Frances L; Lieber, Joan; Special Education; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)Deficits in social communication and interaction have been identified as distinguishing impairments for individuals with an autism spectrum disorder (ASD). As a pivotal skill, the successful development of social communication and interaction in individuals with ASD is a lifelong objective. Point-of-view video modeling has the potential to address these deficits. This type of video involves filming the completion of a targeted skill or behavior from a first-person perspective. By presenting only what a person might see from his or her viewpoint, it has been identified to be more effective in limiting irrelevant stimuli by providing a clear frame of reference to facilitate imitation. The current study investigated the use of point-of-view video modeling in teaching social initiations (e.g., greetings). Using a multiple baseline across participants design, five kindergarten participants were taught social initiations using point-of-view video modeling and video priming. Immediately before and after viewing the entire point-of-view video model, the participants were evaluated on their social initiations with a trained, typically developing peer serving as a communication partner. Specifically, the social initiations involved participants’ abilities to shift their attention toward the peer who entered the classroom, maintain attention toward the peer, and engage in an appropriate social initiation (e.g., hi, hello). Both generalization and maintenance were tested. Overall, the data suggest point-of-view video modeling is an effective intervention for increasing social initiations in young students with ASD. However, retraining was necessary for acquisition of skills in the classroom environment. Generalization in novel environments and with a novel communication partner, and generalization to other social initiation skills was limited. Additionally, maintenance of gained social initiation skills only occurred in the intervention room. Despite the limitations of the study and variable results, there are a number of implications moving forward for both practitioners and future researchers examining point-of-view modeling and its potential impact on the social initiation skills of individuals with ASD.