A. James Clark School of Engineering

Permanent URI for this communityhttp://hdl.handle.net/1903/1654

The collections in this community comprise faculty research works, as well as graduate theses and dissertations.

Browse

Search Results

Now showing 1 - 2 of 2
  • Thumbnail Image
    Item
    Active Attention for Target Detection and Recognition in Robot Vision
    (2017) Luan, Wentao; Baras, John S; Electrical Engineering; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    In this thesis, we address problems in building an efficient and reliable target detection and recognition system for robot applications, where the vision module is only one component of the overall system executing the task. The different modules interact with each other to achieve the goal. In this interaction, the role of vision is not only to recognize but also to select what and where to process. In other words, attention is an essential process for efficient task execution. We introduce attention mechanisms into the recognition system that serve the overall system at different levels of the integration and formulate four problems as below. At the most basic level of integration, attention interacts with vision only. We consider the problem of detecting a target in an input image using a trained binary classifier of the target and formulate the target detection problem as a sampling process. The goal is to localize the windows containing targets in the image, and attention controls which part of the image to process next. We observe that detectors’ response scores of sampling windows fade gradually from the peak response window in the detection area and approximate this scoring pattern with an exponential de- cay function. Exploiting this property, we propose an active sampling procedure to efficiently detect the target while avoiding an exhaustive and expensive search of all the possible window locations. With more knowledge about the target, we describe the target as template graphs over segmented surfaces. Constraint functions are also defined to find the node and edge’s matching between an input scene graph and target’s template graph. We propose to introduce the recognition early into the traditional candidate proposal process to achieve fast and reliable detection performance. The target detection thence becomes finding subgraphs from the segmented input scene graph that match the template graphs. In this problem, attention provides the order of constraints in checking the graph matching, and a reasonable sequence can help filter out negatives early, thus reducing computational time. We put forward a sub-optimal checking order, and prove that it has bounded time cost compared to the optimal checking sequence, which is not obtainable in polynomial time. Experiments on rigid and non-rigid object detection validate our pipeline. With more freedom in control, we allow the robot to actively choose another viewpoint if the current view cannot deliver a reliable detection and recognition result. We develop a practical viewpoint control system and apply it to two human-robot interaction applications, where the detection task becomes more challenging with the additional randomness from the human. Attention represents an active process of deciding the location of the camera. Our viewpoint selection module not only considers the viewing condition constraints for vision algorithms but also incorporates the low-level robot kinematics to guarantee the reachability of the desired viewpoint. By selecting viewpoints fast using a linear time cost score function, the system can deliver smooth user interaction experience. Additionally, we provide a learning from human demonstration method to obtain the score function parameters that better serves the task’s preference. Finally, when recognition results from multiple sources under different environmental factor are available, attention means how to fuse the observations to get reliable output. We consider the problem of object recognition in 3D using an ensemble of attribute-based classifiers. We propose two new concepts to improve classification in practical situations, and show their implementation in an approach implemented for recognition from point-cloud data. First, we study the impact of the distance between the camera and the object and propose an approach to classifier’s accuracy performance, which incorporates distance into the decision making. Second, to avoid the difficulties arising from lack of representative training examples in learning the optimal threshold, we set in our attribute classifier two threshold values to distinguish a positive, a negative and an uncertainty class, instead of just one threshold value. We prove the theoretical correctness of this approach for an active agent who can observe the object multiple times.
  • Thumbnail Image
    Item
    Single-Microphone Speech Enhancement Inspired by Auditory System
    (2014) Mirbagheri, Majid; Shamma, Shihab; Electrical Engineering; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    Enhancing quality of speech in noisy environments has been an active area of research due to the abundance of applications dealing with human voice and dependence of their performance on this quality. While original approaches in the field were mostly addressing this problem in a pure statistical framework in which the goal was to estimate speech from its sum with other independent processes (noise), during last decade, the attention of the scientific community has turned to the functionality of human auditory system. A lot of effort has been put to bridge the gap between the performance of speech processing algorithms and that of average human by borrowing the models suggested for the sound processing in the auditory system. In this thesis, we will introduce algorithms for speech enhancement inspired by two of these models i.e. the cortical representation of sounds and the hypothesized role of temporal coherence in the auditory scene analysis. After an introduction to the auditory system and the speech enhancement framework we will first show how traditional speech enhancement technics such as wiener-filtering can benefit on the feature extraction level from discriminatory capabilities of spectro-temporal representation of sounds in the cortex i.e. the cortical model. We will next focus on the feature processing as opposed to the extraction stage in the speech enhancement systems by taking advantage of models hypothesized for human attention for sound segregation. We demonstrate a mask-based enhancement method in which the temporal coherence of features is used as a criterion to elicit information about their sources and more specifically to form the masks needed to suppress the noise. Lastly, we explore how the two blocks for feature extraction and manipulation can be merged into one in a manner consistent with our knowledge about auditory system. We will do this through the use of regularized non-negative matrix factorization to optimize the feature extraction and simultaneously account for temporal dynamics to separate noise from speech.