Skip to content
University of Maryland LibrariesDigital Repository at the University of Maryland
    • ログイン
    アイテム表示 
    •   ホーム
    • Theses and Dissertations from UMD
    • UMD Theses and Dissertations
    • アイテム表示
    •   ホーム
    • Theses and Dissertations from UMD
    • UMD Theses and Dissertations
    • アイテム表示
    JavaScript is disabled for your browser. Some features of this site may not work without it.

    DEEP NEURAL NETWORKS AND REGRESSION MODELS FOR OBJECT DETECTION AND POSE ESTIMATION

    Thumbnail
    閲覧/開く
    Hara_umd_0117E_17577.pdf (2.509Mb)
    No. of downloads: 1439

    日付
    2016
    著者
    Hara, Kota
    Advisor
    Chellappa, Rama
    DRUM DOI
    https://doi.org/10.13016/M2ZC1H
    Metadata
    アイテムの詳細レコードを表示する
    抄録
    Estimating the pose, orientation and the location of objects has been a central problem addressed by the computer vision community for decades. In this dissertation, we propose new approaches for these important problems using deep neural networks as well as tree-based regression models. For the first topic, we look at the human body pose estimation problem and propose a novel regression-based approach. The goal of human body pose estimation is to predict the locations of body joints, given an image of a person. Due to significant variations introduced by pose, clothing and body styles, it is extremely difficult to address this task by a standard application of the regression method. Thus, we address this task by dividing the whole body pose estimation problem into a set of local pose estimation problems by introducing a dependency graph which describes the dependency among different body joints. For each local pose estimation problem, we train a boosted regression tree model and estimate the pose by progressively applying the regression along the paths in a dependency graph starting from the root node. Our next work is on improving the traditional regression tree method and demonstrate its effectiveness for pose/orientation estimation tasks. The main issues of the traditional regression training are, 1) the node splitting is limited to binary splitting, 2) the form of the splitting function is limited to thresholding on a single dimension of the input vector and 3) the best splitting function is found by exhaustive search. We propose a novel node splitting algorithm for regression tree training which does not have the issues mentioned above. The algorithm proceeds by first applying k-means clustering in the output space, conducting multi-class classification by support vector machine (SVM) and determining the constant estimate at each leaf node. We apply the regression forest that includes our regression tree models to head pose estimation, car orientation estimation and pedestrian orientation estimation tasks and demonstrate its superiority over various standard regression methods. Next, we turn our attention to the role of pose information for the object detection task. In particular, we focus on the detection of fashion items a person is wearing or carrying. It is clear that the locations of these items are strongly correlated with the pose of the person. To address this task, we first generate a set of candidate bounding boxes by using an object proposal algorithm. For each candidate bounding box, image features are extracted by a deep convolutional neural network pre-trained on a large image dataset and the detection scores are generated by SVMs. We introduce a pose-dependent prior on the geometry of the bounding boxes and combine it with the SVM scores. We demonstrate that the proposed algorithm achieves significant improvement in the detection performance. Lastly, we address the object detection task by exploring a way to incorporate an attention mechanism into the detection algorithm. Humans have the capability of allocating multiple fixation points, each of which attends to different locations and scales of the scene. However, such a mechanism is missing in the current state-of-the-art object detection methods. Inspired by the human vision system, we propose a novel deep network architecture that imitates this attention mechanism. For detecting objects in an image, the network adaptively places a sequence of glimpses at different locations in the image. Evidences of the presence of an object and its location are extracted from these glimpses, which are then fused for estimating the object class and bounding box coordinates. Due to the lack of ground truth annotations for the visual attention mechanism, we train our network using a reinforcement learning algorithm. Experiment results on standard object detection benchmarks show that the proposed network consistently outperforms the baseline networks that do not employ the attention mechanism.
    URI
    http://hdl.handle.net/1903/19037
    Collections
    • Electrical & Computer Engineering Theses and Dissertations
    • UMD Theses and Dissertations

    DRUM is brought to you by the University of Maryland Libraries
    University of Maryland, College Park, MD 20742-7011 (301)314-1328.
    Please send us your comments.
    Web Accessibility
     

     

    ブラウズ

    リポジトリ全体コミュニティ/コレクション公開日著者タイトル主題このコレクション公開日著者タイトル主題

    登録利用者

    ログイン登録
    Pages
    About DRUMAbout Download Statistics

    DRUM is brought to you by the University of Maryland Libraries
    University of Maryland, College Park, MD 20742-7011 (301)314-1328.
    Please send us your comments.
    Web Accessibility