OPTIMIZATION ALGORITHMS USING PRIORS IN COMPUTER VISION
Shah, Sohil Atul
MetadataShow full item record
Over the years, many computer vision models, some inspired by human behavior, have been developed for various applications. However, only handful of them are popular and widely used. Why? There are two major factors: 1) most of these models do not have any efficient numerical algorithm and hence they are computationally very expensive; 2) many models, being too generic, cannot capitalize on problem specific prior information and thus demand rigorous hyper-parameter tuning. In this dissertation, we design fast and efficient algorithms to leverage application specific priors to solve unsupervised and weakly-supervised problems. Specifically, we focus on developing algorithms to impose structured priors, model priors and label priors during the inference and/or learning of vision models. In many application, it is known a priori that a signal is smooth and continuous in space. The first part of this work is focussed on improving unsupervised learning mechanisms by explicitly imposing these structured priors in an optimization framework using different regularization schemes. This led to the development of fast algorithms for robust recovery of signals from compressed measurements, image denoising and data clustering. Moreover, by employing re-descending robust penalty on the structured regularization terms and applying duality, we reduce our clustering formulation to an optimization of a single continuous objective. This enabled integration of clustering processes in an end-to-end feature learning pipeline. In the second part of our work, we exploit inherent properties of established models to develop efficient solvers for SDP, GAN, and semantic segmentation. We consider models for several different problem classes. a) Certain non-convex models in computer vision (e.g., BQP) are popularly solved using convex SDPs after lifting to a high-dimensional space. However, this computationally expensive approach limits these methods to small matrices. A fast and approximate algorithm is developed that directly solves the original non-convex formulation using biconvex relaxations and known rank information. b) Widely popular adversarial networks are difficult to train as they suffer from instability issues. This is because optimizing adversarial networks corresponds to finding a saddle-point of a loss function. We propose a simple prediction method that enables faster training of various adversarial networks using larger learning rates without any instability problems. c) Semantic segmentation models must learn long-distance contextual information while retaining high spatial resolution at the output. Existing models achieves this at the cost of computationally expensive and memory exhaustive training/inference. We designed stacked u-nets model which can repeatedly process top-down and bottom-up features. Our smallest model exceeds Resnet-101 performance on PASCAL VOC 2012 by 4.5% IoU with ∼ 7× fewer parameters. Next, we address the problem of learning heterogeneous concepts from internet videos using mined label tags. Given a large number of videos each with multiple concepts and labels, the idea is to teach machines to automatically learn these concepts by leveraging weak labels. We formulate this into a co-clustering problem and developed a novel bayesian non-parametric weakly supervised Indian buffet process model which additionally enforces the paired label prior between concepts. In the final part of this work we consider an inverse approach: learning data priors from a given model. Specifically, we develop numerically efficient algorithm for estimating the log likelihood of data samples from GANs. The approximate log-likelihood function is used for outlier detection and data augmentation for training classifiers.