Object Detection and Instance Segmentation for Real-world Applications
Files
Publication or External Link
Date
Authors
Advisor
Citation
DRUM DOI
Abstract
The modern visual recognition system has achieved great success in the past decade. Aided by the great progress, instance localization and recognition has been significantly improved, which benefit many applications e.g. face recognition, autonomous driving, smart city etc.\
The three key factors play very important roles in the success of visual recognition, big computation, big data, and big models. Recent advances in hardware have increased the computation exponentially, which makes it feasible for training deep and large learning models on large-scale datasets. On the other hand, large-scale visual datasets e.g. ImageNet~\cite{deng2009imagenet}, COCO dataset~\cite{lin2014microsoft}, Youtube-VIS~\cite{yang2019video}, provide accurate and rich information for deep learning models. Moreover, aided by advanced design of deep neural networks~\cite{he2016deep,xie2017aggregated,liu2021swin,liu2022convnet}, the capacity of the deep models has been greatly increased.
On the other hand, instance localization and recognition as the core of modern visual system has many downstream applications, e.g. autonomous driving, augmented reality, virtual reality, and smart city. Thanks to the successful advances of deep learning in the last decade, those applications have achieved such great progresses recently.
In this thesis, we introduce a series of published work that improves the performance of instance localization and addresses the issues in modeling instance localization and recognition by using deep learning models. Moreover, we will introduce the future direction and some potential research projects.