Autonomous Learning of Semantic Segmentation from Internet Images

Abstract

Collecting a large amount of manually labeled training data is labor-intensive, thus often becomes the major bottleneck when applying semantic segmentation techniques to real-world applications, especially for new categories where no labeled data is available. In this paper, we aim at solving the problem of “webly-supervised” semantic segmentation relying purely on web searched images, where users only need to provide a single keyword for each target category. A major challenge in this task is the existence of label noise in web images. To deal with the label noise, we design a noise erasing network that is able to learn cross-image knowledge from credible attention regions in images in a mini-batch and then erases those regions unrelated to the search keywords from the web images. With this network, our system can automatically generate high-quality ‘proxy ground truth’, for training semantic segmentation models. Extensive experiments on the popular benchmark, i.e., PASCAL VOC 2012, show surprisingly good results in both our task (mIoU = 62.0%) and the weakly-supervised setting (mIoU = 66.1%).

Publication
Hou, Qibin, Linghao Han, Jiangjiang Liu, and Mingming Cheng. Autonomous Learning of Semantic Segmentation from Internet Images. SCIENTIA SINICA Informationis (2021).

Related