Top 10 GitHub Papers :: Semantic Segmentation

datahacker.rs Other 26.02.2020 | 0

top 10 github papers on Semantic Segmentation — Semantic Segmentation

In computer vision, Image segmentation is the process of subdividing a digital image into multiple segments commonly known as image objects. The main objective is to change the representation of the object found in a given image into something that is much simpler to analyze. This technique is commonly used when locating objects, and boundaries such as lines, curves, etc. in an image.

In this section, you can find state-of-the-art, greatest papers for segment segmentation along with the authors’ names, link to the paper, Github link & stars, number of citations, dataset used and date published. Enjoy.

1. Searching for MobileNetV3

Searching for MobileNetV3 - semantic segmentation — Comparison of original last stage and efficient last stage.

Abstract: We present the next generation of MobileNets based on a combination of complementary search techniques as well as a novel architecture design. MobileNetV3 is tuned to mobile phone CPUs through a combination of hardware-aware network architecture search (NAS) complemented by the NetAdapt algorithm and then subsequently improved through novel architecture advances. This paper starts the exploration of how automated search algorithms and network design can work together to harness complementary approaches improving the overall state of the art. Through this process we create two new MobileNet models for release: MobileNetV3-Large and MobileNetV3-Small which are targeted for high and low resource use cases. These models are then adapted and applied to the tasks of object detection and semantic segmentation. For the task of semantic segmentation (or any dense pixel prediction), we propose a new efficient segmentation decoder Lite Reduced Atrous Spatial Pyramid Pooling (LR-ASPP). We achieve new state of the art results for mobile classification, detection and segmentation. MobileNetV3-Large is 3.2\% more accurate on ImageNet classification while reducing latency by 15\% compared to MobileNetV2. MobileNetV3-Small is 4.6\% more accurate while reducing latency by 5\% compared to MobileNetV2. MobileNetV3-Large detection is 25\% faster at roughly the same accuracy as MobileNetV2 on COCO detection. MobileNetV3-Large LR-ASPP is 30\% faster than MobileNetV2 R-ASPP at similar accuracy for Cityscapes segmentation.

Authors: Andrew Howard • Mark Sandler • Grace Chu • Liang-Chieh Chen • Bo Chen • Mingxing Tan • Weijun Wang • Yukun Zhu • Ruoming Pang • Vijay Vasudevan • Quoc V. Le • Hartwig Adam
Paper: https://arxiv.org/pdf/1905.02244v5.pdf
Github: https://github.com/tensorflow/models/tree/master/research/object_detection
Dataset: COCO, Cityscapes
Github ⭐: 61,954 and the stars were counted on 01/03/2020
Citations: Cited by 77
Published: 6 May 2019

2. FEELVOS: Fast End-to-End Embedding Learning for Video Object Segmentation

Abstract: Many of the recent successful methods for video object segmentation (VOS) are overly complicated, heavily rely on fine-tuning on the first frame, and/or are slow, and are hence of limited practical use. In this work, we propose FEELVOS as a simple and fast method which does not rely on fine-tuning. In order to segment a video, for each frame FEELVOS uses a semantic pixel-wise embedding together with a global and a local matching mechanism to transfer information from the first frame and from the previous frame of the video to the current frame. In contrast to previous work, our embedding is only used as an internal guidance of a convolutional network. Our novel dynamic segmentation head allows us to train the network, including the embedding, end-to-end for the multiple object segmentation task with a cross entropy loss. We achieve a new state of the art in video object segmentation without fine-tuning with a J&F measure of 71.5% on the DAVIS 2017 validation set. We make our code and models available at https://github.com/tensorflow/models/tree/master/research/feelvos.

Authors: Paul Voigtlaender • Yuning Chai • Florian Schroff • Hartwig Adam • Bastian Leibe • Liang-Chieh Chen
Paper: https://arxiv.org/pdf/1902.09513v2.pdf
Github: https://github.com/tensorflow/models/tree/master/research/feelvos
Dataset: DAVIS 2017
Github ⭐: 61,989 and the stars were counted on 01/03/2020
Citations: Cited by 28
Published: 28 Feb 2019

3. Searching for Efficient Multi-Scale Architectures for Dense Image Prediction

Abstract: The design of neural network architectures is an important component for achieving state-of-the-art performance with machine learning systems across a broad array of tasks. Much work has endeavored to design and build architectures automatically through clever construction of a search space paired with simple learning algorithms. Recent progress has demonstrated that such meta-learning methods may exceed scalable human-invented architectures on image classification tasks. An open question is the degree to which such methods may generalize to new domains. In this work we explore the construction of meta-learning techniques for dense image prediction focused on the tasks of scene parsing, person-part segmentation, and semantic image segmentation. Constructing viable search spaces in this domain is challenging because of the multi-scale representation of visual information and the necessity to operate on high resolution imagery. Based on a survey of techniques in dense image prediction, we construct a recursive search space and demonstrate that even with efficient random search, we can identify architectures that outperform human-invented architectures and achieve state-of-the-art performance on three dense prediction tasks including 82.7\% on Cityscapes (street scene parsing), 71.3\% on PASCAL-Person-Part (person-part segmentation), and 87.9\% on PASCAL VOC 2012 (semantic image segmentation). Additionally, the resulting architecture is more computationally efficient, requiring half the parameters and half the computational cost as previous state of the art systems.

Authors: Liang-Chieh Chen • Maxwell D. Collins • Yukun Zhu • George Papandreou • Barret Zoph • Florian Schroff • Hartwig Adam • Jonathon Shlens
Paper: https://arxiv.org/pdf/1809.04184v1.pdf
Github: https://github.com/eriklindernoren/Keras-GAN
Dataset: Cityscapes
Github ⭐: 61,918 and the stars were counted on 01/03/2020
Citations: Cited by 100
Published: 11 September 2018

4. Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation

Abstract: Spatial pyramid pooling module or encode-decoder structure are used in deep neural networks for semantic segmentation task. The former networks are able to encode multi-scale contextual information by probing the incoming features with filters or pooling operations at multiple rates and multiple effective fields-of-view, while the latter networks can capture sharper object boundaries by gradually recovering the spatial information. In this work, we propose to combine the advantages from both methods. Specifically, our proposed model, DeepLabv3+, extends DeepLabv3 by adding a simple yet effective decoder module to refine the segmentation results especially along object boundaries. We further explore the Xception model and apply the depthwise separable convolution to both Atrous Spatial Pyramid Pooling and decoder modules, resulting in a faster and stronger encoder-decoder network. We demonstrate the effectiveness of the proposed model on PASCAL VOC 2012 and Cityscapes datasets, achieving the test set performance of 89.0\% and 82.1\% without any post-processing.

Authors: Liang-Chieh Chen • Yukun Zhu • George Papandreou • Florian Schroff • Hartwig Adam
Paper: https://arxiv.org/pdf/1802.02611v3.pdf
Github: https://github.com/tensorflow/models/tree/master/research/deeplab
Dataset: PASCAL VOC 2012 test
Github ⭐: 61,887 and the stars were counted on 01/03/2020
Citations: Cited by 1254
Published: 7 February 2018

5. MobileNetV2: Inverted Residuals and Linear Bottlenecks

MobileNetV2: Inverted Residuals and Linear Bottlenecks - semantic segmentation — Comparison of convolutional blocks for different architectures.

Abstract: In this paper we describe a new mobile architecture, MobileNetV2, that improves the state of the art performance of mobile models on multiple tasks and benchmarks as well as across a spectrum of different model sizes. We also describe efficient ways of applying these mobile models to object detection in a novel framework we call SSDLite. Additionally, we demonstrate how to build mobile semantic segmentation models through a reduced form of DeepLabv3 which we call Mobile DeepLabv3. The MobileNetV2 architecture is based on an inverted residual structure where the input and output of the residual block are thin bottleneck layers opposite to traditional residual models which use expanded representations in the input an MobileNetV2 uses lightweight depthwise convolutions to filter features in the intermediate expansion layer. Additionally, we find that it is important to remove non-linearities in the narrow layers in order to maintain representational power. We demonstrate that this improves performance and provide an intuition that led to this design. Finally, our approach allows decoupling of the input/output domains from the expressiveness of the transformation, which provides a convenient framework for further analysis. We measure our performance on Imagenet classification, COCO object detection, VOC image segmentation. We evaluate the trade-offs between accuracy, and number of operations measured by multiply-adds (MAdd), as well as the number of parameters.

Authors: Mark Sandler • Andrew Howard • Menglong Zhu • Andrey Zhmoginov • Liang-Chieh Chen
Paper: https://arxiv.org/pdf/1801.04381v4.pdf
Github: https://github.com/tensorflow/models/tree/master/research/object_detection
Dataset: COCO, ImageNet
Github ⭐: 61,955 and the stars were counted on 01/03/2020
Citations: Cited by 1302
Published:13 January 2018

6. Rethinking Atrous Convolution for Semantic Image Segmentation

Abstract: In this work, we revisit atrous convolution, a powerful tool to explicitly adjust filter’s field-of-view as well as control the resolution of feature responses computed by Deep Convolutional Neural Networks, in the application of semantic image segmentation. To handle the problem of segmenting objects at multiple scales, we design modules which employ atrous convolution in cascade or in parallel to capture multi-scale context by adopting multiple atrous rates. Furthermore, we propose to augment our previously proposed Atrous Spatial Pyramid Pooling module, which probes convolutional features at multiple scales, with image-level features encoding global context and further boost performance. We also elaborate on implementation details and share our experience on training our system. The proposed `DeepLabv3′ system significantly improves over our previous DeepLab versions without DenseCRF post-processing and attains comparable performance with other state-of-art models on the PASCAL VOC 2012 semantic image segmentation benchmark.

Authors: Liang-Chieh Chen • George Papandreou • Florian Schroff • Hartwig Adam
Paper: https://arxiv.org/pdf/1706.05587v3.pdf
Github: https://github.com/tensorflow/models/tree/master/research/deeplab
Dataset: PASCAL VOC 2012
Github ⭐: 61,965 and the stars were counted on 01/03/2020
Citations: Cited by 1172
Published: 17 June 2017

7. Mask R-CNN

Mask R-CNN - semantic segmentation — The Mask R-CNN framework for instance segmentation

Abstract: We present a conceptually simple, flexible, and general framework for object instance segmentation. Our approach efficiently detects objects in an image while simultaneously generating a high-quality segmentation mask for each instance. The method, called Mask R-CNN, extends Faster R-CNN by adding a branch for predicting an object mask in parallel with the existing branch for bounding box recognition. Mask R-CNN is simple to train and adds only a small overhead to Faster R-CNN, running at 5 fps. Moreover, Mask R-CNN is easy to generalize to other tasks, e.g., allowing us to estimate human poses in the same framework. We show top results in all three tracks of the COCO suite of challenges, including instance segmentation, bounding-box object detection, and person keypoint detection. Without bells and whistles, Mask R-CNN outperforms all existing, single-model entries on every task, including the COCO 2016 challenge winners. We hope our simple and effective approach will serve as a solid baseline and help ease future research in instance-level recognition. Code has been made available at: https://github.com/facebookresearch/Detectron

Authors: Kaiming He • Georgia Gkioxari • Piotr Dollár • Ross Girshick
Paper: https://arxiv.org/pdf/1703.06870v3.pdf
Github: https://github.com/facebookresearch/Detectron
Dataset: COCO, Cityscapes
Github ⭐: 22,874 and the stars were counted on 27/02/2020
Citations: Cited by 5207
Published: 20 March 2017

8. Pyramid Scene Parsing Network

Pyramid Scene Parsing Network - semantic segmentation — An overview of the Augmented-CE2P framework

Abstract: Scene parsing is challenging for unrestricted open vocabulary and diverse scenes. In this paper, we exploit the capability of global context information by different-region-based context aggregation through our pyramid pooling module together with the proposed pyramid scene parsing network (PSPNet). Our global prior representation is effective to produce good quality results on the scene parsing task, while PSPNet provides a superior framework for pixel-level prediction tasks. The proposed approach achieves state-of-the-art performance on various datasets. It came first in ImageNet scene parsing challenge 2016, PASCAL VOC 2012 benchmark and Cityscapes benchmark. A single PSPNet yields new record of mIoU accuracy 85.4% on PASCAL VOC 2012 and accuracy 80.2% on Cityscapes.

Authors: Hengshuang Zhao • Jianping Shi • Xiaojuan Qi • Xiaogang Wang • Jiaya Jia
Paper: https://arxiv.org/pdf/1612.01105v2.pdf
Github: https://github.com/tensorflow/models/tree/master/research/deeplab
Dataset: Cityscapes, PASCAL VOC 2012, ADE20K
Github ⭐: 61,965 and the stars were counted on 01/03/2020
Citations: Cited by 2211
Published: 4 December 2016

9. DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs

Abstract: In this work we address the task of semantic image segmentation with Deep Learning and make three main contributions that are experimentally shown to have substantial practical merit. First, we highlight convolution with upsampled filters, or ‘atrous convolution’, as a powerful tool in dense prediction tasks. Atrous convolution allows us to explicitly control the resolution at which feature responses are computed within Deep Convolutional Neural Networks. It also allows us to effectively enlarge the field of view of filters to incorporate larger context without increasing the number of parameters or the amount of computation. Second, we propose atrous spatial pyramid pooling (ASPP) to robustly segment objects at multiple scales. ASPP probes an incoming convolutional feature layer with filters at multiple sampling rates and effective fields-of-views, thus capturing objects as well as image context at multiple scales. Third, we improve the localization of object boundaries by combining methods from DCNNs and probabilistic graphical models. The commonly deployed combination of max-pooling and downsampling in DCNNs achieves invariance but has a toll on localization accuracy. We overcome this by combining the responses at the final DCNN layer with a fully connected Conditional Random Field (CRF), which is shown both qualitatively and quantitatively to improve localization performance. Our proposed “DeepLab” system sets the new state-of-art at the PASCAL VOC-2012 semantic image segmentation task, reaching 79.7% mIOU in the test set, and advances the results on three other datasets: PASCAL-Context, PASCAL-Person-Part, and Cityscapes. All of our code is made publicly available online.

Authors: Liang-Chieh Chen • George Papandreou • Iasonas Kokkinos • Kevin Murphy • Alan L. Yuille
Paper: https://arxiv.org/pdf/1606.00915v2.pdf
Github: https://github.com/tensorflow/models/tree/master/research/deeplab
Dataset: PASCAL VOC 2012, PASCAL-Context, PASCAL-Person-Part, Cityscapes
Github ⭐: 61,965 and the stars were counted on 01/03/2020
Citations: Cited by 4199
Published: 2 June 2016

10. Deep Residual Learning for Image Recognition

Deep Residual Learning for Image Recognition - semantic segmentation — Residual learning: a building block.

Abstract: Deeper neural networks are more difficult to train. We present a residual learning framework to ease the training of networks that are substantially deeper than those used previously. We explicitly reformulate the layers as learning residual functions with reference to the layer inputs, instead of learning unreferenced functions. We provide comprehensive empirical evidence showing that these residual networks are easier to optimize, and can gain accuracy from considerably increased depth. On the ImageNet dataset we evaluate residual nets with a depth of up to 152 layers—8x deeper than VGG nets but still having lower complexity. An ensemble of these residual nets achieves 3.57% error on the ImageNet test set. This result won the 1st place on the ILSVRC 2015 classification task. We also present analysis on CIFAR-10 with 100 and 1000 layers. The depth of representations is of central importance for many visual recognition tasks. Solely due to our extremely deep representations, we obtain a 28% relative improvement on the COCO object detection dataset. Deep residual nets are foundations of our submissions to ILSVRC & COCO 2015 competitions, where we also won the 1st places on the tasks of ImageNet detection, ImageNet localization, COCO detection, and COCO segmentation.

Authors: Kaiming He • Xiangyu Zhang • Shaoqing Ren • Jian Sun
Paper: https://arxiv.org/pdf/1512.03385v1.pdf
Github: https://github.com/tensorflow/models/tree/master/research/slim
Dataset: ImageNet, COCO
Github ⭐: 61,965 and the stars were counted on 01/03/2020
Citations: Cited by 40,397
Published: 10 December 2015

Top 10 GitHub Papers :: Semantic Segmentation