A summary of Fully Convolutional Networks for Semantic Segmentation by Jonathan Long et al.
Nicholas M. Synovic
- 4 minutes read - 803 wordsA summary of Fully Convolutional Networks for Semantic Segmentation
Jonathan Long et al. arXiv, 2015 DOI
For the summary of the paper, go to the Summary section of this article.
Table of Contents
First Pass
Read the title, abstract, introduction, section and sub-section headings, and conclusion
Problem
What is the problem addressed in the paper?
That previous SOTA models for semantic segmentation did not utilize fully convolutional neural networks.
Motivation
Why should we care about this paper?
Because it presents a methodology for using existing SOTA convolutional neural networks for semantic segmentation.
Category
What type of paper is this work?
This is a deep learning computer vision semantic segmentation paper.
Context
What other types of papers is the work related to?
This paper is related to works that involve the development of semantic segmentation techniques.
Contributions
What are the author’s main contributions?
Their main contribution is an analysis of the usage of fully convolutional neural networks for the purposes of semantic segmentation.
Second Pass
A proper read through of the paper is required to answer this
Background Work
What has been done prior to this paper?
Work has been done to develop semantic segmentation models as well as developing convolutional neural networks for the purposes of image classification.
Figures, Diagrams, Illustrations, and Graphs
Are the axes properly labeled? Are results shown with error bars, so that conclusions are statistically significant?
The charts and figures are clear and easy to read.
Clarity
Is the paper well written?
The paper is well written.
Relevant Work
Mark relevant work for review
The following relevant work can be found in the Citations section of this article.
- F. Ning, D. Delhomme, Y. LeCun, F. Piano, L. Bottou, and P. E. Barbano. Toward automatic phenotyping of developing embryos from videos. Image Processing, IEEE Transactions on, 14(9):1360–1371, 2005.
- C. Farabet, C. Couprie, L. Najman, and Y. LeCun. Learning hierarchical features for scene labeling. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 2013.
- P. H. Pinheiro and R. Collobert. Recurrent convolutional neural networks for scene labeling. In ICML, 2014.
Methodology
What methodology did the author’s use to validate their contributions?
The authors compared a variety of semantic segmentation model using VGG, GoogLeNet, and ResNet on several different datasets. These datasets include NYUDv2, PASCAL VOC, and SIFT Flow. The metrics captured were pixel accuracy, mean accuracy, mean IU, and frequency weighted IU.
Author Assumptions
What assumptions does the author(s) make? Are they justified assumptions?
The authors are reliant upon image classification models that have fully connected layers for the purposes of classification. This is because these fully connected layers are converted into fully convolutional layers.
Correctness
Do the assumptions seem valid?
This reliance seems valid as most SOTA models for image classification (that I’m aware of) utilize fully connected layers for the purposes of classification. However, the method that the authors propose may not be transferable to present or future convolutional networks that no longer rely on fully connected layers for the purposes of classification.
Future Directions
My own proposed future directions for the work
I’d like to take the ideas and methodology proposed in this paper and apply them to one shot object detection models to see if it is possible to create something like a YOLO-Segmentation model.
Open Questions
What open questions do I have about the work?
Is it possible to incorporate a fully connected layer for the purposes of classification and additionally convolutional layers for the purposes of semantic segmentation within the same layer by using a complicated branching architecture (in the vein of GoogLeNet)?
Author Feedback
What feedback would I give to the authors?
This is a good paper. I am concerned that the methodology presented may not be transferable to models of the future that may not rely upon fully convolutional layers to accomplish image classification.
Summary
A summary of the paper
The paper Fully Convolutional Networks for Semantic Segmentation by Jonathan Long et al. [1] describes a methodology for converting an existing image classification network into a semantic segmentation network. This is done by replacing the fully connected layers at the head of the classification network with one or more convolutional layers. This thereby makes semantic segmentation networks full convolutional in terms of architecture design.
However, not all models rely on this pattern and different architectures need to be tested for each model conversion. For example, GoogLeNet has a different architecture for semantic segmentation than VGG.
Summarization Technique
This paper was summarized using a modified technique proposed by S. Keshav in his work How to Read a Paper [0].