A summary of Very Deep Convolutional Networks for Large-Scale Image Recognition by Karen Simonyan and Andrew Zisserman

Nicholas M. Synovic

09-28-2022 - 5 minutes read - 866 words

A summary of Very Deep Convolutional Networks for Large-Scale Image Recognition by Karen Simonyan and Andrew Zisserman

Karen Simonyan and Andrew Zisserman; https://doi.org/10.48550/arXiv.1409.1556

For the summary of the paper, go to the Summary section of this article.

A summary of Very Deep Convolutional Networks for Large-Scale Image Recognition by Karen Simonyan and Andrew Zisserman

First Pass

Discussion about the title, abstract, introduction, section and sub-section headings, and conclusion

The paper Very Deep Convolutional Networks for Large Scale Image Recognition by Karen Simonyan and Andrew Zissernman discusses the SOTA performance of their model in the 2014 ImageNet Challenge on localization and classification tasks. They discuss that be extending the depth of convolutional neural networks to 16 up to 19 layers, with a 3x3 filter size, SOTA performance is possible without redeveloping the architecture of existing convolutional neural networks. This is in contrast to Szegedy’s work who proposes the Inception architecture for classification and object detection; with which the reference implementation also came first in the 2014 ImageNet Challenge in its respective tasks. Simoyan et al. discuss the architecture and training that went into their model (VGG) and how to architect future models to perform as well or better.

Context

What other types of papers is the work related to?

This paper is most closely related to others who publish work regarding SOTA performance on CV architecture and models.

Contributions

What are the author’s main contributions?

Their main contributions is an exploration of depth in traditional convolutional neural networks to achieve SOTA performance.

Second Pass

Background Work

What has been done prior to this paper?

Prior work has gone into optimizing the width and initial convolutions of convolutional neural networks.

Szegedy et al. proposed a new architecture (Inception) that achieved SOTA performance in the 2014 ImageNet Challenge. Else, Krizhevsky et al. [2] and others have proposed improvements to the convolutional neural network architecture.

Motivation

Why should we care about this paper?

We should care about the authors work as increasing the depth of a neural network by their proposed architecture allows for easy expansion of existing convolutional neural networks without redesigning the libraries used to create them.

Figures, Diagrams, Illustrations, and Graphs

Are the axes properly labeled? Are results shown with error bars, so that conclusions are statistically significant?

The tables that are presented are easy to read, but can be improved upon. Often, multiple rows will correspond with a single model configuration. This is fine, however, it is difficult to make out what configuration each row corresponds to. Additionally, the tables make comparing error percentages easy across model configurations.

Clarity

Is the paper well written?

The paper is well written and can be understood.

Relevant Work

Mark relevant work for review

The following relevant work can be found in the Citations section of this article.

Classical convolutional neural network architecture - [3]
GoogLeNet - [2]
Clarifai
ImageNet classification with deep convolutional neural net- works [4]
Isotropically-rescaled training image
ImageNet 2013 submissions - [5], [6] Localization and Detection using Convolutional Networks

Author Assumptions

What assumptions does the author(s) make? Are they justified assumptions?

The authors assume that the performance improvements that convolutional neural networks are achieving are based off of larger data sets and better compute optimization.

Correctness

Do the assumptions seem valid?

I agree with their assumption. However, [2] created a SOTA model utilizing a new architecture, rather than improving upon an existing one.

Future Directions

My own proposed future directions for the work

I would love to try and optimize the input layer of convolutional neural networks by having a computation that not only looks at the color space, but also the opacity of an image. This would allow for images to have their background removed for the purposes of classification by making the background less opaque than the foreground.

Summary

A summary of the paper

The paper Very Deep Convolutional Networks for Large Scale Image Recognition by Karen Simonyan and Andrew Zissernman discusses the SOTA performance of their model in the 2014 ImageNet Challenge on localization and classification tasks. They discuss that be extending the depth of convolutional neural networks to 16 up to 19 layers, with a 3x3 filter size, SOTA performance is possible without redeveloping the architecture of existing convolutional neural networks. Their work builds of previous efforts of improving convolutional neural network performance by optimizing the filter size and initial layer, but contrasts contemporaries [2] by not developing a new architecture. Their work has importance as it shows that the existing convolutional neural network architecture is capable of SOTA performance by increasing the depth of the model. They justify this by trying six different model configurations, and finding that models with 16 to 19 layers performed best on the 2014 ImageNet Challenge classification and localization challenges.

Summarization Technique

This paper was summarized using a modified technique proposed by S. Keshav in his work How to Read a Paper [0].

A summary of Very Deep Convolutional Networks for Large-Scale Image Recognition by Karen Simonyan and Andrew Zisserman

A summary of Very Deep Convolutional Networks for Large-Scale Image Recognition by Karen Simonyan and Andrew Zisserman

Table of Contents

First Pass

Category

Context

Contributions

Second Pass

Background Work

Motivation

Figures, Diagrams, Illustrations, and Graphs

Clarity

Relevant Work

Author Assumptions

Correctness

Future Directions

Summary

Summarization Technique

Citations