A summary of Fast R-CNN by Ross Girshick

Nicholas M. Synovic

11-10-2022 - 4 minutes read - 806 words

A summary of Fast R-CNN

Ross Girshick, IEEE International Conference on Computer Vision, 2015; DOI

For the summary of the paper, go to the Summary section of this article.

A summary of Fast R-CNN

First Pass

Read the title, abstract, introduction, section and sub-section headings, and conclusion

Problem

What is the problem addressed in the paper?

The problem addressed in this paper is that there exists a method that is better at performing object detection and semantic segmentation within region proposals that is not implemented in either the original R-CNN model or the SPPNet model.

Motivation

Why should we care about this paper?

Because it provides an updated model architecture for performing object detection and semantic segmentation within region proposals, thereby speeding up inference time and reducing computational cost.

Context

What other types of papers is the work related to?

Papers that discuss CNN models with respect to region proposals.

Contributions

What are the author’s main contributions?

An updated R-CNN model that is substantially faster than the original R-CNN model and the competing SPPNet model. This Fast R-CNN model achieves SOTA mean average precision (mAP) on the PASCAL VOC 2007, 2010, and 2012 datasets. Fast training and testing compared to R-CNN and SPPNet. And that fine tuning ConvNet layers in VGG 16 improves mAP.

Second Pass

A proper read through of the paper is required to answer this

Background Work

What has been done prior to this paper?

Region proposal convolutional neural networks have been created prior to this work. Furthermore, this work utilizes techniques that other successful CV DL models have utilized to achieve SOTA results.

Figures, Diagrams, Illustrations, and Graphs

Are the axes properly labeled? Are results shown with error bars, so that conclusions are statistically significant?

The majority of the figures are clear. Figure 2 is a bit difficult to read due to how squished the text is to each other. Additionally, the model architecture in Figure 1 uses an identical image as presented in the seminal R-CNN paper. It would have been nicer to see a different test image utilized for this paper.

Clarity

Is the paper well written?

This paper is well written, if a bit technical. However, the technicality is important as it distinguishes the improvements made to the original R-CNN and SPPNet models.

Relevant Work

Mark relevant work for review

The following relevant work can be found in the Citations section of this article.

Methodology

What methodology did the author’s use to validate their contributions?

They performed a similar study to their previous paper (R-CNN) where they compared the mAP of competing models against their model. Additionally, they performed an analysis of their model where they tested different improvements and DL techniques used in other models to improve performance.

Author Assumptions

What assumptions does the author(s) make? Are they justified assumptions?

They utilized the VGG 16 model as their CNN model. However, other existing models could’ve been used/ re-implemented with their fast region proposal model to potentially improve performance.

Correctness

Do the assumptions seem valid?

This assumption makes sense to a degree as VGG 16 is a popular model for research purposes. However, evaluating other CNN models would have been more interesting in my opinion.

Future Directions

My own proposed future directions for the work

I’d like to implement there work on non-VGG 16 models, such as ResNet or on a MobileNet.

Open Questions

What open questions do I have about the work?

Why weren’t other models implemented with the fast region proposal component?

Author Feedback

What feedback would I give to the authors?

Overall good paper. I don’t recommend on creating a paper of a third variation of this model unless there are substantial improvements made. These improvements can be in further reducing computational or energy cost, an even simpler architecture, or an substantial overall increase of mAP on the PASCAL VOC datasets.

Summary

A summary of the paper

The paper Fast R-CNN by Ross Girshick [1] proposes a new method to perform region proposal CNN tasks that is significantly faster than the previously proposed method. To do so, both the region proposals and the image itself are passed into the CNN layer for analysis. Additionally, many layers of the previous architectures are collapsed into one to reduce the complexity. Furthermore, the SVM classifier was replaced with a Softmax classifier which is both faster and more accurate than the previous SVM classifier.

Summarization Technique

This paper was summarized using a modified technique proposed by S. Keshav in his work How to Read a Paper [0].