A summary of Deep Learning by Yann LeCun et al,
Nicholas M. Synovic
- 6 minutes read - 1120 wordsA summary of Deep Learning
Yann LeCunn et al, Nature, 2015 DOI
For the summary of the paper, go to the Summary section of this article.
Table of Contents
First Pass
Read the title, abstract, introduction, section and sub-section headings, and conclusion
Problem
What is the problem addressed in the paper?
This paper discusses the usage of deep learning (DL) models and how they have led to improvements in speech recognition, visual object recognition, object detection, drug discovery, and genomics. It talks about how these models are created, what type of models are typically applied to what domains, and the usage of the backpropagation algorithm to train the model. Additionally, a discussion about the usage of Recurrent Neural Networks (RNNs) and their benefits is had.
Motivation
Why should we care about this paper?
We should care about this paper as it is a review of different DL techniques for different problem domains.
Category
What type of paper is this work?
This paper is a literary review paper.
Context
What other types of papers is the work related to?
It is closest related to papers that summarize a body of literature for the purposes of understanding what the current SOTA techniques for a problem are.
Contributions
What are the author’s main contributions?
Their main contribution is a discussion of DL, its usages, RNNs, and a general summary of the SOTA DL techniques for different problem domains.
Second Pass
A proper read through of the paper is required to answer this
Background Work
What has been done prior to this paper?
Work has been done to develop DL and RNN techniques.
Figures, Diagrams, Illustrations, and Graphs
Are the axes properly labeled? Are results shown with error bars, so that conclusions are statistically significant?
All of the figures are clear and easy to understand.
Clarity
Is the paper well written?
This paper is well written.
Relevant Work
Mark relevant work for review
The following relevant work can be found in the Citations section of this article.
- Krizhevsky, A., Sutskever, I. & Hinton, G. ImageNet classification with deep convolutional neural networks. In Proc. Advances in Neural Information Processing Systems 25 1090–1098 (2012).
- Hinton, G. et al. Deep neural networks for acoustic modeling in speech recognition. IEEE Signal Processing Magazine 29, 82–97 (2012).
- Sutskever, I. Vinyals, O. & Le. Q. V. Sequence to sequence learning with neural networks. In Proc. Advances in Neural Information Processing Systems 27 3104–3112 (2014)
- Glorot, X., Bordes, A. & Bengio. Y. Deep sparse rectifier neural networks. In Proc. 14th International Conference on Artificial Intelligence and Statistics 315–323 (2011).
- Hinton, G. E., Osindero, S. & Teh, Y.-W. A fast learning algorithm for deep belief nets. Neural Comp. 18, 1527–1554 (2006).
- Bengio, Y., Lamblin, P., Popovici, D. & Larochelle, H. Greedy layer-wise training of deep networks. In Proc. Advances in Neural Information Processing Systems 19 153–160 (2006).
- LeCun, Y. et al. Handwritten digit recognition with a back-propagation network. In Proc. Advances in Neural Information Processing Systems 396–404 (1990).
- LeCun, Y., Bottou, L., Bengio, Y. & Haffner, P. Gradient-based learning applied to document recognition. Proc. IEEE 86, 2278–2324 (1998).
- Hochreiter, S. & Schmidhuber, J. Long short-term memory. Neural Comput. 9, 1735–1780 (1997).
Methodology
What methodology did the author’s use to validate their contributions?
The author’s of this paper reviewed literary sources for examples and usage of DL and RNN techniques applied to different problem domains.
Author Assumptions
What assumptions does the author(s) make? Are they justified assumptions?
The author’s assume that unsupervised learning will become far more important in the future than supervised learning.
Correctness
Do the assumptions seem valid?
Potentially. Unsupervised learning presents problems and challenges not explored in this paper, and is therefore treated as the next logical evolution of techniques, rather than a series of unknowns and problems that need to be solved first.
Future Directions
My own proposed future directions for the work
I’d like to implement the different DL techniques to the suggested problem domains presented in this paper.
Open Questions
What open questions do I have about the work?
Why wasn’t a discussion about Generative Adversarial Networks (GANs) not had in this work? What are the performance differences of the presented loss functions?
Author Feedback
What feedback would I give to the authors?
Overall, a pretty good paper. A follow up paper on unsupervised learning would be nice to read.
Summary
A summary of the paper
The review paper Deep Learning by Yann LeCun et al [1]. discusses the advances and advantages of deep learning (DL) techniques made up to 2015. The authors discuss what is DL, how and where it is applied, commercial and academic usages of DL, the advantages of merging two different architectures together to solve challenging tasks, and the usage of Recurrent Neural Networks (RNNs) for handling natural language processing and speech recognition tasks. As their paper is purely a listing of work that others have done prior to them, their contributions were mostly the synthesis of such information into a digestible document. With that said, each section of their work can be summarized, which is what I have done here.
DL allows for machine learning to surpass its previous limitations of having to
manually represent data in a suitable internal representation (through feature
extraction) by learning the representation itself. Current DL models are
typically trained using labeled datasets in what is known as supervised
learning. A sub-set of the data is used for training, which when ran through the
model, adjusts the hidden weights. These weights are adjusted using a technique
called stochastic gradient descent (SGD). SGD is accomplished by working
backwards through the model and taking the derivative of each weight which is
then used to adjust the hidden weights. Algorithms to do this include tanh(x)
and ReLU
. ReLU
is the most popular algorithm for this task which is more
commonly known as backpropagation.
Convolutional neural networks (ConvNets) are useful for analyzing data
structured as a series of multi-dimensional arrays. A typical application of
ConvNets are for analyzing images. RNNs are useful for analyzing data that is
dependent upon prior understanding. Chat bots, speech recognition, and answering
questions about data (i.e.,, where is a character in a book?) are all problems
that are reliant upon the model having some sort of “memory”. Memory solutions
include long short-term memory
which has been useful for accomplishing these
tasks.
Summarization Technique
This paper was summarized using a modified technique proposed by S. Keshav in his work How to Read a Paper [0].