A summary of The Random Subspace Method for Constructing Decision Forests by Tin Kam Ho
Nicholas M. Synovic
- 4 minutes read - 819 wordsA summary of The Random Subspace Method for Constructing Decision Forests
Tin Kam Ho, IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 1998 DOI
For the summary of the paper, go to the Summary section of this article.
Table of Contents
First Pass
Read the title, abstract, introduction, section and sub-section headings, and conclusion
Problem
What is the problem addressed in the paper?
This paper addresses the problem of decision tree forest construction.
Motivation
Why should we care about this paper?
We should care about this paper as it compares eight forest construction algorithms against the author’s algorithm on publicly available datasets. This allows the reader to understand the pros and cons of using a particular algorithm over another as well as validating the author’s claims. Furthermore, this algorithm can monotonically increase in generalization accuracy while preserving perfect accuracy.
Category
What type of paper is this work?
This is an algorithms paper.
Context
What other types of papers is the work related to?
This paper is related to papers that present ways of constructing random forests.
Contributions
What are the author’s main contributions?
Her main contributions were:
- An efficient algorithm for generating decision trees
- A comparison of 8 forest construction algorithms on publicly available datasets
Second Pass
A proper read through of the paper is required to answer this
Background Work
What has been done prior to this paper?
Work has been done before to describe what decision trees are, as well as how to generate many of them for the purposes of classification.
Figures, Diagrams, Illustrations, and Graphs
Are the axes properly labeled? Are results shown with error bars, so that conclusions are statistically significant?
All of the tables are clear and easy to read. However, all of the line charts are difficult to read as each line is the same color in my copy of the paper. Additionally, figure 1 is difficult to tell what is supposed to represented.
Clarity
Is the paper well written?
I found this work hard to follow. I think that this is due to me not understanding the problem domain, rather than her explanations.
Relevant Work
Mark relevant work for review
The following relevant work can be found in the Citations section of this article.
- Y. Amit, D. Geman, and K. Wilder, “Joint Induction of Shape Features and Tree Classifiers,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 19, no. 11, pp. 1,300-1,305, Nov. 1997
- L. Breiman, J.H. Friedman, R.A. Olshen, and C.J. Stone, Classification and Regression Trees. Belmont, Calif.: Wadsworth, 1984
- D. Heath, S. Kasif, and S. Salzberg, “Induction of Oblique Decision Trees,” Proc. 13th Int’l Joint Conf. Artificial Intelligence, vol. 2, pp. 1,002-1,007, Chambery, France, 28 Aug.-3 Sept. 1993.
- T.K. Ho, “Random Decision Forests,” Proc. Third Int’l Conf. Document Analysis and Recognition, pp. 278-282, Montreal, Canada, 14-18 Aug. 1995.
- T.K. Ho, “C4.5 Decision Forests,” Proc. 14th Int’l Conf. Pattern Recognition, Brisbane, Australia, 17-20 Aug. 1998.
Methodology
What methodology did the author’s use to validate their contributions?
The author compared the performance of different forest generation methods against her own generation method. The different forest generation methods were:
- Single feature split with best gain ratio
- Distribution mapping
- Class centroids
- Unsupervised clustering
- Supervised clustering
- Central axis projection
- Perceptron
- Support Vector Machine
Author Assumptions
What assumptions does the author(s) make? Are they justified assumptions?
The author assumes that the reader has worked with decision trees prior to reading.
Correctness
Do the assumptions seem valid?
Yes.
Future Directions
My own proposed future directions for the work
I’d like to learn more about decision trees and compare them against Deep Learning models.
Open Questions
What open questions do I have about the work?
When would I ever use a decision tree over a SVM or DL model?
Author Feedback
What feedback would I give to the authors?
I’d appreciate the usage of color to separate different lines on the figures. Additionally (and this could be due to the limited available citation), please reduce the number of self-citations in future works.
Summary
A summary of the paper
The paper The Random Subspace Method for Constructing Decision Forests by Tin Kam Ho [1] discusses a method of generating many decision trees efficiently without affecting accuracy. She validates this method by comparing it against eight other forest construction methods, all on publicly available datasets. The benefits of her work is that it is parallelized; meaning that with some tuning to the algorithm, it can run on multiple CPU cores or threads (potentially even faster on GPU cores).
Summarization Technique
This paper was summarized using a modified technique proposed by S. Keshav in his work How to Read a Paper [0].