On the optimal size of candidate feature set in random forest

Sunwoo Han, Hyunjoong Kim

Research output: Contribution to journalArticle

Abstract

Random forest is an ensemble method that combines many decision trees. Each level of trees is determined by an optimal rule among a candidate feature set. The candidate feature set is a random subset of all features, and is different at each level of trees. In this article, we investigated whether the accuracy of Random forest is affected by the size of the candidate feature set. We found that the optimal size differs from data to data without any specific pattern. To estimate the optimal size of feature set, we proposed a novel algorithm which uses the out-of-bag error and the 'SearchSize' exploration. The proposed method is significantly faster than the standard grid search method while giving almost the same accuracy. Finally, we demonstrated that the accuracy of Random forest using the proposed algorithm has increased significantly compared to using a typical size of feature set.

Original languageEnglish
Article number898
JournalApplied Sciences (Switzerland)
Volume9
Issue number5
DOIs
Publication statusPublished - 2019 Jan 1

Fingerprint

Decision trees
bags
set theory
grids
estimates

All Science Journal Classification (ASJC) codes

  • Materials Science(all)
  • Instrumentation
  • Engineering(all)
  • Process Chemistry and Technology
  • Computer Science Applications
  • Fluid Flow and Transfer Processes

Cite this

@article{761c11f60c744f13a5d59c40f11a80ac,
title = "On the optimal size of candidate feature set in random forest",
abstract = "Random forest is an ensemble method that combines many decision trees. Each level of trees is determined by an optimal rule among a candidate feature set. The candidate feature set is a random subset of all features, and is different at each level of trees. In this article, we investigated whether the accuracy of Random forest is affected by the size of the candidate feature set. We found that the optimal size differs from data to data without any specific pattern. To estimate the optimal size of feature set, we proposed a novel algorithm which uses the out-of-bag error and the 'SearchSize' exploration. The proposed method is significantly faster than the standard grid search method while giving almost the same accuracy. Finally, we demonstrated that the accuracy of Random forest using the proposed algorithm has increased significantly compared to using a typical size of feature set.",
author = "Sunwoo Han and Hyunjoong Kim",
year = "2019",
month = "1",
day = "1",
doi = "10.3390/app9050898",
language = "English",
volume = "9",
journal = "Applied Sciences (Switzerland)",
issn = "2076-3417",
publisher = "Multidisciplinary Digital Publishing Institute (MDPI)",
number = "5",

}

On the optimal size of candidate feature set in random forest. / Han, Sunwoo; Kim, Hyunjoong.

In: Applied Sciences (Switzerland), Vol. 9, No. 5, 898, 01.01.2019.

Research output: Contribution to journalArticle

TY - JOUR

T1 - On the optimal size of candidate feature set in random forest

AU - Han, Sunwoo

AU - Kim, Hyunjoong

PY - 2019/1/1

Y1 - 2019/1/1

N2 - Random forest is an ensemble method that combines many decision trees. Each level of trees is determined by an optimal rule among a candidate feature set. The candidate feature set is a random subset of all features, and is different at each level of trees. In this article, we investigated whether the accuracy of Random forest is affected by the size of the candidate feature set. We found that the optimal size differs from data to data without any specific pattern. To estimate the optimal size of feature set, we proposed a novel algorithm which uses the out-of-bag error and the 'SearchSize' exploration. The proposed method is significantly faster than the standard grid search method while giving almost the same accuracy. Finally, we demonstrated that the accuracy of Random forest using the proposed algorithm has increased significantly compared to using a typical size of feature set.

AB - Random forest is an ensemble method that combines many decision trees. Each level of trees is determined by an optimal rule among a candidate feature set. The candidate feature set is a random subset of all features, and is different at each level of trees. In this article, we investigated whether the accuracy of Random forest is affected by the size of the candidate feature set. We found that the optimal size differs from data to data without any specific pattern. To estimate the optimal size of feature set, we proposed a novel algorithm which uses the out-of-bag error and the 'SearchSize' exploration. The proposed method is significantly faster than the standard grid search method while giving almost the same accuracy. Finally, we demonstrated that the accuracy of Random forest using the proposed algorithm has increased significantly compared to using a typical size of feature set.

UR - http://www.scopus.com/inward/record.url?scp=85063750084&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85063750084&partnerID=8YFLogxK

U2 - 10.3390/app9050898

DO - 10.3390/app9050898

M3 - Article

AN - SCOPUS:85063750084

VL - 9

JO - Applied Sciences (Switzerland)

JF - Applied Sciences (Switzerland)

SN - 2076-3417

IS - 5

M1 - 898

ER -