Classification trees with unbiased multiway splits

Hyunjoong Kim, Wei Yin Loh

Research output: Contribution to journalArticle

188 Citations (Scopus)

Abstract

Two univariate split methods and one linear combination split method are proposed for the construction of classification trees with multiway splits. Examples are given where the trees are more compact and hence easier to interpret than binary trees. A major strength of the univariate split methods is that they have negligible bias in variable selection, both when the variables differ in the number of splits they offer and when they differ in the number of missing values. This is an advantage because inferences from the tree structures can be adversely affected by selection bias. The new methods are shown to be highly competitive in terms of computational speed and classification accuracy of future observations.

Original languageEnglish
Pages (from-to)589-604
Number of pages16
JournalJournal of the American Statistical Association
Volume96
Issue number454
DOIs
Publication statusPublished - 2001 Jun 1

Fingerprint

Classification Tree
Univariate
Selection Bias
Missing Values
Binary Tree
Variable Selection
Tree Structure
Linear Combination

All Science Journal Classification (ASJC) codes

  • Statistics and Probability
  • Statistics, Probability and Uncertainty

Cite this

@article{39780dff671b432fbcf32115717d86e9,
title = "Classification trees with unbiased multiway splits",
abstract = "Two univariate split methods and one linear combination split method are proposed for the construction of classification trees with multiway splits. Examples are given where the trees are more compact and hence easier to interpret than binary trees. A major strength of the univariate split methods is that they have negligible bias in variable selection, both when the variables differ in the number of splits they offer and when they differ in the number of missing values. This is an advantage because inferences from the tree structures can be adversely affected by selection bias. The new methods are shown to be highly competitive in terms of computational speed and classification accuracy of future observations.",
author = "Hyunjoong Kim and Loh, {Wei Yin}",
year = "2001",
month = "6",
day = "1",
doi = "10.1198/016214501753168271",
language = "English",
volume = "96",
pages = "589--604",
journal = "Journal of the American Statistical Association",
issn = "0162-1459",
publisher = "Taylor and Francis Ltd.",
number = "454",

}

Classification trees with unbiased multiway splits. / Kim, Hyunjoong; Loh, Wei Yin.

In: Journal of the American Statistical Association, Vol. 96, No. 454, 01.06.2001, p. 589-604.

Research output: Contribution to journalArticle

TY - JOUR

T1 - Classification trees with unbiased multiway splits

AU - Kim, Hyunjoong

AU - Loh, Wei Yin

PY - 2001/6/1

Y1 - 2001/6/1

N2 - Two univariate split methods and one linear combination split method are proposed for the construction of classification trees with multiway splits. Examples are given where the trees are more compact and hence easier to interpret than binary trees. A major strength of the univariate split methods is that they have negligible bias in variable selection, both when the variables differ in the number of splits they offer and when they differ in the number of missing values. This is an advantage because inferences from the tree structures can be adversely affected by selection bias. The new methods are shown to be highly competitive in terms of computational speed and classification accuracy of future observations.

AB - Two univariate split methods and one linear combination split method are proposed for the construction of classification trees with multiway splits. Examples are given where the trees are more compact and hence easier to interpret than binary trees. A major strength of the univariate split methods is that they have negligible bias in variable selection, both when the variables differ in the number of splits they offer and when they differ in the number of missing values. This is an advantage because inferences from the tree structures can be adversely affected by selection bias. The new methods are shown to be highly competitive in terms of computational speed and classification accuracy of future observations.

UR - http://www.scopus.com/inward/record.url?scp=1542573450&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=1542573450&partnerID=8YFLogxK

U2 - 10.1198/016214501753168271

DO - 10.1198/016214501753168271

M3 - Article

VL - 96

SP - 589

EP - 604

JO - Journal of the American Statistical Association

JF - Journal of the American Statistical Association

SN - 0162-1459

IS - 454

ER -