Impacts of Fractional Hot-Deck Imputation on Learning and Prediction of Engineering Data

Ikkyun Song, Yicheng Yang, Jongho Im, Tong Tong, Halil Ceylan, In Ho Cho

Research output: Contribution to journalArticlepeer-review

10 Citations (Scopus)

Abstract

In broad engineering fields, missing data is a common issue which often causes undesired bias and sparseness impeding rigorous data analyses. To tackle this problem, many imputation theories have been proposed and widely used. However, prior methods often require distributional assumptions and prior knowledge regarding data which may cause some difficulty for engineering research. Essentially, the fractional hot-deck imputation (FHDI) is an assumption-free imputation method, holding broad applicability in the engineering domains. FHDIs internal parameters and impact on statistical and machine learning methods, however, have been rarely understood. Thus, this study investigates the behavior and impacts of FHDI on prediction methods including generalized additive model, support vector machine, extremely randomized trees, and artificial neural network, for which four practical datasets (appliance energy, air quality, phenotypes, and weather) are used. Results show that FHDI performs better for improving the prediction accuracy compared to a simple naive method which cures missing data using the mean value of attributes, and FHDI has an asymptotically positive effect on prediction accuracy with decreasing response rates. Regarding an optimal setting, 30 to 35 is recommended for the FHDIs internal categorization number while 5 is recommended for the FHDI donors, which is aligned with Rubins recommendation.

Original languageEnglish
Article number8735753
Pages (from-to)2363-2373
Number of pages11
JournalIEEE Transactions on Knowledge and Data Engineering
Volume32
Issue number12
DOIs
Publication statusPublished - 2020 Dec 1

Bibliographical note

Funding Information:
This research is supported by the research funding of the Department of Civil, Construction, and Environmental Engineering of Iowa State University. The parallel computing research reported herein is partially supported by the HPC@ISU equipment at ISU, some of which has been purchased through funding provided by US National Science Foundation under MRI grant number CNS 1229081 and CRI grant number 1205413. I. Cho’s research is also supported by the US National Science Foundation under grants CBET-1605275, and J. Im’s research is supported by the National Research Foundation (NRF) Korea, NRF-2018R1D1A1B07045220. The data sharing of Dr. Lawrence and Dr. Cetin is appreciated.

Publisher Copyright:
© 1989-2012 IEEE.

All Science Journal Classification (ASJC) codes

  • Information Systems
  • Computer Science Applications
  • Computational Theory and Mathematics

Fingerprint

Dive into the research topics of 'Impacts of Fractional Hot-Deck Imputation on Learning and Prediction of Engineering Data'. Together they form a unique fingerprint.

Cite this