A Study of Missing Data Imputation in Predictive Modeling of a Wood-Composite Manufacturing Process

作者:Zeng Yan*; Young Timothy M; Edwards David J; Guess Frank M; Chen Chung Hao
来源:Journal of Quality Technology, 2016, 48(3): 284-296.
DOI:10.1080/00224065.2016.11918167

摘要

Problem: Real-time process data and destructive test data were collected and merged from a wood composite manufacturer in the southeastern US for the purpose of developing real-time predictive models for strength properties of manufactured particleboard. Sensor malfunction and other real-time data problems lead to null fields in the company's data warehouse, resulting in information loss. Many manufacturers attempt to build accurate predictive models excluding entire records with null fields or use summary statistics such as the average or median in place of the null field. However, predictive-model errors in validation may be higher in the presence of information loss and may misguide the production process. Approach: This paper summarizes an application of missing-data imputation methods in predictive modeling of a wood-composite manufacturing process. Variable selection was applied prior to imputing missing data. Two missing data-imputation methods were selected after comparing six possible methods. Predictive models of imputed data were developed using partial least squares regression (PLSR) and were compared with models developed from nonimputed data. Results: Maximum likelihood-based imputation using the expectation-maximization (EM) algorithm and multiple imputation (Ml) using Markov Chain Monte Carlo (MCMC) simulation achieved lower root mean-square error of prediction results than imputation based on the mean/median substitution, last observation-carried-forward (LOCF), or a "hot-deck" method using single imputation. Predictive models based on the imputed dataset generated more precise prediction results than models based on nonimputed datasets. Outcomes of the study included avoiding rework and scrap when model predictions alerted of an imminent strength failure and minor reductions were made in resin input set points. Senior management of the company indicated that a savings occurred as a result of the study from lower resin usage (the second highest cost component of manufactured product).

  • 出版日期2016-7