Abstract
In this study, we propose a time-series clustering approach that selects optimal training data for the development of predictive models. The optimal number of clusters was set based on the variation of within-cluster sums of squares. A predictive model was developed with the selection ratio of training data from each of those clusters. Based on the results, a regression model was developed to predict the performance of the model. The search space was applied to the regression model, and the optimal training data ratio were selected satisfying the objective function and constraints. The effectiveness of the method is demonstrated by addressing a commercial bio 2,3-butanediol distillation process. As a result, the number of data for model training was reduced by 49.20% compared to the base case without clustering. The coefficient of determination (R2) showed the same level of performance, and the root-mean-square error was improved up to 14.07%.
Original language | English |
---|---|
Article number | 107758 |
Journal | Computers and Chemical Engineering |
Volume | 161 |
DOIs | |
Publication status | Published - 2022 May |
Bibliographical note
Funding Information:This work was supported by the Korean Institute of Industrial Technology within the framework of the following projects: “Development of Global Optimization System for Energy Process [grant number EM-21–0022, IR-21–0029, IZ-21–0052]” and “Development of AI Platform Technology for Smart Chemical Process [grant number JH-21–0005]”.
Publisher Copyright:
© 2022
All Science Journal Classification (ASJC) codes
- Chemical Engineering(all)
- Computer Science Applications