Automated classification of building information modeling (BIM) case studies by BIM use based on natural language processing (NLP) and unsupervised learning

Namcheol Jung, Ghang Lee

Research output: Contribution to journalArticle

1 Citation (Scopus)

Abstract

This paper comparatively analyzes a method to automatically classify case studies of building information modeling (BIM) in construction projects by BIM use. It generally takes a minimum of thirty minutes to hours of collection and review and an average of four information sources to identify a project that has used BIM in a manner that is of interest. To automate and expedite the analysis tasks, this study deployed natural language processing (NLP) and commonly used unsupervised learning for text classification, namely latent semantic analysis (LSA) and latent Dirichlet allocation (LDA). The results were validated against one of representative supervised learning methods for text classification—support vector machine (SVM). When LSA and LDA detected phrases in a BIM case study that had higher similarity values to the definition of each BIM use than the threshold values, the system determined that the project had deployed BIM in the detected approach. For the classification of BIM use, the BIM uses specified by Pennsylvania State University were utilized. The approach was validated using 240 BIM case studies (512,892 features). When BIM uses were employed in a project, the project was labeled as “1”; when they were not, the project was labeled as “0.” The performance was analyzed by changing parameters: namely, document segmentation, feature weighting, dimensionality reduction coefficient (k-value), the number of topics, and the number of iterations. LDA yielded the highest F1 score, 80.75% on average. LDA and LSA yielded high recall and low precision in most cases. Conversely, SVM yielded high precision and low recall in most cases and fluctuations in F1 scores.

Original languageEnglish
JournalAdvanced Engineering Informatics
DOIs
Publication statusPublished - 2019 Jan 1

Fingerprint

Unsupervised learning
Semantics
Processing
Supervised learning

All Science Journal Classification (ASJC) codes

  • Information Systems
  • Artificial Intelligence

Cite this

@article{2e35402810b845a8bb1263e7e78556d2,
title = "Automated classification of building information modeling (BIM) case studies by BIM use based on natural language processing (NLP) and unsupervised learning",
abstract = "This paper comparatively analyzes a method to automatically classify case studies of building information modeling (BIM) in construction projects by BIM use. It generally takes a minimum of thirty minutes to hours of collection and review and an average of four information sources to identify a project that has used BIM in a manner that is of interest. To automate and expedite the analysis tasks, this study deployed natural language processing (NLP) and commonly used unsupervised learning for text classification, namely latent semantic analysis (LSA) and latent Dirichlet allocation (LDA). The results were validated against one of representative supervised learning methods for text classification—support vector machine (SVM). When LSA and LDA detected phrases in a BIM case study that had higher similarity values to the definition of each BIM use than the threshold values, the system determined that the project had deployed BIM in the detected approach. For the classification of BIM use, the BIM uses specified by Pennsylvania State University were utilized. The approach was validated using 240 BIM case studies (512,892 features). When BIM uses were employed in a project, the project was labeled as “1”; when they were not, the project was labeled as “0.” The performance was analyzed by changing parameters: namely, document segmentation, feature weighting, dimensionality reduction coefficient (k-value), the number of topics, and the number of iterations. LDA yielded the highest F1 score, 80.75{\%} on average. LDA and LSA yielded high recall and low precision in most cases. Conversely, SVM yielded high precision and low recall in most cases and fluctuations in F1 scores.",
author = "Namcheol Jung and Ghang Lee",
year = "2019",
month = "1",
day = "1",
doi = "10.1016/j.aei.2019.04.007",
language = "English",
journal = "Advanced Engineering Informatics",
issn = "1474-0346",
publisher = "Elsevier Limited",

}

TY - JOUR

T1 - Automated classification of building information modeling (BIM) case studies by BIM use based on natural language processing (NLP) and unsupervised learning

AU - Jung, Namcheol

AU - Lee, Ghang

PY - 2019/1/1

Y1 - 2019/1/1

N2 - This paper comparatively analyzes a method to automatically classify case studies of building information modeling (BIM) in construction projects by BIM use. It generally takes a minimum of thirty minutes to hours of collection and review and an average of four information sources to identify a project that has used BIM in a manner that is of interest. To automate and expedite the analysis tasks, this study deployed natural language processing (NLP) and commonly used unsupervised learning for text classification, namely latent semantic analysis (LSA) and latent Dirichlet allocation (LDA). The results were validated against one of representative supervised learning methods for text classification—support vector machine (SVM). When LSA and LDA detected phrases in a BIM case study that had higher similarity values to the definition of each BIM use than the threshold values, the system determined that the project had deployed BIM in the detected approach. For the classification of BIM use, the BIM uses specified by Pennsylvania State University were utilized. The approach was validated using 240 BIM case studies (512,892 features). When BIM uses were employed in a project, the project was labeled as “1”; when they were not, the project was labeled as “0.” The performance was analyzed by changing parameters: namely, document segmentation, feature weighting, dimensionality reduction coefficient (k-value), the number of topics, and the number of iterations. LDA yielded the highest F1 score, 80.75% on average. LDA and LSA yielded high recall and low precision in most cases. Conversely, SVM yielded high precision and low recall in most cases and fluctuations in F1 scores.

AB - This paper comparatively analyzes a method to automatically classify case studies of building information modeling (BIM) in construction projects by BIM use. It generally takes a minimum of thirty minutes to hours of collection and review and an average of four information sources to identify a project that has used BIM in a manner that is of interest. To automate and expedite the analysis tasks, this study deployed natural language processing (NLP) and commonly used unsupervised learning for text classification, namely latent semantic analysis (LSA) and latent Dirichlet allocation (LDA). The results were validated against one of representative supervised learning methods for text classification—support vector machine (SVM). When LSA and LDA detected phrases in a BIM case study that had higher similarity values to the definition of each BIM use than the threshold values, the system determined that the project had deployed BIM in the detected approach. For the classification of BIM use, the BIM uses specified by Pennsylvania State University were utilized. The approach was validated using 240 BIM case studies (512,892 features). When BIM uses were employed in a project, the project was labeled as “1”; when they were not, the project was labeled as “0.” The performance was analyzed by changing parameters: namely, document segmentation, feature weighting, dimensionality reduction coefficient (k-value), the number of topics, and the number of iterations. LDA yielded the highest F1 score, 80.75% on average. LDA and LSA yielded high recall and low precision in most cases. Conversely, SVM yielded high precision and low recall in most cases and fluctuations in F1 scores.

UR - http://www.scopus.com/inward/record.url?scp=85064926758&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85064926758&partnerID=8YFLogxK

U2 - 10.1016/j.aei.2019.04.007

DO - 10.1016/j.aei.2019.04.007

M3 - Article

AN - SCOPUS:85064926758

JO - Advanced Engineering Informatics

JF - Advanced Engineering Informatics

SN - 1474-0346

ER -