Prompt engineering for zero-shot and few-shot defect detection and classification using a visual-language pretrained model

Gunwoo Yong, Kahyun Jeon, Daeyoung Gil, Ghang Lee

Research output: Contribution to journalArticlepeer-review

Abstract

Zero-shot learning, applied with vision-language pretrained (VLP) models, is expected to be an alternative to existing deep learning models for defect detection, under insufficient dataset. However, VLP models, including contrastive language-image pretraining (CLIP), showed fluctuated performance on prompts (inputs), resulting in research on prompt engineering—optimization of prompts for improving performance. Therefore, this study aims to identify the features of a prompt that can yield the best performance in classifying and detecting building defects using the zero-shot and few-shot capabilities of CLIP. The results reveal the following: (1) domain-specific definitions are better than general definitions and images; (2) a complete sentence is better than a set of core terms; and (3) multimodal information is better than single-modal information. The resulting detection performance using the proposed prompting method outperformed that of existing supervised models.

Original languageEnglish
JournalComputer-Aided Civil and Infrastructure Engineering
DOIs
Publication statusAccepted/In press - 2022

Bibliographical note

Funding Information:
National Research Foundation of Korea (NRF), Grant/Award Number: 2021RIA2C300820969

Publisher Copyright:
© 2022 Computer-Aided Civil and Infrastructure Engineering.

All Science Journal Classification (ASJC) codes

  • Civil and Structural Engineering
  • Building and Construction
  • Computer Science Applications
  • Computer Graphics and Computer-Aided Design
  • Computational Theory and Mathematics

Fingerprint

Dive into the research topics of 'Prompt engineering for zero-shot and few-shot defect detection and classification using a visual-language pretrained model'. Together they form a unique fingerprint.

Cite this