With the proliferation of question and answering (Q&A) services, studies on building a knowledge base (KB) using various information extraction (IE) methodologies from unstructured data on the Web have received significant attention. Existing IE approaches, including machine reading comprehension (MRC), can find the correct answer to a question if the correct answer exists in the document. However, most are prone to extracting incorrect answers rather than producing no answers when the correct answer does not exist in the given documents. This problem is likely to cause serious real-world problems when we apply such technologies to practical services such as AI speakers. We propose a novel open-domain IE system to alleviate the weaknesses of previous approaches. The proposed system integrates an elaborated document selection, sentence selection, and knowledge extraction ensemble method to obtain high specificity while maintaining a realistically achievable level of precision. Based on this framework, we extract answers on Korean open-domain user queries from unstructured documents collected from multiple Web sources. For evaluating our system, we build a benchmark dataset with the SKTelecom AI Speaker log. The baseline models KYLIN infobox generator and BiDAF were used to evaluate the performance of the proposed approach. The experimental results demonstrate that the proposed method outperforms the baseline models and is practically applicable to real-world services.
Bibliographical noteFunding Information:
This work was financially supported by Korea Ministry of Land, Infrastructure and Transport (MOLIT) as [Innovative Talent Education Program for Smart City]. This work was also supported by SK Telecom Company, Ltd.
© 2020 Institute of Electrical and Electronics Engineers Inc.. All rights reserved.
All Science Journal Classification (ASJC) codes
- Computer Science(all)
- Materials Science(all)