Question answering (QA) has become a popular way for humans to access billion-scale knowledge bases. Unlike web search, QA over a knowledge base gives out accurate and concise results, provided that natural language questions can be understood and mapped precisely to structured queries over the knowledge base. The challenge, however, is that a human can ask one question in many different ways. Previous approaches have natural limits due to their representations: Rule based approaches only understand a small set of "canned" questions, while keyword based or synonym based approaches cannot fully understand the questions. In this paper, we design a new kind of question representation: Templates, over a billion scale knowledge base and a million scale QA corpora. For example, for questions about a city's population, we learn templates such as What's the population of $city?, How many people are there in $city?. We learned 27 million templates for 2782 intents. Based on these templates, our QA system KBQA effectively supports binary factoid questions, as well as complex questions which are composed of a series of binary factoid questions. Furthermore, we expand predicates in RDF knowledge base, which boosts the coverage of knowledge base by 57 times. Our QA system beats all other state-of-art works on both effectiveness and efficiency over QALD benchmarks.
|Number of pages||12|
|Journal||Proceedings of the VLDB Endowment|
|Publication status||Published - 2016|
|Event||43rd International Conference on Very Large Data Bases, VLDB 2017 - Munich, Germany|
Duration: 2017 Aug 28 → 2017 Sep 1
Bibliographical noteFunding Information:
This paper was supported by the National Key Basic Research Program of China under No.2015CB358800, by the National NSFC (No.61472085, U1509213), by Shanghai Municipal Science and Technology Commission foundation key project under No.15JC1400900, by Shanghai Municipal Science and Technology project under No.16511102102. Seung-won Hwang was supported by IITP grant funded by the Korea government (MSIP; No. B0101-16-0307) and Microsoft Research.
All Science Journal Classification (ASJC) codes
- Computer Science (miscellaneous)
- Computer Science(all)