Nl2PSQL: Generating pseudo-SQL queries from under-specified natural language questions

Fuxiang Chen, Seung Won Hwang, Jaegul Choo, Jung Woo Ha, Sunghun Kim

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Generating SQL codes from natural language questions (NL2SQL) is an emerging research area. Existing studies have mainly focused on clear scenarios where specified information is fully given to generate a SQL query. However, in developer forums such as Stack Overflow,1 questions cover more diverse tasks including table manipulation or performance issues, where a table is not specified. The SQL query posted in Stack Overflow, Pseudo-SQL (pSQL), does not usually contain table schemas and is not necessarily executable, is sufficient to guide developers. Here we describe a new NL2pSQL task to generate pSQL codes from natural language questions on under-specified database issues, in short, NL2pSQL. In addition, we define two new metrics suitable for the proposed NL2pSQL task, Canonical-BLEU and SQL-BLEU, instead of the conventional BLEU. With a baseline model using sequence-to-sequence architecture integrated with denoising autoencoder, we confirm the validity of our task. Experiments show that the proposed NL2pSQL approach yields well-formed queries (up to 43% more than a standard Seq2Seq model). Our code and datasets are publicly available at http://github.com/clovaai/nl2psql.

Original languageEnglish
Title of host publicationEMNLP-IJCNLP 2019 - 2019 Conference on Empirical Methods in Natural Language Processing and 9th International Joint Conference on Natural Language Processing, Proceedings of the Conference
PublisherAssociation for Computational Linguistics
Pages2603-2613
Number of pages11
ISBN (Electronic)9781950737901
Publication statusPublished - 2020
Event2019 Conference on Empirical Methods in Natural Language Processing and 9th International Joint Conference on Natural Language Processing, EMNLP-IJCNLP 2019 - Hong Kong, China
Duration: 2019 Nov 32019 Nov 7

Publication series

NameEMNLP-IJCNLP 2019 - 2019 Conference on Empirical Methods in Natural Language Processing and 9th International Joint Conference on Natural Language Processing, Proceedings of the Conference

Conference

Conference2019 Conference on Empirical Methods in Natural Language Processing and 9th International Joint Conference on Natural Language Processing, EMNLP-IJCNLP 2019
CountryChina
CityHong Kong
Period19/11/319/11/7

All Science Journal Classification (ASJC) codes

  • Computational Theory and Mathematics
  • Computer Science Applications
  • Information Systems

Fingerprint Dive into the research topics of 'Nl2PSQL: Generating pseudo-SQL queries from under-specified natural language questions'. Together they form a unique fingerprint.

  • Cite this

    Chen, F., Hwang, S. W., Choo, J., Ha, J. W., & Kim, S. (2020). Nl2PSQL: Generating pseudo-SQL queries from under-specified natural language questions. In EMNLP-IJCNLP 2019 - 2019 Conference on Empirical Methods in Natural Language Processing and 9th International Joint Conference on Natural Language Processing, Proceedings of the Conference (pp. 2603-2613). (EMNLP-IJCNLP 2019 - 2019 Conference on Empirical Methods in Natural Language Processing and 9th International Joint Conference on Natural Language Processing, Proceedings of the Conference). Association for Computational Linguistics.