The automatic summarization of scientific articles differs from other text genres because of the structured format and longer text length. Previous approaches have focused on tackling the lengthy nature of scientific articles, aiming to improve the computational efficiency of summarizing long text using a flat, unstructured abstract. However, the structured format of scientific articles and characteristics of each section have not been fully explored, despite their importance. The lack of a sufficient investigation and discussion of various characteristics for each section and their influence on summarization results has hindered the practical use of automatic summarization for scientific articles. To provide a balanced abstract proportionally emphasizing each section of a scientific article, the community introduced the structured abstract, an abstract with distinct, labeled sections. Using this information, in this study, we aim to understand tasks ranging from data preparation to model evaluation from diverse viewpoints. Specifically, we provide a preprocessed large-scale dataset and propose a summarization method applying the introduction, methods, results, and discussion (IMRaD) format reflecting the characteristics of each section. We also discuss the objective benchmarks and perspectives of state-of-the-art algorithms and present the challenges and research directions in this area.
|Number of pages||15|
|Journal||Journal of the Association for Information Science and Technology|
|Publication status||Published - 2023 Feb|
Bibliographical noteFunding Information:
This work was supported by the Yonsei University Research Fund of 2022 (2022‐22‐0122).
Yonsei University Research Fund, Grant/Award Number: 2022‐22‐0122 Funding information
© 2022 Association for Information Science and Technology.
All Science Journal Classification (ASJC) codes
- Information Systems
- Computer Networks and Communications
- Information Systems and Management
- Library and Information Sciences