Analyzing research topics offers potential insights into the direction of scientific development. In particular, analyzing multilingual research topics can help researchers grasp the evolution of topics globally, revealing topic similarity among scientific publications written in different languages. Most studies to date on topic analysis have been based on English-language publications and have relied heavily on citation-based topic evolution analysis. However, since it can be challenging for English publications to cite non-English sources and since many languages do not offer English translations of abstracts, citation-based methodologies are not suitable for analyzing multilingual research topic relations. Since multilingual sentence embeddings can effectively preserve word semantics in multilingual translation tasks, a topic model based on multilingual sentence embeddings could potentially generate topic-word distributions for publications in multilingual analysis. In this paper, which is situated in the field of library and information science, we use multilingual pretrained Bidirectional Encoder Representations from Transformers (BERT) embeddings and the Latent Dirichlet Allocation (LDA) topic model to analyze topic evolution in monolingual and multilingual topic similarity settings. For each topic, we multiply its LDA probability value by the averaged tensor similarity of BERT embeddings to explore the evolution of the topic in scientific publications. As our proposed method does not rely on a machine translator or the author's subjective translation, it avoids confusion and misusages caused by either machine error or the author's subjectively chosen English keywords. Our results show that the proposed approach is well-suited to analyzing the scientific evolutions in monolingual and scientific multilingual topic similarity relations.
Bibliographical noteFunding Information:
This work was supported by the National Research Foundation of Korea (NRF) grant funded by the Korean government (MSIT) (No. NRF-2019R1A2C2002577 ).
© 2020 Elsevier Ltd.
All Science Journal Classification (ASJC) codes
- Computer Science Applications
- Library and Information Sciences