In this paper we analyze topic evolution over time within bioinformatics to uncover the underlying dynamics of that field, focusing on the recent developments in the 2000s. We select 33 bioinformatics related conferences indexed in DBLP from 2000 to 2011. The major reason for choosing DBLP as the data source instead of PubMed is that DBLP retains most bioinformatics related conferences, and to study dynamics of the field, conference papers are more suitable than journal papers. We divide a period of a dozen years into four periods: period 1 (2000–2002), period 2 (2003–2005), period 3 (2006–2008) and period 4 (2009–2011). To conduct topic evolution analysis, we employ three major procedures, and for each procedure, we develop the following novel technique: the Markov Random Field-based topic clustering, automatic cluster labeling, and topic similarity based on Within-Period Cluster Similarity and Between-Period Cluster Similarity. The experimental results show that there are distinct topic transition patterns between different time periods. From period 1 to period 3, new topics seem to have emerged and expanded, whereas from period 3 to period 4, topics are merged and display more rigorous interaction with each other. This trend is confirmed by the collaboration pattern over time.
Bibliographical noteFunding Information:
Topic Detection and Tracking (TDT), which started with the support of US Government’s Defense Advanced Research Projects Agency (DARPA), consists of a broadcast news understanding program and the TIDES program (Translingual Information Detection, Extraction, and Summarization). Since the first pilot study in 1997, numerous studies on topic detection and tracking tasks have been rigorously conducted until 2004 and steadily continued afterward (Fukumoto and Suzuki 2000; Swan and Allan 2000; Rajaraman and Tan 2001; Makkonen et al. 2004; Morinaga and Yamanishi 2004; Kuo and Chen 2007; Jin et al. 2007; Diaz 2009; Larson et al. 2011; Tu and Seng 2012). First story detection, cluster detection, and topic tracking are three TDT tasks that share a common goal with topic evolution. Early focus of TDT studies on broadcast news data was extended recently to study general topic detection researches on various different data sets.
Acknowledgments This work was supported by National Research Foundation of Korea Grant funded by the Korean Government (NRF-2012-2012S1A3A2033291) and by the National Research Foundation of Korea (NRF) grant funded by the Korea government (No. 2012033242).
All Science Journal Classification (ASJC) codes
- Social Sciences(all)
- Computer Science Applications
- Library and Information Sciences