Detecting the knowledge structure of bioinformatics by mining full-text collections

Min Song, Su Yeon Kim

Research output: Contribution to journalArticlepeer-review

51 Citations (Scopus)


Bioinformatics is a fast-growing, diverse research field that has recently gained much public attention. Even though there are several attempts to understand the field of bioinformatics by bibliometric analysis, the proposed approach in this paper is the first attempt at applying text mining techniques to a large set of full-text articles to detect the knowledge structure of the field. To this end, we use PubMed Central full-text articles for bibliometric analysis instead of relying on citation data provided in Web of Science. In particular, we develop text mining routines to build a custom-made citation database as a result of mining full-text. We present several interesting findings in this study. First, the majority of the papers published in the field of bioinformatics are not cited by others (63 % of papers received less than two citations). Second, there is a linear, consistent increase in the number of publications. Particularly year 2003 is the turning point in terms of publication growth. Third, most researches of bioinformatics are driven by USA-based institutes followed by European institutes. Fourth, the results of topic modeling and word co-occurrence analysis reveal that major topics focus more on biological aspects than on computational aspects of bioinformatics. However, the top 10 ranked articles identified by PageRank are more related to computational aspects. Fifth, visualization of author co-citation analysis indicates that researchers in molecular biology or genomics play a key role in connecting sub-disciplines of bioinformatics.

Original languageEnglish
Pages (from-to)183-201
Number of pages19
Issue number1
Publication statusPublished - 2013 Jul

Bibliographical note

Funding Information:
University of California Harvard Medical School Stanford National Institutes of Health University of Washington Yale University University College London Massachusetts Institute of Technology Washington University University of Toronto Wellcome Trust Genome Campus University of Illinois University of Oxford University of Michigan University of Cambridge University of North Carolina Princeton University Baylor College of Medicine Columbia University Cornell University

All Science Journal Classification (ASJC) codes

  • Social Sciences(all)
  • Computer Science Applications
  • Library and Information Sciences


Dive into the research topics of 'Detecting the knowledge structure of bioinformatics by mining full-text collections'. Together they form a unique fingerprint.

Cite this