GRiD: Gathering rich data from PubMed using one-class SVM

Junbum Cha, Jeongwoo Kim, Sanghyun Park

Research output: Chapter in Book/Report/Conference proceedingConference contribution

4 Citations (Scopus)

Abstract

The Medical Subject Headings (MeSH) term search is typical data-gathering method in biomedical text mining. However, it has two problems: the allocation delay of the MeSH term and missing valuable literature sources. Since MeSH term allocation is performed by a human being, the allocation process has delay. In addition, even if a literature source was allocated with a MeSH term, there is a still the problem that valuable literature sources are missed during the data-gathering process. There are literature sources that are not indexed to the MeSH term of a keyword, even though it contains valuable information related to the MeSH term. The MeSH term search misses these valuable literature sources. In order to resolve these problems, we propose a novel method to gather rich data using a one-class support vector machine (SVM) and relevance rule. The term frequency-inverse document frequency (TF-IDF) and paragraph vector are examined as text vectorization methods with various parameters and relevance factors. We apply our method to lung cancer, prostate cancer, breast cancer, and Alzheimer's disease. As a result, up to 26% of keyword data and 35% of target data are gathered with high quality (a C-score of at least 0.948).

Original languageEnglish
Title of host publication2016 IEEE International Conference on Systems, Man, and Cybernetics, SMC 2016 - Conference Proceedings
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages4325-4331
Number of pages7
ISBN (Electronic)9781509018970
DOIs
Publication statusPublished - 2017 Feb 6
Event2016 IEEE International Conference on Systems, Man, and Cybernetics, SMC 2016 - Budapest, Hungary
Duration: 2016 Oct 92016 Oct 12

Publication series

Name2016 IEEE International Conference on Systems, Man, and Cybernetics, SMC 2016 - Conference Proceedings

Other

Other2016 IEEE International Conference on Systems, Man, and Cybernetics, SMC 2016
CountryHungary
CityBudapest
Period16/10/916/10/12

All Science Journal Classification (ASJC) codes

  • Computer Vision and Pattern Recognition
  • Artificial Intelligence
  • Control and Optimization
  • Human-Computer Interaction

Fingerprint Dive into the research topics of 'GRiD: Gathering rich data from PubMed using one-class SVM'. Together they form a unique fingerprint.

Cite this