Text categorization of biomedical data sets using graph kernels and a controlled Vocabulary

Said Bleik, Meenakshi Mishra, Jun Huan, Min Song

Research output: Contribution to journalArticle

19 Citations (Scopus)

Abstract

Recently, graph representations of text have been showing improved performance over conventional bag-of-words representations in text categorization applications. In this paper, we present a graph-based representation for biomedical articles and use graph kernels to classify those articles into high-level categories. In our representation, common biomedical concepts and semantic relationships are identified with the help of an existing ontology and are used to build a rich graph structure that provides a consistent feature set and preserves additional semantic information that could improve a classifier's performance. We attempt to classify the graphs using both a set-based graph kernel that is capable of dealing with the disconnected nature of the graphs and a simple linear kernel. Finally, we report the results comparing the classification performance of the kernel classifiers to common text-based classifiers.

Original languageEnglish
Article number6475935
Pages (from-to)1211-1217
Number of pages7
JournalIEEE/ACM Transactions on Computational Biology and Bioinformatics
Volume10
Issue number5
DOIs
Publication statusPublished - 2013 Sep 1

Fingerprint

Controlled Vocabulary
Thesauri
Text Categorization
Semantics
Classifiers
kernel
Graph in graph theory
Classifier
Ontology
Classify
Graph Representation
Datasets

All Science Journal Classification (ASJC) codes

  • Biotechnology
  • Genetics
  • Applied Mathematics

Cite this

@article{7fd3311dc0764244bc359674eb7c823a,
title = "Text categorization of biomedical data sets using graph kernels and a controlled Vocabulary",
abstract = "Recently, graph representations of text have been showing improved performance over conventional bag-of-words representations in text categorization applications. In this paper, we present a graph-based representation for biomedical articles and use graph kernels to classify those articles into high-level categories. In our representation, common biomedical concepts and semantic relationships are identified with the help of an existing ontology and are used to build a rich graph structure that provides a consistent feature set and preserves additional semantic information that could improve a classifier's performance. We attempt to classify the graphs using both a set-based graph kernel that is capable of dealing with the disconnected nature of the graphs and a simple linear kernel. Finally, we report the results comparing the classification performance of the kernel classifiers to common text-based classifiers.",
author = "Said Bleik and Meenakshi Mishra and Jun Huan and Min Song",
year = "2013",
month = "9",
day = "1",
doi = "10.1109/TCBB.2013.16",
language = "English",
volume = "10",
pages = "1211--1217",
journal = "IEEE/ACM Transactions on Computational Biology and Bioinformatics",
issn = "1545-5963",
publisher = "Institute of Electrical and Electronics Engineers Inc.",
number = "5",

}

Text categorization of biomedical data sets using graph kernels and a controlled Vocabulary. / Bleik, Said; Mishra, Meenakshi; Huan, Jun; Song, Min.

In: IEEE/ACM Transactions on Computational Biology and Bioinformatics, Vol. 10, No. 5, 6475935, 01.09.2013, p. 1211-1217.

Research output: Contribution to journalArticle

TY - JOUR

T1 - Text categorization of biomedical data sets using graph kernels and a controlled Vocabulary

AU - Bleik, Said

AU - Mishra, Meenakshi

AU - Huan, Jun

AU - Song, Min

PY - 2013/9/1

Y1 - 2013/9/1

N2 - Recently, graph representations of text have been showing improved performance over conventional bag-of-words representations in text categorization applications. In this paper, we present a graph-based representation for biomedical articles and use graph kernels to classify those articles into high-level categories. In our representation, common biomedical concepts and semantic relationships are identified with the help of an existing ontology and are used to build a rich graph structure that provides a consistent feature set and preserves additional semantic information that could improve a classifier's performance. We attempt to classify the graphs using both a set-based graph kernel that is capable of dealing with the disconnected nature of the graphs and a simple linear kernel. Finally, we report the results comparing the classification performance of the kernel classifiers to common text-based classifiers.

AB - Recently, graph representations of text have been showing improved performance over conventional bag-of-words representations in text categorization applications. In this paper, we present a graph-based representation for biomedical articles and use graph kernels to classify those articles into high-level categories. In our representation, common biomedical concepts and semantic relationships are identified with the help of an existing ontology and are used to build a rich graph structure that provides a consistent feature set and preserves additional semantic information that could improve a classifier's performance. We attempt to classify the graphs using both a set-based graph kernel that is capable of dealing with the disconnected nature of the graphs and a simple linear kernel. Finally, we report the results comparing the classification performance of the kernel classifiers to common text-based classifiers.

UR - http://www.scopus.com/inward/record.url?scp=84894553853&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84894553853&partnerID=8YFLogxK

U2 - 10.1109/TCBB.2013.16

DO - 10.1109/TCBB.2013.16

M3 - Article

C2 - 24384709

AN - SCOPUS:84894553853

VL - 10

SP - 1211

EP - 1217

JO - IEEE/ACM Transactions on Computational Biology and Bioinformatics

JF - IEEE/ACM Transactions on Computational Biology and Bioinformatics

SN - 1545-5963

IS - 5

M1 - 6475935

ER -