Biomedical text categorization with concept graph representations using a controlled vocabulary

Meenakshi Mishra, Jun Huan, Said Bleik, Min Song

Research output: Chapter in Book/Report/Conference proceedingConference contribution

9 Citations (Scopus)

Abstract

Recent work using graph representations for text categorization has shown promising performance over conventional bag-of-words representation of text documents. In this paper we investigate a graph representation of texts for the task of text categorization. In our representation we identify high level concepts extracted from a database of controlled biomedical terms and build a rich graph structure that contains important concepts and relationships. This procedure ensures that graphs are described with a regular vocabulary, leading to increased ease of comparison. We then classify document graphs by applying a set-based graph kernel that is intuitively sensible and able to deal with the disconnectedness of the constructed concept graphs. We compare this approach to standard approaches using non-graph, text-based features. We also do a comparison amongst different kernels that can be used to see which performs better.

Original languageEnglish
Title of host publicationProc. of the 11th Int. Workshop on Data Mining in Bioinformatics, BIOKDD 2012 - Held in Conjunction with the 18th ACM SIGKDD Int. Conference on Knowledge Discovery and Data Mining, SIGKDD'12
Pages26-32
Number of pages7
DOIs
Publication statusPublished - 2012 Sep 28
Event11th International Workshop on Data Mining in Bioinformatics, BIOKDD 2012 - Held in Conjunction with the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, SIGKDD'12 - Beijing, China
Duration: 2012 Aug 122012 Aug 12

Publication series

NameProceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

Other

Other11th International Workshop on Data Mining in Bioinformatics, BIOKDD 2012 - Held in Conjunction with the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, SIGKDD'12
CountryChina
CityBeijing
Period12/8/1212/8/12

Fingerprint

Thesauri

All Science Journal Classification (ASJC) codes

  • Software
  • Information Systems

Cite this

Mishra, M., Huan, J., Bleik, S., & Song, M. (2012). Biomedical text categorization with concept graph representations using a controlled vocabulary. In Proc. of the 11th Int. Workshop on Data Mining in Bioinformatics, BIOKDD 2012 - Held in Conjunction with the 18th ACM SIGKDD Int. Conference on Knowledge Discovery and Data Mining, SIGKDD'12 (pp. 26-32). (Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining). https://doi.org/10.1145/2350176.2350181
Mishra, Meenakshi ; Huan, Jun ; Bleik, Said ; Song, Min. / Biomedical text categorization with concept graph representations using a controlled vocabulary. Proc. of the 11th Int. Workshop on Data Mining in Bioinformatics, BIOKDD 2012 - Held in Conjunction with the 18th ACM SIGKDD Int. Conference on Knowledge Discovery and Data Mining, SIGKDD'12. 2012. pp. 26-32 (Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining).
@inproceedings{abbab2138358470ebdf2be59761748ad,
title = "Biomedical text categorization with concept graph representations using a controlled vocabulary",
abstract = "Recent work using graph representations for text categorization has shown promising performance over conventional bag-of-words representation of text documents. In this paper we investigate a graph representation of texts for the task of text categorization. In our representation we identify high level concepts extracted from a database of controlled biomedical terms and build a rich graph structure that contains important concepts and relationships. This procedure ensures that graphs are described with a regular vocabulary, leading to increased ease of comparison. We then classify document graphs by applying a set-based graph kernel that is intuitively sensible and able to deal with the disconnectedness of the constructed concept graphs. We compare this approach to standard approaches using non-graph, text-based features. We also do a comparison amongst different kernels that can be used to see which performs better.",
author = "Meenakshi Mishra and Jun Huan and Said Bleik and Min Song",
year = "2012",
month = "9",
day = "28",
doi = "10.1145/2350176.2350181",
language = "English",
isbn = "9781450315524",
series = "Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining",
pages = "26--32",
booktitle = "Proc. of the 11th Int. Workshop on Data Mining in Bioinformatics, BIOKDD 2012 - Held in Conjunction with the 18th ACM SIGKDD Int. Conference on Knowledge Discovery and Data Mining, SIGKDD'12",

}

Mishra, M, Huan, J, Bleik, S & Song, M 2012, Biomedical text categorization with concept graph representations using a controlled vocabulary. in Proc. of the 11th Int. Workshop on Data Mining in Bioinformatics, BIOKDD 2012 - Held in Conjunction with the 18th ACM SIGKDD Int. Conference on Knowledge Discovery and Data Mining, SIGKDD'12. Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 26-32, 11th International Workshop on Data Mining in Bioinformatics, BIOKDD 2012 - Held in Conjunction with the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, SIGKDD'12, Beijing, China, 12/8/12. https://doi.org/10.1145/2350176.2350181

Biomedical text categorization with concept graph representations using a controlled vocabulary. / Mishra, Meenakshi; Huan, Jun; Bleik, Said; Song, Min.

Proc. of the 11th Int. Workshop on Data Mining in Bioinformatics, BIOKDD 2012 - Held in Conjunction with the 18th ACM SIGKDD Int. Conference on Knowledge Discovery and Data Mining, SIGKDD'12. 2012. p. 26-32 (Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

TY - GEN

T1 - Biomedical text categorization with concept graph representations using a controlled vocabulary

AU - Mishra, Meenakshi

AU - Huan, Jun

AU - Bleik, Said

AU - Song, Min

PY - 2012/9/28

Y1 - 2012/9/28

N2 - Recent work using graph representations for text categorization has shown promising performance over conventional bag-of-words representation of text documents. In this paper we investigate a graph representation of texts for the task of text categorization. In our representation we identify high level concepts extracted from a database of controlled biomedical terms and build a rich graph structure that contains important concepts and relationships. This procedure ensures that graphs are described with a regular vocabulary, leading to increased ease of comparison. We then classify document graphs by applying a set-based graph kernel that is intuitively sensible and able to deal with the disconnectedness of the constructed concept graphs. We compare this approach to standard approaches using non-graph, text-based features. We also do a comparison amongst different kernels that can be used to see which performs better.

AB - Recent work using graph representations for text categorization has shown promising performance over conventional bag-of-words representation of text documents. In this paper we investigate a graph representation of texts for the task of text categorization. In our representation we identify high level concepts extracted from a database of controlled biomedical terms and build a rich graph structure that contains important concepts and relationships. This procedure ensures that graphs are described with a regular vocabulary, leading to increased ease of comparison. We then classify document graphs by applying a set-based graph kernel that is intuitively sensible and able to deal with the disconnectedness of the constructed concept graphs. We compare this approach to standard approaches using non-graph, text-based features. We also do a comparison amongst different kernels that can be used to see which performs better.

UR - http://www.scopus.com/inward/record.url?scp=84866635017&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84866635017&partnerID=8YFLogxK

U2 - 10.1145/2350176.2350181

DO - 10.1145/2350176.2350181

M3 - Conference contribution

AN - SCOPUS:84866635017

SN - 9781450315524

T3 - Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

SP - 26

EP - 32

BT - Proc. of the 11th Int. Workshop on Data Mining in Bioinformatics, BIOKDD 2012 - Held in Conjunction with the 18th ACM SIGKDD Int. Conference on Knowledge Discovery and Data Mining, SIGKDD'12

ER -

Mishra M, Huan J, Bleik S, Song M. Biomedical text categorization with concept graph representations using a controlled vocabulary. In Proc. of the 11th Int. Workshop on Data Mining in Bioinformatics, BIOKDD 2012 - Held in Conjunction with the 18th ACM SIGKDD Int. Conference on Knowledge Discovery and Data Mining, SIGKDD'12. 2012. p. 26-32. (Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining). https://doi.org/10.1145/2350176.2350181