Histogram difference string distance for enhancing ontology integration in bioinformatics

Alex Rudniy, James Geller, Min Song

Research output: Chapter in Book/Report/Conference proceedingConference contribution

1 Citation (Scopus)

Abstract

Integration of bioinformatics ontologies is an important research task. This paper presents a family of new methods of string distance computation for improving existing ontology integration and alignment techniques. A histogram, the main tool of the introduced methods, is an associative array for storing the number of occurrences of each character in a string. We use histogram difference in combination with Longest Common Prefix, TFIDF, Smith-Waterman, and Jaccard re-scorers to define the four members of our family of string matching methods. We compare the performance of our methods with several well-known string matching algorithms using five Gene Ontology datasets as test beds. Our methods outperformed those algorithms in terms of average precision on four datasets and for maximum F1 measure on three datasets. On the remaining datasets our results were among the best, compared to these well-known methods.

Original languageEnglish
Title of host publication4th International Conference on Bioinformatics and Computational Biology 2012, BICoB 2012
Pages108-113
Number of pages6
Publication statusPublished - 2012 Dec 1
Event4th International Conference on Bioinformatics and Computational Biology 2012, BICoB 2012 - Las Vegas, NV, United States
Duration: 2012 Mar 122012 Mar 14

Publication series

Name4th International Conference on Bioinformatics and Computational Biology 2012, BICoB 2012

Other

Other4th International Conference on Bioinformatics and Computational Biology 2012, BICoB 2012
CountryUnited States
CityLas Vegas, NV
Period12/3/1212/3/14

Fingerprint

Bioinformatics
Computational Biology
Ontology
String searching algorithms
Genes
Gene Ontology
Datasets
Research

All Science Journal Classification (ASJC) codes

  • Biomedical Engineering
  • Health Information Management

Cite this

Rudniy, A., Geller, J., & Song, M. (2012). Histogram difference string distance for enhancing ontology integration in bioinformatics. In 4th International Conference on Bioinformatics and Computational Biology 2012, BICoB 2012 (pp. 108-113). (4th International Conference on Bioinformatics and Computational Biology 2012, BICoB 2012).
Rudniy, Alex ; Geller, James ; Song, Min. / Histogram difference string distance for enhancing ontology integration in bioinformatics. 4th International Conference on Bioinformatics and Computational Biology 2012, BICoB 2012. 2012. pp. 108-113 (4th International Conference on Bioinformatics and Computational Biology 2012, BICoB 2012).
@inproceedings{b8306f528fcd42f5b97cf03626e55632,
title = "Histogram difference string distance for enhancing ontology integration in bioinformatics",
abstract = "Integration of bioinformatics ontologies is an important research task. This paper presents a family of new methods of string distance computation for improving existing ontology integration and alignment techniques. A histogram, the main tool of the introduced methods, is an associative array for storing the number of occurrences of each character in a string. We use histogram difference in combination with Longest Common Prefix, TFIDF, Smith-Waterman, and Jaccard re-scorers to define the four members of our family of string matching methods. We compare the performance of our methods with several well-known string matching algorithms using five Gene Ontology datasets as test beds. Our methods outperformed those algorithms in terms of average precision on four datasets and for maximum F1 measure on three datasets. On the remaining datasets our results were among the best, compared to these well-known methods.",
author = "Alex Rudniy and James Geller and Min Song",
year = "2012",
month = "12",
day = "1",
language = "English",
isbn = "9781618397461",
series = "4th International Conference on Bioinformatics and Computational Biology 2012, BICoB 2012",
pages = "108--113",
booktitle = "4th International Conference on Bioinformatics and Computational Biology 2012, BICoB 2012",

}

Rudniy, A, Geller, J & Song, M 2012, Histogram difference string distance for enhancing ontology integration in bioinformatics. in 4th International Conference on Bioinformatics and Computational Biology 2012, BICoB 2012. 4th International Conference on Bioinformatics and Computational Biology 2012, BICoB 2012, pp. 108-113, 4th International Conference on Bioinformatics and Computational Biology 2012, BICoB 2012, Las Vegas, NV, United States, 12/3/12.

Histogram difference string distance for enhancing ontology integration in bioinformatics. / Rudniy, Alex; Geller, James; Song, Min.

4th International Conference on Bioinformatics and Computational Biology 2012, BICoB 2012. 2012. p. 108-113 (4th International Conference on Bioinformatics and Computational Biology 2012, BICoB 2012).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

TY - GEN

T1 - Histogram difference string distance for enhancing ontology integration in bioinformatics

AU - Rudniy, Alex

AU - Geller, James

AU - Song, Min

PY - 2012/12/1

Y1 - 2012/12/1

N2 - Integration of bioinformatics ontologies is an important research task. This paper presents a family of new methods of string distance computation for improving existing ontology integration and alignment techniques. A histogram, the main tool of the introduced methods, is an associative array for storing the number of occurrences of each character in a string. We use histogram difference in combination with Longest Common Prefix, TFIDF, Smith-Waterman, and Jaccard re-scorers to define the four members of our family of string matching methods. We compare the performance of our methods with several well-known string matching algorithms using five Gene Ontology datasets as test beds. Our methods outperformed those algorithms in terms of average precision on four datasets and for maximum F1 measure on three datasets. On the remaining datasets our results were among the best, compared to these well-known methods.

AB - Integration of bioinformatics ontologies is an important research task. This paper presents a family of new methods of string distance computation for improving existing ontology integration and alignment techniques. A histogram, the main tool of the introduced methods, is an associative array for storing the number of occurrences of each character in a string. We use histogram difference in combination with Longest Common Prefix, TFIDF, Smith-Waterman, and Jaccard re-scorers to define the four members of our family of string matching methods. We compare the performance of our methods with several well-known string matching algorithms using five Gene Ontology datasets as test beds. Our methods outperformed those algorithms in terms of average precision on four datasets and for maximum F1 measure on three datasets. On the remaining datasets our results were among the best, compared to these well-known methods.

UR - http://www.scopus.com/inward/record.url?scp=84883638335&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84883638335&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:84883638335

SN - 9781618397461

T3 - 4th International Conference on Bioinformatics and Computational Biology 2012, BICoB 2012

SP - 108

EP - 113

BT - 4th International Conference on Bioinformatics and Computational Biology 2012, BICoB 2012

ER -

Rudniy A, Geller J, Song M. Histogram difference string distance for enhancing ontology integration in bioinformatics. In 4th International Conference on Bioinformatics and Computational Biology 2012, BICoB 2012. 2012. p. 108-113. (4th International Conference on Bioinformatics and Computational Biology 2012, BICoB 2012).