Improved shortest path edit distance for synonyms identification

Alex Rudniy, Min Song, James Geller

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Integration of proliferous sequencing and related data into the UniProt Knowledgebase is an important ongoing research project. This paper proposes Improved Shortest Path Edit Distance (ISPED) as an algorithm for enhancing existing integration techniques. ISPED is an improved version of the algorithm previously developed by the authors. Three major adjustments have been made: better node weight calculation, score normalization, and implementation of a re-scorer. We apply ISPED as an approximate string similarity metric to five datasets extracted from UNIPROT-GOA during synonym identification experiments. ISPED outperforms nine wellknown string similarity metrics and achieves the highest values of average precision and F1 on all selected datasets.

Original languageEnglish
Title of host publicationProceedings of the 6th International Conference on Bioinformatics and Computational Biology, BICOB 2014
PublisherInternational Society for Computers and Their Applications
Pages97-102
Number of pages6
ISBN (Print)9781632665140
Publication statusPublished - 2014 Jan 1
Event6th International Conference on Bioinformatics and Computational Biology, BICOB 2014 - Las Vegas, NV, United States
Duration: 2014 Mar 242014 Mar 26

Publication series

NameProceedings of the 6th International Conference on Bioinformatics and Computational Biology, BICOB 2014

Other

Other6th International Conference on Bioinformatics and Computational Biology, BICOB 2014
CountryUnited States
CityLas Vegas, NV
Period14/3/2414/3/26

Fingerprint

Knowledge Bases
Weights and Measures
Research
Experiments
Datasets

All Science Journal Classification (ASJC) codes

  • Information Systems
  • Health Informatics

Cite this

Rudniy, A., Song, M., & Geller, J. (2014). Improved shortest path edit distance for synonyms identification. In Proceedings of the 6th International Conference on Bioinformatics and Computational Biology, BICOB 2014 (pp. 97-102). (Proceedings of the 6th International Conference on Bioinformatics and Computational Biology, BICOB 2014). International Society for Computers and Their Applications.
Rudniy, Alex ; Song, Min ; Geller, James. / Improved shortest path edit distance for synonyms identification. Proceedings of the 6th International Conference on Bioinformatics and Computational Biology, BICOB 2014. International Society for Computers and Their Applications, 2014. pp. 97-102 (Proceedings of the 6th International Conference on Bioinformatics and Computational Biology, BICOB 2014).
@inproceedings{b3380f19104445a7981233a1b9478a64,
title = "Improved shortest path edit distance for synonyms identification",
abstract = "Integration of proliferous sequencing and related data into the UniProt Knowledgebase is an important ongoing research project. This paper proposes Improved Shortest Path Edit Distance (ISPED) as an algorithm for enhancing existing integration techniques. ISPED is an improved version of the algorithm previously developed by the authors. Three major adjustments have been made: better node weight calculation, score normalization, and implementation of a re-scorer. We apply ISPED as an approximate string similarity metric to five datasets extracted from UNIPROT-GOA during synonym identification experiments. ISPED outperforms nine wellknown string similarity metrics and achieves the highest values of average precision and F1 on all selected datasets.",
author = "Alex Rudniy and Min Song and James Geller",
year = "2014",
month = "1",
day = "1",
language = "English",
isbn = "9781632665140",
series = "Proceedings of the 6th International Conference on Bioinformatics and Computational Biology, BICOB 2014",
publisher = "International Society for Computers and Their Applications",
pages = "97--102",
booktitle = "Proceedings of the 6th International Conference on Bioinformatics and Computational Biology, BICOB 2014",

}

Rudniy, A, Song, M & Geller, J 2014, Improved shortest path edit distance for synonyms identification. in Proceedings of the 6th International Conference on Bioinformatics and Computational Biology, BICOB 2014. Proceedings of the 6th International Conference on Bioinformatics and Computational Biology, BICOB 2014, International Society for Computers and Their Applications, pp. 97-102, 6th International Conference on Bioinformatics and Computational Biology, BICOB 2014, Las Vegas, NV, United States, 14/3/24.

Improved shortest path edit distance for synonyms identification. / Rudniy, Alex; Song, Min; Geller, James.

Proceedings of the 6th International Conference on Bioinformatics and Computational Biology, BICOB 2014. International Society for Computers and Their Applications, 2014. p. 97-102 (Proceedings of the 6th International Conference on Bioinformatics and Computational Biology, BICOB 2014).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

TY - GEN

T1 - Improved shortest path edit distance for synonyms identification

AU - Rudniy, Alex

AU - Song, Min

AU - Geller, James

PY - 2014/1/1

Y1 - 2014/1/1

N2 - Integration of proliferous sequencing and related data into the UniProt Knowledgebase is an important ongoing research project. This paper proposes Improved Shortest Path Edit Distance (ISPED) as an algorithm for enhancing existing integration techniques. ISPED is an improved version of the algorithm previously developed by the authors. Three major adjustments have been made: better node weight calculation, score normalization, and implementation of a re-scorer. We apply ISPED as an approximate string similarity metric to five datasets extracted from UNIPROT-GOA during synonym identification experiments. ISPED outperforms nine wellknown string similarity metrics and achieves the highest values of average precision and F1 on all selected datasets.

AB - Integration of proliferous sequencing and related data into the UniProt Knowledgebase is an important ongoing research project. This paper proposes Improved Shortest Path Edit Distance (ISPED) as an algorithm for enhancing existing integration techniques. ISPED is an improved version of the algorithm previously developed by the authors. Three major adjustments have been made: better node weight calculation, score normalization, and implementation of a re-scorer. We apply ISPED as an approximate string similarity metric to five datasets extracted from UNIPROT-GOA during synonym identification experiments. ISPED outperforms nine wellknown string similarity metrics and achieves the highest values of average precision and F1 on all selected datasets.

UR - http://www.scopus.com/inward/record.url?scp=84905826168&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84905826168&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:84905826168

SN - 9781632665140

T3 - Proceedings of the 6th International Conference on Bioinformatics and Computational Biology, BICOB 2014

SP - 97

EP - 102

BT - Proceedings of the 6th International Conference on Bioinformatics and Computational Biology, BICOB 2014

PB - International Society for Computers and Their Applications

ER -

Rudniy A, Song M, Geller J. Improved shortest path edit distance for synonyms identification. In Proceedings of the 6th International Conference on Bioinformatics and Computational Biology, BICOB 2014. International Society for Computers and Their Applications. 2014. p. 97-102. (Proceedings of the 6th International Conference on Bioinformatics and Computational Biology, BICOB 2014).