Detecting duplicate biological entities using markov random field-based edit distance

Min Song, Alex Rudniy

Research output: Chapter in Book/Report/Conference proceedingConference contribution

7 Citations (Scopus)

Abstract

Duplicate entities detection in biological data became a demanded research task [3,5,7,8,9]. In this paper, we propose a novel context-sensitive Markov Random Field-based Edit Distance (MRFED). We apply the Markov Random Field (MRF) theory to Needleman-Wunsch (NW) distance and combine MRFED with TFIDF, a token-based distance algorithm (SoftMRFED). We evaluate SoftMRFED and other distance algorithms (Levenstein, SoftTFIDF, and MongeElkan) at biological entity matching and synonym matching. The experiment results show SoftMRFED significantly outperforms other distance algorithms and its performance is superior to token-based distance algorithms in two matching tasks.

Original languageEnglish
Title of host publicationProceedings - IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2008
Pages457-460
Number of pages4
DOIs
Publication statusPublished - 2008 Dec 1
Event2008 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2008 - Philadelphia, PA, United States
Duration: 2008 Nov 32008 Nov 5

Publication series

NameProceedings - IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2008

Other

Other2008 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2008
CountryUnited States
CityPhiladelphia, PA
Period08/11/308/11/5

All Science Journal Classification (ASJC) codes

  • Molecular Biology
  • Information Systems
  • Biomedical Engineering

Fingerprint Dive into the research topics of 'Detecting duplicate biological entities using markov random field-based edit distance'. Together they form a unique fingerprint.

  • Cite this

    Song, M., & Rudniy, A. (2008). Detecting duplicate biological entities using markov random field-based edit distance. In Proceedings - IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2008 (pp. 457-460). [4684939] (Proceedings - IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2008). https://doi.org/10.1109/BIBM.2008.34