Detecting duplicate biological entities using markov random field-based edit distance

Min Song, Alex Rudniy

Research output: Chapter in Book/Report/Conference proceedingConference contribution

7 Citations (Scopus)

Abstract

Duplicate entities detection in biological data became a demanded research task [3,5,7,8,9]. In this paper, we propose a novel context-sensitive Markov Random Field-based Edit Distance (MRFED). We apply the Markov Random Field (MRF) theory to Needleman-Wunsch (NW) distance and combine MRFED with TFIDF, a token-based distance algorithm (SoftMRFED). We evaluate SoftMRFED and other distance algorithms (Levenstein, SoftTFIDF, and MongeElkan) at biological entity matching and synonym matching. The experiment results show SoftMRFED significantly outperforms other distance algorithms and its performance is superior to token-based distance algorithms in two matching tasks.

Original languageEnglish
Title of host publicationProceedings - IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2008
Pages457-460
Number of pages4
DOIs
Publication statusPublished - 2008 Dec 1
Event2008 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2008 - Philadelphia, PA, United States
Duration: 2008 Nov 32008 Nov 5

Other

Other2008 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2008
CountryUnited States
CityPhiladelphia, PA
Period08/11/308/11/5

Fingerprint

Research
Experiments

All Science Journal Classification (ASJC) codes

  • Molecular Biology
  • Information Systems
  • Biomedical Engineering

Cite this

Song, M., & Rudniy, A. (2008). Detecting duplicate biological entities using markov random field-based edit distance. In Proceedings - IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2008 (pp. 457-460). [4684939] https://doi.org/10.1109/BIBM.2008.34
Song, Min ; Rudniy, Alex. / Detecting duplicate biological entities using markov random field-based edit distance. Proceedings - IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2008. 2008. pp. 457-460
@inproceedings{839d3bdf014945aba647420fefbee63f,
title = "Detecting duplicate biological entities using markov random field-based edit distance",
abstract = "Duplicate entities detection in biological data became a demanded research task [3,5,7,8,9]. In this paper, we propose a novel context-sensitive Markov Random Field-based Edit Distance (MRFED). We apply the Markov Random Field (MRF) theory to Needleman-Wunsch (NW) distance and combine MRFED with TFIDF, a token-based distance algorithm (SoftMRFED). We evaluate SoftMRFED and other distance algorithms (Levenstein, SoftTFIDF, and MongeElkan) at biological entity matching and synonym matching. The experiment results show SoftMRFED significantly outperforms other distance algorithms and its performance is superior to token-based distance algorithms in two matching tasks.",
author = "Min Song and Alex Rudniy",
year = "2008",
month = "12",
day = "1",
doi = "10.1109/BIBM.2008.34",
language = "English",
isbn = "9780769534527",
pages = "457--460",
booktitle = "Proceedings - IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2008",

}

Song, M & Rudniy, A 2008, Detecting duplicate biological entities using markov random field-based edit distance. in Proceedings - IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2008., 4684939, pp. 457-460, 2008 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2008, Philadelphia, PA, United States, 08/11/3. https://doi.org/10.1109/BIBM.2008.34

Detecting duplicate biological entities using markov random field-based edit distance. / Song, Min; Rudniy, Alex.

Proceedings - IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2008. 2008. p. 457-460 4684939.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

TY - GEN

T1 - Detecting duplicate biological entities using markov random field-based edit distance

AU - Song, Min

AU - Rudniy, Alex

PY - 2008/12/1

Y1 - 2008/12/1

N2 - Duplicate entities detection in biological data became a demanded research task [3,5,7,8,9]. In this paper, we propose a novel context-sensitive Markov Random Field-based Edit Distance (MRFED). We apply the Markov Random Field (MRF) theory to Needleman-Wunsch (NW) distance and combine MRFED with TFIDF, a token-based distance algorithm (SoftMRFED). We evaluate SoftMRFED and other distance algorithms (Levenstein, SoftTFIDF, and MongeElkan) at biological entity matching and synonym matching. The experiment results show SoftMRFED significantly outperforms other distance algorithms and its performance is superior to token-based distance algorithms in two matching tasks.

AB - Duplicate entities detection in biological data became a demanded research task [3,5,7,8,9]. In this paper, we propose a novel context-sensitive Markov Random Field-based Edit Distance (MRFED). We apply the Markov Random Field (MRF) theory to Needleman-Wunsch (NW) distance and combine MRFED with TFIDF, a token-based distance algorithm (SoftMRFED). We evaluate SoftMRFED and other distance algorithms (Levenstein, SoftTFIDF, and MongeElkan) at biological entity matching and synonym matching. The experiment results show SoftMRFED significantly outperforms other distance algorithms and its performance is superior to token-based distance algorithms in two matching tasks.

UR - http://www.scopus.com/inward/record.url?scp=58049138671&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=58049138671&partnerID=8YFLogxK

U2 - 10.1109/BIBM.2008.34

DO - 10.1109/BIBM.2008.34

M3 - Conference contribution

SN - 9780769534527

SP - 457

EP - 460

BT - Proceedings - IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2008

ER -

Song M, Rudniy A. Detecting duplicate biological entities using markov random field-based edit distance. In Proceedings - IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2008. 2008. p. 457-460. 4684939 https://doi.org/10.1109/BIBM.2008.34