Shortest Path Edit Distance for Enhancing UMLS Integration and Audit

Alex Rudniy, James Geller, Min Song

Research output: Contribution to journalArticle

2 Citations (Scopus)

Abstract

Expansion of the UMLS is an important long-term research project. This paper proposes Shortest Path Edit Distance (SPED) as an algorithm for improving existing source-integration and auditing techniques. We use SPED as a string similarity measure for UMLS terms that are known to be synonyms because they are assigned to the same concept. We compare SPED with several other well known string matching algorithms using two UMLS samples as test bed. One of those samples is SNOMED-based. SPED transforms the task of calculating edit distance among two strings into a problem of finding a shortest path from a source to a destination in a node and link graph. In the algorithm, the two strings are used to construct the graph. The Pulling algorithm is applied to find a shortest path, which determines the string similarity value. SPED was superior for one of the data sets, with a precision of 0.6.

Original languageEnglish
Pages (from-to)697-701
Number of pages5
JournalAMIA ... Annual Symposium proceedings / AMIA Symposium. AMIA Symposium
Volume2010
Publication statusPublished - 2010 Jan 1

Fingerprint

Unified Medical Language System
Systematized Nomenclature of Medicine
Research

All Science Journal Classification (ASJC) codes

  • Medicine(all)

Cite this

@article{b4974ec79d5d497fa31466d2300c5381,
title = "Shortest Path Edit Distance for Enhancing UMLS Integration and Audit",
abstract = "Expansion of the UMLS is an important long-term research project. This paper proposes Shortest Path Edit Distance (SPED) as an algorithm for improving existing source-integration and auditing techniques. We use SPED as a string similarity measure for UMLS terms that are known to be synonyms because they are assigned to the same concept. We compare SPED with several other well known string matching algorithms using two UMLS samples as test bed. One of those samples is SNOMED-based. SPED transforms the task of calculating edit distance among two strings into a problem of finding a shortest path from a source to a destination in a node and link graph. In the algorithm, the two strings are used to construct the graph. The Pulling algorithm is applied to find a shortest path, which determines the string similarity value. SPED was superior for one of the data sets, with a precision of 0.6.",
author = "Alex Rudniy and James Geller and Min Song",
year = "2010",
month = "1",
day = "1",
language = "English",
volume = "2010",
pages = "697--701",
journal = "AMIA ... Annual Symposium proceedings / AMIA Symposium. AMIA Symposium",
issn = "1559-4076",
publisher = "American Medical Informatics Association",

}

Shortest Path Edit Distance for Enhancing UMLS Integration and Audit. / Rudniy, Alex; Geller, James; Song, Min.

In: AMIA ... Annual Symposium proceedings / AMIA Symposium. AMIA Symposium, Vol. 2010, 01.01.2010, p. 697-701.

Research output: Contribution to journalArticle

TY - JOUR

T1 - Shortest Path Edit Distance for Enhancing UMLS Integration and Audit

AU - Rudniy, Alex

AU - Geller, James

AU - Song, Min

PY - 2010/1/1

Y1 - 2010/1/1

N2 - Expansion of the UMLS is an important long-term research project. This paper proposes Shortest Path Edit Distance (SPED) as an algorithm for improving existing source-integration and auditing techniques. We use SPED as a string similarity measure for UMLS terms that are known to be synonyms because they are assigned to the same concept. We compare SPED with several other well known string matching algorithms using two UMLS samples as test bed. One of those samples is SNOMED-based. SPED transforms the task of calculating edit distance among two strings into a problem of finding a shortest path from a source to a destination in a node and link graph. In the algorithm, the two strings are used to construct the graph. The Pulling algorithm is applied to find a shortest path, which determines the string similarity value. SPED was superior for one of the data sets, with a precision of 0.6.

AB - Expansion of the UMLS is an important long-term research project. This paper proposes Shortest Path Edit Distance (SPED) as an algorithm for improving existing source-integration and auditing techniques. We use SPED as a string similarity measure for UMLS terms that are known to be synonyms because they are assigned to the same concept. We compare SPED with several other well known string matching algorithms using two UMLS samples as test bed. One of those samples is SNOMED-based. SPED transforms the task of calculating edit distance among two strings into a problem of finding a shortest path from a source to a destination in a node and link graph. In the algorithm, the two strings are used to construct the graph. The Pulling algorithm is applied to find a shortest path, which determines the string similarity value. SPED was superior for one of the data sets, with a precision of 0.6.

UR - http://www.scopus.com/inward/record.url?scp=84883636454&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84883636454&partnerID=8YFLogxK

M3 - Article

C2 - 21347068

AN - SCOPUS:84883636454

VL - 2010

SP - 697

EP - 701

JO - AMIA ... Annual Symposium proceedings / AMIA Symposium. AMIA Symposium

JF - AMIA ... Annual Symposium proceedings / AMIA Symposium. AMIA Symposium

SN - 1559-4076

ER -