Inferring drug-protein-side effect relationships from biomedical text

Min Song, Seung Han Baek, Go Eun Heo, Jeong Hoon Lee

Research output: Contribution to journalArticle

Abstract

Background: Although there are many studies of drugs and their side effects, the underlying mechanisms of these side effects are not well understood. It is also difficult to understand the specific pathways between drugs and side effects. Objective: The present study seeks to construct putative paths between drugs and their side effects by applying text-mining techniques to free text of biomedical studies, and to develop ranking metrics that could identify the most-likely paths. Materials and Methods: We extracted three types of relationships-drug-protein, proteinprotein, and protein-side effect-from biomedical texts by using text mining and predefined relation-extraction rules. Based on the extracted relationships, we constructed whole drug-protein- side effect paths. For each path, we calculated its ranking score by a new ranking function that combines corpus- and ontology-based semantic similarity as well as co-occurrence frequency. Results: We extracted 13 plausible biomedical paths connecting drugs and their side effects from cancer-related abstracts in the PubMed database. The top 20 paths were examined, and the proposed ranking function outperformed the other methods tested, including co-occurrence, COALS, and UMLS by P@5-P@20. In addition, we confirmed that the paths are novel hypotheses that are worth investigating further. Discussion: The risk of side effects has been an important issue for the US Food and Drug Administration (FDA). However, the causes and mechanisms of such side effects have not been fully elucidated. This study extends previous research on understanding drug side effects by using various techniques such as Named Entity Recognition (NER), Relation Extraction (RE), and semantic similarity. Conclusion: It is not easy to reveal the biomedical mechanisms of side effects due to a huge number of possible paths. However, we automatically generated predictable paths using the proposed approach, which could provide meaningful information to biomedical researchers to generate plausible hypotheses for the understanding of such mechanisms.

Original languageEnglish
Article number159
JournalGenes
Volume10
Issue number2
DOIs
Publication statusPublished - 2019 Feb

Fingerprint

Drug-Related Side Effects and Adverse Reactions
Proteins
Data Mining
Semantics
Unified Medical Language System
United States Food and Drug Administration
PubMed
Research Personnel
Databases
Research
Pharmaceutical Preparations
Neoplasms

All Science Journal Classification (ASJC) codes

  • Genetics
  • Genetics(clinical)

Cite this

Song, Min ; Baek, Seung Han ; Heo, Go Eun ; Lee, Jeong Hoon. / Inferring drug-protein-side effect relationships from biomedical text. In: Genes. 2019 ; Vol. 10, No. 2.
@article{a2b54870d28c41f781c17e85f78b3ae9,
title = "Inferring drug-protein-side effect relationships from biomedical text",
abstract = "Background: Although there are many studies of drugs and their side effects, the underlying mechanisms of these side effects are not well understood. It is also difficult to understand the specific pathways between drugs and side effects. Objective: The present study seeks to construct putative paths between drugs and their side effects by applying text-mining techniques to free text of biomedical studies, and to develop ranking metrics that could identify the most-likely paths. Materials and Methods: We extracted three types of relationships-drug-protein, proteinprotein, and protein-side effect-from biomedical texts by using text mining and predefined relation-extraction rules. Based on the extracted relationships, we constructed whole drug-protein- side effect paths. For each path, we calculated its ranking score by a new ranking function that combines corpus- and ontology-based semantic similarity as well as co-occurrence frequency. Results: We extracted 13 plausible biomedical paths connecting drugs and their side effects from cancer-related abstracts in the PubMed database. The top 20 paths were examined, and the proposed ranking function outperformed the other methods tested, including co-occurrence, COALS, and UMLS by P@5-P@20. In addition, we confirmed that the paths are novel hypotheses that are worth investigating further. Discussion: The risk of side effects has been an important issue for the US Food and Drug Administration (FDA). However, the causes and mechanisms of such side effects have not been fully elucidated. This study extends previous research on understanding drug side effects by using various techniques such as Named Entity Recognition (NER), Relation Extraction (RE), and semantic similarity. Conclusion: It is not easy to reveal the biomedical mechanisms of side effects due to a huge number of possible paths. However, we automatically generated predictable paths using the proposed approach, which could provide meaningful information to biomedical researchers to generate plausible hypotheses for the understanding of such mechanisms.",
author = "Min Song and Baek, {Seung Han} and Heo, {Go Eun} and Lee, {Jeong Hoon}",
year = "2019",
month = "2",
doi = "10.3390/genes10020159",
language = "English",
volume = "10",
journal = "Genes",
issn = "2073-4425",
publisher = "Multidisciplinary Digital Publishing Institute (MDPI)",
number = "2",

}

Inferring drug-protein-side effect relationships from biomedical text. / Song, Min; Baek, Seung Han; Heo, Go Eun; Lee, Jeong Hoon.

In: Genes, Vol. 10, No. 2, 159, 02.2019.

Research output: Contribution to journalArticle

TY - JOUR

T1 - Inferring drug-protein-side effect relationships from biomedical text

AU - Song, Min

AU - Baek, Seung Han

AU - Heo, Go Eun

AU - Lee, Jeong Hoon

PY - 2019/2

Y1 - 2019/2

N2 - Background: Although there are many studies of drugs and their side effects, the underlying mechanisms of these side effects are not well understood. It is also difficult to understand the specific pathways between drugs and side effects. Objective: The present study seeks to construct putative paths between drugs and their side effects by applying text-mining techniques to free text of biomedical studies, and to develop ranking metrics that could identify the most-likely paths. Materials and Methods: We extracted three types of relationships-drug-protein, proteinprotein, and protein-side effect-from biomedical texts by using text mining and predefined relation-extraction rules. Based on the extracted relationships, we constructed whole drug-protein- side effect paths. For each path, we calculated its ranking score by a new ranking function that combines corpus- and ontology-based semantic similarity as well as co-occurrence frequency. Results: We extracted 13 plausible biomedical paths connecting drugs and their side effects from cancer-related abstracts in the PubMed database. The top 20 paths were examined, and the proposed ranking function outperformed the other methods tested, including co-occurrence, COALS, and UMLS by P@5-P@20. In addition, we confirmed that the paths are novel hypotheses that are worth investigating further. Discussion: The risk of side effects has been an important issue for the US Food and Drug Administration (FDA). However, the causes and mechanisms of such side effects have not been fully elucidated. This study extends previous research on understanding drug side effects by using various techniques such as Named Entity Recognition (NER), Relation Extraction (RE), and semantic similarity. Conclusion: It is not easy to reveal the biomedical mechanisms of side effects due to a huge number of possible paths. However, we automatically generated predictable paths using the proposed approach, which could provide meaningful information to biomedical researchers to generate plausible hypotheses for the understanding of such mechanisms.

AB - Background: Although there are many studies of drugs and their side effects, the underlying mechanisms of these side effects are not well understood. It is also difficult to understand the specific pathways between drugs and side effects. Objective: The present study seeks to construct putative paths between drugs and their side effects by applying text-mining techniques to free text of biomedical studies, and to develop ranking metrics that could identify the most-likely paths. Materials and Methods: We extracted three types of relationships-drug-protein, proteinprotein, and protein-side effect-from biomedical texts by using text mining and predefined relation-extraction rules. Based on the extracted relationships, we constructed whole drug-protein- side effect paths. For each path, we calculated its ranking score by a new ranking function that combines corpus- and ontology-based semantic similarity as well as co-occurrence frequency. Results: We extracted 13 plausible biomedical paths connecting drugs and their side effects from cancer-related abstracts in the PubMed database. The top 20 paths were examined, and the proposed ranking function outperformed the other methods tested, including co-occurrence, COALS, and UMLS by P@5-P@20. In addition, we confirmed that the paths are novel hypotheses that are worth investigating further. Discussion: The risk of side effects has been an important issue for the US Food and Drug Administration (FDA). However, the causes and mechanisms of such side effects have not been fully elucidated. This study extends previous research on understanding drug side effects by using various techniques such as Named Entity Recognition (NER), Relation Extraction (RE), and semantic similarity. Conclusion: It is not easy to reveal the biomedical mechanisms of side effects due to a huge number of possible paths. However, we automatically generated predictable paths using the proposed approach, which could provide meaningful information to biomedical researchers to generate plausible hypotheses for the understanding of such mechanisms.

UR - http://www.scopus.com/inward/record.url?scp=85062830200&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85062830200&partnerID=8YFLogxK

U2 - 10.3390/genes10020159

DO - 10.3390/genes10020159

M3 - Article

AN - SCOPUS:85062830200

VL - 10

JO - Genes

JF - Genes

SN - 2073-4425

IS - 2

M1 - 159

ER -