SSL

Inferring disease-related genes using sentence structure and literature data

Jeongwoo Kim, Won Gi Choi, Jungrim Kim, Sang Hyun Park

Research output: Chapter in Book/Report/Conference proceedingConference contribution

1 Citation (Scopus)

Abstract

Text mining is widely applied in biology to infer relationships between biological entities. In biology, disease-gene relationships are important to discover the cause of disease. Therefore, we propose a useful method called SSL, which infers disease-related genes, using sentence structure and literature data. Using sentence structure, the proposed method decreases the number of candidate disease-related genes and infers more meaningful disease-related genes than other comparable methods. Furthermore, our method extracts useful sentences that have information on the relationship between specific diseases and genes. By analyzing the structure of the sentences, we can obtain useful knowledge of disease-gene relationships. We applied our method to five diseases, including Alzheimer's disease, prostate cancer, gastric cancer, colorectal cancer, and lung cancer. For validation, we investigated the top 10 inferred genes for five diseases. Our method demonstrated up to 50% higher precision than existing methods, and showed 98% accuracy in inferring disease-related genes.

Original languageEnglish
Title of host publication2017 IEEE International Conference on Big Data and Smart Computing, BigComp 2017
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages100-107
Number of pages8
ISBN (Electronic)9781509030156
DOIs
Publication statusPublished - 2017 Mar 17
Event2017 IEEE International Conference on Big Data and Smart Computing, BigComp 2017 - Jeju Island, Korea, Republic of
Duration: 2017 Feb 132017 Feb 16

Publication series

Name2017 IEEE International Conference on Big Data and Smart Computing, BigComp 2017

Other

Other2017 IEEE International Conference on Big Data and Smart Computing, BigComp 2017
CountryKorea, Republic of
CityJeju Island
Period17/2/1317/2/16

Fingerprint

Genes

All Science Journal Classification (ASJC) codes

  • Information Systems
  • Artificial Intelligence
  • Computer Science Applications
  • Computer Vision and Pattern Recognition

Cite this

Kim, J., Choi, W. G., Kim, J., & Park, S. H. (2017). SSL: Inferring disease-related genes using sentence structure and literature data. In 2017 IEEE International Conference on Big Data and Smart Computing, BigComp 2017 (pp. 100-107). [7881723] (2017 IEEE International Conference on Big Data and Smart Computing, BigComp 2017). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/BIGCOMP.2017.7881723
Kim, Jeongwoo ; Choi, Won Gi ; Kim, Jungrim ; Park, Sang Hyun. / SSL : Inferring disease-related genes using sentence structure and literature data. 2017 IEEE International Conference on Big Data and Smart Computing, BigComp 2017. Institute of Electrical and Electronics Engineers Inc., 2017. pp. 100-107 (2017 IEEE International Conference on Big Data and Smart Computing, BigComp 2017).
@inproceedings{3266b9f2785f48ecad9333f6500a611a,
title = "SSL: Inferring disease-related genes using sentence structure and literature data",
abstract = "Text mining is widely applied in biology to infer relationships between biological entities. In biology, disease-gene relationships are important to discover the cause of disease. Therefore, we propose a useful method called SSL, which infers disease-related genes, using sentence structure and literature data. Using sentence structure, the proposed method decreases the number of candidate disease-related genes and infers more meaningful disease-related genes than other comparable methods. Furthermore, our method extracts useful sentences that have information on the relationship between specific diseases and genes. By analyzing the structure of the sentences, we can obtain useful knowledge of disease-gene relationships. We applied our method to five diseases, including Alzheimer's disease, prostate cancer, gastric cancer, colorectal cancer, and lung cancer. For validation, we investigated the top 10 inferred genes for five diseases. Our method demonstrated up to 50{\%} higher precision than existing methods, and showed 98{\%} accuracy in inferring disease-related genes.",
author = "Jeongwoo Kim and Choi, {Won Gi} and Jungrim Kim and Park, {Sang Hyun}",
year = "2017",
month = "3",
day = "17",
doi = "10.1109/BIGCOMP.2017.7881723",
language = "English",
series = "2017 IEEE International Conference on Big Data and Smart Computing, BigComp 2017",
publisher = "Institute of Electrical and Electronics Engineers Inc.",
pages = "100--107",
booktitle = "2017 IEEE International Conference on Big Data and Smart Computing, BigComp 2017",
address = "United States",

}

Kim, J, Choi, WG, Kim, J & Park, SH 2017, SSL: Inferring disease-related genes using sentence structure and literature data. in 2017 IEEE International Conference on Big Data and Smart Computing, BigComp 2017., 7881723, 2017 IEEE International Conference on Big Data and Smart Computing, BigComp 2017, Institute of Electrical and Electronics Engineers Inc., pp. 100-107, 2017 IEEE International Conference on Big Data and Smart Computing, BigComp 2017, Jeju Island, Korea, Republic of, 17/2/13. https://doi.org/10.1109/BIGCOMP.2017.7881723

SSL : Inferring disease-related genes using sentence structure and literature data. / Kim, Jeongwoo; Choi, Won Gi; Kim, Jungrim; Park, Sang Hyun.

2017 IEEE International Conference on Big Data and Smart Computing, BigComp 2017. Institute of Electrical and Electronics Engineers Inc., 2017. p. 100-107 7881723 (2017 IEEE International Conference on Big Data and Smart Computing, BigComp 2017).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

TY - GEN

T1 - SSL

T2 - Inferring disease-related genes using sentence structure and literature data

AU - Kim, Jeongwoo

AU - Choi, Won Gi

AU - Kim, Jungrim

AU - Park, Sang Hyun

PY - 2017/3/17

Y1 - 2017/3/17

N2 - Text mining is widely applied in biology to infer relationships between biological entities. In biology, disease-gene relationships are important to discover the cause of disease. Therefore, we propose a useful method called SSL, which infers disease-related genes, using sentence structure and literature data. Using sentence structure, the proposed method decreases the number of candidate disease-related genes and infers more meaningful disease-related genes than other comparable methods. Furthermore, our method extracts useful sentences that have information on the relationship between specific diseases and genes. By analyzing the structure of the sentences, we can obtain useful knowledge of disease-gene relationships. We applied our method to five diseases, including Alzheimer's disease, prostate cancer, gastric cancer, colorectal cancer, and lung cancer. For validation, we investigated the top 10 inferred genes for five diseases. Our method demonstrated up to 50% higher precision than existing methods, and showed 98% accuracy in inferring disease-related genes.

AB - Text mining is widely applied in biology to infer relationships between biological entities. In biology, disease-gene relationships are important to discover the cause of disease. Therefore, we propose a useful method called SSL, which infers disease-related genes, using sentence structure and literature data. Using sentence structure, the proposed method decreases the number of candidate disease-related genes and infers more meaningful disease-related genes than other comparable methods. Furthermore, our method extracts useful sentences that have information on the relationship between specific diseases and genes. By analyzing the structure of the sentences, we can obtain useful knowledge of disease-gene relationships. We applied our method to five diseases, including Alzheimer's disease, prostate cancer, gastric cancer, colorectal cancer, and lung cancer. For validation, we investigated the top 10 inferred genes for five diseases. Our method demonstrated up to 50% higher precision than existing methods, and showed 98% accuracy in inferring disease-related genes.

UR - http://www.scopus.com/inward/record.url?scp=85017586725&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85017586725&partnerID=8YFLogxK

U2 - 10.1109/BIGCOMP.2017.7881723

DO - 10.1109/BIGCOMP.2017.7881723

M3 - Conference contribution

T3 - 2017 IEEE International Conference on Big Data and Smart Computing, BigComp 2017

SP - 100

EP - 107

BT - 2017 IEEE International Conference on Big Data and Smart Computing, BigComp 2017

PB - Institute of Electrical and Electronics Engineers Inc.

ER -

Kim J, Choi WG, Kim J, Park SH. SSL: Inferring disease-related genes using sentence structure and literature data. In 2017 IEEE International Conference on Big Data and Smart Computing, BigComp 2017. Institute of Electrical and Electronics Engineers Inc. 2017. p. 100-107. 7881723. (2017 IEEE International Conference on Big Data and Smart Computing, BigComp 2017). https://doi.org/10.1109/BIGCOMP.2017.7881723