A protein embedding model for drug molecular screening

Chao Che, Min Zhu, Yongjun Zhu, Qiang Zhang, Dongsheng Zhou, Bin Wang

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Application of machine learning method in new drug development can greatly shorten the process of experimental discovery and reduce the risk of clinical failure. However, the feature extraction of proteins sequence is very difficult due to the large dimension. To this end, we propose a Protein Embedding Model(PEM) for drug molecular screening to predict the interaction between proteins and small molecules. Specifically, PEM first classifies 20 kinds of amino acids into 6 categories to reduce the dimension and learns the representation of protein borrowing the idea of word embedding. Then the model uses multiple imputation to fill the physical and chemical properties of small molecule compounds. Finally, the model uses LightGBM model to predict the affinity value Ki between proteins and small molecules. Experiments show that the model can effectively extract the features of proteins and small molecules and outperforms other traditional methods on the data provided by a drug discovery and development company.

Original languageEnglish
Title of host publicationProceedings - 2020 IEEE International Conference on Big Data and Smart Computing, BigComp 2020
EditorsWookey Lee, Luonan Chen, Yang-Sae Moon, Julien Bourgeois, Mehdi Bennis, Yu-Feng Li, Young-Guk Ha, Hyuk-Yoon Kwon, Alfredo Cuzzocrea
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages251-254
Number of pages4
ISBN (Electronic)9781728160344
DOIs
Publication statusPublished - 2020 Feb
Event2020 IEEE International Conference on Big Data and Smart Computing, BigComp 2020 - Busan, Korea, Republic of
Duration: 2020 Feb 192020 Feb 22

Publication series

NameProceedings - 2020 IEEE International Conference on Big Data and Smart Computing, BigComp 2020

Conference

Conference2020 IEEE International Conference on Big Data and Smart Computing, BigComp 2020
Country/TerritoryKorea, Republic of
CityBusan
Period20/2/1920/2/22

Bibliographical note

Funding Information:
This research was supported in part by the National Science Fund for Distinguished Young Scholars (No.61425002), the National Natural Science Foundation of China (No. 61751203), the Program for ChangJiang Scholars and Innovative Research Team in University (No.IRT 15R07), the Program for Dalian High-level Talent Innovation Support (No.2017RD11), the Science and Technology Innovation Fund of Dalian (No.2018J12GX036), and the Guidance Program of Liaoning Natural Science Foundation(No. 2019-ZD-0569).

Publisher Copyright:
© 2020 IEEE.

All Science Journal Classification (ASJC) codes

  • Artificial Intelligence
  • Information Systems and Management
  • Control and Optimization

Fingerprint

Dive into the research topics of 'A protein embedding model for drug molecular screening'. Together they form a unique fingerprint.

Cite this