Application of machine learning method in new drug development can greatly shorten the process of experimental discovery and reduce the risk of clinical failure. However, the feature extraction of proteins sequence is very difficult due to the large dimension. To this end, we propose a Protein Embedding Model(PEM) for drug molecular screening to predict the interaction between proteins and small molecules. Specifically, PEM first classifies 20 kinds of amino acids into 6 categories to reduce the dimension and learns the representation of protein borrowing the idea of word embedding. Then the model uses multiple imputation to fill the physical and chemical properties of small molecule compounds. Finally, the model uses LightGBM model to predict the affinity value Ki between proteins and small molecules. Experiments show that the model can effectively extract the features of proteins and small molecules and outperforms other traditional methods on the data provided by a drug discovery and development company.
|Title of host publication||Proceedings - 2020 IEEE International Conference on Big Data and Smart Computing, BigComp 2020|
|Editors||Wookey Lee, Luonan Chen, Yang-Sae Moon, Julien Bourgeois, Mehdi Bennis, Yu-Feng Li, Young-Guk Ha, Hyuk-Yoon Kwon, Alfredo Cuzzocrea|
|Publisher||Institute of Electrical and Electronics Engineers Inc.|
|Number of pages||4|
|Publication status||Published - 2020 Feb|
|Event||2020 IEEE International Conference on Big Data and Smart Computing, BigComp 2020 - Busan, Korea, Republic of|
Duration: 2020 Feb 19 → 2020 Feb 22
|Name||Proceedings - 2020 IEEE International Conference on Big Data and Smart Computing, BigComp 2020|
|Conference||2020 IEEE International Conference on Big Data and Smart Computing, BigComp 2020|
|Country/Territory||Korea, Republic of|
|Period||20/2/19 → 20/2/22|
Bibliographical noteFunding Information:
This research was supported in part by the National Science Fund for Distinguished Young Scholars (No.61425002), the National Natural Science Foundation of China (No. 61751203), the Program for ChangJiang Scholars and Innovative Research Team in University (No.IRT 15R07), the Program for Dalian High-level Talent Innovation Support (No.2017RD11), the Science and Technology Innovation Fund of Dalian (No.2018J12GX036), and the Guidance Program of Liaoning Natural Science Foundation(No. 2019-ZD-0569).
© 2020 IEEE.
All Science Journal Classification (ASJC) codes
- Artificial Intelligence
- Information Systems and Management
- Control and Optimization