Evaluation of protein descriptors in computer-aided rational protein engineering tasks and its application in property prediction in SARS-CoV-2 spike glycoprotein

Hocheol Lim, Hyeon Nae Jeon, Seungcheol Lim, Yuil Jang, Taehee Kim, Hyein Cho, Jae Gu Pan, Kyoung Tai No

Research output: Contribution to journalArticlepeer-review

Abstract

The importance of protein engineering in the research and development of biopharmaceuticals and biomaterials has increased. Machine learning in computer-aided protein engineering can markedly reduce the experimental effort in identifying optimal sequences that satisfy the desired properties from a large number of possible protein sequences. To develop general protein descriptors for computer-aided protein engineering tasks, we devised new protein descriptors, one sequence-based descriptor (PCgrades), and three structure-based descriptors (PCspairs, 3D-SPIEs_5.4 Å, and 3D-SPIEs_8Å). While the PCgrades and PCspairs include general and statistical information in physicochemical properties in single and pairwise amino acids respectively, the 3D-SPIEs include specific and quantum–mechanical information with parameterized quantum mechanical calculations (FMO2-DFTB3/D/PCM). To evaluate the protein descriptors, we made prediction models with the new descriptors and previously developed descriptors for diverse protein datasets including protein expression and binding affinity change in SARS-CoV-2 spike glycoprotein. As a result, the newly devised descriptors showed a good performance in diverse datasets, in which the PCspairs showed the best performance (R2=0.783 for protein expression and R2=0.711 for binding affinity). As a result, the newly devised descriptors showed a good performance in diverse datasets, in which the PCspairs showed the best performance. Similar approaches with those descriptors would be promising and useful if the prediction models are trained with sufficient quantitative experimental data from high-throughput assays for industrial enzymes or protein drugs.

Original languageEnglish
Pages (from-to)788-798
Number of pages11
JournalComputational and Structural Biotechnology Journal
Volume20
DOIs
Publication statusPublished - 2022 Jan

Bibliographical note

Funding Information:
This research was financially supported by the Ministry of Trade, Industry, and Energy (MOTIE), Korea, under the “Infrastructure Support Program for Industry Innovation” (reference number P0014714) supervised by the Korea Institute for Advancement of Technology (KIAT).

Publisher Copyright:
© 2022 The Author(s)

All Science Journal Classification (ASJC) codes

  • Biotechnology
  • Biophysics
  • Structural Biology
  • Biochemistry
  • Genetics
  • Computer Science Applications

Fingerprint

Dive into the research topics of 'Evaluation of protein descriptors in computer-aided rational protein engineering tasks and its application in property prediction in SARS-CoV-2 spike glycoprotein'. Together they form a unique fingerprint.

Cite this