[张贴报告]Classification of ion channel toxin peptides using multi-channel protein feature representation learning and its application

Classification of ion channel toxin peptides using multi-channel protein feature representation learning and its application
编号:53 稿件编号:23 访问权限:仅限参会人 更新:2022-06-28 16:55:18 浏览:397次 张贴报告

报告开始:2022年07月23日 12:40 (Asia/Shanghai)

报告时间:20min

所在会议:[E] 张贴报告 » [E] 张贴报告

暂无文件

摘要
Background:Ion channel toxins are a class of polypeptide toxin molecules with a molecular weight between 1 and 10 kDa, containing multiple pairs of disulfide bonds and a stable tertiary structure. They usually have high effect on ion channel proteins and are highly specific for their sites. It can be used as a direct therapeutic factor in the research and development of drugs for neurological diseases, and can also provide sequence and spatial structure models for the development of new drugs. Therefore, identification of new channel toxins with high binding affinity to specific ion channel proteins is an important step in early drug discovery. However, due to the lack of enough ion channel toxins with known structures and functions for modeling, how to accurately predict the function of toxin polypeptides is currently a great challenge.
Methods:To provide a solution, we propose a representation learning method based on multi-channel protein features and apply it to the classification of ion channel toxin polypeptides. Using multiple sequence homology-based predictors and the UniProt database, we obtained the secondary structure, relative solvent accessibility, and disulfide bond characteristics of each residue for each unlabeled sequence as tags describing its structural representation . At the same time, we use the pre-trained word vector prediction model to create embeddings for amino acid sequences, and get the embedding vectors that best fit their properties. Aiming at the various protein feature representations obtained by the above methods, this paper proposes a bidirectional long short-term memory network (BiLSTM) model based on multi-channel features, which integrates different feature inputs of protein sequences to solve the problem of protein classification. Based on BiLSTM, we build multiple parallel neural network units for training with different feature channels, so that the model can input protein features with different dimensions. Finally, the training results of each channel are summarized in order to fully mine the hidden information in each feature sequence.
Results:We run this model on an ion channel toxin polypeptide dataset that includes voltage-gated sodium (Nav) channel toxins, potassium channel inhibitors, calcium channel inhibitors, chloride channel inhibitors, nicotinic acetylcholine receptor toxins, and other ion channel toxins taking the test. Experiments show that BiLSTM with multi-channel features achieves improved accuracy compared to deep learning models that only use amino acid sequences for input. In addition, we found that proper dropout or adding attention mechanism in the model can continue to improve the accuracy of the model.
Conclusions:In general, for an ion channel toxin polypeptide, the representation of its structural and property characteristics is closely related to the outcome of its functional prediction. From this, we propose new methods that can synthesize these protein signatures and predict their targets of action. Through our method, a large number of ion channel toxin polypeptides with unknown targets obtained in the venom glands of venomous animals will be able to be screened with higher precision. With the completion of more and more animal venom gland transcriptome projects, the above method will be able to well meet the needs for the classification of venom peptides and proteins expressed in them.

Acknowledgement: The authors thank the National Natural Science Foundation of China (32001313), Fundamental Research Joint Special Youth Project of Local Undergraduate Universities in Yunnan Province (2018FH001-106), Yunnan Province Postdoctoral Research Fund Project (ynbh20057), Major Science and Technology Project of Yunnan Province (202002AA100007) and Fundamental Research Special Project of Yunnan Province (Research on Data Enhancement and Identification of Yunnan Spider Species Based on Generative Adversarial Networks).

References
Karimi, M., Wu, D., Wang, Z., Shen, Y., Valencia, A., 2019. Deepaffinity: interpretable deep learning of compound-protein affinity through unified recurrent and convolutional neural networks. Bioinformatics.
Rifaioglu, A. S., Atalay, R. C., Kahraman, D. C., Doan, T., Atalay, V., 2020. Mdeepred: novel multi-channel protein featurization for deep learning based binding affinity prediction in drug discovery. Bioinformatics, 37(5).
Li, W., Qi, F., Tang, M., Yu, Z., 2020. Bidirectional lstm with self-attention mechanism and multi-channel features for sentiment classification. Neurocomputing, 387, 63-77.
Heinzinger, M.,  Elnaggar, A.,  Wang, Y.,  Dallago, C.,  Nechaev, D., Matthes, F., et al., 2019. Modeling aspects of the language of life through transfer-learning protein sequences. BMC Bioinformatics, 20.
Capecchi, A.,  Cai, X.,  Personne, H.,  Khler, T.,  Delden, C. V., Reymond, J. L., 2021. Machine learning designs non-hemolytic antimicrobial peptides. Chemical Science, 12.
Yan, J., Zhang, B., Zhou, M., Kwok, H. F., Siu, S. W. I. Multi-Branch-CNN: classification of ion channel interacting peptides using parallel convolutional neural networks. bioRxiv: doi: https://doi.org/10.1101/2021.11.13.468342
Malik, A., Subramaniyam, S., Kim, C., Manavalan, B., 2022. Sortpred: the first machine learning based predictor to identify bacterial sortases and their classes using sequence-derived information - sciencedirect.
Yu, T. H., Su, B. H., Battalora, L. C., Liu, S., Tseng, Y. J., 2022. Ensemble modeling with machine learning and deep learning to provide interpretable generalized rules for classifying CNS drugs with high prediction power. Briefings in Bioinformatics, 23(1), bbab377.
Park, K., Ko, Y. J., Durai, P., Pan, C. H., 2019. Machine learning-based chemical binding similarity using evolutionary relationships of target genes. Nucleic acids research, 47(20), e128-e128.
 
关键字
Ion Channel,Deep Learning,Classification,Peptides,Drug Discovery
报告人
钱正坤
研究生 大理大学

稿件作者
王建明 大理大学
钱正坤 大理大学
崔荣凯 大理大学
杨自忠 大理大学云南省昆虫生物医药研发重点实验室
李毅 大理大学
发表评论
验证码 看不清楚,更换一张
全部评论