[口头报告]Genome-scale prediction of bacterial secreted effectors using pre-trained protein language model

Genome-scale prediction of bacterial secreted effectors using pre-trained protein language model

编号：41 稿件编号：9 访问权限：仅限参会人更新：2022-07-20 21:42:35 浏览：1331次口头报告

报告开始：2022年07月24日 17:15 （Asia/Shanghai）

报告时间：15min

所在会议：[S2] 分会场2 » [S2-2] 基因表达调控与大分子修饰

演示文件

提示：该报告下的文件权限为仅限参会人，您尚未登录，暂时无法查看。

摘要

Bacterial secretion systems are versatile membrane-spanning apparatuses, which can be divided into specific types according to their structures and functions, and mediate the delivery of effector proteins to host cells or environment. The secreted effectors can affect gene expression, and disrupt signal-transduction pathways of the host cells. As such, they often function as virulence factors and play a crucial role in bacterial pathogenesis. Their clinical significance with the labor-consuming experimental techniques has encouraged the participation of bioinformatics approaches to identify the effectors.

Type IV secreted effectors (T4SE) are reported to promote the virulence of pathogen after translocated into eukaryotic cells. A considerable number of experimentally-verified T4SEs has been collected in the updated version of type IV secretion system database SecReT4. Nowadays, T4SE prediction tools have utilized various machine learning algorithms, but the accuracy and speed of these tools remain to be improved. Position-specific scoring matrix (PSSM) is generally acknowledged to capture the conservation sequence patterns among previous studies. However, the generation of PSSM requires extensive search of similar sequences in large protein sequence databases. In our recent study, we employed an advanced sequence embedding strategy from a pre-trained language model of protein sequences (TAPEBert) to the classification task of T4SEs and developed an online web server termed T4SEfinder (https://tool2-mml.sjtu.edu.cn/T4SEfinder_TAPE/). The analogy between natural language and protein sequence contributes to the application of Transformer-based model in biological sequence understanding. T4SEfinder not only exhibits highly competitive performance compared with PSSM-based methods through comprehensive comparison but also accelerates the prediction progress which is qualified for whole genome-scale T4SE identification in pathogenic bacteria.

Leveraging millions of diverse sequences in UniRef database as the pre-trained corpus, protein language model has become more robust in deciphering evolutionary features of protein sequences. Meanwhile, integrated platforms like BastionHub have provided curated collections for various types of secreted effectors. Therefore, we are dedicated to developing an updated version for identifying typical secreted effectors rather than only T4SEs. The localization and analysis of secretion systems with corresponding effectors among representative bacterial species and important clinical strains might uncover novel insights into the secretion systems in Gram-negative bacteria. It might contribute to meet the increasing demands of re-annotating secretion systems and effector proteins in sequenced bacterial genomes.

关键字

bacterial secreted effectors;pre-trained language model;sequence analysis;deep learning

报告人

张昱朦

硕士研究生 上海交通大学

稿件作者

张昱朦上海交通大学

欧竑宇上海交通大学

NCCBB 2022

[口头报告]Genome-scale prediction of bacterial secreted effectors using pre-trained protein language model

Genome-scale prediction of bacterial secreted effectors using pre-trained protein language model

摘要

关键字

报告人

张昱朦

稿件作者

发表评论

全部评论