[主旨报告]Biological Multiple Sequence Alignment: Scoring Functions, Algorithms, and Evaluations

Biological Multiple Sequence Alignment: Scoring Functions, Algorithms, and Evaluations

编号：130 访问权限：仅限参会人更新：2022-07-07 14:52:04 浏览：1342次主旨报告

报告开始：2022年07月25日 09:35 （Asia/Shanghai）

报告时间：25min

所在会议：[P] 全体会议 » [P-3] 闭幕式及主旨报告3

暂无文件

摘要

Aligning multiple biological sequences is a fundamental task in bioinformatics and sequence analysis. These alignments may contain invaluable information that scientists need to predict the sequences' structures, determine the evolutionary relationships between them, or discover drug-like compounds that can bind to the sequences. MSA also has many applications in Next-Generation Sequencing (NGS) data analysis such aligning multiple short reads. Unfortunately, multiple sequence alignment (MSA) is NP-Complete. In addition, the lack of a reliable scoring method makes it very hard to align the sequences reliably and to evaluate the alignment outcomes. In this talk, I will describe a new scoring method for use in biological multiple sequence alignment. Our scoring method encapsulates stereo-chemical properties of sequence residues and their substitution probabilities into a tree-structure scoring scheme. In addition to the new scoring scheme, we have designed an overlapping sequence clustering algorithm to use in our three new multiple sequence alignment algorithms. One of our alignment algorithms uses a dynamic weighted guidance tree to perform multiple sequence alignment in progressive fashion. The use of dynamic weighted tree allows errors in the early alignment stages to be corrected in the subsequence stages. Other two algorithms utilize sequence knowledge and sequence consistency to produce biological meaningful sequence alignments. The sequence knowledge-based algorithm utilizes the existing biological sequence knowledge databases such as Swiss-Prot to guide sequence alignment. When sequence knowledge databases are not available, the sequence consistency-based algorithm can utilize the consistency information from the input sequence to achieve a similar effect. Experimental results and theoretical analysis indicate that our new scoring function and alignment algorithms truly improve the current best multiple sequence alignment algorithms.

关键字

报告人

潘毅

中科院深圳理工大学

潘毅以江苏省理科状元考入清华大学计算机科学与工程系，1982年获得清华大学计算机工学学士学位，1984年获得清华大学计算机工学硕士学位，1991年获得美国匹兹堡大学计算机科学博士学位。曾任美国乔治亚州立大学计算机科学系主任、生物系主任、文理学院副院长、州校董教授、校级杰出教授，现任中国科学院深圳理工大学（筹）计算机科学与控制工程学院院长、讲席教授。潘毅教授是乌克兰国家工程院外籍院士、美国医学与生物工程院院士、英国皇家公共卫生学院院士、英国工程技术学会会士、日本学术振兴会会士，也是国家重点人才计划获得者。他的主要研究领域是以云计算、大数据分析、人工智能、深度学习等为工具，进行生物信息和医疗信息的研究。他设计和开发了许多生物信息学算法和工具，有力地推动了生物学和医学科学的发展。他曾入选全球前2%顶尖科学家榜单(World’s Top 2% Scientists 2020)；入选2021年度全球TOP 1000计算机科学家榜单，他是其中唯一一位中国的生物信息学领域专家。在此领域已发表250多篇SCI期刊学术论文，其中100多篇发表在顶尖的IEEE/ACM Transactions/Journals学术期刊上；另在国际学术会议录上发表150多篇学术论文，出版编著了42本书。他的学术成果已被引用近18000，目前H-index为87。他培养了20多名博士，50多名硕士。他荣获IEEE诸多奖项，应邀在60多个国际大会上作了大会主题演讲，并在美国和许多世界著名大学作了近百个学术报告。潘毅教授现任《Big Data Mining and Analytics》（清华大学与IEEE共同发行，中国Top 5%杂志）主编、中国顶尖计算机类英文杂志《Journal of Computer Science and Technology (JCST)》副总主编、中国顶尖电子类英文杂志《Chinese Journal of Electronics》副总主编。曾任国际顶尖生物信息杂志《IEEE/ACM Transactions on Computational Biology and Bioinformatics》副总主编、John-Wiley《生物信息学系列丛书》与John-Wiley《无线网络和移动计算系列丛书》的创办人兼主编和七种IEEE Transactions期刊副编辑。

NCCBB 2022

Biological Multiple Sequence Alignment: Scoring Functions, Algorithms, and Evaluations

摘要

关键字

报告人

潘毅

发表评论

全部评论