欢迎来到中国遗传学会! 收藏本站 | 设为首页 | 联系我们
首页 走进学会 学术交流 表彰奖励 科普之窗 会员之家 下载中心 联系我们
 首页 > 通知公告
2025年5月23日|进化之光云论坛第二十五期——斯坦福大学和Arc研究所 Garyk Brixi博士生
 

 

内容提要:

All of life encodes information with DNA. While tools for sequencing, synthesis, and editing of genomic code have transformed biological research, intelligently composing new biological systems would also require a deep understanding of the immense complexity encoded by genomes. We introduce Evo 2, a biological foundation model trained on 9.3 trillion DNA base pairs from a highly curated genomic atlas spanning all domains of life. We train Evo 2 with 7B and 40B parameters to have an unprecedented 1 million token context window with single-nucleotide resolution. Evo 2 learns from DNA sequence alone to accurately predict the functional impacts of genetic variation-from noncoding pathogenic mutations to clinically significant BRCA1 variants-without task-specific finetuning. Applying mechanistic interpretability analyses, we reveal that Evo 2 autonomously learns a breadth of biological features, including exon–intron boundaries, transcription factor binding sites, protein structural elements, and prophage genomic regions. Beyond its predictive capabilities, Evo 2 generates mitochondrial, prokaryotic, and eukaryotic sequences at genome scale with greater naturalness and coherence than previous methods. Guiding Evo 2 via inference-time search enables controllable generation of epigenomic structure, for which we demonstrate the first inference-time scaling results in biology. We make Evo 2 fully open, including model parameters, training code, inference code, and the OpenGenome2 dataset, to accelerate the exploration and design of biological complexity.

讲者简介:

Garyk is a PhD student in Professor Brian Hie’s lab at Stanford. He works at the intersection of machine learning and biology to create better models of genes and genomes. Garyk previously studied Applied Mathematics at Harvard. Garyk’s previous work includes developing treatments for acute malnutrition, genetic association studies, machine learning for protein engineering, and connecting statistical and deep learning models of protein sequence.

 

 

活动链接:

蔻享学术直播:

https://www.koushare.com/live/details/43122

腾讯会议直播:

https://meeting.tencent.com/l/WmSQeiT4GINb

微信视频号(扫码预约/进入):

 

 

回放信息:

本期论坛已征得主讲人同意,将上传录制视频,会后可观看回放,欢迎关注。

方式1:微信视频号原直播链接

方式2:蔻享学术网站

https://www.koushare.com/topic-hd/i/ELCF


 
联系地址:北京市朝阳区北辰西路1号院2号 邮编100101 联系电话:010-64806635 010-64806529
中国遗传学会 版权所有 Genetics Society of China