[CSB Seminar] Chenyu Wang: Reward-Optimized Discrete Diffusion Models for DNA and Protein Design
Abstract
Diffusion models have shown strong performance on discrete sequence tasks, such as protein inverse folding, by generating natural-like sequences conditioned on structural constraints. However, real-world design problems often require optimizing specific objectives—e.g., generating stable proteins. In this talk, I introduce DRAKES, a new algorithm that fine-tunes pretrained discrete diffusion models to maximize task-specific rewards while preserving sequence naturalness. By leveraging the Gumbel-Softmax trick, DRAKES enables end-to-end reward backpropagation through discrete sampling trajectories. Our theoretical analysis indicates that our approach can generate sequences that are both natural-like (i.e., have a high probability under a pretrained model) and yield high rewards. Unlike prior work in continuous domains, our method tackles unique challenges in the discrete setting, rooted in continuous-time Markov chains. We demonstrate the effectiveness of our algorithm in generating DNA and protein sequences that optimize enhancer activity and protein stability, respectively, important tasks for gene therapies and protein-based therapeutics.
Speaker Bio
Chenyu Wang is a third-year PhD student at MIT CSAIL, advised by Professor Tommi Jaakkola. Her research interests lie broadly in machine learning, representation learning, generative models, and AI for science. Recently, her research focuses on multi-modal learning, diffusion generative models, and controlled generation. She has completed a research internship at Genentech in Aviv Regev’s lab. Before PhD, she obtained her Bachelor’s degree from Tsinghua University.