PLAID: AI Model That Designs New Proteins From Scratch

April 20, 20261 min read

TL;DR

Berkeley researchers built a system that generates protein sequences and 3D structures using latent diffusion, opening new doors in computational biology.

In a groundbreaking development from UC Berkeley's BAIR lab, the PLAID AI model is transforming protein design by leveraging the latent spaces of established protein folding predictors like ESMFold. This approach enables simultaneous generation of 1D amino acid sequences and 3D atomic structures, addressing a key bottleneck in synthetic biology. The model's ability to accept compositional prompts for function and organism specificity marks a significant leap over previous methods, which often struggled with multimodal co-generation.

PLAID's training leverages vast sequence databases, which are orders of magnitude larger than structural datasets, thanks to the lower cost of sequencing. By learning a diffusion model over the latent embeddings of protein folding AI, it decodes novel proteins during inference without requiring extensive structural data. This method mirrors techniques in robotics and vision-language models, where pretrained weights supply critical priors for generative tasks.

A notable innovation is the CHEAP compression technique, which tackles regularization challenges in transformer-based latent spaces. This ensures efficient mapping and high-resolution outputs, making PLAID practical for real-world applications like drug discovery and enzyme engineering. Early results show improved diversity and structural accuracy, such as beta-strand patterns that have eluded earlier models.

The research builds on the legacy of AlphaFold2, whose 2024 Nobel Prize highlighted AI's role in biology. Collaborations with institutions like Genentech, Microsoft Research, and New York University underscore the model's potential for wet-lab testing and broader adoption. As protein predictors evolve to handle complexes with nucleic acids and ligands, PLAID's framework could extend to more intricate multimodal generation.

This work not only advances generative AI in science but also opens doors to customizable protein design, potentially accelerating developments in medicine and biotechnology. With code and preprints publicly available, the team invites further collaboration to validate and expand these methods in practical settings.