Protein language models (pLMs) apply the transformer architecture that revolutionized natural language processing to the language of biology: protein sequences. Just as text language models learn grammar and semantics from large text corpora, pLMs learn the rules of protein sequence composition from millions of natural protein sequences shaped by billions of years of evolution. These models develop internal representations that capture protein structure, function, and evolutionary relationships without any explicit structural supervision.

EvolutionaryScale has emerged as a leader in this space with its ESM (Evolutionary Scale Modeling) family of protein language models. ESM-2 and its successors can predict protein structure, assess mutation effects, and generate novel functional sequences. Meta AI's ESMFold demonstrated that language model representations alone could predict protein structures nearly as accurately as AlphaFold2 but at dramatically faster speeds. Cradle, a Dutch startup, applies protein language models to accelerate directed evolution campaigns, using model-guided sequence exploration to find improved variants with fewer experimental cycles.

The integration of protein language models into industrial protein engineering workflows is accelerating. These models can score thousands of candidate mutations computationally before any experimental testing, dramatically reducing the cost and time of engineering campaigns. Applications include optimizing therapeutic antibodies, improving industrial enzyme stability, and designing novel protein therapeutics. The combination of protein language models with generative AI approaches is enabling the design of proteins with properties beyond what is observed in nature, opening new frontiers in synthetic biology and biotechnology.