OpenAI has launched GPT-Rosalind, a large language model trained specifically on biology workflows. The model, named after scientist Rosalind Franklin, aims to address challenges in handling massive biological datasets and specialized subfields. Access is currently limited to US-based entities due to safety concerns.
OpenAI announced GPT-Rosalind on Thursday, distinguishing it from more generic science-focused models developed by other tech companies. The model targets key hurdles in biology research, including the overwhelming volume of data from genome sequencing and protein biochemistry, as well as the jargon-heavy nature of subfields like genetics and neurobiology. Yunyun Wang, OpenAI’s Life Sciences Product Lead, highlighted these issues during a press briefing, as reported by Ars Technica. Wang explained that a geneticist studying brain-related genes might struggle with neurobiological literature without specialized tools. The system was trained on 50 common biological workflows and methods for accessing public databases. It can suggest biological pathways, prioritize drug targets, and connect genotype to phenotype via known mechanisms. “We’re connecting genotype to phenotype through known pathways and regulatory mechanisms, infer likely structural or functional properties of proteins, and really leveraging this mechanistic understanding,” Wang said. OpenAI tuned GPT-Rosalind to be more skeptical, countering tendencies toward sycophancy in other large language models. The company describes its reasoning as handling complex multi-step processes and its abilities as expert-level based on benchmarks. However, concerns persist about potential hallucinations. Access is restricted through a trusted deployment structure for US entities only, due to risks like optimizing virus infectivity. A limited Life Sciences Research Plugin will soon be available to the public.