- AI virtual cells outperform key baselines on well-calibrated metrics, challenging prior reports of poor model performance
- Foundational research reinforces the use of virtual cell models to accelerate target identification pipelines
CAMBRIDGE, England–(BUSINESS WIRE)–Shift Bioscience (Shift), a biotechnology company uncovering the biology of cell rejuvenation to end the morbidity and mortality of aging, today announced the release of new research detailing an improved framework for evaluating benchmark metric calibration in virtual cell models1. Using well-calibrated metrics, the study demonstrates that virtual cell models consistently outperform key baselines, providing valuable and actionable biological insights to accelerate target identification pipelines.
Genetic perturbation response models are a subset of AI virtual cells used to predict how cells will respond to various genetic alterations, including up- and down-regulation of genes. These models are a valuable tool to augment target identification pipelines, providing a rapidly scalable, in silico solution to identify promising genetic targets without the time and resource requirements of wet lab experiments. However, recently published papers have questioned the utility of these models to correctly identify gene targets, noting concerns that virtual cell models fail to outperform simple, uninformative baselines in some experiments.
In this latest study from Shift Bioscience, the team demonstrated that incidents of poor model performance largely reflect metric miscalibration, with commonly-used metrics routinely failing to distinguish robust predictions from uninformative ones, particularly in datasets with weaker perturbations. Building on this finding, the team developed an improved framework for metric calibration. Using 14 perturb-seq datasets, the team identified several rank-based and DEG (Differentially Expressed Gene)-aware metrics that are well-calibrated across datasets.
Virtual cell models evaluated using these well-calibrated metrics were able to consistently outperform uninformative mean, control and linear baselines, providing clear evidence that virtual cell models can distinguish biologically significant signals when appropriate calibration is applied. These results challenge prior reports that genetic perturbation models do not work, and suggest that AI Virtual Cells can be effectively applied for target discovery.
Henry Miller, Ph.D., Head of Machine Learning, Shift Bioscience, commented: “This latest research from our talented team provides clear evidence that the reports of poor performance in AI virtual cells is largely due to limitations of metrics, not due to issues with the models. We showed that when models are evaluated on well-calibrated metrics, they perform quite well and consistently outperform key baselines. We believe that this work opens the door to more widespread use of virtual cells and reinforces our confidence in the virtual cell models that are helping to drive our target identification program for cell rejuvenation.”
Contacts
Jake Brown
[email protected]




