Artificial intelligence systems designed to diagnose cancer from tissue slides are learning to infer patient demographics, leading to uneven diagnostic performance across racial, gender, and age groups. Researchers at Harvard Medical School and collaborators identified the problem and developed a method that sharply reduces these disparities, underscoring the need for routine bias checks in medical AI.
Pathology has long relied on examining thin tissue slices under microscopes to diagnose cancer, a process that typically does not reveal a patient's demographic characteristics to the human eye. Yet new research shows that AI models entering pathology labs do not share this limitation.
A study led by Kun-Hsing Yu, an associate professor of biomedical informatics in the Blavatnik Institute at Harvard Medical School and an assistant professor of pathology at Brigham and Women's Hospital, analyzed several standard deep-learning pathology systems trained on large collections of labeled slides for cancer diagnosis.
According to Harvard Medical School and the study published in Cell Reports Medicine, the team evaluated four commonly used pathology AI models on a large, multi-institutional repository of pathology slides spanning 20 cancer types.
The researchers found that all four models showed unequal performance across demographic groups defined by patients' self-reported race, gender, and age. In a pan-cancer analysis, they identified significant performance disparities in about 29 percent of diagnostic tasks.
Follow-up reporting by outlets including News-Medical notes that disparities were especially apparent in certain lung- and breast-cancer subtype tasks, with underperformance for African American and some male patients on lung-cancer distinctions, and for younger patients on several breast-cancer subtype distinctions.
The research team traced these gaps to several factors. One was the uneven representation of demographic groups in the training data. Another involved differences in disease incidence and biology across populations. The Cell Reports Medicine paper further reports that variations in the prevalence of somatic mutations among populations contributed to performance disparities, suggesting that the models were picking up subtle molecular patterns linked to demographics as well as disease.
"Reading demographics from a pathology slide is thought of as a 'mission impossible' for a human pathologist, so the bias in pathology AI was a surprise to us," Yu said, according to Harvard Medical School.
To address the problem, the researchers developed FAIR-Path (Fairness-aware Artificial Intelligence Review for Pathology), a bias-mitigation framework that builds on an existing machine-learning concept known as contrastive learning. The approach encourages models to emphasize differences between cancer types while downplaying differences tied to demographic categories.
In the Cell Reports Medicine study, FAIR-Path mitigated 88.5 percent of the measured performance disparities across demographic groups in the primary pan-cancer analysis and reduced performance gaps by 91.1 percent in external validation across 15 independent cohorts.
Yu and colleagues report that FAIR-Path improved fairness without requiring perfectly balanced datasets and with relatively modest changes to existing model-training pipelines.
The work, described on December 16, 2025, in Cell Reports Medicine, highlights the importance of systematically testing medical AI systems for demographic bias before they are deployed in clinical care.
According to follow-up coverage from Harvard Medical School and SciTechDaily, the team is now exploring how to extend FAIR-Path to settings with limited data and to better understand how AI-driven bias contributes to wider disparities in health outcomes. Their long-term goal is to develop pathology AI tools that support human experts by providing fast, accurate, and fair diagnoses for patients across all backgrounds.