On improving experimental binding affinity predictions with synthetic data

Published in BioRxiv, 2026

Recommended citation: Ryczko, Kevin; Zin, Phyo Phyo; Crivelli-Decker, Jordan; Le, Ly; Jha, Punit K.; Shields, Benjamin J.; Lemos, Pablo; Bandi, Sasaank; van Damme, Maarten; Sood, Amogh; Huntington, Lee; Pitman, Mary; Ganahl, Martin; Bortolato, Andrea. "On improving experimental binding affinity predictions with synthetic data", BioRxiv, 2026. https://doi.org/10.64898/2026.03.02.708607

The success of deep learning binding affinity prediction models depends critically on expanding experimental data with reliable synthetic data. We extend the Structurally Augmented IC50 Repository (SAIR) with approximately 80K absolute free energy perturbation (AFEP) calculations and present two distinct data splits, SAIR-FEP and SAIR-OOD (out-of-distribution), to simulate realistic drug discovery scenarios. By filtering for high-confidence, co-folded complexes, we show that performance improves predictably, whereas training on all complexes blindly does not yield performance gains.

Share on

Twitter Facebook LinkedIn

Benjamin J. Shields, Ph.D.

Share on