Enhanced predictive performance of artificial intelligence in individualized ovarian stimulation of in vitro fertilization: a retrospective cohort study

Document Type

Article

Publication Date

12-1-2026

Abstract

Background: Over 2.5 million cycles in vitro fertilization (IVF) are conducted annually, and numbers are expected to rise with the aging population. The controlled ovarian stimulation (COS) process, key to IVF success, is inherently complex. Given advances in artificial intelligence (AI), this study investigated whether a series of AI models can outperform traditional clinical practices in predictive accuracy and COS optimization. Methods: This retrospective cohort study analyzed first-cycle ovarian stimulation patients (Oct 2017–Dec 2020) and was validated using an independent cohort (Jan 2018–Jan 2022). Six AI algorithms and 73 variables were screened. A four-submodel strategy included risk prediction models for low and hyper ovarian response (LORRM, HORRM) and strategy deployment models (LORSM, HORSM) for managing critical COS components. Feature importance was assessed using Shapley additive explanations, with sensitivity analyses performed for robustness. The ability to propose effective COS strategies was also retrospectively assessed. Results: A four-submodel system prototype using extreme gradient boosting trees was developed. All submodels showed superior discrimination compared to conventional ovarian reserve markers (AUC, 95% CI: LORSM, 0.95 [0.94–0.96]; LORRM, 0.93 [0.92–0.94]; HORSM, 0.90 [0.88–0.91]; HORRM, 0.89 [0.87–0.91]. DeLong P < 0.001 for all). They demonstrated adequate calibration (Brier scores of four submodels ranged from 0.064 to 0.072), promising performance in external validation (AUCs ranging from 0.84 to 0.88) and sensitivity analyses. Among COS components, COS protocol and recombinant follicle-stimulating hormone (FSH) use had the largest impact on low and hyper response risks, respectively, with FSH starting dose ranking third. Diastolic blood pressure, alanine aminotransferase, and white blood cell count predicted low response, while basal luteinizing hormone (LH) levels and platelet count were key for hyper response. Several were newly identified potential biomarkers. LORSM and HORSM identified effective strategies with precision of 95.5% (95% CI, 94.6–96.4%) and 98.4% (95% CI, 98.0–98.9%), respectively. Conclusions: The AI-based system demonstrated superior detection of abnormal ovarian responses and effective individualized COS design compared to conventional clinical practice while maintaining transparency. The system identified potential biomarkers beyond conventional ovarian reserve markers and offered new insights for optimizing IVF, showing promise for advancing personalized reproductive medicine.

Publication Source (Journal or Book title)

BMC Medicine

This document is currently not available here.

Share

COinS