Interpretable machine learning prediction of extracorporeal shock wave lithotripsy outcomes for urinary stones: a retrospective cohort study
All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.
Authors
Background: Accurately predicting the outcome of extracorporeal shock wave lithotripsy (ESWL) is a persistent clinical challenge. While machine learning (ML) offers potential for improved predictions, the opacity of many models hinders clinical trust and adoption. This study aimed to develop and validate an interpretable ML model to predict ESWL success using routinely available clinical data.
Patients and methods: In this retrospective cohort study, we analyzed data from 1,501 patients treated with a single ESWL session at a single institution (2022-2024). Six ML algorithms were trained on 75% of the data (n=1,125), with performance evaluated on a hold-out test set (n=376). Techniques to manage significant class imbalance were employed. Model interpretability was achieved using SHapley Additive exPlanations (SHAP).
Results: The extreme gradient boosting (XGBoost) model demonstrated the best discriminative performance, with an area under the receiver operating characteristic curve (ROC-AUC) of 0.723 (95% CI: 0.662-0.784). However, a critical trade-off was observed: the model exhibited high specificity (95.2%) but low sensitivity (35.4%), meaning it identified most successes but missed nearly two-thirds of treatment failures. Stone density and size were the most influential predictors, and SHAP analysis provided clinically plausible, individualized explanations for predictions.
Conclusions: We present a transparent, interpretable ML framework for ESWL outcome prediction. While the model aligns with clinical reasoning and offers a foundation for trustworthy artificial intelligence, its current low sensitivity limits immediate standalone clinical utility for ruling out ESWL failure. The framework highlights the imperative for future work to improve sensitivity through richer datasets and prospective validation before integration into clinical pathways.
How to Cite

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.