Fostering reproducibility and generalizability in machine learning for clinical prediction modeling in spine surgery
Journal Paper/Review - Oct 13, 2020
Azad Tej D, Ehresman Jeff, Ahmed Ali Karim, Staartjes Victor E, Lubelski Daniel, Stienen Martin N., Veeravagu Anand, Ratliff John K
As the use of machine learning algorithms in the development of clinical prediction models has increased, researchers are becoming more aware of the deleterious effects that stem from the lack of reporting standards. One of the most obvious consequences is the insufficient reproducibility found in current prediction models. In an attempt to characterize methods to improve reproducibility and to allow for better clinical performance, we utilize a previously proposed taxonomy that separates reproducibility into 3 components: technical, statistical, and conceptual reproducibility. By following this framework, we discuss common errors that lead to poor reproducibility, highlight the importance of generalizability when evaluating a ML model's performance, and provide suggestions to optimize generalizability to ensure adequate performance. These efforts are a necessity before such models are applied to patient care.