Best practices for supervised machine learning when examining biomarkers in clinical populations

Machine learning approaches are increasingly used in health research. Applications range from the identification of disease onset, classification of disease severity, to predicting epileptic seizures. Although machine learning can be a powerful tool, there is potential for misuse; model performance can be inflated through overfitting and, consequently, will not generalize to the greater population. The risk of misuse increases when the number of variables extracted from continuous data is almost unlimited, as is the case for neural, movement, and acoustic (e.g., speech and music) data. Given that health research may contain small sample sizes, and outcome variables can be noisier for clinical populations, there are important points that should be considered before using machine learning. We suggest best practices in machine learning including data formatting, reducing data dimensionality, model selection and evaluation, and other steps within the machine learning process. We further discuss some common pitfalls in applying machine learning to small sample sizes and high-dimensional data (e.g., speech biomarkers, neural and imaging data). We advocate for parsimonious approaches that include selecting the simplest machine learning method that best describes the data, preventing redundancy and overfitting through variable elimination, and ensuring that certain variables or approaches do not inflate machine learning outcomes. We further consider approaches that can identify the best predictors (or combinations thereof), as well as “black box” machine learning methods (e.g., deep learning). Finally, we discuss the limitations of current machine learning methods and pose future directions to broaden the applicability of machine learning tools and ensure the outcomes are robust against random factors.

Click here for more details


Related Post

  • Posted on 31 March, 2024
    Speech and language impairments are core features of the neurodevelopmental genetic condition Kleefstra syndrome. The speech, language and cognitive profile...
    • Posted on 25 January, 2024
      Exposing healthy adults to extended periods of wakefulness is known to induce changes in psychomotor functioning. The effect of fatigue...
      • Posted on 17 January, 2024
        While speech biomarkers of disease have attracted increased interest in recent years, a challenge is that features derived from signal...