Supplementary material for machine learning analysis of data from a simplified mobility testing procedure with a single sensor and single task accurately differentiates Parkinson's disease from controls

Abstract

Quantitative mobility analysis using wearable sensors, while promising as a diagnostic tool for Parkinson's disease (PD), is not commonly applied in clinical settings. Major obstacles include uncertainty regarding the best protocol for instrumented mobility testing and subsequent data processing, as well as the added workload and complexity of this multi-step process. To simplify sensor-based mobility testing in diagnosing PD, we analyzed data from 262 PD participants and 50 controls performing several motor tasks wearing a sensor on the lower back containing a triaxial accelerometer and a triaxial gyroscope. Using ensembles of heterogeneous machine learning models incorporating a range of classifiers trained on a large set of sensor features, we show that our models effectively differentiate between participants with PD and controls, both for mixed-stage PD (92.6% accuracy) and a group selected for mild PD only (89.4% accuracy). Omitting algorithmic segmentation of complex mobility tasks decreased the diagnostic accuracy of our models, as did the inclusion of kinesiological features. Feature importance analysis revealed Timed Up & Go (TUG) tasks to contribute highest-yield predictive features, with only minor decrease in accuracy for models based on cognitive TUG as a single mobility task. Our machine learning approach facilitates major simplification of instrumented mobility testing without compromising predictive performance.

Notes

  1. sensor_data: folder with sensor readings derived from 32-foot walk, standing with eyes open, standing with eyes closed, two trials of TUG, and two trials of cogTUG. It has also the calculated kinesiological variables, demographics, and clinical evaluation data in separate files.
  2. code_notebook: notebook with the code used to generate the super learner models. It has sections corresponding to the components of the proposed machine-learning pipeline. To view the notebook open the file index.html in a web browser or open the file notebook.pdf.
  3. rdata: folder with intermediate R objects. sensor_features.RData: saves a list of the features table of each task. sensor_features_all_tasks.RData: saves one table of features for all subjects not missing cogTUG data and tasks. The mean of repeated tasks and demographics variables are also added. PD_control_seg.RData: save a data frame with rows corresponding to PD participants and controls and columns corresponding to the features selected by the feature reduction technique. HY_control_early.RData: save a data frame with rows corresponding to mild PD participants and controls and columns corresponding to the features selected by the feature reduction technique. HY_control_mild.RData: save a data frame with rows corresponding to moderate PD participants and controls and columns corresponding to the features selected by the feature reduction technique. HY_control_severe.RData: save a data frame with rows corresponding to severe PD participants and controls and columns corresponding to the features selected by the feature reduction technique. var_reduct_PD_control_splits.RData saves the training and test splits for the five repeats and five-fold cross-validation framework used to build a classifier distinguishing PD patients and controls. var_reduct_HY_early_HC_splits.RData saves the training and test splits for the five repeats and five- fold cross-validation framework used to build a classifier distinguishing mild PD participants and controls. var_reduct_HY_mild_HC_splits.RData saves the training and test splits for the five repeats and five-fold cross-validation framework used to build a classifier distinguishing moderate PD participants and controls. var_reduct_HY_severe_HC_splits.RData saves the training and test splits for the five repeats and five-fold cross-validation framework used to build a classifier distinguishing severe PD participants and controls.
  4. files: for each classifier, its folder contains five sl_predictions.csv files with the predictions of the super learner models for the five repeats of the outer loop, train_test_files with the train and test split files, top_imp_scores.csv with the permutation-based importance scores, and top_shap_values.csv with the SHAP values of each feature. Each classifier folder has also five files (GLM_params.csv, GBM_params.csv, DRF_params, XGBoost_params.csv, DeepLearning_params.csv) with the hyperparameters of the base models used to build the super learners.
  5. models: for each classifier, it has the 25 superlearner models built inside the nested loop framework (will be uploaded).
  6. README: file with detailed instructions on how to set up and run the code, as well as any dependencies or requirements.

Rights

CC0 1.0 Universal
http://creativecommons.org/publicdomain/zero/1.0/