Preface

This notebook provides the code used to build the machine learning pipeline described in the manuscript “Characterization of high-yield mobility features to identify Parkinson’s disease with a wearable sensor”. The process starts by filtering and segmenting the composite tasks. Then, a large set of time-domain and frequency-domain features was calculated. This set was reduced using forward selection based on feature importance calculated with random forests. After that, the data was shuffled multiple times, and a cross-validation framework was applied in each iteration to generate training and testing sets. Then, a super learner model was built using a wide array of base models and the final prediction was calculated by assigning a weight to the prediction of each base learner. Finally, the performance of the model was evaluated using the accuracy, sensitivity, specificity, and F1 score measures. The model interpretation was performed using SHAP analysis to understand the contribution of features to the final model prediction.