Khalil, Rana M.Shulman, Lisa M.Gruber-Baldini, Ann L.Shakya, SunitaHausdorff, Jeffrey M.von Coelln, RainerCummings, Michael P.Mobility tasks like the Timed Up and Go test (TUG), cognitive TUG (cogTUG), and walking with turns provide insight into motor control, balance, and cognitive functions affected by Parkinson’s disease (PD). We assess the test-retest reliability of these tasks in 262 PD participants and 50 controls by evaluating machine learning models based on wearable sensor-derived measures and statistical metrics. This evaluation examines total duration, subtask duration, and other quantitative measures across two trials. We show that the diagnostic accuracy for distinguishing PD from controls decreases by a mean of 1.8% between the first and the second trial, suggesting that task repetition may not be necessary for accurate diagnosis. Although the total duration remains relatively consistent between trials (intraclass correlation coefficient (ICC) = 0.62 to 0.95), greater variability is seen in subtask duration and sensor-derived measures, reflected in machine learning performance and statistical differences. Our findings also show that this variability differs not only between controls and PD participants but also among groups with varying levels of PD severity, indicating the need to consider population characteristics. Relying solely on total task duration and conventional statistical metrics to gauge the reliability of mobility tasks may fail to reveal nuanced variations in movement.en-USParkinson's disease; mobility tasks; wearable sensors; test-retest reliability; intraclass correlation coefficient; machine learningSupplementary material for machine learning and statistical analyses of sensor data reveal variability between repeated trials in Parkinson’s disease mobility assessmentsDataset