Promoting Rich and Low-Burden Self-Tracking With Multimodal Data Input

Thumbnail Image


Publication or External Link





Manual tracking of personal data offers many benefits such as increased engagement and situated awareness. However, existing self-tracking tools often employ touch-based input to support manual tracking, imposing a heavy input burden and limiting the richness of the collected data. Inspired by speech's fast and flexible nature, this dissertation examines how speech input works with traditional touch input to manually capture personal data in different contexts: food practice, productivity, and exercise.

As a first step, I conducted co-design workshops with registered dietitians to explore opportunities for customizing food trackers composed of multimodal input. The workshops generated diverse tracker designs to meet dietitians' information needs, with a wide range of tracking items, timing, data format, and input modalities.

In the second study, I specifically examined how speech input supports capturing everyday food practice. I created FoodScrap, a speech-based food journaling app, and conducted a data collection study, in which FoodScrap not only collected rich details of meals and food decisions, but was also recognized for encouraging self-reflection.

To further integrate touch and speech on mobile phones, I developed NoteWordy, a multimodal system integrating touch and speech input to capture multiple types of data. Through deploying NoteWordy in the context of productivity tracking, I found several input patterns varying by the data type as well as participants' input habits, error tolerance, and social surroundings. Additionally, speech input helped faster entry completion and enhanced the richness of the free-form text.

Furthermore, I expanded the research scope by exploring speech input on smart speakers by developing TandemTrack, a multimodal exercise assistant coupling a mobile app and an Alexa skill. In a four-week deployment study, TandemTrack demonstrated the convenience of the hands-free speech input to capture exercise data and acknowledged the importance of visual feedback on the mobile app to help with data exploration.

Across these studies, I describe the strengths and limitations of speech as an input modality to capture personal data in various contexts, and discuss opportunities for improving the data capture experience with natural language input. Lastly, I conclude the dissertation with design recommendations toward a low-burden, rich, and reflective self-tracking experience.