Exploring Blind and Sighted Users’ Interactions With Error-Prone Speech and Image Recognition

Hong, Jonggi

Exploring Blind and Sighted Users’ Interactions With Error-Prone Speech and Image Recognition

dc.contributor.advisor	Kacorri, Hernisa	en_US
dc.contributor.author	Hong, Jonggi	en_US
dc.contributor.department	Computer Science	en_US
dc.contributor.publisher	Digital Repository at the University of Maryland	en_US
dc.contributor.publisher	University of Maryland (College Park, Md.)	en_US
dc.date.accessioned	2022-02-04T06:31:26Z
dc.date.available	2022-02-04T06:31:26Z
dc.date.issued	2021	en_US
dc.description.abstract	Speech and image recognition, already employed in many mainstream and assistive applications, hold great promise for increasing independence and improving the quality of life for people with visual impairments. However, their error-prone nature combined with challenges in visually inspecting errors can hold back their use for more independent living. This thesis explores blind users’ challenges and strategies in handling speech and image recognition errors through non-visual interactions looking at both perspectives: that of an end-user interacting with already trained and deployed models such as automatic speech recognizer and image recognizers but also that of an end-user who is empowered to attune the model to their idiosyncratic characteristics such as teachable image recognizers. To better contextualize the findings and account for human factors beyond visual impairments, user studies also involve sighted participants on a parallel thread. More specifically, Part I of this thesis explores blind and sighted participants' experience with speech recognition errors through audio-only interactions. Here, the recognition result from a pre-trained model is not being displayed; instead, it is played back through text-to-speech. Through carefully engineered speech dictation tasks in both crowdsourcing and controlled-lab settings, this part investigates the percentage and type of errors that users miss, their strategies in identifying errors, as well as potential manipulations of the synthesized speech that may help users better identify the errors. Part II investigates blind and sighted participants' experience with image recognition errors. Here, we consider both pre-trained image recognition models and those fine-tuned by the users. Through carefully engineered questions and tasks in both crowdsourcing and semi-controlled remote lab settings, this part investigates the percentage and type of errors that users miss, their strategies in identifying errors, as well as potential interfaces for accessing training examples that may help users better avoid prediction errors when fine-tuning models for personalization.	en_US
dc.identifier	https://doi.org/10.13016/smsx-yu9x
dc.identifier.uri	http://hdl.handle.net/1903/28402
dc.language.iso	en	en_US
dc.subject.pqcontrolled	Computer science	en_US
dc.subject.pqcontrolled	Artificial intelligence	en_US
dc.subject.pqcontrolled	Information science	en_US
dc.subject.pquncontrolled	Accessibility	en_US
dc.subject.pquncontrolled	Machine teaching	en_US
dc.subject.pquncontrolled	Object recognition	en_US
dc.subject.pquncontrolled	Speech recognition	en_US
dc.subject.pquncontrolled	Teachable interface	en_US
dc.subject.pquncontrolled	Visual impairment	en_US
dc.title	Exploring Blind and Sighted Users’ Interactions With Error-Prone Speech and Image Recognition	en_US
dc.type	Dissertation	en_US

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Hong_umd_0117E_22040.pdf
Size:: 66.04 MB
Format:: Adobe Portable Document Format

Download

Collections

UMD Theses and Dissertations
Computer Science Theses and Dissertations