Digital Words: Moving Forward with Measuring the Readability of Online Texts

Thumbnail Image
CS-TR-5061.pdf(1.47 MB)
No. of downloads: 420
Publication or External Link
Redmiles, Elissa M.
Maszkiewicz, Lisa
Hwang, Emily
Kuchhal, Dhruv
Liu, Everest
Morales, Miraida
Peskov, Denis
Rao, Sudha
Stevens, Rock
Gligoric, Kristina
The readability of a digital text can influence people’s information acquisition (Wikipedia articles), online security (how-to articles), and even health (WebMD). Readability metrics can also alter search rankings and are used to evaluate AI system performance. However, prior work on measuring readability has significant gaps, especially for HCI applications. Prior work has (a) focused on grade-school texts, (b) ignored domain-specific, jargon-heavy texts (e.g., health advice), and (c) failed to compare metrics, especially in the context of scaling to use with online corpora. This paper addresses these shortcomings by comparing well-known readability measures and a novel domain-specific approach across four different corpora: crowd-worker generated stories, Wikipedia articles, security and privacy advice, and health information. We evaluate the convergent, discriminant, and content validity of each measure and detail tradeoffs in domain-specificity and participant burden. These results provide a foundation for more accurate readability measurements in HCI.