FROM EXPLORATORY TO CONFIRMATORY: TOWARDS DATA VISUALIZATION AS A COMPLETE ANALYSIS TOOL
Publication or External Link
Confirmatory statistics tests, performed and written with equations, are a standard in scientific publications, but may represent a barrier to entry for novice analysts who have less familiarity with purely calculative methods. Data visualization, often touted as useful for sharing completed analyses with lay audiences, is often used for early-stage exploratory analysis. Could visualization support hypothesis confirmation? Do people have the visual intuitions to make use of such a tool? What would a visual statistical test look like, and what features would it require for acceptance by the scientific community?This research begins with a crowd-sourced experiment which asked respondents to fit a normal curve to a series of data samples, displayed as bar histograms, dot histograms, box plots, or strip plots. The results suggest people have visual intuitions – though biased toward overestimating spread – for linking idealized probability distributions with real sample data. People performed differently depending upon graphic form, suggesting design choices for subsequent experiments.
A second experiment tested whether novice users might be able to perform a statistical test (T-Test) using a visual analogue – two overlapping distributions (shown as overlapping normal curves, box plots, strip plots, bar histograms, or dot histograms). Respondents had some capacity for this task, performing best with normal curves than with more detailed graphics like histograms.
The final investigation of this research paired the design lessons garnered during experiments 1 & 2 with an interview study of experienced statisticians to explore the design requirements for creating acceptable visual tools for inferential statistics. The interviews uncovered three design foci: that the tool must display multiple, contrasting facets of analysis; the tool should connect the test back to raw data; and include a visual representation of real effect sizes compared to the p-value of the test statistic. The final chapter of this dissertation uses the design principles determined by these three investigations to propose a prototype visual tool for conducting a two-sample t-test, along with suggested variations for other inferential statistics.