Integrating Statistics and Visualization to Improve Exploratory Social Network Analysis
Social network analysis is emerging as a key technique to understanding social, cultural and economic phenomena. However, social network analysis is inherently complex since analysts must understand every individual's attributes as well as relationships between individuals. There are many statistical algorithms which reveal nodes that occupy key social positions and form cohesive social groups. However, it is difficult to find outliers and patterns in strictly quantitative output. In these situations, information visualizations can enable users to make sense of their data, but typical network visualizations are often hard to interpret because of overlapping nodes and tangled edges. My first contribution improves the process of exploratory social network analysis. I have designed and implemented a novel social network analysis tool, SocialAction (http://www.cs.umd.edu/hcil/socialaction) , that integrates both statistics and visualizations to enable users to quickly derive the benefits of both. Statistics are used to detect important individuals, relationships, and clusters. Instead of tabular display of numbers, the results are integrated with a network visualization in which users can easily and dynamically filter nodes and edges. The visualizations simplify the statistical results, facilitating sensemaking and discovery of features such as distributions, patterns, trends, gaps and outliers. The statistics simplify the comprehension of a sometimes chaotic visualization, allowing users to focus on statistically significant nodes and edges. SocialAction was also designed to help analysts explore non-social networks, such as citation, communication, financial and biological networks. My second contribution extends lessons learned from SocialAction and provides designs guidelines for interactive techniques to improve exploratory data analysis. A taxonomy of seven interactive techniques are augmented with computed attributes from statistics and data mining to improve information visualization exploration. Furthermore, systematic yet flexible design goals are provided to help guide domain experts through complex analysis over days, weeks and months. My third contribution demonstrates the effectiveness of long term case studies with domain experts to measure creative activities of information visualization users. Evaluating information visualization tools is problematic because controlled studies may not effectively represent the workflow of analysts. Discoveries occur over weeks and months, and exploratory tasks may be poorly defined. To capture authentic insights, I designed an evaluation methodology that used structured and replicated long-term case studies. The methodology was implemented on unique domain experts that demonstrated the effectiveness of integrating statistics and visualization.