Towards Data-Driven Large Scale Scientific Visualization and Exploration

Thumbnail Image


Publication or External Link






Technological advances have enabled us to acquire extremely large

datasets but it remains a challenge to store, process, and extract

information from them. This dissertation builds upon recent advances

in machine learning, visualization, and user interactions to

facilitate exploration of large-scale scientific datasets. First, we

use data-driven approaches to computationally identify regions of

interest in the datasets. Second, we use visual presentation for

effective user comprehension. Third, we provide interactions for

human users to integrate domain knowledge and semantic information

into this exploration process.

Our research shows how to extract, visualize, and explore informative

regions on very large 2D landscape images, 3D volumetric datasets,

high-dimensional volumetric mouse brain datasets with thousands of

spatially-mapped gene expression profiles, and geospatial trajectories

that evolve over time. The contribution of this dissertation include:

(1) We introduce a sliding-window saliency model that discovers

regions of user interest in very large images; (2) We develop visual

segmentation of intensity-gradient histograms to identify meaningful

components from volumetric datasets; (3) We extract boundary surfaces

from a wealth of volumetric gene expression mouse brain profiles to

personalize the reference brain atlas; (4) We show how to efficiently

cluster geospatial trajectories by mapping each sequence of locations

to a high-dimensional point with the kernel distance framework.

We aim to discover patterns, relationships, and anomalies that would

lead to new scientific, engineering, and medical advances. This work

represents one of the first steps toward better visual understanding

of large-scale scientific data by combining machine learning and human