Representation Learning For Large-scale Graphs
Files
Publication or External Link
Date
Authors
Advisor
Citation
Abstract
Graphs are widely used to model object relationships in real world applications such as biology, neuroscience, and communication networks. Traditional graph analysis tools focus on extracting key graph patterns to characterize graphs which are further used in the downstream tasks such as prediction. Common graph characteristics include global and local graph measurements such as clustering coefficient, global efficiency, characteristic path length, diameter and so on. Many applications often involve high dimensional and large-scale graphs for which existing methods, which rely on small numbers of graph characteristics, cannot be used to efficiently perform graph tasks. A major research challenge is to learn graph representations that can be used to efficiently perform graph tasks as required by a wide range of applications.
In this thesis, we have developed a number of novel methods to tackle the challenges associated with processing or representing large-scale graphs. In the first part, we propose a general graph coarsening framework that maps large graphs into smaller ones while preserving important structural graph properties. Based on spectral graph theory, we define a novel distance function that measures the differences between graph spectra of the original and coarse graphs. We show that the proposed spectral distance sheds light on the structural differences in the graph coarsening process. In addition, we propose graph coarsening algorithms that aim to minimize the spectral distance, with provably strong bounds. Experiments show that the proposed algorithms outperform previous graph coarsening methods in applications such as graph classification and stochastic block recovery tasks.
In the second part, we propose a new graph neural network paradigm that improves the expressiveness of the best known graph representations. Graph neural network (GNN) models have recently been introduced to solve challenging graph-related problems. Most GNN models follow the message-passing paradigm where node information is propagated through edges, and graph representations are formed by the aggregation of node representations. Despite their successes, message-passing GNN models are limited in terms of their expressive power, which fail to capture basic characteristic properties of graphs. In our work, we represent graphs as the composition of sequence representations. Through the design of sequence sampling and modeling techniques, the proposed graph representations achieve provably powerful expressiveness while maintaining permutation invariance. Empirical results show that the proposed model achieves superior results in real-world graph classification tasks.
In the third part, we develop a fast implementation of spectral clustering methods on CPU-GPU platforms. Spectral clustering is one of the most popular graph clustering algorithms which achieved state-of-the art performance in a wide range of applications. However, existing implementations in commonly used software platforms such as Matlab and Python do not scale well for many of the emerging Big Data applications. We present a fast implementation of the spectral clustering algorithm on a CPU-GPU heterogeneous platform. Our implementation takes advantage of the computational power of the multi-core CPU and the massive multithreading capabilities of GPUs. We show that the new implementation achieved significantly accelerated computation speeds compared with previous implementations on a wide range of tasks.
In the fourth part, we study structural brain networks derived from Diffusion Tensor Imaging (DTI) data. The processing of DTI data coupled with the use of modern tractographic methods reveal white matter fiber connectivity at a relatively high resolution; this allows us to model the brain as a structural network which encodes pairwise connectivity strengths between brain voxels. We have developed an iterative method to delineate the brain cortex into fine-grained connectivity-based brain parcellations. This allows to map the initial large-scale brain network into a relatively small weighted graph that preserves the essential structural connectivity information. We show that graph representations based on the brain networks from new brain parcellations are more powerful in discriminating between different populations groups, compared with existing brain parcellations.