STATISTICAL INFERENCE ACROSS MULTIPLE NETWORKS: ADVANCEMENTS IN MULTIPLEX GRAPH MATCHING AND JOINT SPECTRAL NETWORK EMBEDDINGS
MetadataShow full item record
Networks are commonly used to model and study complex systems that arise in a variety of scientific domains.One important network data modality is multiplex networks which are comprised of a collection of networks over a common vertex set. Multiplex networks can describe complex relational data where edges within each network can encode different relationship or interaction types. With the rise of network modeling of complex, multi-sample network data, there has been a recent emphasis on multiplex inference methods. In this thesis, we develop novel theory and methodology to study underlying network structures and perform statistical inference on multiple networks. While each chapter of the thesis has its own individual merit, synergistically they constitute a coherent multi-scale spectral network inference framework that accounts for unlabeled and correlated multi-sample network data. Together, these results significantly extend the reach of such procedures in the literature. In the first part of the thesis, we consider the inference task of aligning the vertices across a pair of multiplex networks, a key algorithmic step in routines that assume a priori node-aligned data. This general multiplex matching framework is then adapted to the task of detecting a noisy induced multiplex template network in a larger multiplex background network.Our methodology, which lifts the classical graph matching framework and the matched filters method of Sussman et al. (2018) to the multiple network setting, uses the template network to search for the ``most" similar subgraph(s) in the background network, where the notion of similarity is measured via a multiplex graph matching distance. We present an algorithm which can efficiently match the template to a (induced or not induced) subgraph in the background that approximately minimizes a suitable graph matching distance, and we demonstrate the effectiveness of our approach both theoretically and empirically in synthetic and real-world data settings. In the second part of the thesis, we present a joint embedding procedure for multi-scale spectral network inference.In our proposed framework, a collection of node-aligned networks are jointly embedded into a common Euclidean space by embedding a specialized omnibus matrix which contains the information across the entire collection of networks. Our approach---which builds upon the work of Levin et al. (2017)---is flexible enough to faithfully embed many different network collection types (e.g., network time-series data; test-retest network data, etc.) and is theoretically tractable. Indeed, our method is among the first joint embedding methods in which statistical consistency and asymptotic normality are established across correlated network collections. Moreover, we are able to identify (and fully analyze in our setting) the phenomenon of induced correlation in the embedded space, which is an artifact of joint embedding methodologies. We examine how both the induced (by the method) and inherent correlation can impact inference for correlated network data, and we provide network analogues of classical questions such as the effective sample size for more generally correlated data. In the final part of the thesis, we consider a constrained optimization problem called corr2OMNI, and we present an algorithm that approximates generalized omnibus matrix structure (for jointly embedding networks) that best preserve (in the embedded space) a given correlation structure across a collection of networks. Moreover, we analyze theoretically an important special case of corr2OMNI where we desire a fixed level of correlation across the networks.