A Latent Factor Approach for Social Network Analysis
Files
Publication or External Link
Date
Authors
Advisor
Citation
DRUM DOI
Abstract
Social network data consist of entities and the relation of information between
pairs of entities. Observations in a social network are dyadic and interdependent.
Therefore, making appropriate statistical inferences from a network requires specifications
of dependencies in a model. Previous studies suggested that latent factor
models (LFMs) for social network data can account for stochastic equivalence and
transitivity simultaneously, which are the two primary dependency patterns that are
observed social network data in real-world social networks. One particular LFM, the
additive and multiplicative effects network model (AME) accounts for the heterogeneity
of second-order dependencies at the actor level. However, all current latent
variable models have not considered the heterogeneity of third-order dependencies,
actor-level transitivity for example. Failure to model third-order dependency heterogeneity
may result in worse fits to local network structures, which in turn may result
in biased parameter inferences and may negatively influence the goodness-of-fit and
prediction performance of a model.
Motivated by such a gap in the literature, this dissertation proposes to incorporate
a correlation structure between the sender and receiver latent factors in the
AME to account for the distribution of actor-level transitivity. The proposed model
is compared with the existing AME in both simulation studies real-world data. Models
are evaluated via multiple goodness-of-fit techniques, including mean squared error,
parameter coverage rate, information criteria, receiver-operation curve (ROC)
based on K-fold cross-validation or full data, and posterior predictive checking. This
work may also contribute to the literature of goodness-of-fit methods to network
models, which is an area that has not been unified.
Both the simulation studies and real-world data analyses showed that adding
the correlation structure provides a better fit as well as higher prediction accuracy
to network data. The proposed method has equal or similar performance to the
AME when the underlying correlation is zero, with regard to mean-squared error
of probability of ties and widely applicable information criteria. The present study
did not find any significant impact of the correlation term on the node-level covariate’s
coefficient estimation. Future studies include investigating more types of covariates,
subgroup related covariate effects is an example.