Statistical Methods for Analyzing Time Series Data Drawn from Complex Social Systems

Thumbnail Image


Publication or External Link





The rise of human interaction in digital environments has lead to an abundance of behavioral traces. These traces allow for model-based investigation of human-human and human-machine interaction `in the wild.' Stochastic models allow us to both predict and understand human behavior. In this thesis, we present statistical procedures for learning such models from the behavioral traces left in digital environments.

First, we develop a non-parametric method for smoothing time series data corrupted by serially correlated noise. The method determines the simplest smoothing of the data that simultaneously gives the simplest residuals, where simplicity of the residuals is measured by their statistical complexity. We find that complexity regularized regression outperforms generalized cross validation in the presence of serially correlated noise.

Next, we cast the task of modeling individual-level user behavior on social media into a predictive framework. We demonstrate the performance of two contrasting approaches, computational mechanics and echo state networks, on a heterogeneous data set drawn from user behavior on Twitter. We demonstrate that the behavior of users can be well-modeled as processes with self-feedback. We find that the two modeling approaches perform very similarly for most users, but that users where the two methods differ in performance highlight the challenges faced in applying predictive models to dynamic social data.

We then expand the predictive problem of the previous work to modeling the aggregate behavior of large collections of users. We use three models, corresponding to seasonal, aggregate autoregressive, and aggregation-of-individual approaches, and find that the performance of the methods at predicting times of high activity depends strongly on the tradeoff between true and false positives, with no method dominating. Our results highlight the challenges and opportunities involved in modeling complex social systems, and demonstrate how influencers interested in forecasting potential user engagement can use complexity modeling to make better decisions.

Finally, we turn from a predictive to a descriptive framework, and investigate how well user behavior can be attributed to time of day, self-memory, and social inputs. The models allow us to describe how a user processes their past behavior and their social inputs. We find that despite the diversity of observed user behavior, most models inferred fall into a small subclass of all possible finitary processes. Thus, our work demonstrates that user behavior, while quite complex, belies simple underlying computational structures.