Mathematicshttp://hdl.handle.net/1903/22612015-09-19T23:58:04Z2015-09-19T23:58:04ZStatistical Methods for Analyzing Time Series Data Drawn from Complex Social SystemsDarmon, Davidhttp://hdl.handle.net/1903/171112015-09-19T02:38:36Z2015-01-01T00:00:00ZStatistical Methods for Analyzing Time Series Data Drawn from Complex Social Systems
Darmon, David
The rise of human interaction in digital environments has lead to an abundance of behavioral traces. These traces allow for model-based investigation of human-human and human-machine interaction `in the wild.' Stochastic models allow us to both predict and understand human behavior. In this thesis, we present statistical procedures for learning such models from the behavioral traces left in digital environments.
First, we develop a non-parametric method for smoothing time series data corrupted by serially correlated noise. The method determines the simplest smoothing of the data that simultaneously gives the simplest residuals, where simplicity of the residuals is measured by their statistical complexity. We find that complexity regularized regression outperforms generalized cross validation in the presence of serially correlated noise.
Next, we cast the task of modeling individual-level user behavior on social media into a predictive framework. We demonstrate the performance of two contrasting approaches, computational mechanics and echo state networks, on a heterogeneous data set drawn from user behavior on Twitter. We demonstrate that the behavior of users can be well-modeled as processes with self-feedback. We find that the two modeling approaches perform very similarly for most users, but that users where the two methods differ in performance highlight the challenges faced in applying predictive models to dynamic social data.
We then expand the predictive problem of the previous work to modeling the aggregate behavior of large collections of users. We use three models, corresponding to seasonal, aggregate autoregressive, and aggregation-of-individual approaches, and find that the performance of the methods at predicting times of high activity depends strongly on the tradeoff between true and false positives, with no method dominating. Our results highlight the challenges and opportunities involved in modeling complex social systems, and demonstrate how influencers interested in forecasting potential user engagement can use complexity modeling to make better decisions.
Finally, we turn from a predictive to a descriptive framework, and investigate how well user behavior can be attributed to time of day, self-memory, and social inputs. The models allow us to describe how a user processes their past behavior and their social inputs. We find that despite the diversity of observed user behavior, most models inferred fall into a small subclass of all possible finitary processes. Thus, our work demonstrates that user behavior, while quite complex, belies simple underlying computational structures.
2015-01-01T00:00:00ZMultiscale and Directional Representations of High-Dimensional Information Content in Remotely Sensed DataWeinberg, Daniel Erichttp://hdl.handle.net/1903/171032015-09-19T02:37:51Z2015-01-01T00:00:00ZMultiscale and Directional Representations of High-Dimensional Information Content in Remotely Sensed Data
Weinberg, Daniel Eric
This thesis explores the theory and applications of directional representations in
the field of anisotropic harmonic analysis. Although wavelets are optimal for decomposing functions in one dimension, they are unable to achieve the same success in two or more dimensions due to the presence of curves and surfaces of discontinuity. In order to optimally capture the behavior of a function at high-dimensional discontinuities, we must be able to incorporate directional information into our analyzing functions, in addition to location and scale. Examples of such representations are contourlets, curvelets, ridgelets, bandelets, wedgelets, and shearlets. Using directional representations, in particular shearlets, we tackle several challenging problems in the processing of remotely sensed data. First, we detect roads and ditches in LIDAR data of rural scenes. Second, we develop an algorithm for superresolution of optical and hyperspectral data. We conclude by presenting a stochastic particle model in which the probability of movement in a particular direction is neighbor-weighted.
2015-01-01T00:00:00ZIdentification of Operators on Elementary Locally Compact Abelian GroupsCivan, Gokhanhttp://hdl.handle.net/1903/170762015-09-19T02:36:13Z2015-01-01T00:00:00ZIdentification of Operators on Elementary Locally Compact Abelian Groups
Civan, Gokhan
Measurement of time-variant linear channels is an important problem in communications theory with applications in mobile communications and radar detection. Kailath addressed this problem about half a century ago and developed a spreading criterion for the identifiability of time-variant channels analogous to the band limitation criterion in the classical sampling theory of signals. Roughly speaking, underspread channels are identifiable and overspread channels are not identifiable, where the critical spreading area equals one. Kailath's analysis was later generalized by Bello from rectangular to arbitrary spreading supports.
Modern developments in time-frequency analysis provide a natural and powerful framework in which to study the channel measurement problem from a rigorous mathematical standpoint. Pfander and Walnut, building on earlier work by Kozek and Pfander, have developed a sophisticated theory of "operator sampling" or "operator identification" which not only places the work of Kailath and Bello on rigorous footing, but also takes the subject in new directions, revealing connections with other important problems in time-frequency analysis.
We expand upon the existing work on operator identification, which is restricted to the real line, and investigate the subject on elementary locally compact abelian groups, which are groups built from the real line, the circle, the integers, and finite abelian groups. Our approach is to axiomatize, as it were, the main ideas which have been developed over the real line, working with lattice subgroups. We are thus able to prove the various identifiability results for operators involving both underspread and overspread conditions in both general and specific cases. For example, we provide a finite dimensional example illustrating a necessary and sufficient condition for identifiability of operators, owing to the insight gleaned from the general theory.
In working up to our main results, we set up the quite considerable technical background, bringing some new perspectives to existing ideas and generally filling what we consider to be gaps in the literature.
2015-01-01T00:00:00ZBred vectors, singular vectors, and Lyapunov vectors in simple and complex modelsNorwood, Adriennehttp://hdl.handle.net/1903/170712015-09-19T02:35:58Z2015-01-01T00:00:00ZBred vectors, singular vectors, and Lyapunov vectors in simple and complex models
Norwood, Adrienne
We compute and compare three types of vectors frequently used to explore the instability properties of dynamical models, Lyapunov vectors (LVs), singular vectors (SVs), and bred vectors (BVs). The first model is the Lorenz (1963) three-variable model. We find BVs align with the locally fastest growing LV, which is often the second fastest growing global LV. The growth rates of the three types of vectors reveal all predict regime changes and durations of new regimes, as shown for BVs by Evans et al. (2004). The second model is the toy ‘atmosphere-ocean model’ developed by Peña and Kalnay (2004) coupling three Lorenz (1963) models with different time scales to test the effects of fast and slow modes of growth on the dynamical vectors. A fast ‘extratropical atmosphere’ is weakly coupled to a fast ‘tropical atmosphere’ which is strongly coupled to a slow ‘ocean’ system, the latter coupling imitating the tropical El Niño–Southern Oscillation. BVs separate the fast and slow modes of growth through appropriate selection of the breeding parameters. LVs successfully separate the fast ‘extratropics’ but cannot completely decouple the ‘tropics’ from the ‘ocean,’ leading to ‘coupled’ LVs that are affected by both systems but mainly dominated by one. SVs identify the fast modes but cannot capture the slow modes until the fast ‘extratropics’ are replaced with faster ‘convection.’ The dissimilar behavior of the three types of vectors degrades the similarities of the subspaces they inhabit (Norwood et al. 2013). The third model is a quasi-geostrophic channel model (Rotunno and Bao 1996) that is a simplification of extratropical synoptic-scale motions with baroclinic instabilities only. We were unable to successfully compute LVs for it. However, randomly initialized BVs quickly converge to a single vector that is the leading LV. The last model is the SPEEDY model created by Molteni (2003). It is a simplified general atmospheric circulation model with several types of instabilities saturating at different time scales. Through proper selection of the breeding parameters, BVs identify baroclinic and convective instabilities. When the amplitude and rescaling period are further reduced, all BVs converge to a single vector associated with Lamb waves, something never before observed.
2015-01-01T00:00:00Z