Physics
Permanent URI for this communityhttp://hdl.handle.net/1903/2269
Browse
2 results
Search Results
Item DEVELOPING MACHINE LEARNING TECHNIQUES FOR NETWORK CONNECTIVITY INFERENCE FROM TIME-SERIES DATA(2022) Banerjee, Amitava; Ott, Edward; Physics; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)Inference of the connectivity structure of a network from the observed dynamics of the states of its nodes is a key issue in science, with wide-ranging applications such as determination of the synapses in nervous systems, mapping of interactions between genes and proteins in biochemical networks, distinguishing ecological relationships between different species in their habitats etc. In this thesis, we show that certain machine learning models, trained for the forecasting of experimental and synthetic time-series data from complex systems, can automatically learn the causal networks underlying such complex systems. Based on this observation, we develop new machine learning techniques for inference of causal interaction network connectivity structures underlying large, networked, noisy, complex dynamical systems, solely from the time-series of their nodal states. In particular, our approach is to first train a type of machine learning architecture, known as the ‘reservoir computer’, to mimic the measured dynamics of an unknown network. We then use the trained reservoir computer system as an in silico computational model of the unknown network to estimate how small changes in nodal states propagate in time across that network. Since small perturbations of network nodal states are expected to spread along the links of the network, the estimated propagation of nodal state perturbations reveal the connections of the unknown network. Our technique is noninvasive, but is motivated by the widely used invasive network inference method, whereby the temporal propagation of active perturbations applied to the network nodes are observed and employed to infer the network links (e.g., tracing the effects of knocking down multiple genes, one at a time, can be used infer gene regulatory networks). We discuss how we can further apply this methodology to infer causal network structures underlying different time-series datasets and compare the inferred network with the ground truth whenever available. We shall demonstrate three practical applications of this network inference procedure in (1) inference of network link strengths from time-series data of coupled, noisy Lorenz oscillators, (2) inference of time-delayed feedback couplings in opto-electronic oscillator circuit networks designed the laboratory, and, (3) inference of the synaptic network from publicly-available calcium fluorescence time-series data of C. elegans neurons. In all examples, we also explain how experimental factors like noise level, sampling time, and measurement duration systematically affect causal inference from experimental data. The results show that synchronization and strong correlation among the dynamics of different nodal states are, in general, detrimental for causal network inference. Features that break synchrony among the nodal states, e.g., coupling strength, network topology, dynamical noise, and heterogeneity of the parameters of individual nodes, help the network inference. In fact, we show in this thesis that, for parameter regimes where the network nodal states are not synchronized, we can often achieve perfect causal network inference from simulated and experimental time-series data, using machine learning techniques, in a wide variety of physical systems. In cases where effects like observational noise, large sampling time, or small sampling duration hinder such perfect network inference, we show that it is possible to utilize specially-designed surrogate time-series data for assigning statistical confidence to individual inferred network links. Given the general applicability of our machine learning methodology in time-series prediction and network inference, we anticipate that such techniques can be used for better model-building, forecasting, and control of complex systems in nature and in the lab.Item UNCOVERING PATTERNS IN COMPLEX DATA WITH RESERVOIR COMPUTING AND NETWORK ANALYTICS: A DYNAMICAL SYSTEMS APPROACH(2020) Krishnagopal, Sanjukta; Girvan, Michelle; Physics; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)In this thesis, we explore methods of uncovering underlying patterns in complex data, and making predictions, through machine learning and network science. With the availability of more data, machine learning for data analysis has advanced rapidly. However, there is a general lack of approaches that might allow us to 'open the black box'. In the machine learning part of this thesis, we primarily use an architecture called Reservoir Computing for time-series prediction and image classification, while exploring how information is encoded in the reservoir dynamics. First, we investigate the ways in which a Reservoir Computer (RC) learns concepts such as 'similar' and 'different', and relationships such as 'blurring', 'rotation' etc. between image pairs, and generalizes these concepts to different classes unseen during training. We observe that the high dimensional reservoir dynamics display different patterns for different relationships. This clustering allows RCs to perform significantly better in generalization with limited training compared with state-of-the-art pair-based convolutional/deep Siamese Neural Networks. Second, we demonstrate the utility of an RC in the separation of superimposed chaotic signals. We assume no knowledge of the dynamical equations that produce the signals, and require only that the training data consist of finite time samples of the component signals. We find that our method significantly outperforms the optimal linear solution to the separation problem, the Wiener filter. To understand how representations of signals are encoded in an RC during learning, we study its dynamical properties when trained to predict chaotic Lorenz signals. We do so by using a novel, mathematical fixed-point-finding technique called directional fibers. We find that, after training, the high dimensional RC dynamics includes fixed points that map to the known Lorenz fixed points, but the RC also has spurious fixed points, which are relevant to how its predictions break down. While machine learning is a useful data processing tool, its success often relies on a useful representation of the system's information. In contrast, systems with a large numbers of interacting components may be better analyzed by modeling them as networks. While numerous advances in network science have helped us analyze such systems, tools that identify properties on networks modeling multi-variate time-evolving data (such as disease data) are limited. We close this gap by introducing a novel data-driven, network-based Trajectory Profile Clustering (TPC) algorithm for 1) identification of disease subtypes and 2) early prediction of subtype/disease progression patterns. TPC identifies subtypes by clustering patients with similar disease trajectory profiles derived from bipartite patient-variable networks. Applying TPC to a Parkinson’s dataset, we identify 3 distinct subtypes. Additionally, we show that TPC predicts disease subtype 4 years in advance with 74% accuracy.