Making Predictions and Handling Errors in Reconstructed Biological Networks

Thumbnail Image
Publication or External Link
Platig, John
Girvan, Michelle
In this thesis we present methods for applying techniques from complex network theory to analyze and interpret inferred biological interactions. With the advent of high throughput technologies such as gene microarrays and genome-wide sequencing, it is now possible to measure the activity of every gene in a cancer cell population under different conditions. How to extract important interactions from these experiments remains an outstanding question. Here we present a method to identify these key interactions by focusing on short paths in a transcription factor network. We use a mutual information-based approach to infer the transcription factor network from gene expression microarrays, which measure perturbations in a Diffuse Large B Cell Lymphoma cell line. By focusing on the number of short paths between transcription factors and signature genes in the inferred network, we find a set of transcription factors whose biology is crucial to the continued survival of these lymphoma cells and also show that a subset of these factors have a distinct expression pattern in patient tumors as well. As many networks of interest are reconstructed from data containing errors, we introduce two simple models of false and missing links to characterize the effects of network misinformation on three commonly used centrality measures: degree centrality, betweenness centrality, and dynamical importance. We show that all three measures are especially robust to both false and missing links when the network has a power law in the tail of its degree distribution.