Browsing by Author "Licamele, Louis"
Now showing 1 - 4 of 4
Results Per Page
Sort Options
Item Indirect two-sided relative ranking: a robust similarity measure for gene expression data(2010-03-17) Licamele, Louis; Getoor, LiseBackground: There is a large amount of gene expression data that exists in the public domain. This data has been generated under a variety of experimental conditions. Unfortunately, these experimental variations have generally prevented researchers from accurately comparing and combining this wealth of data, which still hides many novel insights. Results: In this paper we present a new method, which we refer to as indirect two-sided relative ranking, for comparing gene expression profiles that is robust to variations in experimental conditions. This method extends the current best approach, which is based on comparing the correlations of the up and down regulated genes, by introducing a comparison based on the correlations in rankings across the entire database. Because our method is robust to experimental variations, it allows a greater variety of gene expression data to be combined, which, as we show, leads to richer scientific discoveries. Conclusions: We demonstrate the benefit of our proposed indirect method on several datasets. We first evaluate the ability of the indirect method to retrieve compounds with similar therapeutic effects across known experimental barriers, namely vehicle and batch effects, on two independent datasets (one private and one public). We show that our indirect method is able to significantly improve upon the previous state-of-the-art method with a substantial improvement in recall at rank 10 of 97.03% and 49.44%, on each dataset, respectively. Next, we demonstrate that our indirect method results in improved accuracy for classification in several additional datasets. These datasets demonstrate the use of our indirect method for classifying cancer subtypes, predicting drug sensitivity/resistance, and classifying (related) cell types. Even in the absence of a known (i.e., labeled) experimental barrier, the improvement of the indirect method in each of these datasets is statistically significant.Item KNOWLEDGE DISCOVERY FROM GENE EXPRESSION DATA: NOVEL METHODS FOR SIMILARITY SEARCH, SIGNATURE DETECTION, AND CONFOUNDER CORRECTION(2012) Licamele, Louis; Getoor, Lise; Computer Science; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)Gene expression microarray data is used to answer a variety of scientific questions. For example, it can be used for gaining a better understanding of a drug, segmenting a disease, and predicting an optimal therapeutic response. The amount of gene expression data publicly available is extremely large and continues to grow at an increasing rate. However, this rapid growth of gene expression data from laboratories across the world has not fully achieved its potential impact on the scientific community. This shortcoming is due to the fact that the majority of the data has been gathered under varying conditions, and there is no principled way for combining and fully utilizing related data. Even within a closely controlled gene expression experiment, there are confounding factors that may mask the true signatures when analyzed with current methods. Therefore, we are interested in three core tasks that we believe are important for improving the utilization of gene array data: similarity search, signature detection, and confounder correction. We have developed novel methods that address each of these tasks. In this work, we first address the similarity search problem. More specifically, we propose methods which overcome experimental barriers in pariwise gene expression similarity calculations. We introduce a method, which we refer to as indirect similarity, which, unlike previous approaches, uses all of the information in a database to better inform the similarity calculation of a pair of gene expression profiles. We demonstrate that our method is more robust and better able to cope with experimental barriers such as vehicle and batch effects. We evaluate the ability of our method to retrieve compounds with similar therapeutic effects in two independent datasets. We evaluate the recall ability of our approach and show that our method results in an improvement of 97.03% and 49.44% respectively over existing state of the art approaches. The second problem we focus on is signature detection. Gene expression experiments are performed to test a specific hypothesis. Generally, this hypothesis is that there is some genetic signature common in a group of samples. Current methods try to find the differentially expressed genes within a group of samples using a variety of methods, however, they all are parametric. We introduce a nonparametric approach to group profile creation which we refer to as the Weighted Influence Model - Rank of Ranks method. For every probe on the microarray, the average rank is calculated across all members of a group. These average ranks are then re-ranked to form the group profile. We demonstrate the ability of our group profile method to better understand a disease and the underlying mechanism common to its treatments. Additionally, we demonstrate the predictive power of this group profile to detect novel drugs that could treat a particular disease. This method leads the detection of robust group signatures even with unknown confounding effects. The final problem that we address is the challenge of removing known (annotated) confounding effects from gene expression profiles. We propose an extension to our non-parametric gene expression profile method to correct for observed confounding effects. This correction is performed on ranked lists directly, and it provides a robust alternative to parametric batch profile correction methods. We evaluate our novel profile subtraction method on two real world datasets, comparing against several state-of-the-art parametric methods. We demonstrate an improvement in group signature detection using our method to remove confounding effects. Additionally, we show that in a dataset with the true group assignments removed and only the confounding effects labelled, our profile subtraction method allows for the discovery of the true groups. We evaluate the robustness of our methods using a gene expression profile generator that we developed.Item Predicting Protein-Protein Interactions Using Relational Features(2007-01-07) Licamele, Louis; Getoor, LiseProteins play a fundamental role in ever y process within the cell. Understanding how proteins interact, and the functional units they are par t of, is important to furthering our knowledge of the entire biological process. There has been a growing amount of work, both experimental and computational, on determining the protein-protein interaction network. Recently researchers have had success looking at this as a relational learning problem. In this work, we further this investigation, proposing several novel relational features for predicting protein-protein interaction. These features can be used in any classifier. Our approach allows large and complex networks to be analyzed and is an alternative to using more expensive relational methods. We show that we are able to get an accuracy of 81.7% when predicting new links from noisy high throughput data.Item Social Capital in Friendship-Event Networks(2006-09-27) Licamele, Louis; Getoor, LiseIn this paper, we examine a particular form of social network which we call a friendship-event network. A friendship-event network captures both the friendship relationship among a set of actors, and also the organizer and participation relationships of actors in a series of events. Within these networks, we formulate the notion of social capital based on the actor-organizer friendship relationship and the notion of benefit, based on event participation. We investigate appropriate definitions for the social capital of both a single actor and a collection of actors. We ground these definitions in a real-world example of academic collaboration networks, where the actors are researchers, the friendships are collaborations, the events are conferences, the organizers are program committee members and the participants are conference authors. We show that our definitions of capital and benefit capture interesting qualitative properties of event series. In addition, we show that social capital is a better publication predictor than publication history.