Computer Science Theses and Dissertations
Permanent URI for this collectionhttp://hdl.handle.net/1903/2756
Browse
2 results
Search Results
Item A Probabilistic Approach to Modeling Socio-Behavioral Interactions(2016) Ramesh, Arti; Getoor, Lise; Computer Science; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)In our ever-increasingly connected world, it is essential to build computational models that represent, reason, and model the underlying characteristics of real-world networks. Data generated from these networks are often heterogeneous, interlinked, and exhibit rich multi-relational graph structures having unobserved latent characteristics. My work focuses on building computational models for representing and reasoning about rich, heterogeneous, interlinked graph data. In my research, I model socio-behavioral interactions and predict user behavior patterns in two important online interaction platforms: online courses and online professional networks. Structured data from these interaction platforms contain rich behavioral and interaction data, and provide an opportunity to design machine learning methods for understanding and interpreting user behavior. The data also contains unstructured data, such as natural language text from forum posts and other online discussions. My research aims at constructing a family of probabilistic models for modeling social interactions involving both structured and unstructured data. In the early part of this thesis, I present a family of probabilistic models for online courses for: 1) modeling student engagement, 2) predicting student completion and dropouts, 3) modeling student sentiment toward various course aspects (e.g., content vs. logistics), 4) detecting coarse and fine-grained course aspects (e.g., grading, video, content), and 5) modeling evolution of topics in repeated offerings of online courses. These methods have the potential to improve student experience and focus limited instructor resources in ways that will have the most impact. In the latter part of this thesis, I present methods to model multi-relational influence in online professional networks. I test the effectiveness of this model via experimentation on the professional network, LinkedIn. My models can potentially be adapted to address a wide range of problems in real-world networks including predicting user interests, user retention, personalization, and making recommendations.Item FEATURE GENERATION AND ANALYSIS APPLIED TO SEQUENCE CLASSIFICATION FOR SPLICE-SITE PREDICTION(2007-11-27) Islamaj, Rezarta; Getoor, Lise; Computer Science; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)Sequence classification is an important problem in many real-world applications. Sequence data often contain no explicit "signals," or features, to enable the construction of classification algorithms. Extracting and interpreting the most useful features is challenging, and hand construction of good features is the basis of many classification algorithms. In this thesis, I address this problem by developing a feature-generation algorithm (FGA). FGA is a scalable method for automatic feature generation for sequences; it identifies sequence components and uses domain knowledge, systematically constructs features, explores the space of possible features, and identifies the most useful ones. In the domain of biological sequences, splice-sites are locations in DNA sequences that signal the boundaries between genetic information and intervening non-coding regions. Only when splice-sites are identified with nucleotide precision can the genetic information be translated to produce functional proteins. In this thesis, I address this fundamental process by developing a highly accurate splice-site prediction model that employs our sequence feature-generation framework. The FGA model shows statistically significant improvements over state-of-the-art splice-site prediction methods. So that biologists can understand and interpret the features FGA constructs, I developed SplicePort, a web-based tool for splice-site prediction and analysis. With SplicePort the user can explore the relevant features for splicing, and can obtain splice-site predictions for the sequences based on these features. For an experimental biologist trying to identify the critical sequence elements of splicing, SplicePort offers flexibility and a rich motif exploration functionality, which may help to significantly reduce the amount of experimentation needed. In this thesis, I present examples of the observed feature groups and describe efforts to detect biological signals that may be important for the splicing process. Naturally, FGA can be generalized to other biologically inspired classification problems, such as tissue-specific regulatory elements, polyadenylation sites, promoters, as well as other sequence classification problems, provided we have sufficient knowledge of the new domain.