Theses and Dissertations from UMD
Permanent URI for this communityhttp://hdl.handle.net/1903/2
New submissions to the thesis/dissertation collections are added automatically as they are received from the Graduate School. Currently, the Graduate School deposits all theses and dissertations from a given semester after the official graduation date. This means that there may be up to a 4 month delay in the appearance of a give thesis/dissertation in DRUM
More information is available at Theses and Dissertations at University of Maryland Libraries.
Browse
4 results
Search Results
Item Rich and Scalable Models for Text(2019) nguyen, thang dai; Boyd-Graber, Jordan; Resnik, Philip; Computer Science; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)Topic models have become essential tools for uncovering hidden structures in big data. However, the most popular topic model algorithm—Latent Dirichlet Allocation (LDA)— and its extensions suffer from sluggish performance on big datasets. Recently, the machine learning community has attacked this problem using spectral learning approaches such as the moment method with tensor decomposition or matrix factorization. The anchor word algorithm by Arora et al. [2013] has emerged as a more efficient approach to solve a large class of topic modeling problems. The anchor word algorithm is high-speed, and it has a provable theoretical guarantee: it will converge to a global solution given enough number of documents. In this thesis, we present a series of spectral models based on the anchor word algorithm to serve a broader class of datasets and to provide more abundant and more flexible modeling capacity. First, we improve the anchor word algorithm by incorporating various rich priors in the form of appropriate regularization terms. Our new regularized anchor word algorithms produce higher topic quality and provide flexibility to incorporate informed priors, creating the ability to discover topics more suited for external knowledge. Second, we enrich the anchor word algorithm with metadata-based word representation for labeled datasets. Our new supervised anchor word algorithm runs very fast and predicts better than supervised topic models such as Supervised LDA on three sentiment datasets. Also, sentiment anchor words, which play a vital role in generating sentiment topics, provide cues to understand sentiment datasets better than unsupervised topic models. Lastly, we examine ALTO, an active learning framework with a static topic overview, and investigate the usability of supervised topic models for active learning. We develop a new, dynamic, active learning framework that combines the concept of informativeness and representativeness of documents using dynamically updating topics from our fast supervised anchor word algorithm. Experiments using three multi-class datasets show that our new framework consistently improves classification accuracy over ALTO.Item EXPLORING THE DIMENSIONS OF GENDER AND STUDENT EPISTEMOLOGIES IN A REFORMED LEARNER-CENTERED ORGANISMAL BIOLOGY COURSE: A MIXED METHODS APPROACH(2019) Klosteridis, Jennifer Hayes-; Hultgren, Francine; Croninger, Robert; Education Policy, and Leadership; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)Gender and student epistemology play a role in how students interact with STEM content and knowledge development in the classroom and may influence the retention of women in the sciences. Reform agencies have called for changes to the undergraduate biology curriculum to produce students with high level quantitative and critical thinking skills. As educators seek to reform college biology courses to align with policy maker recommendations, it remains important to consider how these dimensions influence student learning of reformed content and pedagogy. This mixed methods study explored the dimensions of gender and epistemology as they related to student learning in a reformed learner-centered organismal biology course at a large east coast university. Pre-test and post-test epistemological survey results and qualitative interview data collected over two semesters by Hall (2013) were analyzed. The results indicated that there was no significant relationship between gender and student epistemologies at pre-test or post-test on the MBEX I instrument or in 3 of the 4 epistemological clusters. Both women and men experienced significant positive shifts on the instrument overall and in two clusters of the survey instrument. Specifically, women and men became more sophisticated in their view of the structure of biological sciences knowledge as composed of principles, and how biology knowledge should be constructed rather than memorized. Qualitative findings, however, suggested that gender and level of epistemological sophistication played a role in how women and men experienced the reformed content and pedagogy in the course. Specifically, women expressed resistance to the inclusion of physical science content in the course, while most men expressed receptivity. This study is unique in that it explored the interplay between gender and epistemology as it related to course content and pedagogical reform. Through integration of the quantitative results and qualitative findings, the study concluded that the reformed learner-centered course was successful at creating more epistemologically sophisticated men and women who viewed biological knowledge as principles-based and developed a belief that biological knowledge is a process of knowledge construction. The results also suggested that women had a more favorable response to the active learning pedagogy. Gender may have created a potential resistance to the inclusion of other disciplinary perspectives and content in the course. The results and findings add to the higher education curriculum reform and instruction literature by providing some insight into how student epistemology and gender may influence faculty efforts to develop courses that align with national reform efforts.Item FEATURE LEARNING AND ACTIVE LEARNING FOR IMAGE QUALITY ASSESSMENT(2014) Ye, Peng; Chellappa, Rama; Doermann, David; Electrical Engineering; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)With the increasing popularity of mobile imaging devices, digital images have become an important vehicle for representing and communicating information. Unfortunately, digital images may be degraded at various stages of their life cycle. These degradations may lead to the loss of visual information, resulting in an unsatisfactory experience for human viewers and difficulties for image processing and analysis at subsequent stages. The problem of visual information quality assessment plays an important role in numerous image/video processing and computer vision applications, including image compression, image transmission and image retrieval, etc. There are two divisions of Image Quality Assessment (IQA) research - Objective IQA and Subjective IQA. For objective IQA, the goal is to develop a computational model that can predict the quality of distorted image with respect to human perception or other measures of interest accurately and automatically. For subjective IQA, the goal is to design experiments for acquiring human subjects' opinions on image quality. It is often used to construct image quality datasets and provide the groundtruth for building and evaluating objective quality measures. In the thesis, we will address these two aspects of IQA problem. For objective IQA, our work focuses on the most challenging category of objective IQA tasks - general-purpose No-Reference IQA (NR-IQA), where the goal is to evaluate the quality of digital images without access to reference images and without prior knowledge of the types of distortions. First, we introduce a feature learning framework for NR-IQA. Our method learns discriminative visual features in the spatial domain instead of using hand-craft features. It can therefore significantly reduce the feature computation time compared to previous state-of-the-art approaches while achieving state-of-the-art performance in prediction accuracy. Second, we present an effective method for extending existing NR-IQA mod- els to "Opinion-Free" (OF) models which do not require human opinion scores for training. In particular, we accomplish this by using Full-Reference (FR) IQA measures to train NR-IQA models. Unsupervised rank aggregation is applied to combine different FR measures to generate a synthetic score, which serves as a better "gold standard". Our method significantly outperforms previous OF-NRIQA methods and is comparable to state-of-the-art NR-IQA methods trained on human opinion scores. Unlike objective IQA, subjective IQA tests ask humans to evaluate image quality and are generally considered as the most reliable way to evaluate the visual quality of digital images perceived by the end user. We present a hybrid subjective test which combines Absolute Categorical Rating (ACR) tests and Paired Comparison (PC) tests via a unified probabilistic model and an active sampling method. Our method actively constructs a set of queries consisting of ACR and PC tests based on the expected information gain provided by each test and can effectively reduce the number of tests required for achieving a target accuracy. Our method can be used in conventional laboratory studies as well as crowdsourcing experiments. Experimental results show our method outperforms state-of-the-art subjective IQA tests in a crowdsourced setting.Item Cost-sensitive Information Acquisition in Structured Domains(2010) Bilgic, Mustafa; Getoor, Lise C; Computer Science; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)Many real-world prediction tasks require collecting information about the domain entities to achieve better predictive performance. Collecting the additional information is often a costly process that involves acquiring the features describing the entities and annotating the entities with target labels. For example, document collections need to be manually annotated for classification and lab tests need to be ordered for medical diagnosis. Annotating the whole document collection and ordering all possible lab tests might be infeasible due to limited resources. In this thesis, I explore effective and efficient ways of choosing the right features and labels to acquire under limited resources. For the problem of feature acquisition, we are given entities with missing features and the task is to classify them with minimum cost. The likelihood of misclassification can be reduced by acquiring features but acquiring features incurs costs as well. The objective is to acquire the right set of features that balance acquisition and misclassification cost. I introduce a technique that can reduce the space of possible sets of features to consider for acquisition by exploiting the conditional independence properties in the underlying probability distribution. For the problem of label acquisition, I consider two real-world scenarios. In the first one, we are given a previously trained model and a budget determining how many labels we can acquire, and the objective is to determine the right set of labels to acquire so that the accuracy on the remaining ones is maximized. I describe a system that can automatically learn and predict on which entities the underlying classifier is likely to make mistakes and it suggests acquiring the labels of the entities that lie in a high density potentially-misclassified region. In the second scenario, we are given a network of entities that are unlabeled and our objective is to learn a classification model that will have the least future expected error by acquiring minimum number of labels. I describe an active learning technique that can exploit the relationships in the network both to select informative entities to label and to learn a collective classifier that utilizes the label correlations in the network.