Mathematics

Permanent URI for this communityhttp://hdl.handle.net/1903/2261

Browse

Search Results

Now showing 1 - 10 of 496
  • Thumbnail Image
    Item
    Better Metrics to Automatically Predict the Quality of a Text Summary
    (MDPI, 2012-09-26) Rankel, Peter A.; Conroy, John M.; Schlesinger, Judith D.
    In this paper we demonstrate a family of metrics for estimating the quality of a text summary relative to one or more human-generated summaries. The improved metrics are based on features automatically computed from the summaries to measure content and linguistic quality. The features are combined using one of three methods—robust regression, non-negative least squares, or canonical correlation, an eigenvalue method. The new metrics significantly outperform the previous standard for automatic text summarization evaluation, ROUGE.
  • Thumbnail Image
    Item
    Complexity-Regularized Regression for Serially-Correlated Residuals with Applications to Stock Market Data
    (MDPI, 2014-12-23) Darmon, David; Girvan, Michelle
    A popular approach in the investigation of the short-term behavior of a non-stationary time series is to assume that the time series decomposes additively into a long-term trend and short-term fluctuations. A first step towards investigating the short-term behavior requires estimation of the trend, typically via smoothing in the time domain. We propose a method for time-domain smoothing, called complexity-regularized regression (CRR). This method extends recent work, which infers a regression function that makes residuals from a model “look random”. Our approach operationalizes non-randomness in the residuals by applying ideas from computational mechanics, in particular the statistical complexity of the residual process. The method is compared to generalized cross-validation (GCV), a standard approach for inferring regression functions, and shown to outperform GCV when the error terms are serially correlated. Regression under serially-correlated residuals has applications to time series analysis, where the residuals may represent short timescale activity. We apply CRR to a time series drawn from the Dow Jones Industrial Average and examine how both the long-term and short-term behavior of the market have changed over time.
  • Thumbnail Image
    Item
    Detection of co-eluted peptides using database search methods
    (Springer Nature, 2008-07-02) Alves, Gelio; Ogurtsov, Aleksey Y; Kwok, Siwei; Wu, Wells W; Wang, Guanghui; Shen, Rong-Fong; Yu, Yi-Kuo
    Current experimental techniques, especially those applying liquid chromatography mass spectrometry, have made high-throughput proteomic studies possible. The increase in throughput however also raises concerns on the accuracy of identification or quantification. Most experimental procedures select in a given MS scan only a few relatively most intense parent ions, each to be fragmented (MS2) separately, and most other minor co-eluted peptides that have similar chromatographic retention times are ignored and their information lost. We have computationally investigated the possibility of enhancing the information retrieval during a given LC/MS experiment by selecting the two or three most intense parent ions for simultaneous fragmentation. A set of spectra is created via superimposing a number of MS2 spectra, each can be identified by all search methods tested with high confidence, to mimick the spectra of co-eluted peptides. The generated convoluted spectra were used to evaluate the capability of several database search methods – SEQUEST, Mascot, X!Tandem, OMSSA, and RAId_DbS – in identifying true peptides from superimposed spectra of co-eluted peptides. We show that using these simulated spectra, all the database search methods will gain eventually in the number of true peptides identified by using the compound spectra of co-eluted peptides. Reviewed by Vlad Petyuk (nominated by Arcady Mushegian), King Jordan and Shamil Sunyaev. For the full reviews, please go to the Reviewers' comments section.
  • Thumbnail Image
    Item
    Simultaneous transcriptional profiling of Leishmania major and its murine macrophage host cell reveals insights into host-pathogen interactions
    (Springer Nature, 2015-12-29) Dillon, Laura A. L.; Suresh, Rahul; Okrah, Kwame; Corrada Bravo, Hector; Mosser, David M.; El-Sayed, Najib M.
    Parasites of the genus Leishmania are the causative agents of leishmaniasis, a group of diseases that range in manifestations from skin lesions to fatal visceral disease. The life cycle of Leishmania parasites is split between its insect vector and its mammalian host, where it resides primarily inside of macrophages. Once intracellular, Leishmania parasites must evade or deactivate the host's innate and adaptive immune responses in order to survive and replicate. We performed transcriptome profiling using RNA-seq to simultaneously identify global changes in murine macrophage and L. major gene expression as the parasite entered and persisted within murine macrophages during the first 72 h of an infection. Differential gene expression, pathway, and gene ontology analyses enabled us to identify modulations in host and parasite responses during an infection. The most substantial and dynamic gene expression responses by both macrophage and parasite were observed during early infection. Murine genes related to both pro- and anti-inflammatory immune responses and glycolysis were substantially upregulated and genes related to lipid metabolism, biogenesis, and Fc gamma receptor-mediated phagocytosis were downregulated. Upregulated parasite genes included those aimed at mitigating the effects of an oxidative response by the host immune system while downregulated genes were related to translation, cell signaling, fatty acid biosynthesis, and flagellum structure. The gene expression patterns identified in this work yield signatures that characterize multiple developmental stages of L. major parasites and the coordinated response of Leishmania-infected macrophages in the real-time setting of a dual biological system. This comprehensive dataset offers a clearer and more sensitive picture of the interplay between host and parasite during intracellular infection, providing additional insights into how pathogens are able to evade host defenses and modulate the biological functions of the cell in order to survive in the mammalian environment.
  • Thumbnail Image
    Item
    Evolution of transcriptional networks in yeast: alternative teams of transcriptional factors for different species
    (Springer Nature, 2016-11-11) Muñoz, Adriana; Santos Muñoz, Daniella; Zimin, Aleksey; Yorke, James A.
    The diversity in eukaryotic life reflects a diversity in regulatory pathways. Nocedal and Johnson argue that the rewiring of gene regulatory networks is a major force for the diversity of life, that changes in regulation can create new species. We have created a method (based on our new “ping-pong algorithm) for detecting more complicated rewirings, where several transcription factors can substitute for one or more transcription factors in the regulation of a family of co-regulated genes. An example is illustrative. A rewiring has been reported by Hogues et al. that RAP1 in Saccharomyces cerevisiae substitutes for TBF1/CBF1 in Candida albicans for ribosomal RP genes. There one transcription factor substitutes for another on some collection of genes. Such a substitution is referred to as a “rewiring”. We agree with this finding of rewiring as far as it goes but the situation is more complicated. Many transcription factors can regulate a gene and our algorithm finds that in this example a “team” (or collection) of three transcription factors including RAP1 substitutes for TBF1 for 19 genes. The switch occurs for a branch of the phylogenetic tree containing 10 species (including Saccharomyces cerevisiae), while the remaining 13 species (Candida albicans) are regulated by TBF1. To gain insight into more general evolutionary mechanisms, we have created a mathematical algorithm that finds such general switching events and we prove that it converges. Of course any such computational discovery should be validated in the biological tests. For each branch of the phylogenetic tree and each gene module, our algorithm finds a sub-group of co-regulated genes and a team of transcription factors that substitutes for another team of transcription factors. In most cases the signal will be small but in some cases we find a strong signal of switching. We report our findings for 23 Ascomycota fungi species.
  • Thumbnail Image
    Item
    Analysis and correction of compositional bias in sparse sequencing count data
    (Springer Nature, 2018-11-06) Kumar, M. Senthil; Slud, Eric V.; Okrah, Kwame; Hicks, Stephanie C.; Hannenhalli, Sridhar; Bravo, Héctor Corrada
    Count data derived from high-throughput deoxy-ribonucliec acid (DNA) sequencing is frequently used in quantitative molecular assays. Due to properties inherent to the sequencing process, unnormalized count data is compositional, measuring relative and not absolute abundances of the assayed features. This compositional bias confounds inference of absolute abundances. Commonly used count data normalization approaches like library size scaling/rarefaction/subsampling cannot correct for compositional or any other relevant technical bias that is uncorrelated with library size. We demonstrate that existing techniques for estimating compositional bias fail with sparse metagenomic 16S count data and propose an empirical Bayes normalization approach to overcome this problem. In addition, we clarify the assumptions underlying frequently used scaling normalization methods in light of compositional bias, including scaling methods that were not designed directly to address it.
  • Thumbnail Image
    Item
    A statistical analysis of vaccine-adverse event data
    (Springer Nature, 2019-05-28) Ren, Jian-Jian; Sun, Tingni; He, Yongqun; Zhang, Yuji
    Vaccination has been one of the most successful public health interventions to date, and the U.S. FDA/CDC Vaccine Adverse Event Reporting System (VAERS) currently contains more than 500,000 reports for post-vaccination adverse events that occur after the administration of vaccines licensed in the United States. The VAERS dataset is huge, contains very large dimension nominal variables, and is complex due to multiple listing of vaccines and adverse symptoms in a single report. So far there has not been any statistical analysis conducted in attempting to identify the cross-board patterns on how all reported adverse symptoms are related to the vaccines.
  • Thumbnail Image
    Item
    A deficiency in SUMOylation activity disrupts multiple pathways leading to neural tube and heart defects in Xenopus embryos
    (Springer Nature, 2019-05-17) Bertke, Michelle M.; Dubiak, Kyle M.; Cronin, Laura; Zeng, Erliang; Huber, Paul W.
    Adenovirus protein, Gam1, triggers the proteolytic destruction of the E1 SUMO-activating enzyme. Microinjection of an empirically determined amount of Gam1 mRNA into one-cell Xenopus embryos can reduce SUMOylation activity to undetectable, but nonlethal, levels, enabling an examination of the role of this post-translational modification during early vertebrate development.
  • Thumbnail Image
    Item
    Markov multi-state models for survival analysis with recurrent events
    (2019) Zhang, Tianhui; Yang, Grace; Mathematical Statistics; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    Markov models are a major class within the system of multi-state models for the analysis of lifetime or event-time data. Applications abound, including the estimation of lifetime of ultra-cold neutrons, the bias correction of the apparent magnitude distribution of the stars in a certain area of the sky, and the survival analysis of clinical trials. This thesis addresses some of the problems arising in the analysis of right-censored lifetime data. Clinical trials are used as examples to investigate these problems. A Markov model that takes a patient's disease development into account for the analysis of right-censored data was first constructed by Fix and Neyman (1951). The Fix-Neyman (F-N) model is a homogeneous Markov process with two transient and two absorbing states that describes a patient's status over a period of time during a cancer clinical trial. This thesis extends the F-N model by assuming the transition rates (hazard rates) to be both state and time dependent. Recurrent transitions between relapse and recovery are allowed in the extended model. By relaxing the condition of time-independent hazard rates, the extension increases the applicability of the Markov models. The extended models are used to compute the model survival functions, cumulative hazard functions that take into consideration of right censored observations as it has been done in the celebrated Kaplan-Meier estimator. Using the Fix-Neyman procedure and the Kolmogorov forward equations, closed-form solutions are obtained for certain irreversible 4-state extended models while numerical solutions are obtained for the model with recurrent events. The 4-state model is motivated by an Aplastic Anemia data set. The computational method works for general irreversible and reversible models with no restriction on the number of states. Simulations of right-censored Markov processes are performed by using a sequence of competing risks models. Simulated data are used for checking the performance of nonparametric estimators for various sample sizes. In addition, applying Aalen's (1978) results, estimators are shown have asymptotic normal distributions. A brief review of some of the literature relevant to this thesis is provided. References are readily available from a vast literature on the survival analysis including many text books. General Markov process models for survival analysis are described, e.g., in Andersen, Borgan, Gill and Keiding (1993).
  • Thumbnail Image
    Item
    Regression Analysis of Recurrent Events with Measurement Errors
    (2019) Ren, Yixin; Smith, Paul J; He, Xin; Mathematical Statistics; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    Recurrent event data and panel count data are often encountered in longitudinal follow-up studies. The main difference between the two types of data is the observation process. Continuous observations will result in recurrent event data; and discrete observations will lead to panel count data. In statistical literature, regression analysis of the two types of data have been well studied; and a typical assumption of those studies is that all covariates are accurately recorded. However, in many applications, it is common to have measurement errors in some of the covariates. For example, in a clinical trial, a medical index might have been measured multiple times. Then dealing with the differences among those measurements is an essential topic for statisticians. For recurrent event data, we present a class of semiparametric regression models that allow correlations between censoring time and recurrent event process via frailty. An estimating equation based approach is developed to account for the presence of measurement errors in some of the covariates. Both large and finite sample properties of the proposed estimators are established. An example from the study of gamma interferon in chronic granulomatous disease is provided. For panel count data, we consider two situations in which the observation process is independent or dependent of covariates. Estimating equations are developed for the estimation of the regression parameters for both cases. Simulation studies indicate that the proposed inference procedures perform well for practical situations. An example of bladder cancer study is used to demonstrate the value of the proposed method.