Mathematics

Permanent URI for this communityhttp://hdl.handle.net/1903/2261

Browse

Search Results

Now showing 1 - 3 of 3
  • Thumbnail Image
    Item
    USING STATISTICAL METHOD TO REVEAL BIOLOGICAL ASPECT OF HUMAN DISEASE: STUDY OF GLIOBLASTOMA BY USING COMPARATIVE GENOMIC HYBRIDIZATION (CGH) METHOD
    (2010) Wang, Yonghong; Smith, Paul; Mathematical Statistics; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    Glioblastoma is a WHO grade IV tumor with high mortality rate. In order to identify the underlying biological causation of this disease, a comparative genomic hybridization dataset generated from 170 patients' tumor samples was analyzed. Of many available segmentation algorithms, I focused mainly on two most acceptable methods: Homogeneous Hidden Markov Models (HHMM) and Circular Binary Segmentation (CBS). Simulations show that CBS tends to give better segmentation result with low false discovery rate. HHMM failed to identify many obvious breakpoints that CBS identified. On the other hand, HHMM succeeds in identifying many single probe aberrations. Applying other statistical algorithms revealed distinct biological fingerprints of Glioblastoma disease, which includes many signature genes and biological pathways. Survival analysis also reveals that several segments actually correlate to the extended survival time of some patients. In summary, this work shows the importance of statistical model or algorithms in the modern genomic research.
  • Thumbnail Image
    Item
    Anomaly Detection in Time Series: Theoretical and Practical Improvements for Disease Outbreak Detection
    (2009) Lotze, Thomas Harvey; Shmueli, Galit; Applied Mathematics and Scientific Computation; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    The automatic collection and increasing availability of health data provides a new opportunity for techniques to monitor this information. By monitoring pre-diagnostic data sources, such as over-the-counter cough medicine sales or emergency room chief complaints of cough, there exists the potential to detect disease outbreaks earlier than traditional laboratory disease confirmation results. This research is particularly important for a modern, highly-connected society, where the onset of disease outbreak can be swift and deadly, whether caused by a naturally occurring global pandemic such as swine flu or a targeted act of bioterrorism. In this dissertation, we first describe the problem and current state of research in disease outbreak detection, then provide four main additions to the field. First, we formalize a framework for analyzing health series data and detecting anomalies: using forecasting methods to predict the next day's value, subtracting the forecast to create residuals, and finally using detection algorithms on the residuals. The formalized framework indicates the link between the forecast accuracy of the forecast method and the performance of the detector, and can be used to quantify and analyze the performance of a variety of heuristic methods. Second, we describe improvements for the forecasting of health data series. The application of weather as a predictor, cross-series covariates, and ensemble forecasting each provide improvements to forecasting health data. Third, we describe improvements for detection. This includes the use of multivariate statistics for anomaly detection and additional day-of-week preprocessing to aid detection. Most significantly, we also provide a new method, based on the CuScore, for optimizing detection when the impact of the disease outbreak is known. This method can provide an optimal detector for rapid detection, or for probability of detection within a certain timeframe. Finally, we describe a method for improved comparison of detection methods. We provide tools to evaluate how well a simulated data set captures the characteristics of the authentic series and time-lag heatmaps, a new way of visualizing daily detection rates or displaying the comparison between two methods in a more informative way.
  • Thumbnail Image
    Item
    An EM Algorithm for Mixed-Type Multiple Outcome Regressions With Applications to a Prostate Cancer Study
    (2008-06-05) Rudd, JoAnn M.; Smith, Paul J.; Mathematical Statistics; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    We propose a joint model for binary and continuous responses using a latent variable for the binary response. The observed continuous response and the latent response are treated as correlated normals obeying a bivariate regression model. We develop an EM algorithm to find maximum likelihood estimates for the parameters. We perform the E-step analytically and use an iterative algorithm for the M-step. The algorithm is applied to a prostate cancer clinical trial whose goal was to assess therapeutic effects of diethylstilbestrol (DES) in advanced cancer patients and to assess possible excess cardiovascular mortality. Therapeutic effects were measured as prostatic acid phosphatase (PAP) levels follow-up and whether the patient progressed to stage IV or died of cancer. The treatment reduced PAP levels but not the incidence of cancer mortality within a six-month time frame. Higher doses of DES were associated with increased risk of cardiovascular-related death.