Theses and Dissertations from UMD

Permanent URI for this communityhttp://hdl.handle.net/1903/2

New submissions to the thesis/dissertation collections are added automatically as they are received from the Graduate School. Currently, the Graduate School deposits all theses and dissertations from a given semester after the official graduation date. This means that there may be up to a 4 month delay in the appearance of a give thesis/dissertation in DRUM

More information is available at Theses and Dissertations at University of Maryland Libraries.

Browse

Search Results

Now showing 1 - 8 of 8
  • Thumbnail Image
    Item
    USING STATISTICAL METHOD TO REVEAL BIOLOGICAL ASPECT OF HUMAN DISEASE: STUDY OF GLIOBLASTOMA BY USING COMPARATIVE GENOMIC HYBRIDIZATION (CGH) METHOD
    (2010) Wang, Yonghong; Smith, Paul; Mathematical Statistics; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    Glioblastoma is a WHO grade IV tumor with high mortality rate. In order to identify the underlying biological causation of this disease, a comparative genomic hybridization dataset generated from 170 patients' tumor samples was analyzed. Of many available segmentation algorithms, I focused mainly on two most acceptable methods: Homogeneous Hidden Markov Models (HHMM) and Circular Binary Segmentation (CBS). Simulations show that CBS tends to give better segmentation result with low false discovery rate. HHMM failed to identify many obvious breakpoints that CBS identified. On the other hand, HHMM succeeds in identifying many single probe aberrations. Applying other statistical algorithms revealed distinct biological fingerprints of Glioblastoma disease, which includes many signature genes and biological pathways. Survival analysis also reveals that several segments actually correlate to the extended survival time of some patients. In summary, this work shows the importance of statistical model or algorithms in the modern genomic research.
  • Thumbnail Image
    Item
    Genetic Variation in Nitrogen and Phosphorus Levels in Broiler Excreta: Opportunity for Improving both Birds and the Environment
    (2010) sasikala appukuttan, arun kirshna; Siewerdt, Frank; Animal Sciences; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    The increase in poultry meat consumption has resulted in intensified poultry farming operations with consequent concentration of excreta in major production areas. The nutrient content in the soil surrounding the poultry farms has increased as a result of the high content of nitrogen (N) and phosphorus (P) in the poultry excreta. The current study aimed to propose a strategy to reduce the N and P content in excreta through genetic selection of broilers for efficient nutrient utilization. The traits measured (on a dry matter basis) were the percentage of N in the excreta (PNE) and the percentage of P in the excreta (PPE). Individual 24-hr excreta samples were collected from 6 wk old birds. Excreta samples were collected from a commercial breeding farm at two different time periods from line A and line B birds respectively, and analyzed for PNE and PPE. Analysis of excreta samples collected during the first period (197 bird samples belonging to 15 sire families) and second period (278 birds belonging to 25 sire families) suggested a heritability of 0.08, 0.16 for PNE and 0, 0.20 for PPE, respectively. Phenotypic and genetic correlations between the measured traits from the two lines were very low; however, phenotypic correlation analysis of PNE and PPE with other traits of commercial interest showed some favorable as well as neutral associations. Blood samples collected from the birds were used for an association study of the excreta traits with four candidate genes. The candidate genes were selected based on the results of previous research. Some of the SNPs from the candidate genes were found to have additive and dominance effect on the excreta and production traits and were usually favorably associated with mutations in higher frequency in the populations. The results suggest that genetic selection of birds for PNE and PPE could improve the environment and the market value of the birds.
  • Thumbnail Image
    Item
    Anomaly Detection in Time Series: Theoretical and Practical Improvements for Disease Outbreak Detection
    (2009) Lotze, Thomas Harvey; Shmueli, Galit; Applied Mathematics and Scientific Computation; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    The automatic collection and increasing availability of health data provides a new opportunity for techniques to monitor this information. By monitoring pre-diagnostic data sources, such as over-the-counter cough medicine sales or emergency room chief complaints of cough, there exists the potential to detect disease outbreaks earlier than traditional laboratory disease confirmation results. This research is particularly important for a modern, highly-connected society, where the onset of disease outbreak can be swift and deadly, whether caused by a naturally occurring global pandemic such as swine flu or a targeted act of bioterrorism. In this dissertation, we first describe the problem and current state of research in disease outbreak detection, then provide four main additions to the field. First, we formalize a framework for analyzing health series data and detecting anomalies: using forecasting methods to predict the next day's value, subtracting the forecast to create residuals, and finally using detection algorithms on the residuals. The formalized framework indicates the link between the forecast accuracy of the forecast method and the performance of the detector, and can be used to quantify and analyze the performance of a variety of heuristic methods. Second, we describe improvements for the forecasting of health data series. The application of weather as a predictor, cross-series covariates, and ensemble forecasting each provide improvements to forecasting health data. Third, we describe improvements for detection. This includes the use of multivariate statistics for anomaly detection and additional day-of-week preprocessing to aid detection. Most significantly, we also provide a new method, based on the CuScore, for optimizing detection when the impact of the disease outbreak is known. This method can provide an optimal detector for rapid detection, or for probability of detection within a certain timeframe. Finally, we describe a method for improved comparison of detection methods. We provide tools to evaluate how well a simulated data set captures the characteristics of the authentic series and time-lag heatmaps, a new way of visualizing daily detection rates or displaying the comparison between two methods in a more informative way.
  • Thumbnail Image
    Item
    ASSOCIATION OF SINGLE NUCLEOTIDE POLYMORPHISMS WITH PHENOTYPIC PRODUCTION TRAITS IN BROILER CHICKENS
    (2009) Liu, Xuan; Porter, Tom E; Animal Sciences; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    This research investigated the association between SNPs and phenotypic production traits in fat and lean chicken broiler lines. In previous research, eleven SNPs in the promoter regions of four candidate genes were selected. In this study, significant associations were detected between AKR1B10 SNP1 and SDC1 SNP1 and fat yield. SDC1 SNP1 was significantly associated with fat weight. SOD3 SNP2 was associated with breast yield. Five sire-SNP interactions and one sex-SNP interaction were significant. There was a significant interaction between sex and SDC1 SNP3 on muscle-related factor. GPC3 SNP1 interacted with time period on body weight from week 1 to week 9. QTLs on chromosomes 1, 3 and 4 for body fat were refined by incorporating these SNPs into QTL analysis. These genetic markers may be of great value for marker-assisted selection (MAS) for chickens with less abdominal fat as well as genetic markers for body fat accumulation in humans.
  • Thumbnail Image
    Item
    An EM Algorithm for Mixed-Type Multiple Outcome Regressions With Applications to a Prostate Cancer Study
    (2008-06-05) Rudd, JoAnn M.; Smith, Paul J.; Mathematical Statistics; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    We propose a joint model for binary and continuous responses using a latent variable for the binary response. The observed continuous response and the latent response are treated as correlated normals obeying a bivariate regression model. We develop an EM algorithm to find maximum likelihood estimates for the parameters. We perform the E-step analytically and use an iterative algorithm for the M-step. The algorithm is applied to a prostate cancer clinical trial whose goal was to assess therapeutic effects of diethylstilbestrol (DES) in advanced cancer patients and to assess possible excess cardiovascular mortality. Therapeutic effects were measured as prostatic acid phosphatase (PAP) levels follow-up and whether the patient progressed to stage IV or died of cancer. The treatment reduced PAP levels but not the incidence of cancer mortality within a six-month time frame. Higher doses of DES were associated with increased risk of cardiovascular-related death.
  • Thumbnail Image
    Item
    Time-Series Transcriptomic Analysis of a Systematically Perturbed Arabidopsis thaliana Liquid Culture System: A Systems Biology Perspective
    (2007-05-16) Dutta, Bhaskar; Klapa, Maria I; Chemical Engineering; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    Revealing the gene regulation network has been one of the main objectives of biological research. Studying such a complex, multi-scale and multi-parametric problem requires educated fingerprinting of cellular physiology at different molecular levels under systematically designed perturbations. Conventional biology lacked the means for holistic analysis of biological systems. In the post-genomic era, advances in robotics and biology lead to the development of high-throughput molecular fingerprinting technologies. Transcriptional profiling analysis using DNA microarrays has been the most widely used among them. My Ph.D. thesis concerns the dynamic, transcriptional profiling analysis of a systematically perturbed plant system. Specifically, Arabidopsis thaliana liquid cultures were subjected to three different stresses, i.e. elevated CO2 stress, salt (NaCl) stress and sugar (trehalose) applied individually, while the latter two stresses were also applied in combination with the CO2 stress. The transcriptional profiling of these conditions involved carrying out 320 microarray hybridizations, generating thus a vast amount of transcriptomic data for Arabidopsis thaliana liquid culture system. To upgrade the dynamic information content in the data, I developed a statistical analysis strategy that enables at each time point of a time-series the identification of genes whose expression changes in statistically significant amount due to the applied stress. Additional algorithms allow for further exploration of the dynamic significance analysis results to extract biologically relevant conclusions. All algorithms have been incorporated in a software suite called MiTimeS, written in C++. MiTimeS can be applied accordingly to analyze time-series data from any other high-throughput molecular fingerprint. The experimental design combined with existing multivariate statistical analysis techniques and MiTimeS revealed a wealth of biologically relevant dynamic information that had been unobserved before. Due to the high-throughput nature of the analysis, the study enabled the simultaneous identification and correlation of parallel-occurring phenomena induced by the applied stress. Stress responses comparisons indicated that transcriptional response of the biological system to combined stresses is usually not the cumulative effect of individual responses. In addition to the significance of the study for the analysis of the particular plant system, the experimental and analytical strategies used provide a systems biology methodological framework for any biological system, in general.
  • Thumbnail Image
    Item
    BIOINFORMATIC ANALYSIS OF THE FUNCTIONAL AND STRUCTURAL IMPLICATIONS OF ALTERNATIVE SPLICING.
    (2007-01-23) Melamud, Eugene; Moult, John; Molecular and Cell Biology; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    In higher Eukaryotes, upon transcription of a gene, a complex set of reactions take place to remove fragments of a sequence (introns) from transcribed RNA. A large macro-molecular machine (the spliceosome) recognizes the ends of introns, brings ends into close proximity and catalyzes the splicing reaction. The selection of the location of the ends of introns (splice sites) determines the final message produced at the end of the process. In some cases, an alternative set of splice sites are chosen, and as a consequence different message is produced. This phenomenon is known as alternative splicing. It is now realized that nearly every Human gene undergoes alternative splicing, producing large variability in types and number of transcripts produced. In this thesis, we examine the functional and structural consequences of alternative splicing on proteins, we look into the mechanism of formation of complex splicing patterns, and examine the role of noise in the process.
  • Thumbnail Image
    Item
    COMPUTATIONAL ANALYSES OF MICROBIAL GENOMES - OPERONS, PROTEIN FAMILIES AND LATERAL GENE TRANSFER
    (2005-05-15) Yan, Yongpan; Moult, John; Cell Biology & Molecular Genetics; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    As a result of recent successes in genome scale studies, especially genome sequencing, large amounts of new biological data are now available. This naturally challenges the computational world to develop more powerful and precise analysis tools. In this work, three computational studies have been conducted, utilizing complete microbial genome sequences: the detection of operons, the composition of protein families, and the detection of the lateral gene transfer events. In the first study, two computational methods, termed the Gene Neighbor Method (GNM) and the Gene Gap Method (GGM), were developed for the detection of operons in microbial genomes. GNM utilizes the relatively high conservation of order of genes in operons, compared with genes in general. GGM makes use of the relatively short gap between genes in operons compared with that otherwise found between adjacent genes. The two methods were benchmarked using biological pathway data and documented operon data. Operons were predicted for 42 microbial genomes. The predictions are used to infer possible functions for some hypothetical genes in prokaryotic genomes and have proven a useful adjunct to structure information in deriving protein function in our structural genomics project. In the second study, we have developed an automated clustering procedure to classify protein sequences in a set of microbial genomes into protein families. Benchmarking shows the clustering method is sensitive at detecting remote family members, and has a low level of false positives. The aim of constructing this comprehensive protein family set is to address several questions key to structural genomics. First, our study indicates that approximately 20% of known families with three or more members currently have a representative structure. Second, the number of apparent protein families will be considerably larger than previously thought: We estimate that, by the criteria of this work, there will be about 250,000 protein families when 1000 microbial genomes are sequenced. However, the vast majority of these families will be small. Third, it will be possible to obtain structural templates for 70 - 80% of protein domains with an achievable number of representative structures, by systematically sampling the larger families. The third study is the detection of lateral gene transfer event in microbial genomes. Two new high throughput methods have been developed, and applied to a set of 66 fully sequenced genomes. Both make use of a protein family framework. In the High Apparent Gene Loss (HAGL) method, the number and nature of gene loss events implied by classical evolutionary descent is analyzed. The higher the number of apparent losses, and the smaller the evolutionary distance over which they must have occurred, the more likely that one or more genes have been transferred into the family. The Evolutionary Rate Anomaly (ERA) method associates transfer events with proteins that appear to have an anomalously low rate of sequence change compared with the rest of that protein family. The methods are complementary in that the HAGL method works best with small families and the ERA method best with larger ones. The methods have been parameterized against each other, such that they have high specificity (less than 10% false positives) and can detect about half of the test events. Application to the full set of genomes shows widely varying amounts of lateral gene transfer.