Biology Theses and Dissertations

Permanent URI for this collectionhttp://hdl.handle.net/1903/2749

Browse

Search Results

Now showing 1 - 2 of 2
  • Thumbnail Image
    Item
    ECOLOGICAL APPLICATIONS OF MACHINE LEARNING TO DIGITIZED NATURAL HISTORY DATA
    (2022) Robillard, Alexander John; Rowe, Christopher; Bailey, Helen; Marine-Estuarine-Environmental Sciences; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    Natural history collections are a valuable resource for assessment of biodiversity and species decline. Over the past few decades, digitization of specimens has increased the accessibility and value of these collections. As such the number and size of these digitized data sets have outpaced the tools needed to evaluate them. To address this, researchers have turned to machine learning to automate data-driven decisions. Specifically, applications of deep learning to complex ecological problems is becoming more common. As such, this dissertation aims to contribute to this trend by addressing, in three distinct chapters, conservation, evolutionary and ecological questions using deep learning models. For example, in the first chapter we focus on current regulations prohibiting the sale and distribution of hawksbill sea turtle derived products, which continues internationally in physical and online marketplaces. To curb the sale of illegal tortoiseshell, application of new technologies like convolutional neural networks (CNNs) is needed. Therein we describe a curated data set (n = 4,428) which was used to develop a CNN application we are calling “SEE Shell”, which can identify real and faux hawksbill derived products from image data. Developed on a MobileNetV2 using TensorFlow, SEE Shell was tested against a validation (n = 665) and test (n = 649) set where it achieved an accuracy between 82.6-92.2% correctness depending on the certainty threshold used. We expect SEE Shell will give potential buyers more agency in their purchasing decision, in addition to enabling retailers to rapidly filter their online marketplaces. In the second chapter we focus on recent research which utilized geometric morphometrics, associated genetic data, and Principal Component Analysis to successfully delineate Chelonia mydas (green sea turtle) morphotypes from carapace measurements. Therein we demonstrate a similar, yet more rapid approach to this analysis using computer vision models. We applied a U-Net to isolate carapace pixels of (n = 204) of juvenile C. mydas from multiple foraging grounds across the Eastern Pacific, Western Pacific, and Western Atlantic. These images were then sorted based on general alignment (shape) and coloration of the pixels within the image using a pre-trained computer vision model (MobileNetV2). The dimensions of these data were then reduced and projected using Universal Manifold Approximation and Projection. Associated vectors were then compared to simple genetic distance using a Mantel test. Data points were then labeled post-hoc for exploratory analysis. We found clear congruence between carapace morphology and genetic distance between haplotypes, suggesting that our image data have biological relevance. Our findings also suggest that carapace morphotype is associated with specific haplotypes within C. mydas. Our cluster analysis (k = 3) corroborates past research which suggests there are at least three morphotypes from across the Eastern Pacific, Western Pacific, and Western Atlantic. Finally, within the third chapter we discuss the sharp increase in agricultural and infrastructure development and the paucity of widespread data available to support conservation management decisions around the Amazon. To address these issues, we outline a more rapid and accurate tool for identifying fish fauna in the world's largest freshwater ecosystem, the Amazon. Current strategies for identification of freshwater fishes require high levels of training and taxonomic expertise for morphological identification or genetic testing for species recognition at a molecular level. To overcome these challenges, we built an image masking model (U-Net) and a CNN to mask and classify Amazonian fish in photographs. Fish used to generate training data were collected and photographed in tributaries in seasonally flooded forests of the upper Morona River valley in Loreto, Peru in 2018 and 2019. Species identifications in the training images (n = 3,068) were verified by expert ichthyologists. These images were supplemented with photographs taken of additional Amazonian fish specimens housed in the ichthyological collection of the Smithsonian’s National Museum of Natural History. We generated a CNN model that identified 33 genera of fishes with a mean accuracy of 97.9%. Wider availability of accurate freshwater fish image recognition tools, such as the one described here, will enable fishermen, local communities, and citizen scientists to more effectively participate in collecting and sharing data from their territories to inform policy and management decisions that impact them directly.
  • Thumbnail Image
    Item
    Three Variations of Precision Medicine: Gene-Aware Genome Editing, Ancestry-Aware Molecular Diagnosis, and Clone-Aware Treatment Planning
    (2021) Sinha, Sanju; Ruppin, Eytan; Mount, Steve; Biology; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    During my Ph.D., I developed several computational approaches to advance precision medicine for cancer prevention and treatment. My thesis presents three such approaches addressing these emerging challenges by analyzing large-scale cancer omics data from both pre-clinical models and patients datasets. In the first project, we studied the cancer risk associated with CRISPR-based therapies. Therapeutics based on CRISPR technologies (for which the chemistry Nobel prize was awarded in 2020) are poised to become widely applicable for treating a variety of human genetic diseases. However, preceding our work, two experimental studies have reported that genome editing by CRISPR–Cas9 can induce a DNA damage response mediated by p53 in primary cells hampering their growth. This could lead to an undesired selection of cells with pre-existing p53 mutations. Motivated by these findings, we conducted the first comprehensive computational and experimental investigation of the risk of CRISPR-induced selection of cancer gene mutants across many different cell types and lineages. I further studied whether this selection is dependent on the Cas9/sgRNA-delivery method and/or the gene being targeted. Importantly, we asked whether other cancer driver mutations may also be selected during CRISPR-Cas9 gene editing and identified that pre-existing KRAS mutants may also be selected for during CRISPR-Cas9 editing. In summary, we established that the risk of selection for pre-existing p53 or KRAS mutations is non-negligible, thus calling for careful monitoring of patients undergoing CRISPR-Cas9-based editing for clinical therapeutics for pre-existing p53 and KRAS mutations. In the second project, we aimed to delineate some of the molecular mechanisms that may underlie the observed differences in cancer incidences across cancer patients of different ancestries, focusing mainly on lung cancer. We found that lung tumors from African American (AA) patients exhibit higher genomic instability, homologous recombination deficiency, and aggressive molecular features such as chromothripsis. We next demonstrated that these molecular differences extend to many other cancer types. The prevalence of germline homologous recombination deficiency (HRD) is also higher in tumors from AAs, suggesting that at least some of the somatic differences observed may have genetic origins. Importantly, our findings provide a therapeutic strategy to treat tumors from AAs with high HRD, with agents such as PARP and checkpoint inhibitors, which is now further explored by our experimental collaborators. In the third project, we developed a new computational framework to leverage single-cell RNA-seq from patients’ tumors to guide optimal combination treatments that can target multiple clones in the tumor. We first showed that our predicted viability profile of multiple cancer drugs significantly correlates with their targeted pathway activity at a single-cell resolution, as one would expect. We apply this framework to predict the response to monotherapy and combination treatment in cell lines, patient-derived-cell lines, and most importantly, in a clinical trial of multiple myeloma patients. Following these validations, we next charted the landscape of optimal combination treatments of the existing FDA-approved drugs in multiple myeloma, providing a resource that could be used to potentially guide combination trials. Taken together, these results demonstrate the power of multi-omics analysis of cancer data to identify potential cancer risks and a strategy to mitigate, to shed light on molecular mechanisms underlying cancer disparity in AA patients, and point to possible ways to improve their treatment, and finally, we developed a new approach to treat cancer patients based on single-cell transcriptomics of their tumors.