ECOLOGICAL APPLICATIONS OF MACHINE LEARNING TO DIGITIZED NATURAL HISTORY DATA

Loading...
Thumbnail Image

Files

Publication or External Link

Date

2022

Citation

Abstract

Natural history collections are a valuable resource for assessment of biodiversity and species decline. Over the past few decades, digitization of specimens has increased the accessibility and value of these collections. As such the number and size of these digitized data sets have outpaced the tools needed to evaluate them. To address this, researchers have turned to machine learning to automate data-driven decisions. Specifically, applications of deep learning to complex ecological problems is becoming more common. As such, this dissertation aims to contribute to this trend by addressing, in three distinct chapters, conservation, evolutionary and ecological questions using deep learning models. For example, in the first chapter we focus on current regulations prohibiting the sale and distribution of hawksbill sea turtle derived products, which continues internationally in physical and online marketplaces. To curb the sale of illegal tortoiseshell, application of new technologies like convolutional neural networks (CNNs) is needed. Therein we describe a curated data set (n = 4,428) which was used to develop a CNN application we are calling “SEE Shell”, which can identify real and faux hawksbill derived products from image data. Developed on a MobileNetV2 using TensorFlow, SEE Shell was tested against a validation (n = 665) and test (n = 649) set where it achieved an accuracy between 82.6-92.2% correctness depending on the certainty threshold used. We expect SEE Shell will give potential buyers more agency in their purchasing decision, in addition to enabling retailers to rapidly filter their online marketplaces. In the second chapter we focus on recent research which utilized geometric morphometrics, associated genetic data, and Principal Component Analysis to successfully delineate Chelonia mydas (green sea turtle) morphotypes from carapace measurements. Therein we demonstrate a similar, yet more rapid approach to this analysis using computer vision models. We applied a U-Net to isolate carapace pixels of (n = 204) of juvenile C. mydas from multiple foraging grounds across the Eastern Pacific, Western Pacific, and Western Atlantic. These images were then sorted based on general alignment (shape) and coloration of the pixels within the image using a pre-trained computer vision model (MobileNetV2). The dimensions of these data were then reduced and projected using Universal Manifold Approximation and Projection. Associated vectors were then compared to simple genetic distance using a Mantel test. Data points were then labeled post-hoc for exploratory analysis. We found clear congruence between carapace morphology and genetic distance between haplotypes, suggesting that our image data have biological relevance. Our findings also suggest that carapace morphotype is associated with specific haplotypes within C. mydas. Our cluster analysis (k = 3) corroborates past research which suggests there are at least three morphotypes from across the Eastern Pacific, Western Pacific, and Western Atlantic. Finally, within the third chapter we discuss the sharp increase in agricultural and infrastructure development and the paucity of widespread data available to support conservation management decisions around the Amazon. To address these issues, we outline a more rapid and accurate tool for identifying fish fauna in the world's largest freshwater ecosystem, the Amazon. Current strategies for identification of freshwater fishes require high levels of training and taxonomic expertise for morphological identification or genetic testing for species recognition at a molecular level. To overcome these challenges, we built an image masking model (U-Net) and a CNN to mask and classify Amazonian fish in photographs. Fish used to generate training data were collected and photographed in tributaries in seasonally flooded forests of the upper Morona River valley in Loreto, Peru in 2018 and 2019. Species identifications in the training images (n = 3,068) were verified by expert ichthyologists. These images were supplemented with photographs taken of additional Amazonian fish specimens housed in the ichthyological collection of the Smithsonian’s National Museum of Natural History. We generated a CNN model that identified 33 genera of fishes with a mean accuracy of 97.9%. Wider availability of accurate freshwater fish image recognition tools, such as the one described here, will enable fishermen, local communities, and citizen scientists to more effectively participate in collecting and sharing data from their territories to inform policy and management decisions that impact them directly.

Notes

Rights