APPLICATION OF MACHINE LEARNING AND BIOINFORMATICS TO IMPROVE SEAFOOD SAFETY ASSOCIATED WITH VIBRIO SPP.
Files
(RESTRICTED ACCESS)
Publication or External Link
Date
Authors
Advisor
Citation
DRUM DOI
Abstract
With the increasing availability of whole genome sequencing data of foodborne pathogens, bioinformatics and machine learning have transformed and reshaped food safety and public health with improved accuracy and efficiency. On the other hand, the ongoing changes in climatic and environmental conditions are believed to have a profound impact on ocean and seafood safety. Vibrio spp., particularly Vibrio parahaemolyticus and Vibrio vulnificus, are the leading causes responsible for illnesses and outbreaks linked to seafood. The projected changing climate patterns could expand the distribution of Vibrio spp. both geographically and seasonally, enhance their pathogenicity, and contribute to the development of antimicrobial resistance (AMR) in Vibrio spp., resulting in increased risks threatening public health. Therefore, the overarching goal of this dissertation was to explore the potential of bioinformatics and machine learning to improve seafood safety associated with Vibrio spp. under changing climates. Specifically, regression models were developed using six different machine learning algorithms, to predict the concentrations of total and pathogenic V. parahaemolyticus and V. vulnificus isolated from seawater and oyster samples, based on environmental conditions. Robust models were obtained for forecasting levels of total and pathogenic V. parahaemolyticus and V. vulnificus from seawater samples and levels of pathogenic V. parahaemolyticus from oyster samples. Moreover, by coupling pangenome analysis and machine learning classification models, we characterized and differentiated the genomic profiles of V. parahaemolyticus isolated from different sources (environment, seafood, and clinic), in terms of survival, virulence, and antimicrobial resistance. Apart from identifying significant survival and AMR gene-related patterns, we also identified the most influential genes coding key virulence factors (thermostable direct haemolysin (TDH), TDH-related haemolysin, type III secretion system, and alpha-hemolysin) in differentiating seafood and clinical isolates. In addition, the impact of different bioinformatics pipelines (pangenome, core genome multilocus sequence typing (cgMLST), and whole genome multilocus sequence typing) on the downstream analysis (machine learning models for source attribution of V. parahaemolyticus) was investigated. cgMLST was identified as the optimal choice considering both pipeline efficiency and model accuracy. Overall, this dissertation advances the use of bioinformatics and machine learning techniques to improve seafood safety.