Supplementary materials for positive-unlabeled learning identifies vaccine candidate antigens in the malaria parasite Plasmodium falciparum

dc.contributor.authorChou, Renee Ti
dc.contributor.authorOuattara, Amed
dc.contributor.authorAdams, Matthew
dc.contributor.authorBerry, Andrea A.
dc.contributor.authorTakala-Harrison, Shannon
dc.contributor.authorCummings, Michael P.
dc.date.accessioned2023-04-21T18:04:08Z
dc.date.available2023-04-21T18:04:08Z
dc.date.issued2023
dc.descriptionThe research aims to identify and prioritize previously unknown vaccine antigen candidates with potentially high efficacy against the most prevalent malaria parasite Plasmodium falciparum. Positive-unlabeled random forest (PURF) was applied to learn from the small set of known Plasmodium falciparum antigens and the other proteins with unknown antigenic properties. The research notebook contains data and code generated in the study "Positive-unlabeled learning identifies vaccine candidate antigens in the malaria parasite Plasmodium falciparum." The notebook also includes instructions on installing the PURF package, retrieving protein variables and assembling machine learning input from the database, as well as code for experimental analysis and plotting.en_US
dc.description.abstractMalaria vaccine development is hampered by extensive antigenic variation and complex life stages of Plasmodium species. Vaccine development has focused on a small number of antigens identified prior to availability of the P. falciparum genome. In this study, we implement a machine learning-based reverse vaccinology approach to predict potential new malaria vaccine candidate antigens. We assemble and analyze P. falciparum proteomic, structural, functional, immunological, genomic, and transcriptomic data, and use positive-unlabeled learning to predict potential antigens based on the properties of known antigens and remaining proteins. We prioritize candidate antigens based on model performance on reference antigens with different genetic diversity and quantify the protein properties that contribute the most to identifying top candidates. Candidate antigens are characterized by gene essentiality, gene ontology, and gene expression in different life stages to inform future vaccine development. This approach provides a framework for identifying and prioritizing candidate vaccine antigens for a broad range of pathogens.en_US
dc.description.sponsorshipNational Science Foundation Award (DGE-1632976)en_US
dc.identifierhttps://doi.org/10.13016/me1l-1ahr
dc.identifier.urihttp://hdl.handle.net/1903/29775
dc.language.isoen_USen_US
dc.relation.isAvailableAtLibrary Research & Innovative Practice Forum
dc.relation.isAvailableAtDigital Repository at the University of Maryland
dc.relation.isAvailableAtUniversity of Maryland (College Park, Md)
dc.rightsAttribution-NonCommercial-ShareAlike 3.0 United States*
dc.rights.urihttp://creativecommons.org/licenses/by-nc-sa/3.0/us/*
dc.subjectpositive-unlabeled learningen_US
dc.subjectreverse vaccinologyen_US
dc.subjectPlasmodium falciparumen_US
dc.subjectmalaria vaccine antigen identificationen_US
dc.titleSupplementary materials for positive-unlabeled learning identifies vaccine candidate antigens in the malaria parasite Plasmodium falciparumen_US
dc.typeDataseten_US

Files

Original bundle

Now showing 1 - 5 of 6
No Thumbnail Available
Name:
README.txt
Size:
1.46 KB
Format:
Plain Text
Description:
README file
No Thumbnail Available
Name:
main_notebook.zip
Size:
49.47 MB
Format:
Unknown data format
Description:
Main research notebook
No Thumbnail Available
Name:
other_files.zip
Size:
229.34 MB
Format:
Unknown data format
Description:
Other files for notebook generation
No Thumbnail Available
Name:
pf_reverse_vaccinology.sql.tar.gz
Size:
92.92 MB
Format:
Unknown data format
Description:
Plasmodium falciparum database
No Thumbnail Available
Name:
purf_models.zip
Size:
69.75 MB
Format:
Unknown data format
Description:
PURF models