Supplementary materials for Plasmodium vivax antigen candidate prediction improves with the addition of Plasmodium falciparum data

dc.contributor.authorChou, Renee Ti
dc.contributor.authorOuattara, Amed
dc.contributor.authorTakala-Harrison, Shannon
dc.contributor.authorCummings, Michael P.
dc.date.accessioned2023-11-25T14:58:34Z
dc.date.available2023-11-25T14:58:34Z
dc.date.issued2023
dc.descriptionOur research objective is to identify potential vaccine antigen candidates targeting the parasite P. vivax, the second most prevalent cause of malaria. To improve the performance of the autologous model for P. vivax, which has constraints in protein size and a small set of labeled antigens, we leverage heterologous data from P. falciparum. We utilized multiple models trained on various combinations of heterologous and autologous data using the positive-unlabeled random forest (PURF) algorithm. The research notebook contains both data and code generated in the study titled "Plasmodium vivax antigen candidate prediction improves with the addition of Plasmodium falciparum data." Further, the notebook provides guidance on extracting protein variables and assembling machine learning input from the database, along with code for conducting experimental analyses and creating plots.
dc.description.abstractIntensive malaria control and elimination efforts have led to substantial reductions in malaria incidence over the past two decades. However, the reduction in Plasmodium falciparum malaria cases has led to a species shift in some geographic areas, with P. vivax predominating in many areas outside of Africa. Despite its wide geographic distribution, P. vivax vaccine development has lagged far behind that for P. falciparum, in part due to the inability to cultivate P. vivax in vitro, hindering traditional approaches for antigen identification. In a prior study, we have used a positive-unlabeled random forest (PURF) machine learning approach to identify P. falciparum antigens for consideration in vaccine development efforts. Here we integrate systems data from P. falciparum (the better-studied species) to improve PURF models to predict potential P. vivax vaccine antigen candidates. We further show that inclusion of known antigens from the other species is critical for model performance, but the inclusion of unlabeled proteins the other species can result in misdirection of the model toward predictors of species classification, rather than antigen identification. Beyond malaria, incorporating antigens from a closely related species may aid in vaccine development for emerging pathogens having few or no known antigens.
dc.description.sponsorshipNational Science Foundation Award (DGE-1632976)
dc.identifierhttps://doi.org/10.13016/dspace/vijt-jshg
dc.identifier.urihttp://hdl.handle.net/1903/31477
dc.language.isoen_US
dc.relation.isAvailableAtCollege of Computer, Mathematical & Physical Sciencesen_us
dc.relation.isAvailableAtDigital Repository at the University of Marylanden_us
dc.relation.isAvailableAtBiologyen_us
dc.relation.isAvailableAtUniversity of Maryland (College Park, MD)en_us
dc.rightsAttribution-NonCommercial-ShareAlike 3.0 United States*
dc.rights.urihttp://creativecommons.org/licenses/by-nc-sa/3.0/us/*
dc.subjectpositive-unlabeled learning
dc.subjectreverse vaccinology
dc.subjectPlasmodium vivax
dc.subjectmalaria vaccine antigen identification
dc.titleSupplementary materials for Plasmodium vivax antigen candidate prediction improves with the addition of Plasmodium falciparum data
dc.typeDataset
local.equitableAccessSubmissionYes

Files

Original bundle

Now showing 1 - 5 of 5
No Thumbnail Available
Name:
README.txt
Size:
1.65 KB
Format:
Plain Text
No Thumbnail Available
Name:
main_notebook.zip
Size:
50.8 MB
Format:
Unknown data format
No Thumbnail Available
Name:
other_files.zip
Size:
109.85 MB
Format:
Unknown data format
No Thumbnail Available
Name:
pfpv_reverse_vaccinology.sql.tar.gz
Size:
186.87 MB
Format:
Unknown data format
No Thumbnail Available
Name:
purf_models.zip
Size:
84.34 MB
Format:
Unknown data format