Supplementary materials for Plasmodium vivax antigen candidate prediction improves with the addition of Plasmodium falciparum data

No Thumbnail Available


README.txt (1.65 KB)
No. of downloads: 16 (50.8 MB)
No. of downloads: 29 (109.85 MB)
No. of downloads: 10
pfpv_reverse_vaccinology.sql.tar.gz (186.87 MB)
No. of downloads: 22 (84.34 MB)
No. of downloads: 16

Related Publication Link




Related Publication Citation


Intensive malaria control and elimination efforts have led to substantial reductions in malaria incidence over the past two decades. However, the reduction in Plasmodium falciparum malaria cases has led to a species shift in some geographic areas, with P. vivax predominating in many areas outside of Africa. Despite its wide geographic distribution, P. vivax vaccine development has lagged far behind that for P. falciparum, in part due to the inability to cultivate P. vivax in vitro, hindering traditional approaches for antigen identification. In a prior study, we have used a positive-unlabeled random forest (PURF) machine learning approach to identify P. falciparum antigens for consideration in vaccine development efforts. Here we integrate systems data from P. falciparum (the better-studied species) to improve PURF models to predict potential P. vivax vaccine antigen candidates. We further show that inclusion of known antigens from the other species is critical for model performance, but the inclusion of unlabeled proteins the other species can result in misdirection of the model toward predictors of species classification, rather than antigen identification. Beyond malaria, incorporating antigens from a closely related species may aid in vaccine development for emerging pathogens having few or no known antigens.


Our research objective is to identify potential vaccine antigen candidates targeting the parasite P. vivax, the second most prevalent cause of malaria. To improve the performance of the autologous model for P. vivax, which has constraints in protein size and a small set of labeled antigens, we leverage heterologous data from P. falciparum. We utilized multiple models trained on various combinations of heterologous and autologous data using the positive-unlabeled random forest (PURF) algorithm. The research notebook contains both data and code generated in the study titled "Plasmodium vivax antigen candidate prediction improves with the addition of Plasmodium falciparum data." Further, the notebook provides guidance on extracting protein variables and assembling machine learning input from the database, along with code for conducting experimental analyses and creating plots.


Attribution-NonCommercial-ShareAlike 3.0 United States