Supplementary materials for positive-unlabeled learning identifies vaccine candidate antigens in the malaria parasite Plasmodium falciparum

No Thumbnail Available


README.txt (1.46 KB)
No. of downloads: 12 (49.47 MB)
No. of downloads: 7 (229.34 MB)
No. of downloads: 4
pf_reverse_vaccinology.sql.tar.gz (92.92 MB)
No. of downloads: 3 (69.75 MB)
No. of downloads: 3

Related Publication Link




Related Publication Citation


Malaria vaccine development is hampered by extensive antigenic variation and complex life stages of Plasmodium species. Vaccine development has focused on a small number of antigens identified prior to availability of the P. falciparum genome. In this study, we implement a machine learning-based reverse vaccinology approach to predict potential new malaria vaccine candidate antigens. We assemble and analyze P. falciparum proteomic, structural, functional, immunological, genomic, and transcriptomic data, and use positive-unlabeled learning to predict potential antigens based on the properties of known antigens and remaining proteins. We prioritize candidate antigens based on model performance on reference antigens with different genetic diversity and quantify the protein properties that contribute the most to identifying top candidates. Candidate antigens are characterized by gene essentiality, gene ontology, and gene expression in different life stages to inform future vaccine development. This approach provides a framework for identifying and prioritizing candidate vaccine antigens for a broad range of pathogens.


The research aims to identify and prioritize previously unknown vaccine antigen candidates with potentially high efficacy against the most prevalent malaria parasite Plasmodium falciparum. Positive-unlabeled random forest (PURF) was applied to learn from the small set of known Plasmodium falciparum antigens and the other proteins with unknown antigenic properties. The research notebook contains data and code generated in the study "Positive-unlabeled learning identifies vaccine candidate antigens in the malaria parasite Plasmodium falciparum." The notebook also includes instructions on installing the PURF package, retrieving protein variables and assembling machine learning input from the database, as well as code for experimental analysis and plotting.


Attribution-NonCommercial-ShareAlike 3.0 United States