----------- Description ----------- Our research objective is to identify potential vaccine antigen candidates targeting the parasite P. vivax, the second most prevalent cause of malaria. To improve the performance of the autologous model for P. vivax, which has constraints in protein size and a small set of labeled antigens, we leverage heterologous data from P. falciparum. We utilized multiple models trained on various combinations of heterologous and autologous data using the positive-unlabeled random forest (PURF) algorithm. The research notebook contains both data and code generated in the study titled "Plasmodium vivax antigen candidate prediction improves with the addition of Plasmodium falciparum data." Further, the notebook provides guidance on extracting protein variables and assembling machine learning input from the database, along with code for conducting experimental analyses and creating plots. ------------ Instructions ------------ To access the research notebook, navigate to the main_notebook subfolder, and open the HTML file index.html in a web browser. Alternatively, you can find a PDF version of the notebook named main_notebook.pdf in the same folder. To execute the code in the research notebook, open the corresponding R Markdown (.Rmd) files in RStudio. The data generated from the notebook are stored in the following subfolders: other_data (structured data), pickle_data (Python objects), and rdata (R objects). The notebook is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License (http://creativecommons.org/licenses/by-nc-sa/4.0/). Copyright [2023] [Renee Ti Chou and Michael P. Cummings]