Copyright [2023] [Renee Ti Chou, Henry T. Hsueh, Laura M. Ensign, and Michael P. Cummings] Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. ----------- Description ----------- The research notebook contains the code of the machine learning pipeline developed in the project, which involves a super learner-based methodology for multifunctional peptide engineering. The main pipeline consists of variable reduction and model training. The code functionality details can be found in the Methods section of the paper "Machine learning-driven multifunctional peptide engineering for sustained ocular drug delivery." ------------ Instructions ------------ To open the research notebook and view the code, go to the subfolder "main_notebook" and click on "index.html" to open the HTML document or "main_notebook.pdf" to open the PDF document. No software installations are required to open the research notebook. To run the code in the notebook, open the project file supplementary_data.Rproj and Rmd files in RStudio (https://posit.co/download/rstudio-desktop/). Follow online tutorials (e.g., https://rmarkdown.rstudio.com/authoring_quick_tour.html) to run the code and install/update R or Python packages specified on the top of each Rmd file. To install H2O.ai in R or Python, follow the instructions on the H2O documentation webpage (https://docs.h2o.ai/h2o/latest-stable/h2o-docs/downloading.html). Reproduction information has been included in the notebook. ------------------- Small data set demo ------------------- A demo for the main machine learning pipeline is provided in 06_small_data_set_demo.Rmd, which includes a Parkinson's data set (https://archive.ics.uci.edu/ml/machine-learning-databases/parkinsons/). The user can open the Rmd file in RStudio to run the demo. The output for the pipeline is a set of base models, a super learner model trained on all base models, reduced super learner models trained using the iterative reduction procedure, and super learner models trained on homogeneous base models. The runtime is ~750 s when running using all threads on MacBook with 2.4 GHz 8-Core Intel Core i9 and 32 GB 2667 MHz DDR4. ------------------- System requirements ------------------- The main machine learning pipeline was built using R, an open-source software that can be run on Linux (Debian, Fedora/Redhat, Ubuntu), macOS, and Windows. The code in the notebook has been tested under R version 4.2.2 (2022-10-31) and Python version 3.8.15, with RStudio version 2022.12.0+353, on macOS Big Sur version 10.16. See sessionInfo() or session_info.show() output in the research notebook for further information on R or Python package versions. No non-standard hardware is required. --------------------- Final property models --------------------- The final models are stored as MOJO models. To perform model inference, first compute the peptide variables/features following the code in the research notebook. Then, follow the R or Python snippet in the section "Saving and Importing MOJOs" in the H2O documentation (https://docs.h2o.ai/h2o/latest-stable/h2o-docs/save-and-load-model.html) to import the model and run inference on the data. ---------- References ---------- Max A. Little, Patrick E. McSharry, Eric J. Hunter, Lorraine O. Ramig (2008), 'Suitability of dysphonia measurements for telemonitoring of Parkinson's disease', IEEE Transactions on Biomedical Engineering.