A computational pipeline for translating spatial proteomics into prognostic biomarkers for GEP-NETs
Files
Publication or External Link
External Link to Data Files
Date
Advisor
Citation
DRUM DOI
Abstract
Spatial proteomics enables high-resolution characterization of tumor microenvironments by capturing protein expression within the spatial context of tissue. However, translating single-cell measurements into clinically useful prognostic tools remains a challenge due to the high cost and complexity of these data. We developed a computational pipeline to generate composite biomarkers that predict patient survival in gastroenteropancreatic neuroendocrine tumors (GEP-NETs) using the csmpv R package. This approach converts research-grade spatial data into practical biomarker strategies for clinical settings. The dataset consisted of 24 patients with either small intestine- or pancreatic-origin NETs, profiled using imaging mass cytometry (1.5 million single cells). We constructed sample-level feature matrices by aggregating measurements within each region of interest, including cell type composition, cell type-specific protein expression, and spatial metrics such as cell–cell distances and local densities. These variables were filtered and collapsed to one observation per patient for survival analysis. To identify prognostic biomarkers, we applied the csmpv framework for survival modeling. For each cohort subset defined by tumor origin and tissue site, variable selection was performed using the LASSO2 algorithm, followed by multivariate modeling with an XGBoost-based algorithm to generate a binary risk classifier. Model performance was evaluated by stratifying patients into predicted risk groups and comparing survival using Kaplan–Meier analysis. Individual features were also evaluated independently to assess whether the multivariate model provided added predictive value. Across cohorts, the composite biomarkers produced stronger survival stratification than any single feature. These models reduced hundreds of variables to a small set of informative features, demonstrating how costly proteomics approaches can be translated into streamlined, clinically feasible analyses for patient risk stratification.