Mixed Modeling Approaches for Characterizing Genetic Effects and Heritability Metrics in Longitudinal Phenotypes

Loading...
Thumbnail Image

Files

Zhang_umd_0117E_24931.pdf (2.18 MB)
(RESTRICTED ACCESS)
No. of downloads:

Publication or External Link

Date

Advisor

Levy, Doron

Citation

Abstract

This dissertation develops advanced mixed modeling approaches by integrating genetic and subject-specific random effects to quantify genetic effects and heritability metrics on both the baseline levels and rates of change in longitudinal phenotypic trajectories. By disentangling these joint genetic effects, this work provides profound insights into both static and dynamic genetic influences on longitudinal phenotypes.

The first project introduces a mixed modeling framework to predict the subject-level genetic effects on both the baseline levels and slopes of longitudinal phenotypes. The inclusion of joint genetic effects, coupled with the crossed structure of genetic and subject-specific random effects, results in complex dependencies across repeated measurements. These complexities necessitate the development of innovative procedures for parameter estimation and prediction of joint genetic effects. To tackle these challenges, an Average Information Restricted Maximum Likelihood (AI-REML) algorithm is employed to estimate the variance components associated with genetic and subject-specific random effects for both the baseline levels and rates of change in longitudinal phenotypes. Theoretical inferences and comprehensive simulation studies validate the robustness and efficacy of the proposed method. Of note, before estimating variance components using AI-REML algorithm and predicting joint genetic effects, a preliminary step involves identifying statistically significant genetic variants associated with the considered longitudinal phenotype to achieve high-dimensional reduction. The goal is to predict the individual-specific genetic effects on the trajectory of a longitudinal phenotype. The resulting predictions can then be used to stratify participants and provide probabilities of false detection across ages, minimizing unnecessary further diagnoses.

The second project extends the framework by incorporating genome-wide variants (millions of variants) simultaneously to estimate heritability metrics for both baseline trait levels and rates of change in longitudinal trajectories. This effort addresses key challenges, including the computational demands of the potential for large-scale studies, the complexity of high-dimensional genetic data, the interaction between joint subject-level genetic effects, as well as the crossed structure of genotypic and subject-specific random effects. To deal with these challenges, an AI-REML approach optimized for moderate-size studies is employed. For large-scale studies, where the covariance matrix inversion becomes computationally infeasible, a partitioned AI-REML approach is naturally proposed, reducing computational burden by dividing the whole sample into evenly non-overlapping subsamples. This approach sacrifices efficiency due to the loss of non-diagonal covariance information. Further, meta-analysis is conducted to derive more accurate estimates for variance components and two heritability metrics. Alternatively, a restricted Haseman-Elston (REHE) regression method is utilized, offering computational feasibility without the need for partitioning while effectively estimating variance components, especially in the context of large-scale data. Extensive simulation experiments are conducted to compare and evaluate the performance of these two methods across various scenarios, providing insights into their relative strengths and applicability.

The third project further addresses the challenge of high variance in slope-related estimates observed in large-scale studies with a limited number of observations per subject, which complicates the performance of existing methods such as the AI-REML algorithm and the REHE regression method, resulting in less reliable outcomes. To overcome these limitations, we then propose a two-stage estimation method specifically designed for large-scale studies with sparse longitudinal data. This approach aims to enhance the precision and reliability of estimates for the ratio of genetic contributions to the rate of change in longitudinal trajectories. In the first stage, linear regression is performed for each subject to estimate fixed-effect coefficients and their variances, using an unbiased estimator for error variance. In the second stage, linear mixed models are constructed, treating the estimated fixed-effect coefficients as responses and incorporating their variances as observed measurement errors. The AI-REML algorithm is then applied to estimate the ratios of genetic contributions. Simulation studies are conducted to compare and evaluate the performance of these three methods, demonstrating the robustness and effectiveness of the proposed two-stage estimation approach.

Collectively, these projects enhance the understanding and quantification of joint genetic effects and heritability metrics for both baseline levels and rates of change in longitudinal data analysis. The proposed methodologies and guidelines offer valuable tools for researchers addressing the challenges posed by large-scale studies and high-dimensional analyses, broadening the applicability of existing techniques to more complex scenarios. The findings from this dissertation improve the accuracy and reliability of statistical analyses for phenotypic traits influenced by genetic effects in the context of unbalanced serial measurements.

In the applications, this dissertation demonstrates the utility of the proposed methods for analyzing 6,948,674 genome-wide common variants to study the dynamics of prostate-specific antigen (PSA) trajectories in European white males from the Prostate, Lung, Colorectal, and Ovarian (PLCO) Cancer Screening Trial. For the first project, we firstly identify 253 genetic variants associated with longitudinal PSA measurements, enabling a substantial reduction in the high-dimensional genetic dataset. Using these selected variants, the AI-REML algorithm was employed to estimate genetic and subject-specific variance components for both baseline levels and rates of change. The results underscore significant genetic contributions to baseline PSA levels and their progression over time, providing insights into the genetic factors influencing PSA variability among unaffected individuals. These findings have significant implications for identifying individuals at higher risk of false-positive prostate cancer screening results when relying on established PSA cutoffs. By incorporating joint genetic factors into PSA monitoring, this work highlights the potential to improve the accuracy and effectiveness of early prostate cancer detection and the development of personalized PSA screening guidelines. In the second project, the analysis revealed moderate genetic contributions to baseline PSA levels but significant genetic contributions to PSA velocity, highlighting an increasing heritability trend with age. Taken together, the methodologies developed in the dissertation provides researchers the tools for using genetic information to identify individuals that are likely to have a certain phenotypic profile and to estimate the proportion of variation in the intercept and slope of the phenotype that can be explained by genetics (heritablity).

Notes

Rights