Accuracy and consistency in discovering dimensionality by correlation constraint analysis and common factor analysis

Thumbnail Image

Publication or External Link






An important application of multivariate analysis is the estimation of the underlying dimensions of an instrument or set of variables. Estimation of dimensions is often pursued with the objective of finding the single factor or dimension to which each observed variable belongs or by which it is most strongly influenced. This can involve estimating the loadings of observed variables on a pre-specified number of factors, achieved by common factor analysis (CFA) of the covariance or correlational structure of the observed variables. Another method, correlation constraint analysis (CCA), operates on the determinants of all 2x2 submatrices of the covariance matrix of the variables. CCA software also determines if partialling out the effects of any observed variable affects observed correlations, the only exploratory method to specifically rule out (or identify) observed variables as being the cause of correlations among observed variables. CFA estimates the strengths of associations between factors, hypothesized to underlie or cause observed correlations, and the observed variables; CCA does not estimate factor loadings but can uncover mathematical evidence of the causal relationships hypothesized between factors and observed variables. These are philosophically and analytically diverse methods for estimating the dimensionality of a set of variables, and each can be useful in understanding the simple structure in multivariate data. This dissertation studied the performances of these methods at uncovering the dimensionality of simulated data under conditions of varying sample size and model complexity, the presence of a weak factor, and correlated vs. independent factors. CCA was sensitive (performed significantly worse) when these conditions were present in terms of omitting more factors, and omitting and mis-assigning more indicators. CFA was also found to be sensitive to all but one condition (whether factors were correlated or not) in terms of omitting factors; it was sensitive to all conditions in terms of omitting and mis-assigning indicators, and it also found extra factors depending on the number of factors in the population, the purity of factors and the presence of a weak factor. This is the first study of CCA in data with these specific features of complexity, which are common in multivariate data.