Principal Component Analysis (PCA) for High Dimensional Data (PCA is Dead or Long Live PCA?)
Sample covariances and eigenvalues are famously inconsistent when the number d of variables is at least as large as the sample size n. However, when d >> n, genomewide association studies (GWAS) that apparently are based on principal component analysis (PCA) and use sample covari-ances and eigenvalues are famously successful in detecting genetic signals while controlling the probability of false discoveries. To reiterate: ``PCA is dead” or ``long live PCA"? ``PCA is the worst of methods” or ``PCA is the best of methods"? We reconcile the worst/best dichotomy by acknowl-edging that PCA is indeed inconsistent for many classical statistical settings, but for settings that are natural in genomic studies, PCA produces effective methods. The dichotomy can in part be explained by how models are viewed and the goal of the study being carried out. This is joint work with Fan Yang and Kam-Wah Tsui.