Student Seminar Series: Ashwini Maurya, A Well Conditioned and Sparse Estimate of Covariance and Inverse Covariance Matrix Using a Joint Penalty
We develop a method for estimating a well conditioned and sparse covariance matrix from a sample of vectors drawn from a sub-gaussian distribution in high dimensional setting. The proposed estimator minimizes squared loss function and joint penalty of l1 norm and sum of squared deviation penalty on the sample eigenvalues. The joint penalty plays two important roles: i) l1 penalty on each entry of covariance matrix reduces the effective number of parameters and consequently the estimate is sparse and ii) the sum of squared deviations penalty on the eigenvalues controls the over-dispersion in the eigenvalues of sample covariance matrix. In contrast to some of the existing methods of covariance matrix estimation, where often the interest is to estimate a sparse matrix, the proposed method is flexible in estimating both a sparse and well-conditioned covariance matrix simultaneously. We extend the proposed approach to inverse covariance matrix estimation. Theoretical consistency of the proposed estimators is established in both Frobenius and Operator norm. We give an efficient algorithm for estimation of covariance and inverse covariance matrix, which is very fast and easily scalable to large scale data analysis problems. An extensive simulation study for varying sample size and number of variables shows that the proposed estimator performs better than graphical lasso, PDSCE and Ledoit-Wolf estimates for various choices of structured covariance and inverse covariance matrices. We use our proposed estimator for tumor tissues classification using gene expression data and compare its performance with some other classification methods.