Statistical analysis for genomic studies involving measurement error, multiple populations, and limited sample size
Title: Statistical analysis for genomic studies involving measurement error, multiple populations, and limited sample size
Chair: Professor Kerby Shedden
Cognate Member: Assistant Professor Hui Jiang
Member: Associate Professor Ben Hansen, Professor Matthias Kretzler
Abstract: Genomic studies involve various types of high-dimensional data. Study designs are often complex, and data are difficult to collect. For example, the subjects may belong to distinct populations, the number of subjects is often small, and substantial measurement error is usually present. In this thesis, we consider three important issues that arise in this research setting. The impact of measurement error on parameter estimation has been extensively studied, but its effects on predictive performance have not been. In part 1 of the thesis, we partially characterize the data generating models that are most adversely impacted by measurement error. These results may help researchers judge whether improving data collection procedures, or identifying more informative markers would have a greater impact on predictive performance. In part 2 of the thesis, we present a new approach for identifying the common and unique marker/outcome associations that are present in a genomic dataset consisting of several subpopulations. We show that the natural plug-in style estimates of overlap are biased, and we demonstrate a copula-based approach to reducing the bias. Part 3 of the thesis considers situations in which power for attributing effects to specific markers is low, but meaningful relationships between marker/outcome associations and other statistical properties of the markers can be identified.