Large and complex data are common to the modern life. These data sets are mines of information, statisticians are now developing the new statistical techniques to explore information from them. This dissertation contributes to the statistical analysis to explore such challenging types of data sets.
The second chapter estimates the dissimilarity among effect sizes in a regression model. A natural summary is the ratio of the maximum magnitude to the minimum magnitude among the effects. For this nonstandard quantity, some standard techniques cannot be applied directly. Some procedures are discussed to improve the performance on point estimation and confidence intervals. We apply the approaches to we apply our procedures to the National Health and Nutrition Examination Survey (NHANES) from 2011 to 2012.
The third chapter investigates a functional summaries for a p by p covariance structure in an accessible and easily visualized form. The summaries reflect interpretable patterns in the data and are unaffected by relabeling of the variables. The proposed functional summaries allow us to visualize the differences in the covariance structures between two data sets, even when they have different dimensions. Our summaries emphasize the degree by which each variable is predictable from the others, with a special focus on the number of variables required to predict another variable. We apply the functional summaries to two gene expression data sets, 108 normal heart tissue from Cleveland Clinic Kaufman Center and 734 whole-blood RNA samples from Estonian Biobank, to compare structures with different dimensions.
The fourth chapter studies a projection-based approach for exploring conditional correlation paths. We propose a graphical tool that enables us to explore the change in dependence structure from marginal correlations to partial correlations. This path is built via adding information from others gradually to reach partial correlations. The pro-jection-based proposed approach can be applied to another type of conditional correlation matrix which is condi-tioned on linear statistics of the data. We can explore the change in correlation matrices when the values of a line-ar statistics varied. We apply the approach to gene expression data set with 108 normal heart tissue from Cleve-land Clinic Kaufman Center.