Model Selection Principles and False Discovery Rate Control

Basu, P (2016) Model Selection Principles and False Discovery Rate Control. PhD thesis, University of Southern California.

Full text not available from this repository. (Request a copy)

Abstract

In this thesis we discuss problems in two cutting-edge topics of modern statistics, namely, high-dimensional statistical inference and large-scale multiple testing. ❧ Model selection is indispensable to high-dimensional sparse modeling in selecting the best set of covariates among a sequence of candidate models. Most existing work assumes implicitly that the model is correctly specified or of fixed dimensions. Yet model misspecification and high dimensionality are common in real applications. We investigate two classical Kullback-Leibler divergence and Bayesian principles of model selection in the setting of high dimensional misspecified models. Asymptotic expansions of these principles reveal that the effect of model misspecification is crucial and should be taken into account, leading to the generalized AIC and generalized BIC in high dimensions. With a natural choice of prior probabilities, we suggest the generalized BIC with prior probability which involves a logarithmic factor of the dimensionality in penalizing model complexity. We further establish the consistency of the covariance contrast matrix estimator in a general setting. Our results and new methods are supported by numerical studies. ❧ The use of weights provides an effective strategy to incorporate prior domain knowledge in large-scale inference. Our work studies weighted multiple testing in a decision-theoretic framework. We develop oracle and data-driven procedures that aim to maximize the expected number of true positives subject to a constraint on the weighted false discovery rate. The asymptotic validity and optimality of the proposed methods are established. The results demonstrate that incorporating informative domain knowledge enhances the interpretability of results and precision of inference. Simulation studies show that the proposed method controls the error rate at the nominal level, and the gain in power over existing methods is substantial in many settings. An application to genome-wide association study is discussed. ❧ Further we discuss connections between the two topics, applications to area(s) of business, and possible future directions.

Affiliation: Indian School of Business
ISB Creators:
ISB CreatorsORCiD
Basu, PUNSPECIFIED
Item Type: Thesis (PhD)
Additional Information: The Thesis was published by the author with the affiliation of University of Southern California
Uncontrolled Keywords: Model Misspecification; High Dimensionality; Model Selection; Kullback-Leibler Divergence Principle; Bayesian Principle; AIC; BIC; Class Weights; Decision Weights; Multiple Testing With Groups; Prioritized Subsets; Value to Cost Ratio; Weighted P-value
Subjects: Operations Management
Depositing User: Mohan Dass
Date Deposited: 08 Apr 2019 11:20
Last Modified: 09 Apr 2019 09:38
URI: http://eprints.exchange.isb.edu/id/eprint/770
Publisher URL: http://digitallibrary.usc.edu/cdm/ref/collection/p...
Related URLs:

Actions (login required)

View Item View Item
Statistics for DESI ePrint 770 Statistics for this ePrint Item