The Forest or the Trees? Tackling Simpson's Paradox with Classification and Regression Trees
Shmueli, G and Yahav, I (2014) The Forest or the Trees? Tackling Simpson's Paradox with Classification and Regression Trees. Working Paper. Indian School of Business.
forest (2 files merged).pdf
Download (1MB) | Preview
Abstract
Prediction and variable selection are major uses of data mining algorithms but they are rarely the focus in social science research, where the main objective is causal explanation. Ideal causal modeling is based on randomized experiments, but because experiments are often impossible, unethical or expensive to perform, social science research often relies on observational data for studying causality. A major challenge is to infer causality from such data. This paper uses the predictive tool of Classification and Regression Trees for detecting Simpson's paradox, which is related to causal inference. We introduce a new tree approach for detecting potential paradoxes in data that have either a few or a large number of potential confounding variables. The approach relies on the tree structure and the location of the cause vs. the confounders in the tree. We discuss theoretical and computational aspects of the approach and illustrate it using several real applications
Item Type: | Monograph (Working Paper) |
---|---|
Subjects: | Business Analytics |
Date Deposited: | 30 Oct 2014 08:22 |
Last Modified: | 26 Jul 2023 12:40 |
URI: | https://eprints.exchange.isb.edu/id/eprint/74 |