The Forest or the Trees? Tackling Simpson's Paradox with Classification and Regression Trees

Shmueli, G and Yahav, I (2014) The Forest or the Trees? Tackling Simpson's Paradox with Classification and Regression Trees. Working Paper. Indian School of Business.

[thumbnail of forest (2 files merged).pdf]
Preview
Text
forest (2 files merged).pdf

Download (1MB) | Preview
Publisher URL: http://www.isb.edu

Abstract

Prediction and variable selection are major uses of data mining algorithms but they are rarely the focus in social science research, where the main objective is causal explanation. Ideal causal modeling is based on randomized experiments, but because experiments are often impossible, unethical or expensive to perform, social science research often relies on observational data for studying causality. A major challenge is to infer causality from such data. This paper uses the predictive tool of Classification and Regression Trees for detecting Simpson's paradox, which is related to causal inference. We introduce a new tree approach for detecting potential paradoxes in data that have either a few or a large number of potential confounding variables. The approach relies on the tree structure and the location of the cause vs. the confounders in the tree. We discuss theoretical and computational aspects of the approach and illustrate it using several real applications

Item Type: Monograph (Working Paper)
Subjects: Business Analytics
Date Deposited: 30 Oct 2014 08:22
Last Modified: 26 Jul 2023 12:40
URI: https://eprints.exchange.isb.edu/id/eprint/74

Actions (login required)

View Item
View Item