The Forest or the Trees? Tackling Simpson's Paradox with Classification and Regression Trees

Shmueli, G and Yahav, I (2014) The Forest or the Trees? Tackling Simpson's Paradox with Classification and Regression Trees. Working Paper. Indian School of Business, Hyderabad.

[thumbnail of forest (2 files merged).pdf]
forest (2 files merged).pdf

| Preview


Prediction and variable selection are major uses of data mining algorithms but they are rarely the focus in social science research, where the main objective is causal explanation. Ideal causal modeling is based on randomized experiments, but because experiments are often impossible, unethical or expensive to perform, social science research often relies on observational data for studying causality. A major challenge is to infer causality from such data. This paper uses the predictive tool of Classification and Regression Trees for detecting Simpson's paradox, which is related to causal inference. We introduce a new tree approach for detecting potential paradoxes in data that have either a few or a large number of potential confounding variables. The approach relies on the tree structure and the location of the cause vs. the confounders in the tree. We discuss theoretical and computational aspects of the approach and illustrate it using several real applications

ISB Creiators:
ISB Creators
Shmueli, G
Item Type: Monograph (Working Paper)
Uncontrolled Keywords: CART, causality, data mining, conditional-inference trees, decision making, aggregation, variable selection
Subjects: Business Analytics
Depositing User: LRC ISB
Date Deposited: 30 Oct 2014 08:22
Last Modified: 02 Nov 2014 05:36
Publisher URL:
Related URLs:

Actions (login required)

View Item View Item
Statistics for DESI ePrint 74 Statistics for this ePrint Item