Robustness in configurational causal modelling

This talk presents joint work with Michael Baumgartner. In the talk we describe a notion of robustness for configurational causal models (CCMs) and present simulation results to validate this notion as a tool for model selection, and briefly compare this notion to notions of statistical robustness familiar from regression analytic modeling (RAMs).

Where RAMs relate variables and quantify net effects across varying background conditions, CCMs search for dependencies between values of variables, and return models that satisfy the conditions of an INUS-theory of causation. As such, CCMs are tools for case-study research: a unit of observation is a single case that exhibits some configuration of values of measured variables. CCMs automate the process of recovering causally interpretable dependencies from such data via cross-case comparisons. The basic idea is that causes make a difference to their effects, and causal structures can be recovered from data by comparing otherwise homogeneous cases where some putative cause- and effect-factors vary suitably.

CCMs impose stringent demands on the analysed data, often not met in real-life data. The most important of these is causal homogeneity –- unlike RAMs, CCMs require that the causal background of the observed cases is homogeneous, in order to guarantee valid results. This assumption is often violated in real-life data. In addition, data may have other noise sources such as measurement error, and may lack sufficient variation in measured variables. These data-deficiencies may prevent CCMs from finding any models at all. In response, CCM methodologists have developed model-fit parameters that measure how well a model accounts for the observed data, and these have been incorporated in CCM search algorithms such that the search may return also models that explain the data less than perfectly.

Lowering model fit requirements increases underdetermination of models by data, making the problem of model selection harder. We performed simulations to investigate the effects that lowering model-fit requirements has on the reliability of the results. These reveal that given noisy data, the models obtained with most stringent fit-thresholds are often overfitted to data, but lower model-fit settings produce increasingly ambiguous results – a correct model is found, but is accompanied by many false models. In RAMs, overfitting may be remedied by robustness testing, for example testing whether particular observations have undue influence on model coefficients. While obviously sensible in the context of RAMs, this idea, or similar notions of sampling robustness, cannot be transported to CCM context, which assumes a case-study setting where one’s conclusions ought to be maximally sensitive to cross-case variation. But this also makes CCMs sensitive to noise. However, a notion of robustness as the concordance of inferences based on many models, derived with varying model-fit requirements, can be implemented in CCMs. A robust model in this sense is one which agrees with many other models of same data, and does not disagree with many other models, in the causal ascriptions it makes. Simulation results demonstrate that this notion can be used as a reliable criterion of model selection given considerable underdetermination of models by data.