Causal discovery lies at the heart of most scientific research today. Perhaps surprisingly then, 'proper' causal discovery algorithms are still not as routinely applied in practice as one might expect. Arguably one of the obstacles is the perceived lack of robustness in the output: borderline decisions are propagated through the network, but this ambiguity is not apparent in the causal model. Bayesian score-based approaches can provide some measure of confidence by outputting multiple high-scoring models with the implied assumption that arcs present in many are more likely to be true. Another way is to augment individual relations with an explicit reliability measure. Methods like Coopers' LCD-algorithm and the Trigger algorithm in genomics can already give such probabilistic estimates, but only apply to very specific instances.

We introduce a new approach that utilizes a Bayesian score to obtain probability estimates on the input statements used in a constraint-based procedure. These statements are processed in decreasing order of reliability until a single output model is obtained. A basic implementation already compares favorably to state-of-the-art methods such as FCI and Conservative PC. More interestingly, the resulting confidence measures for individual causal relations turn out to match fairly well to probabilistic estimates p(X → Y|Data). Here we look at how these estimates are obtained and how they can be improved upon.

[ Paper ]