Causal discovery lies at the heart of most scientific research today. Perhaps
surprisingly then, 'proper' causal discovery algorithms are still not as
routinely applied in practice as one might expect. Arguably one of the
obstacles is the perceived lack of robustness in the output: borderline
decisions are propagated through the network, but this ambiguity is not
apparent in the causal model. Bayesian score-based approaches can provide some
measure of confidence by outputting multiple high-scoring models with the
implied assumption that arcs present in many are more likely to be true.
Another way is to augment individual relations with an explicit reliability
measure. Methods like Coopers' LCD-algorithm and the Trigger algorithm in
genomics can already give such probabilistic estimates, but only apply to very
specific instances.
We introduce a new approach that utilizes a Bayesian score to obtain
probability estimates on the input statements used in a constraint-based
procedure. These statements are processed in decreasing order of reliability
until a single output model is obtained. A basic implementation already
compares favorably to state-of-the-art methods such as FCI and Conservative
PC.
More interestingly, the resulting confidence measures for individual causal
relations turn out to match fairly well to probabilistic estimates p(X
→ Y|Data).
Here we look at how these estimates are obtained and how they can be improved
upon.
[ Paper ]