Chance and causality
Judea Pearl (1936) has a great book on Causality, 2^{nd} edition 2009. He also has the open access Journal of Causal Inference with the first issue in June 2013. Every scientist will find his paper Linear Models: A Useful “Microscope” for Causal Analysis enlightening. In some ways it restores to prominence the “path analysis” that Jan Tinbergen followed in the 1930s for econometrics.
I first heard about Pearl’s approach in 2007. This was rather late actually, and happened in the context of my own book on logic and inference, A logic of exceptions (ALOE). Inspired by Pearl’s book, I ventured on a book Elementary Statistics and Causality (ESAC). A nice observation is that the truthtable on “If it rains then the streets are wet” or p implies q is treated in mathematics with two propositions only (implication is called a binary operator), while causality requires a third factor, namely some cause r that makes that the streets can be wet when it doesn’t rain. Unfortunately, since 2007 I have had no time to work on ESAC, and now it is already 2013.
There is a fairly straightforward link between logic, probability, statistics and causality. Consider the truthtable for p implies q on the left and the 2 x 2 table of statistical observations that we might use for a ChiSquare test on the right. Obviously p occurs before q. For epidemiology we can regard p as the true state of disease and q a test statistic, with the hopeful statement “If I am sick then the test will show it”, with n21 false-negatives and n12 false-positives.
p implies q | p | not-p | Total n | p | not-p | |
q | 1 | 1 | q | n11 | n12 | |
not-q | 0 | 1 | not-q | n21 | n22 |
The statistical probability of the implication is Pr[p implies q] = 1 – n21 / n. I prefer to regard causality as having a validated theory that p implies q, so that both a theory is required (that explains the causal chain) and empirical validation that n21 = 0.
When we observe such empirical regularity then we are induced to look for a theoretical explanation. One might hold that empirical observations remain basic, but my impression is that the relation is only convincing to the human mind when there is also a theory (a story).
Note though that a high prevalence of not-p would enhance Pr[p implies q] without providing much information about where it really matters, namely p. For causality we would rather look at the conditional probability Pr[(p implies q) | p] = Pr[q | p] = n11 / (n11 + n12), which is the sensitivity of the test, and which would lead to Judea Pearl’s operator of setting p.
Alan Hajek in Probability, Logic, and Probability Logic (2001) suggests that it would be hard to translate from the Kolmogorov axioms for events to axioms for statements. I don’t see the problem. Each event gives an atomic sentence like “It rains”, and the translation is perfect. Hajek’s paper however still is fine since he reviews the various possible philosophical foundations for probability, such as also the Dutch Book, a series of bets, each of which the agent regards as fair, but which collectively guarantee his loss. But that paper is again overly complex on conditionals, and one should quickly switch to Pearl.
Hajek (p23) contains the philosophical question: “If the premises of a valid argument are all certain, then so is the conclusion. Suppose, on the other hand, that the premises are not all certain, but probable to various degrees; can we then put bounds on the probability of the conclusion?”
- The logical inference is p & (p implies q) ergo q. The philosopher wonders about the probabilistic analog Pr[p] & Pr[p implies q] ergo Pr[q].
- Hajek gives the impression of getting lost here. He reviews the philosophical literature but why not refer to the table above ? Provided that the conjunction stands for multiplication, we find that (n11 + n21) / n * (1 – n21 / n) generally differs from (n11 + n12) / n. The probabilistic nature of the inference causes that we have to include other possibilities than merely the 1 / 0 values.
- The proper translation becomes the conditional Pr[p] & Pr[(p implies q) | p] ergo Pr[p & q] which gives n11 / n. With the prevalence of the disease (your chance to be sick) and the sensitivity of the test, we can find only one cell, namely that you are both sick and test positive. We cannot conclude about the overall Pr[q]. For that we need also to know the specificity too. It is a bit curious that philosophers do so difficult about an issue that has been clarified by epidemiology long ago.
The example of rain and wet streets might be simplistic. A more relevant example can be the hypothesis that austerity in the government budget would cause economic revival at the end of the elected term. Generally governments impose austerity at the beginning of their term, so that they can stimulate the economy at the end of their term, and then be re-elected. This is the Nordhaus Political Business Cycle. Administrations will tend to claim that the recovery depends upon their stern austerity at the beginning rather than the stimulus at the end. Clearly, this relationship will not be the 1 / 0 situation of pure logic. Van Dalen & Swank (1996) subjected the matter to statistical analysis for Holland. One of my questions would be whether their R-square could be turned into such a Pr[p implies q] set-up, and whether it would be useful to do so for understanding and inference.
Due to Pearl’s hard work the issues grow clearer by the day. But chance and causality work together to confuse our understanding of chance and causality. New ideas erupt daily, and there is ever more to read and ponder about. Some of the following is on my still-to-read-list.
George Boole (1815-1864) invented symbolic logic and wrote the difficult An investigation of the laws of thought (1854) that extended his analysis towards probability. Some hold that John Maynard Keynes in his Treatise on Probability (1921) found some error’s in Boole’s book, and that this caused Boole’s book to fade into obscurity. Now, however, there is David Miller, professor emeritus of Columbia, who wrote The last challenge problem: George Boole’s theory of probability (2009) who holds that Keynes was wrong and that Boole was right. The world is small: Miller’s book is referred to by Dupuydt & Gill, while Richard Gill also wrote a favourable review of ALOE.
Miller: “There are some difficulties to overcome in order to convince persons that the subject of this book is worth knowing something about. George Boole’s theory of probability has had an extremely bad press for more than 100 years. It has almost universally been considered to be too complicated to understand, too difficult to calculate, and wrong in addition. As evidence that this is true, try and find a book on probability in, say, the 20th century which has any material on Boole’s theory of probability. This is quite remarkable treatment for a person as distinguished as George Boole. In fact, his theory is not wrong, it is quite easy to make the calculations after a few simple emendations, it can solve a number of important problems which cannot be handled by the orthodox theory of probability, and there are quite a few areas in which these problems arise. In the following panes we will try and convince skeptics that there are good reasons for believing that a book on the subject is justified.”
And of course there are Nassim Taleb with his black swans and the entertaining Fooled by randomness (2004) and James Franklin with his The science of conjecture. Evidence and probability before Pascal (2001). These two are on the read-book-list but require to be pondered again. Both refer to Carneades 214/3–129/8 BC and the 5^{th} century BC legendary co-founder of rhetorics Korax (Corax) of Syracuse. These give a historical view how probability entered our thoughts via legal applications, whereas modern law still has to adapt to the advances in probability theory. We may be lucky however that the misconceptions about black swans before the melt-down of the banks in 2007+ were not enshrined into legal statutes.
Taleb describes traders who may have a horizon of only five years, and his “black swans” of debt-induced financial collapse are more frequent than one would want to hold for true black swans. Regard n = 10000 and n21 = 1, then Pr[p implies q] ~ 1, but n21 would be our black swan. To evaluate this, we would need a loss-function, with a similar 2 x 2 table with costs and profits, and the standard ChiSquare would not really do. This would bring us to risk analysis. When n becomes large the standard statistical tests quickly become statistically significant but lose their informational value for the decision problem at hand. The loss function however would correct that. It would be enlightening to show this by some good cases. If you are new to these ideas you might enjoy Ziliak & McCloskey, The Cult of Statistical Significance (2006). They are weak on the loss function however.
Well, due to various causes, I will probably have little time to finish my ESAC book quickly. It is nice to see though that logic, inference, probability and statistics all reside under the umbrella of the beautiful subject of economics (though others will think conversely).
PS. ALOE uses x ≤ y for the truthvalues in p implies q. Perhaps you want to check that. Check also that Pr[p & q] ≤ Pr[p & q] + Pr[not-p & q] = Pr[q]. Thus there is something to say for a decision of q with some minimal probability merely from the observation that p & q, as in propositional logic. This angle provides a link with logic but it begs the question whether it is so useful once we consider probability.