November 5, 2021

Critique of Bayesian Reason

What are the limits of Bayesian reasoning? Do they matter? What are the alternatives? Apologies to Kant.

Critique of Bayesian Reason

Epistemic status: pure insanity.

There are more things in heaven and earth, Horatio, than are dreamt of in your philosophy.
- Hamlet (1.5.167-8)
For the main question is always this: what, and how much, can un­derstanding and reason cognize independently of all experience?
- Kant, Critique of Pure Reason

In this post I am going to assume readers know what bayesian reasoning is.

It has become fashionable in certain circles to attempt to reduce reasoning about the world to a simple[1] application of Bayes' Rule[2]. It is applied in problems as diverse as figuring out who to date and evaluating possible threats to the continuation of humanity. Not reasoning about the world in this way has been described as insane.

In fact a large fraction of the current generation of artificial intelligence research is basically the application of Bayes' Rule at vast scale. It's not totally surprising then that many of the same people assume that super-intelligent A.I must be right around the corner.

Given the weight assigned to Bayesian reasoning as a means of understanding the world, and given the importance of problems to which it is being applied, there are a few questions we ought to ask. What are the limits of this kind of reasoning? Is there any other kind of reasoning that could be valuable? Does any other kind of reason even exist?

Finding the limits of any system is often about finding examples where the system fails. This is sometimes called red-teaming[3]. Sometimes this requires destructive testing of the system under examination but I don't think we're going to find a contradiction in Bayes' theorem[4] so it won't come to that. That said, let's put Bayesian reasoning to a test.

We're going to try to decide the Collatz conjecture using Bayesian reasoning.

The Collatz conjecture is a really simple math problem that you will become very famous for deciding[5]. Here's how it goes, just follow the instructions to make a sequence of integers:

  1. Pick some positive integer n
  2. If n is even, the next term is n/2
  3. If n is odd, the next term is 3×n + 1

Our job is to decide whether this sequence always comes back to n = 1. Let's do this the Bayesian way by trying to form a belief about "this sequence always returns (converges) to n=1". Let's call the event 'the sequence converges to 1 for all starting points n' A. The (unconditional) probability that the sequence converges to 1 everywhere is P(A). In principle P(A) is equal to 1 or 0[6]. In practice we have a few choices which I'll go into later on.  

Being Bayesian we've decided we can't know anything about reality except through updating beliefs through observations. Let's call the observation "the sequence converges for a given starting point n" B, and the probability of that observation P(B) - i.e. the probability that for a randomly chosen n the sequence converges.

We're going to try to update our belief P(A|B) until it converges to 1 or 0 so we can decide the conjecture. We've got another term we need to deal with, P(B|A), which is "what is the probability of observing B given the sequence always converges to 1". Naturally if A it ought to be 1. If ¬A? Well, we'll get to that.

Unfortunately we can't ever hope to observe the entire sequence for all n, our friends the positive integers are infinite- we'll never get our needed P(B) this way. But many classes of events are impractical or impossible to observe in their entirity, which is one motivation to do Bayesian reasoning in the first place. We had better take a sample instead. What kinds of sampling can we do?

We could try making observations in the natural order, i.e. look at the starting points n=1, n=2, n=3, ... and so on[7], and update our belief as we go. Humans collectively have tested every n up to roughly 8 E+20 , i.e. eight hundred quintillion (at time of writing)[8]. All converge. I'll save you the arithmetic but taking this as our sample sets P(B) = 1, and P(B|A) = 1. This is very unsatisfying! This means that our belief P(A|B) = P(A). We have learned nothing from this observation, we're entirely beholden to how we choose P(A)! Shit.

Immediately the experienced Bayesian will say "Ah! Your sample simply has a statistical bias! We should pick a more unbiased sample." Alright. Which one would you prefer? Sampling comes down to choosing a finite set of n's to try, i.e. a subset of positive integers. Maybe the most unbiased thing to do is to pick such a subset at random. Here we have a problem. In fact we have an extremely serious problem.

The set of all subsets of positive integers is called the power set of integers. How big is that guy? Well, as you'd expect, it's infinite. How infinite? It has exactly 20 elements. That guy ℵ is pronounced "aleph", and the simplest way for me to explain to you what this means is "when you encounter hebrew letters in mathematics you are probably in serious trouble". The power set is even more infinite than the integers are infinite.

You can try to pick random samples from such a set[9] but the representational capacity of our mathematics falls short of anything but an approximation. Specifically, certain large classes of elements are guaranteed to be left out. In fact using our available representations almost all elements will be left out. Clearly this results in a biased sample.

The upshot of all this is there is simply no meaningful way to take samples to construct an unbiased estimate for P(B). It's not hard to do because it's computationally expensive, it's impossible with any representation we know about.

Fine, what if a super-intelligent A.I with an interest in human mathematics came along one day and said "oh, B, I know how to sample that unbiasedly, here's a nice subset of the integers for you". Great! Now we just need P(A) and P(B|A)!

How do we choose P(A)? As mentioned before it could be 0 or 1. If we assume it's 0, the whole exercise falls over - our belief is 0 forever regardless of what we observe. Not very Bayesian! Additionally, we will have a really hard time with P(B|A), now no longer 1, which we will now have serious problems defining whether or not there are a finite number or an infinite number of n which don't go to 1 - among countably infinite sets, the idea of 'measure' is all screwed up. If P(A) is 1 we just end up at P(B) = 1, again.

You might take the epistemically dubious step of taking the proportion of all sequences of this form which converge to 1 - unfortunately for you the general version of this problem is undecidable. If you want to select at random among all sequences of integers, you are in even worse shape than we were for trying to find a suitable P(B).

Now my dear Bayesian friends might say "But Anton, P(A) just represents my belief in A, it does not have to say anything about the objective truth of A - it's just the prior I'm updating". Fine. Unfortunately for my Bayesian friends I am uninterested in their beliefs about the truth of the Collatz conjecture (except as a fun topic of conversation) because it tells me nothing about the world as it exists, and only tells me about your cognitive biases.

If the aim of Bayesian reasoning is to achieve as accurate a picture of objective reality as we can given our limited capacity for observation and our cognitive distortions, treating P(A) as someone's belief (or perhaps enumeration of the beliefs of say, number theorists) doesn't help us one bit (see [1] again).

I would like to call this class of problem "Bayesian Undecidable".

There exists no Bayesian construction with accompanying stream of observations that will ever allow us to believe anything about the objective reality. This applies to any statement about any non-terminating sequence of integers, for example. This doesn't mean Bayesian reasoning is nowhere useful, just that its use (and hence the use of systems that rely on it as an operating principle) is rather constrained.

In fact we see that for this form of problem, general human reason must do something entirely different; there exist many proofs about sequences of integers created and verified by human mathematicians (sometimes via implementing the proof as a computer program), which do not rely on priors or observations in the Bayesian sense. What is the difference? Where does truth come from in these cases?

In my opinion, this is the question. Concretely; if we are not always doing Bayesian reasoning, what is it that we are doing exactly?

Mathematical reasoning underpins a significant proportion of our understanding of the world; specifically that part of our understanding that goes beyond empirical observation and instead seeks to explain the mechanisms. Bayesian reasoning appears to fall short of anything resembling explanations - it is capable only of empirical hypotheses and empirical observations, 'crawling hypothesis space' as certain people have put it.

Here's a fun thought experiment; is Baye's rule a quine? i.e. is it possible to arrive at Bayes' rule itself through Bayesian reasoning alone?

Mathematical reasoning seems to have other spooky properties. It's often the case that even if a proof of a mathematical theorem is wrong, the theorem itself turns out to be correct. Mathematical ideas often seem to spring forth out of nowhere (instead of careful observation and updating of some internal model), and indeed some of them are so alien that one can't imagine any combination of existing concepts that could have given rise to them.

So how did we get here? Where can we go? What are the limits to our systems of reason? These continue to be brightly burning questions. Update your priors accodingly.

[1] One might say simplified - such Bayesians often invent 'plausible sounding' values for the required prior probabilities (sometimes aided by an enumeration of anecdotes) and shove them through the reverend's formula to arrive at a probability. If the idea is to escape the cognitive distortions of reasoning by intuition, this is definitely not the way to go about it even if it feels nice to do a little arithmetic. Perhaps sensitivity analysis ought to be incorporated into the Bayesian's praxis.

[2] Bayes' {Theorem, Law} if you prefer!

[3] This post was also going to be titled "red-teaming bayesian reasoning" but I'm not really on blue team here so.

[4] If Kolmogorov didn't I doubt that I can. Hey that's an example of Bayesian reasoning!

[5] When you hear that the rate of production for fundamentally new theorems is low, or that very few mathematicians succeed in cracking long-standing problems, remember that you’re including the entire population of people who, it turns out, actively don’t want to solve long-standing problems in the first place. You're not even trying!

[6] Unless you're an intuitionist, in which case I don't know what you do. Perhaps this entire exercise is meaningless in your framework, let me know.

[7] Something interesting I don't go into depth here about is, in the absence of any kind of proof about this conjecture, the only way to know where we'll end up is to actually compute the sequence from that point. Dynamic programming etc. aside, this presents its own problem but not quite as bad as the problems of computing nonlinear dynamics.

[8] Many expressions which are periodic or even constant for small-ish numbers turn out to have counter-examples much further away. This should be unsurprising to the Bayesian since there are many many many more big numbers than there are small ones, and if nature sees fit to distribute counter-examples randomly it's far more likely (to the extent that the word likely means anything here) they'll be found further along than we can hope to look.

[9] Thanks to @MarthPerlman for showing me some constructions used in computational finance to sample from a given distribution among infinte sequences. They're neat! I do still think it's a problem that we can only sample from sets of measure zero, weak convergence not withstanding.