why i am not not a bayesian

Okay so today it’s Why I am not a Bayesian by Clark Glymour. It’s a dense paper, and I will cordon off a section of it for discussion, namely the final section dealing with the so-called problem of old evidence. Indeed, I’m not really even going to discuss that, per se. Rather, I’m going to focus on a single three-sentence passage in the paper. This one:

“How might Bayesians deal with the old evidence/ new theory problem? Red herrings abound: the prior probability of the evidence, Bayesians may object, is not really unity; when the evidence is stated as measured or observed values, the theory does not really entail that those exact values obtain; an ideal Bayesian would never suffer the embarrassment of a novel theory. None of these replies will do: the acceptance of old evidence may make the degree of belief in it as close to unity as our degree of belief in some bit of evidence ever is; although the exact measured value (of, for example, the perihelion advance) may not be entailed by the theory and known initial conditions, that the value of the measured quantity lies in a certain interval may very well be entailed, and that is what is believed anyway; and, finally, it is beside the point that an ideal Bayesian would never face a novel theory, for the idea of Bayesian confirmation theory is to explain scientific inference and argument by means of the assumption that good scientists are, about science at least, approximately ideal Bayesians, and we have before us a feature of scientific argument that seems incompatible with that assumption.”

Now most of this passage is uninteresting to me. Of the three proposed “Bayesian” strategies, the first two make no sense to me at all, and that Glymour spends so much time rebutting them seems to me a bit of a straw-man exercise. The third, however, I got very excited about, in particular this phrase: “an ideal Bayesian would never suffer the embarrassment of a novel theory”. This is why we read philosophy. To hear others formulate, in developed form, our own embryonic positions. But then, when I read “the idea of Bayesian confirmation theory is to explain scientific inference and argument by means of the assumption that good scientists are, about science at least, approximately ideal Bayesians, and we have before us a feature of scientific argument that seems incompatible with that assumption” I sort of lost the thread of Glymour’s argument. This was confirmed by the last page and a half of the paper, which I did not recognize as having been written by the same person that had so cogently characterized my embryonic thought.

So what I propose to do in this blog post is to develop my embryonic thought. Which is this:

The fact that an ideal Bayesian would never be embarrassed by a novel theory completely solves the old evidence problem. Moreover, science does approximate ideal Bayesian behavior…albeit slowly. Moreover, it is in communities of scientists, not in individual scientists, that this approximation is best. 

Now I’ll try to explain myself. According to the Bayesian, when one encounters new evidence, one updates one’s prior probabilities by eliminating all world-states incompatible with that evidence and raising credences in other world-states uniformly, i.e. retaining the ratios of their likelihoods. So if worlds A and B are consistent with evidence E and if my priors dictate a credence in A twice that of B before observing E, I will have credence A equal to twice credence B after observing E as well. The ideal Bayesian doesn’t need to “think about” her new credences at all. She just updates by the standard formula from her priors. That, at any rate, is how the story goes.

There are a couple of potential problems with this story. The first is that it requires us to have prior probabilities. So it is, for example, that Bayesian parameter estimation is perceived by some to be clunkier than some “other” methods of parameter estimation, such as maximum likelihood estimation, that do not appear, on the surface at any rate, to utilize prior distributions of the parameters directly. I don’t think this is really right. Generally some distribution is assumed if an estimation method is to be rigorous. (Perhaps it is a uniform one.) So it’s not the sense advocates using methods of estimation that go by the name “Bayesian” that I am concerned to defend here. If I take prior parameter distributions to be implicitly uniform wherever they have not been reflected upon, even my choice to employ maximum likelihood estimation becomes a genre of Bayesianism. The real problem I want to address here is the question of whether it’s anti-Bayesian to reflect on and change one’s priors after receiving evidence. In general I will conclude no, but with a caveat: I am not talking about changing your priors due to the fact that you take the fact of evidence e to count as evidence that e was likelier than thought to be obtained as evidence. Rather, I am considering a change brought solely about by using the topicality (rather than the actuality) of e as an impetus for undertaking first-time reflection on how one ought to update in cases where e is obtained as evidence. The sort of reflection that could, and in fact would, have been undertaken, long before e was encountered, given time and computational resource.

I’ll get at these issues by way of the following game. I will start giving you terms in some sequence or other s(n). After s(n-1) is revealed, you take a guess at s(n). Your score is the largest n for which your guess was wrong. The lower your score, the better. Here we go: [Personal note: I have an amusing mental block concerning this puzzle. In particular, no matter how many times I talk about it, I screw it up. It’s really sort of amusing.]

s(0) = 0

I am assuming you guess s(1) = 1.

s(1) = 1

Now you’re feeling pretty confident. You guess s(2) = 2.

s(2) = 2

Wonderful. Guess s(3) = 3, naturlich. Now comes this:

s(3) =

260121894356579510020490322708104361119152187501694578572

754183785083563115694738224067857795813045708261992057589

224725953664156516205201587379198458774083252910524469038

881188412376434119195104550534665861624327194019711390984

553672727853709934562985558671936977407000370043078375899

74206767840169672078462806292290321071616698672605489884

455142571939854994489395944960640451323621402659861930732

493697704776060676806701764916694030348199618814556251955

925669188308255149429475965372748456246288242345265977897

377408964665539924359287862125159674832209760295056966999

272846705637471375330192483135870761254126834158601294475

660114554207495899525635430682886346310849656506827715529

962567908452357025521862223581300167008345234432368219357

931847019565107297818043541738905607274280485839959197290

217266122912984205160675790362323376994539641914751755675

576953922338030568253085999774416757843528159134613403946

049012695420288383471013637338244845066600933484844407119

312925376946573543373757247722301815340326471775319845373

414786743270484579837866187032574059389242157096959946305

575210632032634932092207383209233563099232675044017017605

720260108292880423356066430898887102973807975780130560495

763428386830571906622052911748225105366977566030295740433

879834715185526028053338663571391010463364197690973974322

859942198370469791099563033896046758898657957111765666700

391567481531159439800436253993997312030664906013253113047

190288984918562037666691644687911252491937544258458950003

115616829743046411425380748972817233759553806617198014046

779356147936352662656833395097600000000000000000000000

0000000000000000000000000000000000000000000000000000

0000000000000000000000000000000000000000000000000000

000000000000000000000000000000000000000000000000000

I’m guessing that this is not what you were expecting s(3) to be. Clearly this is no polynomial of modest degree and smallish coefficients–we must think “outside the box” on this one.

So what would a Bayesian do? On paper, just condition her prior probabilities for the space of all possible five term sequences on what she’s seen so far, namely s(0) – s(3). But, you might reason, since s(3) has well over a thousand digits, this would appear to be an unrealistic demand in at least two respects. First, there are just too many candidate sequences. Second, you feel pretty sure now that your data is specific enough that there is a unique “simple” pattern. So you feel pretty certain that, given enough time and paper, your probabilities for what s(4) might be would concentrate significantly on some single value K. You just don’t have any clue, right now, what value that is. These are two respects, then, in which you differ from the “ideal” Bayesian. First, you have fewer computational resources. Second, you are less clever. (I leave it open whether these amount to the same thing.) But how, if at all, do these facts bear on the status of “Bayesianism”?

I would say they do not bear on that status at all. But let’s solve the problem and discuss that after.

First, you may note that s(3) ends with a lot of trailing zeros. That means it has a lot of 2s  and a lot of 5s in its prime factorization. So, it seems likely that s(3) was arrived at by multiplication, and it becomes natural to factor s(3) into primes. Having done that (childs’ play on a computer), one would notice some peculiar things. First, no prime factor greater than 720 appears. Second, every prime between 360 and 720 appears precisely once. Third, every prime between 240 and 360 appears precisely twice, between 180 and 240 thrice, etc. It’s not far from here to construct the hypothesis, easily verified, that the number s(3) is 720!.

So now we have our sequence: 0, 1, 2, 720!.

But you may recognize as well that 720 = 6!. So here is our sequence: 0, 1, 2, 6!!. What next? Well, 6 = 3! and 2 = 2!!, so: 0, 1, 2!!, 3!!!. Is that all? Well, 1 = 1!, so: 0, 1!, 2!!, 3!!! and of course the next number in the sequence is 4!!!!.

It’s now that Glymour would, I imagine, say something to the effect that what we just did was not Bayesian, or perhaps not even compatible with what is Bayesian. Before we encountered the evidence, i,e. s(0) – s(3), we did not have priors (because we had never considered the question) reflecting the inevitability of s(4) = 4!!!! in light of evidence s(0) – s(3). One might claim that our eventual considered response, then, that s(4) = 4!!!! with near certainty, constitutes a departure from Bayesian behavior. I don’t agree with that, and the reason is this: it is not merely the case that we now think that the probability of 4!!!!, conditioned on what s(0) – s(3) actually are, is near unity. No–we also believe that our former unreflective priors (if any) are irrational. We have thefefore abandoned them in favor of more better priors.

So we now hold that a rational priors function P will have it that P[0, 1!, 2!!, 3!!!, 4!!!!] is nearly equal to the sum, over all K, of P[0, 1, 2!!, 3!!!, K]. We’ve remained dedicated to the Bayesian perspective throughout our deliberations; we’re just better educated, post-reflection, about what “ratonal priors” ought to look like. Now it’s true that we did not subscribe to such priors (perhaps to any) before we got the puzzle. Nor is it the case that the Bayesian perspective is what led us to them. Bayesianism is silent on the matter of what priors it’s rational to adopt. All Bayesianism tells us is how updated credences should relate to priors and evidence. (They should be arrived at via conditionalization of one’s priors on the evidence, whatever those may be.) And should you be a less than ideal agent, it doesn’t even say, as many have supposed, “never change your priors”. Changing your priors implies diachronic irrationality, true enough. But that’s because only one set of priors is correct, and it’s irrational to have incorrect priors! (This is not the usual view, but it is the correct view.) Obviously if you change your priors, then they were, at some point, incorrect, and to have held incorrect credences at some point in time is to have been diachronically irrational. (It is not necessarily to have been diachronically incoherent, which is a stronger claim. Changing your priors does imply diachronic incoherence, but that’s no reason to persist in holding onto bad unreflective priors.) But it’s not to have violated Bayesianism. What would constitute a violation of Bayesianism? Nothing less than to hold a credence that is distinct from the credence that would be held were one to condition one’s current priors on one’s total evidence.

To repeat: in our example, the role of the evidence you acquired, namely s(0) – s(3), had merely an accidental, attention-focusing role in the adoption of your “improved” (i.e. reflective) priors…priors it was inevitable you would endorse once you had considered the matter closely, whether in response to evidence or not. If you had an infinity of years in time-suspended-isolation to compose comprehensive priors on integer sequences, you would become that ideal creature that can’t be embarrassed by novel patterns. You would, without having ever encountered a pattern in “real life”, have come to “know all the patterns”, simply for having encountered them all in various thought experiments. That knowledge would be sitting there, in your priors. Maybe someone else doing the same would have different priors. They’d almost certainly each notice a lot of the same patterns, but might not agree on their relative strengths. They can’t both be ideally rational, but they can both be coherent, in spite of their differences.

Bayesianism, then, given a set of priors, is just a mindless updating scheme. It’s fine so far as it goes, but the real work of science is in the priors, and Bayesian lore is silent (not wrong) on the issue of what they should be. It’s more or less agreed that “simplicity” should be sought after in the setting of one’s priors, but there’s a lot of question as to what “simplicity” comes to. For the actual practice of science, this question is probably not that important. (Scientists know simplicity when they see it.) For philosophy, however, this is where the action is. I think Glymour senses as much, but it’s just not right for him to encourage others (by example) to disown Bayesianism (“I am not a Bayesian”). Bayesianism isn’t a recipe for all rationality, but there’s no reason not to be one. It’s probably better to just say “I’m not not a Bayesian.”

So I won’t not say it.

Glymour’s concluding paragraph:

“None of these arguments is decisive against the Bayesian scheme of things, nor should they be, for in important respects that scheme is undoubtedly correct But taken together, I think they do at least strongly suggest that there must be relations between evidence and hypotheses that are important to scientific argument and to confirmation but to which the Bayesian scheme has not yet penetrated.”

I don’t know what “relations between evidence and hypotheses” amount to in this context, but that is not how I would put it. The real problem that the Bayesian scheme doesn’t penetrate…makes no claims to penetrate now or ever, so there’s little point in saying “has not yet penetrated”…is that of the adoption of rational priors. Appealing to “simplicity” can only get us so far. Linear relations are simpler than quadratic, we may say, which are in their turn simpler than cubic, etc. There are other sorts of functions appearing in nature, however, both familiar (exponential, trigonometric) and not. Many distributions, both familiar (binomial, hypergeometric, Poisson, normal) and not. It’s well and good to order relations according to the number of unknown parameters (fewer = simpler) or to say that a relation is confirmed precisely when the number of data points exceeds the number of unknowns, and certainly we suspect that any comprehensive theory of the rationality of priors should reflect versions of these and other ad hoc principles, but none of this means that we have such a comprehensive theory, least of all that Bayesianism (of all things!) should be faulted for not laying it at our feet.

After all, the theory of logical consequence didn’t lay this at our feet either–no one wrote a paper called “Why I am not a Logical Consequentialist”.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: