Cian Dorr: Against Counterfactual Miracles

Cian Dorr writes, in the recent paper Against Counterfactual Miracles,

“It is natural to suppose that if…say, you had blinked one more time than you actually did while reading the previous sentence–the state of the world in the distant past would still have been…as it…was. … But if determinism is true….”

There is something slightly paradoxical going on here. In evaluating a counterfactual, we are, according to orthodoxy, put upon to alter the actual world as meagerly as possible while making the antecedent true, then judge the truth of the counterfactual according to whether the consequent is true, subsequent these alterations. But unless we change the laws underwriting determination, we need to alter the past in order to account for the extra blink.

The problem is general, of course. It infects at least those counterfactuals whose antecedents aren’t chance events. All counterfactuals, if determinism is true. Is it? Dorr writes:

“determinism…is a live possibility, one  that many physicists and philosophers of physics take quite seriously. So it is not a merely academic exercise to investigate which of our ordinary beliefs are consistent with it.”

I would urge an interpretation of counterfactuals on which the issue of determinism becomes a red herring. In particular, I would urge that when we imagine incorporating truth of the antecedent into actuality, we do so in a way that only fixes the epistemic position of the speaker.

On a certain view, this seems like a non-starter. On the table is

A♠ 9♠ 8♣ 9♥ 3♠

John, holding A♦  A♥, goes all in, whereupon Matt goes into cardiac arrest and dies. John then says, rather insensitively, “if Matt had called, I would have taken his money”.

On just about any extant view, this counterfactual appears to be true if and only John had a stronger hand than Matt, i.e. if Matt was not holding 9♣ 9♦.  In particular, its truth appears not to be a function of John’s epistemic situation….it depends on facts about the world that John doesn’t know. Apparently, then, I am surrendering the view that counterfactuals have truth conditions, and/or that their meaning is closely related to truth, or to when or how they are true.

So what did John mean by “if Matt had called, I would have taken his money”? We can imagine saying to him in reply “you don’t know that, John.” To which he might say “Well, no…I don’t know it.” (Is this merely contextualism?) Why then did you say it, we should then ask, to which we might get “I said it because, most of the time, when you’re someone like me playing against someone like Matt in situations like this with those cards up and you have aces full and go all in and the other guy calls, you take his money.” Most of the time, then. Not all. Indeed…not even this (ostending some counterfactual) time.

Is this a breed of unfashionable internalism? I don’t mind being unfashionable. (In fact, I tend to prefer it.) I just don’t want to get in trouble later on.

Maybe I will get in trouble later on, but for now, I’m doing quite well, for by “if I had blinked twice (rather than once, say), then the past would have been the same” just means something like “most of the time, when someone like me blinks twice in a scenario epistemically similar to mine, the past turns out to the be same”. Which is clearly just wrong. All of this squares with my own intuition.

I realize it will not square with everyone’s. Here, though, is some therapy. There is a strong conversational norm against asserting that which you don’t know to be true. (It’s not quite lying, but it’s close.) On the orthodox view, John would be in  violation of this norm in avowing “if Matt had called, I would have taken his money”. For he is clearly in violation of this norm in avowing “my hand was stronger than Matt’s” (he knows no such thing) and, on the orthodox view, these are truth-equivalent. But (I claim) our intuitions suggest that John is not in violation of this norm. What John’s words indicate is something a bit vaguer. Something along the lines of “I had a good hand” or “I wasn’t bluffing” or perhaps just “I think I had a better hand than Matt”.

Why “if Matt had called, I would have taken his money” is more assertible here than “I had a better hand than Matt” is something of a mystery, for if any counterfactual has truth conditions, this one does, and every semantics (that I know of) would agree on what they are. I believe the moral is that the two avowals aren’t truth-equivalent, and the only way I can imagine that being the case is if the counterfactual has no truth conditions at all.

Or: whatever I mean by “if it had been the case that F then it would have been the case that G”, it’s surely going to turn out to be something I think I know. (Otherwise, why exactly am I saying it.) So “given what I know” is implicit. It’s what I know that’s relevant to the fact that at least most of the pertinent F worlds are G worlds. In this case (thinks John) the fact that I have cards that win most of the time in scenarios like these.

But, I’m getting sidetracked. Let me try to get back on the paper. Dorr gives an example from Frank Jackson:

a. If I had jumped out of this tenth floor window, I would have been killed.

b. If I had jumped out of this tenth floor window, I would have done so only because someone had put a safety net in place.

On the similarity interpretation of counterfactuals, both a. and b. seem to have valid readings. (This is one of the big problems with the similarity interpretation…there are too many viable choices for a similarity metric.) The b. reading requires what Dorr, following Lewis, calls “backtracking”. In finding a similar world or worlds, one allows oneself to significantly alter the past. Presumably, worlds where there are some people who fear that I might be suicidal are more similar to ours than are worlds where I am suicidal. Now you have to decide whether backtracking is legitimate.

Which I think is fairly hopeless. “Similar” can mean too many different things. My own interpretation of counterfactuals is in no such trouble here. Since I am implicitly fixing my own epistemic position, b. has no viable reading. It would, in particular, be weaker to avow “If my epistemic position had been the same and I had jumped out this tenth floor window then my epistemic position would have been different”.

Hmm. There’s some subtlety here. I want to allow an epistemic reading of something like “If  had looked at that card, I would now know who the messenger is”, which seems to be stronger than “If my epistemic position were the same and I had looked at that card, my epistemic position would be different”. Though this is to say different now, not different at the moment where the counterfactual and the actual diverge. Hence “If my epistemic position were the same then and I had looked at that card, my epistemic position would be different now“. What it seems we want to fix is my epistemic situation at the moment of divergence.

Again, though…there are traps. How about “If I had known then what I know now, I would not have married Jane”. Clearly that is a valid counterfactual, but a satisfactory reading of it requires us not to fix my epistemic situation at the moment of divergence. Or does it? Maybe it doesn’t. My epistemic position is the same up to the moment of divergence…said divergence being the point at which I counterfactually learn then what I actually know now.

The apocryphal analysis is just as before. What I know now is that Jane is DKL-positive. (DKL is a dread virus that causes one to mishandle analysis of counterfactuals.) If I had know that then, it would have to have been (according to the apocryphal analysis) because I was also DKL-positive (you can only learn this sort of thing at a DKL-ics anonymous meeting, it seems), and so I would have married her anyway.

So…no backtracking! Not in any way that affects my epistemic situation prior to divergence, that is…though the divergence itself may involve a counterfactual epistemic position. What is distinctive about the epistemic perspective, then, is that I am free to backtrack the hidden variables, (if determinism is true) as freely as others evoke counterfactual chance outcomes (should it be false). At any rate it doesn’t really matter whether the relevant variables are merely hidden (determinism) or generated on the fly (chance). And plainly it should not matter!

This is the sense in which determinism is a red herring.

Okay…Dorr wants ultimately, he says, to hang blame on the following:

Past: Necessarily, whenever x is normal at t, there is a true history-proposition p such that p would still have been true if x had blinked at t.

He writes:

“We will be tempted to dismiss Past on the basis of our reactions to sentence like (2):

(2) If determinism is true and x does not blink at t, then if x had blinked at t, that would have been because of a prior history of determining factors differing all the way back.

(2) sounds incontrovertible and is plausibly true on its most natural interpretation.”

Whereupon he strikes an analogy between (2) and b. above and prescribes that all counterfactuals will be parsed in the spirit rather of a. The expedience of the epistemic perspective is now completely clear, as (2) no longer reads as “incontrovertible” (mostly it just reads as improperly formulated, i.e. confused) and no such prescription is necessary.

Now we come to footnote 5, which I reproduce in full.

“5. Note that the following also sounds obviously true:

(2′) If determinism is true and x does not blink at t, then if x had blinked at t, a miracle would have to have occurred.

Although I hold that Past fails in ordinary contexts, I am inclined to think that (2′), like (2), is true in the context it most naturally evokes. Lewis’s dichotomy between “backtracking” and “standard” contexts is not particularly helpful here. I believe the explanation turns on subtle ways in which epistemic necessity modals (like “have to”) can serve to signal that certain other propositions, serving as premises from which the asserted content can be inferred, are to be taken for granted.”

There are several issues here. First…what is a miracle? It can’t be a counterexample to a strict law–strict laws don’t admit of counterexamples. It can’t be an exception to a ceteris paribus law–exceptions to ceteris paribus laws aren’t miraculous. I think I see a way to make sense of “miracle”, but it requires my favored metaphysics. A miracle is an event of probability zero. The idea here is that the universe is infinite, admits of densities, and that the density of any metaphysically possible event is positive. Events that are of zero density are not metaphysically possible. If they do occur, however (with density zero), then I’m willing to let those occurrences be “miracles”. I’d bet against long odds that there are no miracles. But…who knows.

Of course there’s something funny in my terminology. If there are miracles, then they are actual but not metaphysically possible! Some better terminology is perhaps advisable, though if I am right and there are no miracles then the metaphysically possible and the actual coincide. Though…wouldn’t that be a relief? The notion of metaphysical possibility is rather vague in the hands of philosophers. (I don’t think anyone knows what in hell it means. Not to say this makes it any different from most extant philosophy!)

But getting back to (2′)…it’s bad enough that we had to worry about backtracking and standard contexts. Now we have some new ones, apparently? This is further evidence that the epistemic perspective is preferable.

The fourth section of the paper is fairly wild. Recall the counterfactual

“If Nixon had pressed the red button there would have been a nuclear war.”

This is often taken as a problem case for the Lewisian similarity analysis: worlds where there is a short in the wire preventing the signal getting through appear closer to ours than those where the mechanism functions properly and Armageddon follows. Lewis wants a similarity metric on which the counterfactual comes up true. So a “similar” world will be the same up to a time very close to the actual non-pressing of the red button, then a “small miracle” will occur and the red button will be pressed. Then we just follow that course according to physical law.

Lewis does explore the possibility of avoiding the small miracle with miniscule past differences. A naive solution would be to opt for some smallish differences in the past that eventually manifest in the pressing of the button. Lewis sees problems here. Dorr quotes him thus:

“…there is no guarantee whatever that [a world where the actual laws are true and were Nixon presses the button] can be chosen so that the differences diminish and eventually become negligible in the more and more remote past. Indeed, it is hard to imagine how two deterministic worlds anything like ours could possible remain just a little bit different for very long. There are altogether too many opportunities for little differences to give rise to bigger differences.”

Dorr disagrees:

“But…Our best deterministic physical theories have continuous dynamics, which means that so long as the past is not infinite, we can always find a nomologically possible world that stays arbitrarily close to the actual world throughout any finite initial segment of history, just by choosing an initial state that is close enough to that of the actual world. (paragraph) This is worth making precise. … (follows a tedious page I’ll skip) … Of course, the fact that there are nomically possible worlds that stay very similar to actuality until shortly before t but diverge after t does not by itself establish that there are nomically possible worlds to the kind that Lewis was worried about–for example, worlds that say very close to actuality until shortly before t and at which Nixon goes on to press the button at t.”

Dorr argues that there are such worlds on the basis of an “Independence Conjecture”, paraphrased as “the macropresent screens off the macrofuture from the macropast”. This is something that doesn’t appear to be true in systems with particularly trivial dynamics. Consider for example, a single object travelling through space, not interacting with any other objects. A macropresent view will tell us roughly where the object is, where it’s going, how fast. But there could be indeterminacy here (“macro”). Seeing where it was in the past will cut down on the indeterminacy of our future estimates. For systems with sufficiently complex (“mixing”) dynamics, however, the Independence Conjecture looks plausible. Here the idea is that if we take a set of “macroscopically described future” orbits, such as those comprising “Nixon presses the red button” and a set of “macroscopically described past” orbits, such as “close to the actual past” then these sets ought to be, at least approximately, probabilistically independent, so that the probability (everything conditional on the present macrostate), conditional on having a past state close to the actual past, of having a future state in which Nixon presses the red button, ought to be near the absolute probability that Nixon presses the red button, in particular non-zero.

Something along these lines ought to be true, but not everything…witness the fact that for orbits having past states “very very close” to the actual past state, the state at time t will be too close to actual for Nixon to press the red button (just what Dorr spent that page we skipped proving). On the other hand merely “close to the actual past” may not be close enough to stay close up to a time shortly prior to t. What Dorr seems to require is that at the in-between level of closeness (“very close”) one already has the sought-for independence kicking in. In other words…independence kicks in just as fast as the escape from closeness to a single state does. This strikes me as loose talk, but I’ll let it slide just the same. (I wonder though if he could do better to try to just manually get Nixon to press the button. So long as we are assuming continuous dynamics, we might as well assume smooth dynamics. Then we have derivatives, so that bulldozing the relevant particles and velocities around at will while leaving others relatively fixed by manipulating the distant past in the neighborhood of a point could come down to linear algebra.)

At any rate, suppose we indulge all of Dorr’s fancy here. Does it get him what he wants? He writes:

“We can regiment Lewis’s time-relative notion of similarity between possible worlds using a metric d on M.”

M is the the set of states of the world, by the way.

“The distance d(p, p’) represents the degree of dissimilarity between w at t and w’ at t’, when t instantiates p at w, t’ instantiates p’ at w’, and both w and w’ are nomically possible.”

I think Dorr means “w instantiates p at t”, etc. but I suppose a time could instantiate at state at a world, too, odd as it sounds to my ear.

The first thing that concerns me here is “We can regiment Lewis’s…notion of similarity…using a metric…”. Indeed, this sounds a lot like something Lewis explicitly rejects:

“We could…define exact distance measures…for…worlds. At worst, we might need a few numerical parameters. For instance, we might define on similarity measure for distribution of matter and another for distribution of fields, and w would then need to choose a weighting parameter to tell us how to combine these in arriving at the overall similarity of two worlds. All this would be easy work for those who like that sort of thing, and would yield an exact measure of something–something that we might be tempted to regard as the similarity distance’ between worlds. … We must resist temptation. The exact measure thus defined cannot be expected to correspond well to our own opinions about comparative similarity. Some of the similarities and differences most important to us involve idiosyncratic, subtle, Gestalt properties.”

Lewis goes on to talk about facial similarity and its irreducibility to similarity in a simple metric based on pixels. Notwithstanding the fact that today’s digital passport  readers seeming to do fairly well, the point is well taken. But maybe I misunderstand Dorr. It may be that he advocates using “Gestalt” properties, primarily, when evaluating closeness of worlds, but breaking ties using metric closeness…at least, metric closeness up to divergence. This could still save the intuition that if he had blinked twice the past would have still been (approximately) the same…assuming there is any such intuition…while avoiding certain problems. But I will set this aside for the moment and assume that Dorr does intend to use the metric as a measure of similarity.

A technical point…is the set of worlds where Nixon presses the button closed in the topology generated by said metric? I think probably it is, as the complement of this set surely has to be open by Dorr’s own reasoning. Probably then it follows that the set of distances between the actual world and a “Nixon presses the button” world has a minimum value. It doesn’t follow that there is a unique world w of this sort, but a moment’s reflection reveals that this is extremely likely. Suppose so.

Now if we follow the Lewisian semantics for counterfactuals, we can take any distant future truth P holding in w and the counterfactual “If Nixon had pressed the red button, then P” will come out true.” So for example “If Nixon had pressed the red button then in March of 3001 a winged mutant named April May would have become the first human descendant to survive one hundred unaided falls onto land from above one hundred meters” would come out true. And that’s highly counterintuitive. Among the worlds where Nixon presses the button, there are worlds all over the map having this property, true enough, but they are hardly concentrated in one place, and there are vastly more that do not. What should it matter to us that the one world closest to actual in the metric we have chosen is such a world? That fact appears to be an accident.

Far from rescuing Lewisian semantics from miracles, what Dorr’s argument points out is how deeply implausible Lewisian semantics are when based on such a metric. Let’s look again at the pretty pictures Lewis draws.


The only reason to utter a counterfactual “if phi had been, psi would have been” is to point out a correlation between (“nearby” if you like) phi worlds and psi worlds. If there’s no such correlation you shouldn’t say it…much less should it come out true. One might think that this situation is reflected in Lewis’s (D): among the most nearby phi worlds, some are psi worlds and some are not. But Lewis’s image pictures a discrete set of spheres, and if we buy into Dorr continuous variation assumption, such a picture is wrong. We get spheres of radius r for every real number r, so that, if phi is a closed set, we get (probably) a value of r for which the r-sphere meets phi in exactly one point, whereby we land in (B) or (C) irrespective of whether there is any correlation between phi worlds and psi worlds. But as Dorr teaches us here, there often won’t be any such correlation! Indeed, where phi and psi are macroscopically described events, the benign regions Lewis has drawn will need replaced by fractal regions that, quite often, will be approximately independent of fractal regions associated with different propositions. Looking at Lewis’s (B), it might seem reasonable to say “if the world had been phi then it would have been psi”. For if you asked (with apologies for the personification of worlds) the actual world to impersonate a phi world, it would plausibly (to some sort of intuition) gravitate mindlessly in the approximate direction of all of the phi worlds and, hitting a nearish one, find itself to be also a psi world something like every time. Suppose the regions are highly fractal, though; would the actual world gravitate to a tiny phi region .03032… units away or a much larger one .03033… units away? Even if we agree that it would gravitate to the nearest one…wouldn’t approximate independence imply that the psiness or non-psiness of the world it lighted on would be essentially random? An accident? There are worlds near that one that are psi worlds, and worlds near that one that are not psi worlds. And if you seek to save Dorr here by saying “well, most of the worlds are psi worlds” then you are just agreeing that it comes down to conditional probability, not the accidental properties of that one special phi world that is closest to actual. Nothing Lewisian about that.

On the picture Dorr paints for us we’d need to replace the clean Lewisian pictures by images more like:


(Sorry for the lame graphics. Anything beyond Microsoft Paint is beyond me as well.)

Somewhere at the center of those concentric spheres I’ve tried so feebly to draw is the actual world–a random dart toss at this rainbow colored fractal. If you want to think in terms of the earlier example, think maybe cool colors for Nixon doesn’t press the red button and reddish/purplish colors for Nixon presses the red button, with nuclear war occurring at all but magenta. Say the actual world lies in a greenish area (Nixon doesn’t press). The dart landed on green, but we can ask about the truth of “If the dart had landed on a reddish color, it wouldn’t have been magenta.” If there’s a miniscule patch of magenta somewhat nearby and no closer cool colored patch, Lewis might say that that counterfactual is false, regardless of how much magenta there is in the image (even somewhat nearby magenta…only the closest reddish patch counts). Indeed, one might just as correctly say “if the dart had landed on a reddish patch it would have landed here”, pointing at the nearest reddish patch, irrespective of the fact that the pointed-to patch is orders of magnitude smaller than other nearish patchs of red.

This is no longer compelling.

In favor of counterfactual miracles?

All right, so the treatment I have been recommending goes something like this. When I utter “If F had been, G would have been” and F is an outcome of a chance event in the past that did not occur, then what I am suggesting is that G has a highish probability conditional on F, what I know about the actual state of the world just prior to the chance event (the one that did not eventuate in F actually, but might have), and perhaps also the results of chance events after t that do not lie causally downstream of F. (So I can say “if Manfred had played, we would be champions” even though this counterfactual championship requires a subsequent very unlikely upset in a distant, causally isolated venue, provided it actually occurred.) I utter such a thing, to the extent that I am doing “communication”  with those words, as a way of imparting information to listeners…information about what I know about the actual state of the world just prior to the chance event, perhaps, or information gleaned from what I take to be a perceptive take on what (usually) follows from what.

Dorr has some cases that he presents in the next section that don’t fall so nicely in this category, however.

“Suppose that, on the phone to Mary at t, Fred speaks the truth by saying “If I were there right now, I would give you a hug.” On the operative interpretation of the counterfactual, how do we think Fred would have got to be with Mary at t? Would he have been whisked there quickly by a recent, antithermodynamic puff of wind, or would he have got there by a less showy method, requiring a somewhat earlier divergence from the approximate course of actual history? The latter option seems better. If we choose the puff of wind, we will need to combine it, rather artificially, with further unusual goings-on in Fred’s brain to ensure that he arrives still in a mood to give Mary a hug…”

Hilarious. I particularly enjoy the phrase “rather artificially”, given how jaw-droppingly artificial the whole “puffs of wind” notion is in the first place. (If you are already having Fred blown across a continent by an easterly gust of wind, does it qualify as a stretch to have him sleep through it?)

Here’s a problem with metric similarity: among all ways of getting Fred to Mary, the  “antithermodynamic puffs of wind” may do it with the most delayed deviation (and hence the greatest similarity of initial conditions) from the actual. Probably you could get Fred across a continent in just a few minutes using them, and on Dorr’s continuous dynamics view there ought to be states near the actual state say a half hour ago that do this.

Dorr now wants to distance himself from metric similarity, Lewis, or both, and I don’t blame him. Fred’s Ripley’s moment may be close in the metric, but that doesn’t make it closest to actuality. It doesn’t “get the Gestalt”.

I asked my wife (she learned Lewis’s semantics for counterfactuals from George Schumm, who was apparently rather animated about it) what she thought about this. At the risk of misinterpreting her (which is likely) she thinks it’s important not to fix too much…you only fix what’s relevant. In particular you have to have a rather plastic notion of similarity, presumably different for each counterfactual utterance. For the case of Fred and Mary, the important thing is that Fred’s and Mary’s general moods be fixed in the inner spheres…probably also their identities and the semi-normalcy of their current experiences, blah blah blah…and not much else. So “in nomically accessible worlds where our needs and desires are as they actually are and we are together, I give you a hug” or something. (“So romantic”, Mary no doubt replied.)

Let’s see if my own view is in any trouble here. To avoid the possibility of running several issues together, I am going to change to third person and put the situation in the past. So let’s say I utter “If Ted had been there, he would have hugged Mary.” I think it would be within your rights to say something like “Ted was 3000 miles away at the time…so what are you saying? Are you saying that if Ted had gotten on a plane that morning and they were together then, he would have hugged her?” Dorr rightly notes in a footnote that in most worlds of this sort where they are together at t, the circumstances that led me to utter “If Ted had been there, he would have hugged Mary” aren’t operating at all. (He may have said it because she missed him so much, which she wouldn’t, were he there.)  I think I might reply: “well, I suppose if he had gotten on a plane, unbeknownst to her, and were just then knocking on her hotel room door during the same sort of phone conversation, then, sure, when she answered, he would have hugged her.” Then you might ask “so are you saying that if he had been there then he would have come in secret and been in the hallway talking to her on a cell phone?” And I would have to confess that, no, that isn’t what I meant.

What to do? Do I follow my wife and say that I can just change the circumstances for different counterfactual utterances? That seems ugly. I would much rather have a uniform treatment. But I despair of one. Consider this: it’s New Year’s Eve. Ted’s flight was cancelled, so he can’t be with Mary. On the way home from the airport, he got in a fender bender and slammed his mouth on the steering wheel. It’s shortly before midnight and Mary laments that he can’t kiss her at he stroke of midnight. Well, notes Ted, I couldn’t kiss you anyway…my lips have been smashed. But I would give you a hug…. What do we make of that? Surely the closest worlds where Ted is there with Mary are worlds in which Ted’s flight wasn’t cancelled. But if his flight wasn’t cancelled and he’s there, he doesn’t hug Mary…he kisses her! Notwithstanding that, what Ted says seems incontrovertibly “true”. (For those who favor a truth value semantics for counterfactuals, anyway.)

I think we have some notion of what Dorr would say, as in a different context he writes: “…we are free to hold fixed both approximate history up to, say, one day before t, and also the facts about whatever Mary said just before t that inspired Fred’s impulse to give her a hug.” In the current case, then, Dorr would perhaps say that we are free to hold fixed both approximate history up to the point where Fred’s flight was cancelled and the fact that he later smashed his lip…of course now he has to do it in the cab to her hotel room (for example) rather than on his drive back home.

This however I cannot abide. We can make the situation even worse: perhaps Mary has said she would not kiss Ted now because, after his flight was cancelled, he got a haircut, breaking his promise to Mary not to make unnecessary expenditures in the state of New York (where, in Mary’s mind, corrupt politicians skim income tax dollars). How are you going to fix that fact if the fight isn’t cancelled? We could probably make it worse still…probably we could make holding fixed both the approximate past and some future event require a genuine miracle. Even if not a miracle, though, surely it requires implausible coincidence. It can’t be part of “if I were there, I would give you a hug” that Ted has worked out in his mind what would have happened if his flight weren’t cancelled and he’d wound up with Mary with a differently acquired lip smash because such a thing would never occur to anyone unless they were writing a philosophy paper. On the contrary, Ted is probably thinking “man, if I were with Mary, my lip wouldn’t have been smashed”. Granted, he’s also thinking “man, if I were with Mary, I would give her a hug”though if he puts this all together he won’t think “if I were with Mary, my lip wouldn’t have been smashed and I would give her a hug”…rather he would think “if I were with Mary, my lip wouldn’t have been smashed and I would kiss her.”

I think the upshot of this is that counterfactuals where the antecedent isn’t a chance event at t, occurrence of which implies (macroscopic or epistemic) divergence from the actual at t, but rather a consequence of an earlier (vaguely formulated or perhaps unformulated) divergence, are a different beast. Strictly speaking, they should probably be discouraged in favor of utterances such as “this may sound quite strange, Mary…given that you are not, unfortunately, actually here with me, but I am, at this very instant, experiencing a most discernible impulse…to give you a hug.” Hugh Grant, I think, would do it that way, and we should whatever extent we want to come off as charming, anyway.

Quick Summary

Let’s look at it this way. On one view, the evolution of the world requires what one might think of as some real time random number generation. (I.e. there are chance events.) You can think of these random numbers as coming from Godly coin tosses, or whatnot. On the deterministic view, this isn’t the case…every outcome is determined by initial data. It changes nothing, however, if the universe evolves in identical fashion, with the random numbers not coming from Godly coin tosses but from a random number table. (The table is part of the initial data.) What’s the difference, when it comes to how we should analyze counterfactuals? Nothing whatsoever, clearly. Where F is a non-actual “outcome” at t, “if F had been then G would have been” means

a. “if such-and-such Godly coin toss had landed differently…” or

b. “if such and such entry in the random number table had been different…” or,

c. “…if the initial data had been different…”.

Nobody has trouble seeing a. and b. as essentially equivalent from the standpoint of counterfactual analysis. Here’s why: it’s easy enough to know which counterfactual world we’re in, in these cases. We are in the world where everything is the same except for the result of that one Godly coin toss, or that one entry in the random number table. In case c. we don’t know for sure what counterfactual world we are in, and the reason for this is that, as Lewis points out, there are too many opportunities for small differences to give rise to large ones. If the initial data had been different, everything about the past would have been different. If determinism is true, there aren’t nomically accessible worlds where everything (including what it is I think I know that led me to assert the counterfactual) about the past is fixed but F happens. So in saying “if F had been”, says Lewis, we are saying “if there had been a miracle, and F had been”.

Dorr, in this paper, wants to get out of this by making c. look like b. That is, he wants the data to be so fine-grained that the “Nth-and-beyond” digits of it, as N gets ridiculously large, act much like numbers read off of a random number table, at least insofar as they don’t have visible effects prior to some t and have huge effect after that. (Just like pseudo-chance events whose outcomes are determined by lookup.) I doubt that’s the way it works…I don’t think the world deals with that much data (infinite data, it seems, from Dorr’s explanations) at every update. Still, it does seem to reduce situation c. to situation b.

However, we found a problem. The whole reason for wanting to evaluate the truth of a counterfactual “if F had been, G would have been” at a “nearby” F world is that (recall) my whole reason for asserting the conditional in the first place was to pass information I have about the actual state of the world at t. (And insight I may think I have about what follows from what.) If I evaluate at a far away world, some of what I know will no doubt be no longer the case. Indeed, some of what I know (not F, for example) will surely be no longer the case, but the idea is to make there be as little of that as possible. But…and here is the problem…sensitivity to initial conditions may make it the case that, even for nearby worlds, F-ness and G-ness may be essentially independent. At the very least, there are G worlds and not-G worlds near to the “nearest” F world, and in cases where there’s reason to think the closest ones are one or the other, it might not be the one you’d want. Consider:

“If the Blazers hadn’t drafted Sam Bowie, they would have won multiple NBA titles in the nineties.”

The idea here is that Michael Jordan was second on the Blazer’s draft board, that they thought long and hard about picking him, and if they had, things would probably have gone very well for them in the nineties. But on the metric  similarity view, the nearest world in which the Blazer’s don’t draft Sam Bowie probably isn’t a world where they draft Michael Jordan…it’s more likely a world where they draft Sam Perkins. Because, well…such a world can closely match the actual world all the way through the recording of the pick’s first name on the card that is about to be handed to the commissioner! Whether then the Blazers go on to win multiple NBA titles in such a world is anybody’s guess. Suffice it to say it’s less likely with Perkins than with Jordan, and it’s its relative likelihood with Jordan that justifies the avowal. As is the case almost everywhere in philosophy, but most especially on the view we’re discussing, where the local neighborhood is teeming with possible futures encompassing just about everything under the sun, everything comes down to probability.

So why don’t I want to fix more than the speaker’s epistemic situation? In normal assertions we do. If John says “Matt wasn’t holding 9♣ 9♦”,  his assertion is true or false according to whether or not Matt was holding 9♣ 9♦, and those truth conditions are surely the better part of the meaning of John’s assertion. John doesn’t mean by “Matt wasn’t holding 9♣ 9♦” that in most cases epistemically similar to his, the person referred to as “Matt” isn’t holding 9♣ 9♦; he means that in this case, Matt isn’t holding 9♣ 9♦!

That’s just the problem, though. When John utters a counterfactual, there is no obvious candidate for “this” case, i.e. the actual case. We aren’t interested in the actual case. When John says “if Matt had called, he would have lost”, he isn’t claiming that he lost in “that” case, ostending by “that” a particular counterfactual world. On the other hand John will probably admit that he was “wrong”…he may even say that what he avowed was “false”…if we turn over Matt’s cards, revealing 9♣ 9♦. Why is that? Perhaps it is part of the meaning of “if Matt had called he’d have lost” that Matt doesn’t have 9♣ 9♦.

Hmm. I was hoping not to get into trouble, but I fear I am. What I’d like to do is keep this in the realm of philosophy of probability, but I feel it creeping into philosophy of language, where I will be unceremoniously flayed. Obviously all I can ever hope to impart via any utterance is some aspect of my epistemic position, yet we do hold most of them accountable to how the world is apart from what I know. There doesn’t seem to be any principled reason why counterfactuals would be unique in this regard. On the other hand, I don’t want my utterance of a counterfactual to be held accountable to how another world is…whether it be a near one or a far. Surely my avowal of “if F had been, then G would have been” when F is a chance event at t that isn’t actual will go by probability of G conditional on F and whatever else I know about the state of the world at t, but I may disavow if I subsequently learn more about the state of the world at t. Part of the meaning of the utterance then, may be that I would not disavow were I to know more. Namely, I am expressing a confidence that I know enough of the relevant stuff that I need not disavow should I know more. That’s the way it works for ordinary assertions, after all. When I say “P is the case” I don’t just intend that P is likely given my epistemic situation. If so I should not then admit I was wrong when it turns out that P is not the case. Rather I intend to suggest that P really is the case…so that in particular, I should continue to avow P should I know more. Or…well, not exactly. There is always the chance that I will be misled, even though I am right. It’s perhaps not part of what I am claiming that this won’t happen. Though if I say “I know that P is the case”, maybe then it is.

Clearly I’m just rambling by now so I will just quit.





doomsday status update: refuted, presumed dead

Since the last post was about the Doomsday Argument, I thought it would be a good idea to remind everyone that the Doomsday Argument is quite dead. The correct resolution…SIA (for self-indicating assumption)…has been around for a while, but a thorny objection to SIA, explicated in a certain type of thought experiment having several extant versions, including Kierland and Monton’s replicating worlds and Nick Bostrom’s “Presumptuous Philosopher gedanken”, has prevented this solution from gaining universal acceptance. Bostrom’s thought experiment (which I will concentrate on, as it seems to be one of the more clearly elaborated) is formidable and had not been correctly explained away as of about a year ago, when I uploaded the final verson of my SB survey to the PhilSci archive:

The relevant refutation is hinted at in footnote 4. It hadn’t to my knowledge appeared previously in the literature, though it does bear resemblance to a 2005 suggestion of Cian Dorr (again appearing in a footnote to an unpublished Sleeping Beauty paper, A Challenge for Halfers). I’ve been writing about it off and on on this blog but will give a slightly more developed treatment here. Philosophers have ignored this solution, but it isn’t, as you shall see, especially susceptible to counter-attack. About all one could hope to do would be to undermine the understanding of modality on which it is arguably based. You don’t need to buy into that understanding to buy into the refutation, but the understanding is itself so robust and so immune to paradox that I’m going to couch the refutation inside of it anyway. Indeed, I have for the past year been intending to write at some length about it. I’m apparently lazy though and it’s not clear when this will happen, so before too much time passes I will at least say a bit.

Here is how Nick Bostrom introduces the Doomsday Argument (he has a setup to this but I am skipping to the meat as it’s pretty clear what’s going on). See

for more.

“Now we modify the thought experiment a bit. We still have the hundred cubicles but this time they are not painted blue or red. Instead they are numbered from 1 to 100. The numbers are painted on the outside. Then a fair coin is tossed (by God perhaps). If the coin falls heads, one person is created in each cubicle. If the coin falls tails, then persons are only created in cubicles 1 through 10.

You find yourself in one of the cubicles and are asked to guess whether there are ten or one hundred people? Since the number was determined by the flip of a fair coin, and since you haven’t seen how the coin fell and you don’t have any other relevant information, it seems you should believe with 50% probability that it fell heads (and thus that there are a hundred people).

Moreover, you can use the self-sampling assumption to assess the conditional probability of a number between 1 and 10 being painted on your cubicle given how the coin fell. For example, conditional on heads, the probability that the number on your cubicle is between 1 and 10 is 1/10, since one out of ten people will then find themselves there. Conditional on tails, the probability that you are in number 1 through 10 is one; for you then know that everybody is in one of those cubicles.

Suppose that you open the door and discover that you are in cubicle number 7. Again you are asked, how did the coin fall? But now the probability is greater than 50% that it fell tails. For what you are observing is given a higher probability on that hypothesis than on the hypothesis that it fell heads. The precise new probability of tails can be calculated using Bayes’ theorem. It is approximately 91%. So after finding that you are in cubicle number 7, you should think that with 91% probability there are only ten people.”

It’s the second paragraph that’s troublesome. More naturally you would assign probability 100/110 to 100 cubicles. Not everyone is as likely to be in a cubicle at all if there are only ten of them! There has to be potential for at least 100 people, so it makes this easier to think about if God creates 100 people (including you) either way and just doesn’t awaken ninety of them in case of tails. Now you might find yourself asleep (in which case, technically, you don’t really “find” yourself at all), but if you find yourself in a cubicle instead you simply condition on that fact.

If you don’t like the sleep angle here’s another perspective. Presumably, God can have only finitely many templates for persons. For convenience, let’s say he has 1000 and let’s say he tattoos what number template you are on your foot. So you find yourself in a cubicle. You look at your foot and see “457” tattooed there. What is your information? Now your information is there’s a 457 in one of the cubicles, and that confirms 100…for conditional on there’s a 457 in one of the cubicles, 100 cublicles is about ten time likelier than 10 cubicles. Or let’s say God doesn’t tattoo the number…that obviously makes no difference. Now the evidence is just there’s someone just like me in at least one of the cubicles and, conditional on that, 100 cublicles is again about ten times likelier than 10 cubicles.

Or, just imagine that the experiment is repeated over and over again. Most of the time you find yourself in a cubicle, it will be with 99 others and not with just 9 others. For if we got everybody together after doing this a huge number of times, and everybody at the party was in a cubicle just as often during a tails run as during a heads run, we’d have a bit of a problem! (Problem being that it’s not possible.)

Bostrom writes:
“After hearing about (the Doomsday Argument), many people think they know what is wrong with it. But these objections tend to be mutually incompatible, and often they hinge on some simple misunderstanding. Be sure to read the literature before feeling too confident that you have a refutation.”

Okay…so there is one sort of thing in the literature that comes up. First, Bostrom calls the sort of reasoning I was doing just now invocation of a “self indicating assumption”:

(SIA) Given the fact that you exist, you should (other things equal) favor hypotheses according to which many observers exist over hypotheses on which few observers exist.

Bostrom does not find this assumption compelling:

“SIA may seem quite dubious as a methodological prescription or a purported principle of rationality. Why should reflecting on the fact that you exist rationally compel you to redistribute your credence in favor of hypotheses that say that there are many observers at the expense of those that claim that there are few?… our view is that SIA is no less implausible ultimo facie. Probably the most positive thing that can be said on its behalf is that it is one way of getting rid of the counterintuitive effects of the Doomsday argument…”

Since I view SIA as a commonplace, about all I can suggest is that one ignore premature reports of its dubiousness and work through the grounds for it yourself…or, barring that, read the paragraphs I wrote carefully. SIA is not invented there as an expedient, rather it’s just the case that as one reasons through some natural assumptions and their obvious consequences, SIA comes out in the wash.

At any rate, the fact that Bostrom doesn’t understand what’s so natural about SIA appears to have made the mind of this “Top one hundred Global Thinker” an environment in which the following tricky thought experiment purportedly telling against it could arise:

“It is the year 2100 and physicists have narrowed down the search for a theory of everything to only two remaining plausible candidate theories, T1 and T2 (using considerations from super-duper symmetry). According to T1 the world is very, very big but finite and there are a total of a trillion trillion observers in the cosmos. According to T2, the world is very, very, very big but finite and there are a trillion trillion trillion observers. The super-duper symmetry considerations are indifferent as between these two theories. Physicists are preparing a simple experiment that will falsify one of the theories. Enter the presumptuous philosopher: “Hey guys, it is completely unnecessary for you to do the experiment, because I can already show to you that T2 is about a trillion times more likely to be true than T1!” (whereupon the philosopher runs the argument that appeals to SIA).”

I’m cutting and pasting this stuff, by the way, from ,

a reply to Olum, who attempted to refute Doomsday with SIA apparently. Good idea, but apparently Olum’s grasp of SIA wasn’t so sharp either, for he “bit the bullet” and agreed that SIA commits one to the presumptuous philosopher’s counter-intuitive leap in the above thought experiment.

That the thought experiment is importantly different from the cubicles case that led one to SIA in the first place, however, is clear. In the cubicles case, whether there was a hundred-verse or a ten-verse was contingent. If we repeat the experiment many times, the coin God tosses will land heads roughly half of the time and it will land tails roughly half of the time, so that most of the people who wind up in cubicles will be in hundred-verses rather than ten-verses. In Bostrom’s gedanken, meanwhile, presumption intuitions require that T1 is either necessarily true or necessarily false.

Why do presumption intuitions require necessity? Well, if the matter of T1 vs. T2 were taken to be contingent, the thought experiment would collapse into something analogous to the cubicles thought experiment…half of any world instances in an infinite-repetition multiverse would be T1 worlds or trillion^2-verses, and half would be T2 worlds or trillion^3-verses. Therefore if we were good self-indicators and took ourselves to’ve been selected uniformly at random from the sequence of all observers in the multiverse of which our world was a part, we would consider it a trillion times more likely that T2 were the case and that we inhabited a trillion^3-verse. Say what you might about we so-called “presumptuous” philosophers–our recommendations would, in such case, be vindicated in proportion to our numbers.

On a more natural hearing, however, the choice between T1 and T2 is a matter of necessity, and our credence of 1/2 in each merely epistemic. Therefore, assuming that our world is part of an infinite repetition multiverse, we take it that there is a 1/2 chance that every world in the multiverse is a T1 world (in which case every observer from the multiverse inhabits a trillion^2-verse), and a 1/2 chance that every world in the multiverse is a T2 world (in which case every observer from the multiverse inhabits a trillion^3-verse). So, if we are good self-indicators and take ourselves to have been sampled uniformly at random from the stream of observers in the multi-verse of which our world is a part, we will take it that there is a 1/2 chance that the multiverse is a T1 multiverse, in which case T1 is true for us, and a 1/2 chance that the multiverse is a T2 multiverse, in which case T2 is true for us–thus defusing Bostrom’s thought experiment.

One might complain that I have cheated by turning the world, stipulated by Bostrom to be finite, into a infinite multiverse. On this reading, “world” means something like “everything that ever was or ever will be”. I can’t speak here for Bostrom of course, but the idea that “everything that ever was or ever will be” could be finite could derive only from a bankrupt (i.e. inadequate to the demands of philosophy) sense of modality. As David Lewis understood so well, philosophy requires some form of modal realism. On the other hand, “there is only one world…the real (i.e. actual) world”–as Bertrand Russell understood so well. These pincers constrain modal thinking rather a lot.

Fortunately, it’s perfectly possible to satisfy both Lewis and Russell. What most philosophers call “the actual world”, the thing that Bostrom invites us to view as finite in this thought experiment, is a small portion of “everything that ever was or ever will be”. (The alternative seems to be that something arose from nothing and will one day go away again, this time, oddly, for good.) It refers, more or less, to a local environment of some sort…perhaps circumscribed by an information-destroying event horizon (Big Bang, Big Crunch…no particular cosmology is implicit here). What philosophers call “counterfactual worlds” meanwhile are very real entities (they existed or will exist or exist now, somewhere). What we use them for mostly, I would claim, is to explicate “objective chance”. When I say that the objective chance that P is one half, what I mean, roughly, is that half of the nearby (in some similarity respect) counterfactual worlds are P worlds and half of them aren’t. (Lewis’s notion that at small similarity distance either all of the nearby worlds might be P worlds or all might be not P worlds…the basis for his treatment of counterfactuals…hasn’t aged particularly well in a scientific environment more conscious of sensitivity to initial conditions. More about this at some point…not now.)

Now, some philosophers may be troubled by the idea of  sampling uniformly at random from an infinite sequence…this may smack of trying to take “half of an infinite set”. They should not be worried. These worlds are not, like Lewis’s worlds, completely separated from ours in time and space. Probably you could get from one to the other, albeit in pieces (Big Bangs leave you worse for wear, generally, and squeezing through those tiny string dimensions may leave you a lot thinner), but that doesn’t mean they wouldn’t nevertheless be juxtaposed in time and space, juxtaposition that would allow one to speak of orders, densities (i.e. frequencies) and so, therefore, fractional parts.

Another possible complaint is that I have employed just-so metaphysics to “conveniently” allow myself to apply SIA when and only when it doesn’t offend my intuitions to do so. The charge is, I think, not serious…I’ll concede that I have let my intuitions guide me here, and that, in the end, whatever metaphysics I adopt should be independently plausible. In this case they are; it is natural, in retrospect, to take oneself to have been sampled uniformly at random from a sequence of “real” observers…i.e. observers who are part of “everything that ever was or ever will be”, and not merely “epistemically possible”. After all, we want our counterparts to find vindication for their credences in proportion to their numbers. (The sentence I just typed may be the most important sentence in the whole of philosophy of probability.)

Indeed, this is what got Doomsday going in the first place, as most philosophers have identified “everything that ever was or ever will be” with “the actual” and everything else as “counterfactual”. The more robust modal realism I have proposed identifies “the actual” with something like “everything I might observe with a good telescope, and good microscope, a good space ship and maybe a good time machine (as traditionally conceived in antiquated science fiction, at any rate)”, identifies “the metaphysically possible” with “everything that ever was or will be (actually, was and will be…infinitely many times over)”, and exiles that which is amenable to coherent description but which is never (presumably because it cannot be, with the available stuff) instantiated at all…i.e. that which is merely logically possible.

What makes this preferable is that, now, there is a modal distinction between real but not actual (i.e. counterfactual) observers and unreal observers. What our intuitions rail against in Bostrom’s thought experiment is that T2 might be an utter fiction. We can live with assigning low credence to an event that occurs when we know that our luck was merely bad…that if things had turned out differently we would have fared better and, indeed, for most agents in the same epistemic situation, things do turn out differently.

As a curiosity, I’ll just mention that the modal views I have proposed dissolve Pascal’s wager, McGee’s Airtight Dutch Book, etc.. For we don’t think it’s the case that God and heaven are real for some tiny proportion of counterfactual agents. (If we did think that, it would be hard to deny that we should adopt an orthodox lifestyle.) Rather, we think that there is a tiny chance that God and heaven are real for everyone. Moreover, any Godly sacrifices we might have been comfortable with only on account of their finitude get multiplied to infinity after all, given that everything that ever was or will be was and will be infinitely many times over.

What are the prospects for philosophers paying any attention to the above? Recent history suggests “not so good”. True, Bostrom has in the past taken time to engage with upstart Doomsday naysayers even when they’ve had little or nothing compelling to say, but only after they managed to get their papers into good journals. George F. Sowers, for example, who wrote in Mind:

“Consider a situation where you are confronted with two large urns. You are informed that one urn holds 10 balls…and the other holds 1,000,000 balls…. You are equipped with a stopwatch and a marker. You first choose one of the urns as your subject. It doesn’t matter which urn is chosen. You start the stopwatch. Each minute you reach into the urn and withdraw a ball. The first ball withdrawn you mark with the number one and set aside. The second ball you mark with the number two. In general, the n th ball withdrawn you mark with the number n. After an arbitrary amount of time has elapsed, you stop the watch and the experiment…. Will there be a probability shift? … If the number drawn exceeds 10, then we can conclude that (the urn has 1,000,000 balls)…. So long as the number drawn is less than 10, however, there is no probability shift….”

While that’s all obviously true, Sowers neglected to say what would happen if the last number drawn was equal to 10. Odd, because he just got done saying “If this thing happens, there’s a shift in a certain direction. If another thing happens, there’s no shift.” You can’t have a unidirectional shift, so obviously something is missing. This isn’t rocket science…it isn’t even earth science, actually. It’s just trichotomy. Numbers can be greater than 10, they can be less than 10…or they can be equal to 10. If the number drawn is equal to 10, you get a shift the other way, because if it’s 10, it’s probably 10 because you ran out of numbers at 10. Sowers wants to set up an analogy between his urn scheme and the Doomsday thought experiment, but there is no Doom scenario analogous to observing a 10 in his urn scheme. The analogy might be more apt if one got put to sleep after 10 in the small urn case, but since in that case no observation would be recorded much of the time, there would still be work to do.

According to Bostrom (see reply at, Sowers make some even crazier moves later. So far as I can tell Bostrom’s reply is very apt….I’m convinced that Sowers has no refutation of Doomsday to speak of. Be warned, however, that Bostrom’s paper degenerates in its final section to his favored selection effects (later double halfer) stuff…stuff that leads to gratuitous violations of reflection (as shown by Cian Dorr and some others), endorses the Monty Hall fallacy, etc. (So much the worse for it.)

At any rate, to recapitulate (and perhaps elaborate) on the above: my position is that self indication is a commonplace, that it should be employed by rational agents, a truism. This post is not an attempt to make a knockdown argument in favor of self-indication, but to answer Bostrom’s Presumptuous Philosopher objection to self-indication. It is true that I can’t relate to Bostrom’s attitudes about self-indication…I find self-indication neither surprising nor counter-intuitive, and I certainly don’t believe it’s motivated solely by a desire to cancel the Doomsday Argument. But, although I have discussed why I think it “comes out in the wash” independently, I don’t know how to convince people like Bostrom that self-indication is proper for rational agents. I don’t  have an original argument for it–such sketches of arguments I alluded to have been developed more fully by others–therefore I don’t have an original argument against Doomsday. What I do have is an original refutation of Bostrom’s preferred take on the Presumptuous Philosopher example. That’s what’s here.

I should perhaps also mention (even though I claim not to write about Sleeping Beauty anymore) that my primary disagreement with the philosophical community about SB was its failure to notice that Lewis is a self-indicator, and so immune to any argument for thirding that establishes only SIA. This renders most extant arguments for thirding useless against Lewis.  Indeed, if you go back up and look at the three informal “arguments” I gave in favor of self-indication after quoting Bostrom’s presentation of DA you will see that they are, in essence, Horgan’s thirding argument, Dorr’s argument (the one with the skylight) against Roger White’s and other double halfer schemes, and the frequency argument for thirding, respectively. Lewis self-indicates but employs a (fairly standard) sample weight bias correction technique.  Halfer schemes that respect SIA (Lewis’s and that of Patrick Hawley) are the proper target for thirders, but thirders have ignored them completely, which is tantamount to begging the question.

But, I should probably not get started about SB.

this isn’t the worst paper in the history of philosophy

Notwithstanding that, Robert Northcott’s paper A Dilemma for the Doomsday Argument is (and by a comfortable margin) the worst paper I have reviewed (i.e. trashed) on this blog. In the paper, the following Doom scenario is presented:

“Imagine that a large asteroid…is…heading…towards Earth…astronomers calculate that it has a 0.5 probability of colliding with us, the uncertainty being due to measurement imprecision regarding its exact path…any collision would be certain to destroy all human life..what is now the rational expectation of humanity’s future duration…a 0.5 probability that humanity will last just the few days…(‘Doom-Now’), and a 0.5 probability that it will last however long it would have otherwise. What does DA (Doomsday Argument) say? Either it is deemed to modify these empirical probabilities; or it is not.”

Doomsday practitioners of course would revise the probability of Doom-Now upward significantly. For convenience, let’s say that humanity would last another million years otherwise, with a total number of people numbering 1 quadrillion. if we take ourselves to have been sampled uniformly at random from the total population past present and future, our birth rank of 60 billion or so looks very, very unlikely conditioned on Doom Later. It looks more plausible conditioned on Doom Now. So DA would revise P(DoomNow) upwards, close to 1. That’s how the Doomsday argument works. But Northcott writes:

“…according to DA, a priori considerations show that the expected duration for humanity is much greater than just a few days. The probability of Doom-Now should accordingly be modified downwards.”

First of all, the expected duration just is half a million years, which is already much greater than a few days. So the first sentence makes no sense. Second, the DA advocates in favor of Doom-Now…not against it. (That’s why it’s called the Doomsday argument.)  A footnote here sheds no light whatsoever:

“True, DA reasoning implies that the single most likely total number of humans is the current number, i.e. 60 billion. But although the mode of the distribution is thus 60 billion, the mean is much greater. Thus, Doom-Now is not favoured.”

Obviously Doom-Now is favored by DA. I mean, of course the expected number of humans is around half of a quadrillion. That’s not DA reasoning, that’s just what it is. Not relevant though because DA assumes that the world is sampled by objective chance and then you are sampled uniformly from the observers in that world. The fact that some unlikely worlds have enormous population doesn’t dilute the worlds that have smallish population. That’s the whole point of DA. Nor is the gaffe a typo…Northcott goes on thinking that DA revises the probability of Doom-Now downward for the rest of the paper. Part of the paper involves miracles. I won’t go into that but the publication of this paper in Ratio is proof that they do happen.

The next passage is not to be believed:

“…an unbiased combined estimate (of the mean) can be achieved via inverse-variance weighting. Roughly, the higher an estimate’s variance, the more uncertain that estimate is, and so the less weight we should put on it. In the DA case, how we balance competing DA and empirical estimates of a probability turns – and must turn – on exactly this issue….Some toy numbers will illustrate. By assumption, the empirical estimate of the asteroid collision’s probability, and thus of Doom-Now’s, is very certain. Suppose that the density function of that estimate is a normal distribution with a mean of 0.5 and, representing the scientists’ high degree of certainty, a small standard deviation of 0.001. Next, suppose initially that for DA the equivalent figures are a mean of 0.001 and the same small standard deviation of 0.001. In this case, because the two variances are the same, so an unbiased estimate of the mean would be midway between the two component estimates of it, i.e. midway between 0.5 and 0.001, i.e. approximately 0.25.”

This is so wrongheaded in so many different ways I don’t really know where to start. So I will start with what is worst. Yes, there is a method in statistical meta-analysis of taking a weighted average of two estimators to get an unbiased third estimator that is of minimum variance, among all weighted averages, and yes it goes by taking the inverse variances as weights. But, first, the estimators you start with have to themselves be unbiased. The two “estimators” considered here can’t both be unbiased estimators of the same parameter, because they have different means, and what it means for an estimator to be unbiased is for it to have mean equal to the true value of the parameter being estimated. Perhaps more troubling, however, is that it’s not at all clear what they could be estimating…the only thing around to estimate is the objective chance that the asteroid hits the earth, which is either zero or one–unless it’s something like “the credence an ideal rational agent would have in this epistemic position”. Surely though that would have to be at least mentioned. The next thing that is wrongheaded is just what was mentioned before—the latter mean shouldn’t be .001 but rather something like .999…DA wants to raise the probability of Doom-Now. Finally…assuming that p is in fact the credence of an ideal rational agent in the current epistemic situation and assuming that both the scientists and DA are trying to estimate p, does it not strike the author as odd that the scientists are so damn certain that p is very close to 1/2 and DA is so damn certain that p is very close to .001? The compromise suggested is, perhaps it bears mentioning, 250 standard deviations away from the mean for both parties. Normally this would be a moment where the meta-statistician might say “hmm…maybe these estimators aren’t estimating the same thing….”.

Not that it matters much at this point, but the most amusing passage is this:

“we can calculate a second scenario, with new toy numbers…this time, suppose that for DA the equivalent figures are still a mean of 0.001 but now, say, a standard deviation of 0.1….”

Sorry, but it’s not possible for a credence estimator (which must take on values in [0,1]) to have a mean of .001 and a standard deviation of .1.

Don’t get me wrong. My beef is not with the author, who I assume is not perpetrating a hoax but just sincerely trying to say something important. The question, for me, is not how did this mess come to be written, but how did this mess come to be published in a respectable journal? Ratio isn’t some obscure, fly-by-night outfit. Ernest Sosa, for example, is on the editorial board. In fact, it seems there is a list of 49 “Most popular journals” used by PhilPapers to identify when someone is a “professional author” in philosophy, and Ratio is on it!

So perhaps congratulations are in order…by slipping this awful manuscript past the editors of this journal, this author is (if he wasn’t before) now and forevermore a “professional author” of philosophy, meaning that PhilPaper editors shall be obliged to archive every stupid thing he ever writes, even if no one on earth or in heaven will touch it.

Okay…but back to my question. How did this ridiculous manuscript come to be published? The question is intended for the editors of Ratio…I’m inviting reply. What was the process? Of course any public reply would obviously be just “we sent it off for double blind refereeing, got a positive report and it was voted in by the editors…blah, blah blah”. So, well…never mind, I guess.

But come on guys….my job isn’t supposed to be this easy.

incoherence will out

Several months ago I had a post (rational credences–still private!) about Rachael Briggs’ paper Distorted Reflection and Anna Mahtani’s reply to it. Unfortunately, I had read Mahtani’s paper, which capitalizes on a single unfortunate sentence on the last page of Briggs’ paper, first. The sentence is this:  “I’ll adopt Lewis’s assumption that the agent stands no chance of mistaking her evidence, so that if the agent learns E, then all the suppositional worlds must be ones where E is true.” Having now read the papers more carefully…granted still not so carefully…Briggs’s paper is great. One of the best I’ve read. Mahtani’s paper meanwhile, in focusing on that one sentence and not the overall thrust of Briggs’ paper, is something of an uncharitable hatchet job–all it shows is that that one sentence doesn’t characterize what Briggs is consistently doing throughout the rest of her paper.

All Briggs was really saying here was that any agent who fails to (conditionalize actually) (actually regards as fair) a sequence of bets yielding a net loss in every possible world, yet there are cases of agents who (actually fail to reflect) but–despite being Dutch Booked–do not (actually regard as fair) a sequence of bets yielding a net loss in every possible world. For them, there are worlds where the bets actually accepted make money. (That said–it’s a Dutch Book after all–in those worlds they do go on to accept a further bet that loses whatever they gained and then some.) Very nice insight.

One might think there is a further distinction to be made…if a violator of reflection condones, in moments of lucidity, the counterfactual violations of conditionalization that lead to the reflection violation, they should be held accountable. Although there is a sense in which they may be diachronically coherent…their credences over time may mirror epistemic frequencies precisely after all…there is a finer sense in which they are coherent by virtue of luck. (Even a random updater would turn out to be coherent some of the time.)

There’s a stronger sense of coherence, to have which implies that–so long as you are able to maintain lucidity, of course–you’ll be coherent almost surely. As far as Dutch Books go, the lucidity requirement is confusing…after all, to lose one’s lucidity may just be what constitutes irrationality. So perhaps the real issue is one of consent. On such a view a legitimate Dutch Book is a seduction establishing incoherence, whereas an illegitimate Dutch Book is a rape establishing nothing of the kind.

To make this point rather tastelessly:

Rose’s credence in “Bill’s a worthy mate” conditional on Bill giving her flowers is near zero but she knows that if Bill actually gives her flowers then her credence in “Bill’s a worthy mate” will rise to nearly 1. Quai’s credence in “Bill’s a worthy mate” conditional on Bill slipping a lude into her drink without her consent is near zero, but she knows that if Bill actually slips a lude into her drink without her consent, her credence in “Bill’s a worthy mate” will rise to nearly 1…for all her ensuing resistance will establish to the contrary, at any rate. Both fail reflection, but I would be inclined to maintain that Rose is irrational, whereas Quai is not. It’s not irrational to be a potental victim of a crime. It is irrational to be a potential victim of an ill advised series of gambles.

But the terrain here is trickier than that…one could maintain that Rose’s case (as culpability for her credences goes) is the same as Quai’s…it’s not her fault that she’s affected as she is by flowers. Flowers have this scent that acts very much like a drug for Rose….blah blah blah. Obviously now we’ve gotten out of the realm of philosophy of probability, where I have no special claim to know what I am talking about.

That’s sort of my point about Briggs’s paper, though…I think she took it just about as far as you can take it from within the confines of philosophy of probability. Yes, she appeared to leave something out…she let many reflection violators off the hook. She did this by killing all of the Dutch Books for reflection. Some of the Dutch Books for reflection are probably okay. Namely, those in which all bets are agreed to with lucidity or with culpability for non-lucidity. Or something like that, but there’s a problem with this approach. The only way I see to define lucidity here is that the betting behavior should agree with that planned earlier in some sort of original moment of lucidity. But that makes the whole discussion sort of moot….surely no one would devise, in a moment of lucidity, a credal scheme susceptible to reflection violations. The whole point of a reflection violation is that it’s supposed to be a violation of diachronic norms….rationality over time. You destroy that idea if you only accept Dutch Books that would be sanctioned in the original moment of lucidity, for now you are just talking about synchronic rationality.

I still think that the best way to define diachronic coherence is the way I did in my last post. But I now recognize that my stronger definition…coherence of a credal scheme…isn’t really a diachronic notion. And probably isn’t very useful. I also am a realist. Philosophers are not going to like the way I defined diachronic coherence in the first place…because they aren’t comfortable with the frequency theory, for example. Dutch Books seem to be something that philosophers can live with, and though they are clunkier, there isn’t anything wrong with them. Within the constraints of analyzing credal coherence from the confines of Dutch Book lore, I would say Briggs absolutely killed it. (Caveat: she says that someone called Jonathan Weisberg had similar results as these appear in 2007. I haven’t looked at his paper yet. Maybe he deserves a slice of the credit for what’s here. I shall look soon, anyway.)

So are there ways to put reflection violators who aren’t coerced back on the hook? I’m sure there are. I just doubt now that this can be done from within the confines of “philosophy of probablity”. What makes the question so murky is that you want to put the hammer on voluntary, but not involuntary, deviations from what I have described as a coherent credal scheme. But who would deviate voluntarily? No one. Textbook cases of incoherence just are involuntary. It’s the same with adopting a credence function in accord with the probability axioms….everyone wants to but no one does. We don’t have the computation resources. Our violations are involuntary…that doesn’t make them coherent!

The alternative is to let all reflection violators off the hook…unless and until they violate conditionalization. I believe that’s the spirit of Briggs’s paper. Is it so bad?  I’m getting used to it. After all, it’s not so much coherence as accuracy we’re after in the adoption of our credence functions. Sure, some self-doubters might get lucky and remain coherent in the short term. But if their self-doubts are well founded, their incoherence will eventually out and, if they aren’t, their accuracy will suffer for having them. This is the credal analog of what Briggs was talking about when she said “there is a sense in which self-doubting agents can’t be right about everything.” Where beliefs are concerned, this comes to light right away. Where credences are concerned, you may need to wait long enough for “the law of large numbers to kick in” as they say. Self-doubters are either wrong to self-doubt, in which case accuracy will suffer in the long run, or they’re right to self doubt, and coherence (hence accuracy…the only reason to be coherent anyway) will suffer in the long run.

Wait…did I just put the reflection violators back on the hook from within the confines of philosophy of probability? Damn. Well, it sounded like it would be hard at the time.

Oh and obviously if you are coerced, well, your coherence/accuracy may suffer then, too. We’re all on that particular hook. Because while crime may not pay, it sure as hell hurts.

you can’t buy philosophical acumen

Believe me man, I wish it were possible–I’d sell you mine for a cool $M. We’d both be better off.

So I’m reading Probability and the Doomsday Argument (Mind, 1993) by William Eckhardt. Now, this guy is not a philosopher, it seems. Rather, he’s a (somewhat famous) futures trader. In fact he seems to be the same William Eckhardt I once read about in a book called “Rise of the Market Wizards” or something like that.

[[Personal aside: a friend with money wanted me to trade for him so he gave me this book in about 1994 to entice me….I ended up trying Datek with my own cash for about a year right around the crash in ’00 or so but found that I was very bad at it; moreover the market did not seem to be behaving the way it had for the previous ten years (buy the dip doesn’t work too well in a free fall). I did make some money but it wasn’t for me…rather a heinous person living in those dark days with those dark goals. Now I view the phenomenon of highly talented people going into finance as the archetypal  social ill of our age. Hedge funds owned by billion-heirs employing dozens of minor geniuses tasked with making them yet richer still whilst producing nothing whatsoever…makes you proud to be a human, does it not? Yes, it does not. By all manner of means.]]

At any rate, Eckhardt wants to discredit the Doomsday argument. Last sentence of his brief paper: “There may exist a plethora of reasons for supposing the human race to be doomed, but our own birth rank in the total human population cannot reasonably be counted among them.”

I applaud this conclusion, but how did he get to it? Well, he describes the usual Doomsday argument, where you assume that you were sampled uniformly from the pool of all humans, past current and future, then look at your birth order n and see how that reflects on

Dm(d) = there will be exactly d humans, ever.

With such a sampling scheme, lower values of d are favored relative to higher values. Except of course for values below n, which are ruled out. Indeed, the lower d is the more likely it will be ruled out by the revelation that birth rank is n, which accounts for why lower values get a bigger boost when they aren’t ruled out.

Here’s a cryptic though apparently crucial passage in Eckhardt’s counter-argument :

“However, the sampling arrangement in this example cannot truly be analogous to that of the Doomsday argument. In sampling equiprobable from a pool, only part of which currently exists, it is essential that one not invariably succeed in obtaining a sample item. Equiprobability entails that in some instances the sample ought to be one of the nonexistent items, in which case the procedure ought to yield a null result. A procedure that invariably yields an existent item cannot be equiprobable sampling, since in that case nonexistent members of the pool could not be receiving appropriate weight. Yet the sampling procedure employed in the Doomsday argument invariably yields a result-a human rank current at the time of the argument’s discovery. Hence, this cannot be equiprobable sampling from an ensemble only part of which currently exists.”

I have no firm idea what the above might mean. If (uniformly at random) Kirk wakes the members of Khan’s 200 member crew (one per day for 200 days) will Joachim, at the time of his awakening, opine “these wakenings cannot possibly have been generated uniformly at random, for not one of us who has been selected for awakening is still asleep!”

Fortunately, Eckhardt wastes no time getting to his novelty–a model no one else could dream up: “…let Samp(r) represent a sampling procedure that is the restriction to {1,2,…,d}  of a distribution that is independent of d.” Okay, so if this is the way my birth rank is decided then there is an N such that I am 99% likely to have a birth rank somewhere in {1,2,…,N}. (What if there are 2N people? If we get them together and they are all 99% confident that their birth rank is in {1,2,…,N}, we have a problem.) High birth ranks are pretty rare by this model’s lights. But if I am born in isolation and learn that there are 10^12 people, total, past present and future, should my birth rank distribution not be uniform on {1,2,…,10^12}? What if everyone is born in isolation and has experiences just like mine?

Contra Eckhardt, almost everyone else has it that, conditional on d, your prior over birth ranks has to be uniform. The Doomsday argument hinges rather on what your credences over values of d should be. Objective chance, or objective chance times number of observers (renormalized)? Eckhardt’s idea that one should abandon uniformity conditional on d is a bit on the audacious side, given that, conditional on d, there are just as many people with high birth rank (i.e. > d / 2) rank as with low.

Worse, Eckhardt’s model is prone to gratuitous violations of reflection. Long after Doomsday, there’s a banquet in heaven. All humans (no present and future, just past now) are invited. Go ahead and mingle. Count everybody there. Now you know d. What are your priors concerning your birth rank now? These are your priors, mind you. Before evidence you gathered from the fossil record and availability of Nintendo while you were alive. Uniform, I’m guessing. Otherwise, you must just think there’s something special about you. Reflection, then, says that they should be uniform now.

So how do we solve the Doomsday paradox? Well, first you need to assume that the total number of humans d is a finite-expectation random variable. You have a pool of birth-able human templates of size N >> E(d). From this pool, you choose d human templates at random (I see no reason not to allow repeats, though they will be rare…allowing repeats avoids problems arising from the rare case that d > N). Okay…now suppose you are a template in the pool. The probability of being birthed at all is E(d)/N. Conditional on being birthed, your posterior distribution on d is given by

Q(d=k) = [ k P(d=k) ] / E(d)

and your birth rank distribution, conditional on d, is uniform on {1,2,…,d}. Now conditional on birth rank n, we have

Q(d=k|n) = [Q(d=k)Q(n|d=k)] / SUM_{j>=n} Q(d=j) Q(n|d=j) = … = P( d = k | d >= n),

which is just what you’d expect. (No paradox…all a birth rank of n tells you is that d is at least n.)

This isn’t the only place Eckhardt falls prey to a reflection violation…or a failure to recognize the role of finite expectation. Consider for example his treatment of the shooting room paradox (which he, financial guru that he is, reworked to his comfort zone) in his book Paradoxes in Probability Theory:

“The Betting Crowd game consists of one or more rounds. For each round a certain number of players enter a region and they each bet even money against double sixes on a single roll of the dice (so they all win or lose together). If the players win, they leave the region and a number of new players are brought in that equals ten times the number of players that have won so far. The dice are rolled again. The rounds continue until the house wins on double sixes at which point the game is over. This guarantees that 90 percent of all those who play lose. Two trains of thought collide: (1) since double sixes occur less than 3 percent of the time and a player stands to win about 97 percent of the time, the bet is highly favorable; (2) since 90 percent of all players are destined to lose, the bet is highly unfavorable.”

Here is Eckhardt’s “solution”:

“A player ought to reason thus: 90 percent of all players will lose, but I have less than a 3 percent chance of belonging to that losing majority. This is no paradox; each player is prospectively likely to be in the minority, since he or she is prospectively likely to win and winning itself causes there to be enough subsequent players to guarantee the winner is in the minority.”

Let’s call this argument “NO PARADOX!” (NOPE!). Actually let’s not; it’s not that funny. Essentially, though, the argument is that there is no paradox “just because”. But there is a paradox here. Let’s change things slightly. You don’t actually learn whether you’ve won at the time you play. (You see them roll the dice, but the outcome is obscured.) You can’t see how many are playing at the same time, either. And, again, let’s have a banquet at the end of the experiment, to which all who’ve played are invited. Feel free to mingle, feel free to count the guests. How does Eckhardt like his odds now? Still 35/36? If he says “yes” well, let’s just say I’d love to be making side bets in that room! Just me and 10^N Eckhardt clones, me gettin’ rich at their expense fast.

But oh no…that’ll never happen. Nobody’s savvier when the money’s on the line, and old Bill will be flopping at the banquet faster than you can say arbitrage schmarbitrage. One tenth all the way, baby, and viola…there goes reflection.

Now, maybe it’s just me, but in my book violations of reflection are damn paradoxical. Accordingly, if your analysis of a paradox is saddled with a violation of reflection then your analysis has failed to resolve the paradox, protestations of NOPE! to the contrary notwithstanding. (Okay, fine, I couldn’t resist.)

So how do we resolve the paradox? Well, you’re not gonna like this. The expected number of players in the game has to be (queue yawns, you’ve already guessed it if you’ve been reading this blog) bounded in expectation.

I know, I know…groans all around. I’ll spare you the proof. But hey…don’t shoot the messenger. Don’t bet against him either…in this or any other Crowd. This is that rare occasion in which yours isn’t the smart money.

“beauty and the bystander” or “da’id-lew-is, whose self is beastly dead”

Found a couple of rather dated SB papers today. The better of the two was A Devastating Example for the Halfer Rule by Vincent Conitzer. Conitzer rediscovers (again) what’s so bad about double halfing. (It violates reflection.) Of course, this has been known for well over a decade, since the outstanding paper A Challenge for Halfers by Cian Dorr. (The title of Conitzer’s paper is rather funny to me as I have been referring to Dorr’s paper as a “devastating” blow to double halfing for some time. It’s bad enough that Titelbaum and others are ripping Dorr off here. I don’t think we need another devastating shot! Double halfing is quite dead by now.) Conitzer’s paper is definitely clever and he reinvents the wheel more than once. I particularly like the following passage, in which Conitzer is discussing a variant of the problem:

“…it is now the Halfer Rule that runs afoul of the Reflection Principle: if Beauty is certain that her credence on Monday (or, for that matter, Tuesday) will be 1/3, then why is it not 1/3 already on Sunday? In fact, it seems to me that this violation of the Reflection Principle is more serious than the thirder’s alleged violation of it in the original Sleeping Beauty problem, for the following reason. In the original problem, it would be unreasonable to say that the fact that the thirder will end up having a credence of 1/3 on Tuesday implies that she should already have a credence of 1/3 on Sunday. After all, she does not always wake up on Tuesday, and if she were capable of, in her sleep, recognizing that she has not been awoken, she would assign credence 1 in Heads then. That is why the purported violation focuses on the Monday credence in Heads, not the Tuesday one. But it seems illegitimate to consider Monday separately from Tuesday, because Beauty cannot distinguish them. Thus, it seems debatable whether the thirder really violates the Reflection Principle (more precisely, whether she violates any version of this principle by which we would care to abide).”

This is of course another case of reinventing (or at least starting to reinvent). If Conitzer had read Stopping to Reflect by Seidenfeld, Schervish and Kadane he would know precisely which “version of this principle by which we would care to abide”…namely that in which refection between now and later requires (in particular) that later be a stopping time. For the thirder, Monday is not a stopping time. (The thirder cannot conspire to say stop! on and only on Monday!) Of course if he has read the aforementioned paper (or any of my unpublished ones) then he’s merely guilty of forgetting to cite someone. Somehow though I doubt this. Conitzer is not a philosopher, and I suspect that he just isn’t very familiar with the literature. If that is so, and I suspect it is, then he’s really quite cleverer than 98% of the philosophers writing about Sleeping Beauty. (Of course that’s high crime, and probably implies that his paper will never see the light of day.)

The other paper, When Beauties Disagree: Why Halfers Should Affirm Robust Perspectivalism, by John Pittard, rediscovers another old fact, namely that Lewisian halfers can disagree with each other about credences even when they’re in the same room, trust each other, can communicate, had equal priors at some point in their pasts and have the same uncentered evidence. In my survey I put it this way:

“…consider fellow Lewisian Sleeping Gorgeous, who gets awakened once if tails, twice if heads. Gorgeous has credence 1/3 in heads upon learning Monday. She and Beauty, who we can take to have been awakened in the same room, agree about how to determine credences, can talk to each other about their evidence, trust each others’ judgments and yet find themselves on opposite sides of objective chance concerning a future toss of a fair coin.”

Conitzer writes: “It should be noted that it would be trivial to turn these disagreeing participants into a money pump by arbitrage of their different credences.” That is worth mentioning, but it belies an ignorance as to how Lewisian halfing works. It’s sample weight dilution, a standard technique for correction of bias…in this case oversampling of the tails world. From the perspective of Beauty, her tails money is diluted by a factor of two. So if we’re to regard betting as a zero sum game, money pumping Conitzer needs also to dilute his enjoyment of her stake by a factor of two if tails. Similarly for Gorgeous if heads–and the money pump disappears. Is that odd? Yes! No one ever said that Lewisian halfing isn’t that. It’s a useful fiction for sampling bias correction, it isn’t meant to be taken literally, as a “guide to life”, as Lewis characterizes credences elsewhere. At any rate you can read all about it in my survey–it’s probably the only good reading of Lewis. Everyone else trashes his position without tact, insight or mercy. Or insight. (O, it’s only Da’id-Lew-is whose self is beastly dead. An impossible person!)

Pittard meanwhile appears to spend upwards of 25 pages attempting to defend the literal, guide-to-life view of Lewisian halfing–without crediting Lewis! Which seems a far greater offense to his memory than trashing his position. At any rate, I confess that I jumped to the end of Pittard’s paper, hoping to find the serious mistake that would justify not reading carefully…and found it. I will now explain what it is.

Pittard thinks he has a “hard case” for Lewisians. It relates to the following generalization of the so-called “reflection principle”:

EXPERT REFLECTION: If I know that S is an expert on p relative to myself, then my credence for p conditional on S’s credence for p being x should also be x.

Here, to be an “expert” on p relative to me, S needs to know all I do about p and perhaps more. (I condone S’s priors and know that any worlds I have eliminated, S too has eliminated–perhaps more.) Pittard claims that many philosophers have endorsed EXPERT REFLECTION. There is something already quite odd about this. EXPERT REFLECTION clearly has naive reflection as a consequence, and there are known counterexamples to naive reflection. Future time slices of me, in particular, are experts relative to me about everything, so any instance in which reflection between now and later fails will be an instance in which EXPERT REFLECTION fails. On the other hand, there is, as Conitzer puts it, “a version of this principle by which we would care to abide”. And it’s called (queue drumroll) The Bounded Martingale Stopping Theorem! (I love repeating myself. And I have to do it, apparently, as philosophers seldom read memos–or emails, I have discovered–from sarcastic mathematicians.) This theorem fixes naive reflection by adding a condition to the antecedent…namely that later be a “stopping time”; in particular the agent needs to be able to recognize when later has arrived. (So the agent could say stop! at that time; i.e. could recognize that her location in time matched that described by her former self.)

So, if EXPERT REFLECTION wants to be true, it’s wanting for a corresponding condition in the antecedent. Namely, if we want reflection between me and expert to hold, then expert has to recognize that his identity matches that described by me. (I don’t have a catchy name for such an expert…stopping expert doesn’t seem to work very well, though, arguably, neither does stopping time–philosophers have, after all, steadfastly refused to recognize the significance of the latter.) If we make this change, then EXPERT REFLECTION is saved. There is no need to discredit it. Indeed, to discredit it is misleading. It also reflects ignorance of the literature. Yet this is exactly what Pittard does.

Pittard is worried about a case where Beauty (here a Lewisian) tries to figure out a bystander’s expected credence in heads during one of her awakenings. The bystander knows what day it is and knows whether Beauty is awake. So for Beauty, bystander’s expected credence in heads is 3/8; if Monday (which Beauty assigns probability 3/4) bystander has no evidence bearing on the toss and if Tuesday (which Beauty assigns probability 1/4) bystander knows tails. Pittard thinks that bystander is an expert relative to Beauty about heads, but wants not for Beauty to adopt his credences. So he thinks he needs to attack EXPERT REFLECTION. He does this by constructing an unrelated (and disanalogous) counterexample in which an expert fails to recognize that his identity matches that described by me.

This is hopelessly misguided. With the requisite added hypothesis, EXPERT REFLECTION cannot fail (it’s just a theorem), and if bystander were an expert here then certainly he’d be the sort of expert that satisfies the hypothesis (he can obviously self-identify under the sort of description Beauty has in mind), so reflection between Beauty and bystander would be valid. If, that is, bystander were an expert relative to Beauty! He is not! (His priors don’t match hers.) Indeed, this is already in Lewis, which Pittard clearly did not read closely enough. Lewis says, quite clearly, that upon learning Monday Beauty learns something relevant to heads…namely that she is not in the future. In other words, she eliminates something upon learning Monday. What she eliminates is half of the tails world. (This is what “robust perspectivalism” really is…the view that one can, contra strict Bayesianism, dilute worlds in SB situations without eliminating them. Pittard has proposed a rather clever name for what Lewis does, but his paper is not a good exposition of it.) I’ve been calling this dilution. Bystander doesn’t do this upon waking Monday–so from Beauty’s perspective, bystander knows less, conditional on Monday, than she herself does. As much is already reflected in bystander’s faulty (from Beauty’s perspective) priors, before learning Monday. So the real reason that reflection between Beauty and bystander fails is that bystander is not an expert on heads relative to Beauty. It’s not because there’s some deep problem with EXPERT REFLECTION. That principle is easily fixed, as I have indicated. Nor is any of this really new.

In fact, it’s getting old. Time to do your homework. The paper is called Stopping to Reflect. The theorem is called the Bounded Martingale Stopping Theorem. Get your head around it or get out of the way. I’m not kidding. Editors, meanwhile, when you see a paper about “reflection” (or for that matter “expert reflection”), you need to send the paper to an actual expert…not merely someone who answers to that description.

Pittard’s example meanwhile, while something of a rehash, is perfectly fine as an indictment of naive EXPERT REFLECTION. It goes something (not much, but I’m allowed to rehash too) like this:

Tex tosses a fair coin. If heads, he tells his best friend the outcome. If tails he tells no one. Tex has only two friends. They and I are each indifferent as to which friend (Chip or Dale) is best friend. If Chip gets the heads message his credence in heads will of course shoot up to 1. Otherwise, it will drop to 1/3. Not zero, since Chip doesn’t know that he is best friend. Not 1/2, as he doesn’t know that he isn’t. Now…after all has played out, from my vantage point Chip is clearly an expert. He knows all that I know, and in fact more, as he knows either (a) that he is best friend and the coin landed heads or (b) that he is not best friend or the coin landed tails.) So I would do well to adopt his expected credence in heads as my own. And that’s okay, because it’s 1/2–same as mine. It’s 1/2 because there is a 3/4 probability that he didn’t hear anything, in which case his credence in heads is 1/3, and a 1/4 probability that he heard heads, in which case his credence in heads is 1. And, well, you know it from there. By the same token, Dale is an expert…and his expected credence is 1/2 as well. So no trouble for EXPERT REFLECTION yet.

Trouble arises when we consider reflection between me and best friend (under that description). Best friend has expected credence 2/3 in heads. Moreover, best friend is an expert relative to me. After all, best friend is either Chip or Dale, and we already agreed that they’re both experts. But it’s clearly false that I ought to adopt 2/3 as my credence in heads. So reflection between me and best friend (under that description) fails. How did this happen? We’ve already had this discussion. Best friend fails to recognize that his identity matches that of the description. So there was never any reason to think that reflection between me and best friend (so described) should be valid.

It will perhaps be instructive to see how this is essentially John Collins’s Prisoner example. Recall that the Prisoner will, with probability 1/2, be executed at midnight. (Otherwise he will be left alone in his cell.) He has no clock, so as midnight approaches his credence in executed drops. We don’t know how far it drops because we don’t know if his internal clock is running fast or slow. But it will drop at least a little by, say, 11:59, as he won’t be entirely sure at that point whether or not midnight has passed. For example, let’s say his internal clock is indifferent, at 11:59, between “before midnight” and “after midnight”. Then Prisoner’s credence in executed will have dropped to 1/3. Now at 6 O’clock the Prisoner christens his 11:59 time slice best friend and decides that the expected value of best friend’s credence in executed is 1/3. (A back-of-envelope calculation suggests that anything bigger than 1 – ln 2 is plausible here, assuming the Prisoner’s internal clock is unbiased.) Moreover, best friend is an expert relative to the Prisoner. He knows everything the Prisoner does, plus the fact that he lived long enough for his internal clock to assume whatever state it has. Reflection between Prisoner and best friend fails, however. Usually we say that this is because 11:59 is not a stopping time. Prisoner has no idea, at 11:59, that it’s 11:59–so he couldn’t, for example, say stop at that time. Put another way, best friend doesn’t recognize that he fits the description best friend…much as in Pittard’s example.

Seen from this vantage, The Prisoner is already a counterexample to naive “expert reflection”; there wasn’t really any need for Pittard to have supplied another. On the other hand, it was probably good than he did. It just would have been better if he had realized what it accomplishes, and why.

why i am not not a bayesian

Okay so today it’s Why I am not a Bayesian by Clark Glymour. It’s a dense paper, and I will cordon off a section of it for discussion, namely the final section dealing with the so-called problem of old evidence. Indeed, I’m not really even going to discuss that, per se. Rather, I’m going to focus on a single three-sentence passage in the paper. This one:

“How might Bayesians deal with the old evidence/ new theory problem? Red herrings abound: the prior probability of the evidence, Bayesians may object, is not really unity; when the evidence is stated as measured or observed values, the theory does not really entail that those exact values obtain; an ideal Bayesian would never suffer the embarrassment of a novel theory. None of these replies will do: the acceptance of old evidence may make the degree of belief in it as close to unity as our degree of belief in some bit of evidence ever is; although the exact measured value (of, for example, the perihelion advance) may not be entailed by the theory and known initial conditions, that the value of the measured quantity lies in a certain interval may very well be entailed, and that is what is believed anyway; and, finally, it is beside the point that an ideal Bayesian would never face a novel theory, for the idea of Bayesian confirmation theory is to explain scientific inference and argument by means of the assumption that good scientists are, about science at least, approximately ideal Bayesians, and we have before us a feature of scientific argument that seems incompatible with that assumption.”

Now most of this passage is uninteresting to me. Of the three proposed “Bayesian” strategies, the first two make no sense to me at all, and that Glymour spends so much time rebutting them seems to me a bit of a straw-man exercise. The third, however, I got very excited about, in particular this phrase: “an ideal Bayesian would never suffer the embarrassment of a novel theory”. This is why we read philosophy. To hear others formulate, in developed form, our own embryonic positions. But then, when I read “the idea of Bayesian confirmation theory is to explain scientific inference and argument by means of the assumption that good scientists are, about science at least, approximately ideal Bayesians, and we have before us a feature of scientific argument that seems incompatible with that assumption” I sort of lost the thread of Glymour’s argument. This was confirmed by the last page and a half of the paper, which I did not recognize as having been written by the same person that had so cogently characterized my embryonic thought.

So what I propose to do in this blog post is to develop my embryonic thought. Which is this:

The fact that an ideal Bayesian would never be embarrassed by a novel theory completely solves the old evidence problem. Moreover, science does approximate ideal Bayesian behavior…albeit slowly. Moreover, it is in communities of scientists, not in individual scientists, that this approximation is best. 

Now I’ll try to explain myself. According to the Bayesian, when one encounters new evidence, one updates one’s prior probabilities by eliminating all world-states incompatible with that evidence and raising credences in other world-states uniformly, i.e. retaining the ratios of their likelihoods. So if worlds A and B are consistent with evidence E and if my priors dictate a credence in A twice that of B before observing E, I will have credence A equal to twice credence B after observing E as well. The ideal Bayesian doesn’t need to “think about” her new credences at all. She just updates by the standard formula from her priors. That, at any rate, is how the story goes.

There are a couple of potential problems with this story. The first is that it requires us to have prior probabilities. So it is, for example, that Bayesian parameter estimation is perceived by some to be clunkier than some “other” methods of parameter estimation, such as maximum likelihood estimation, that do not appear, on the surface at any rate, to utilize prior distributions of the parameters directly. I don’t think this is really right. Generally some distribution is assumed if an estimation method is to be rigorous. (Perhaps it is a uniform one.) So it’s not the sense advocates using methods of estimation that go by the name “Bayesian” that I am concerned to defend here. If I take prior parameter distributions to be implicitly uniform wherever they have not been reflected upon, even my choice to employ maximum likelihood estimation becomes a genre of Bayesianism. The real problem I want to address here is the question of whether it’s anti-Bayesian to reflect on and change one’s priors after receiving evidence. In general I will conclude no, but with a caveat: I am not talking about changing your priors due to the fact that you take the fact of evidence e to count as evidence that e was likelier than thought to be obtained as evidence. Rather, I am considering a change brought solely about by using the topicality (rather than the actuality) of e as an impetus for undertaking first-time reflection on how one ought to update in cases where e is obtained as evidence. The sort of reflection that could, and in fact would, have been undertaken, long before e was encountered, given time and computational resource.

I’ll get at these issues by way of the following game. I will start giving you terms in some sequence or other s(n). After s(n-1) is revealed, you take a guess at s(n). Your score is the largest n for which your guess was wrong. The lower your score, the better. Here we go: [Personal note: I have an amusing mental block concerning this puzzle. In particular, no matter how many times I talk about it, I screw it up. It’s really sort of amusing.]

s(0) = 0

I am assuming you guess s(1) = 1.

s(1) = 1

Now you’re feeling pretty confident. You guess s(2) = 2.

s(2) = 2

Wonderful. Guess s(3) = 3, naturlich. Now comes this:

s(3) =
































I’m guessing that this is not what you were expecting s(3) to be. Clearly this is no polynomial of modest degree and smallish coefficients–we must think “outside the box” on this one.

So what would a Bayesian do? On paper, just condition her prior probabilities for the space of all possible five term sequences on what she’s seen so far, namely s(0) – s(3). But, you might reason, since s(3) has well over a thousand digits, this would appear to be an unrealistic demand in at least two respects. First, there are just too many candidate sequences. Second, you feel pretty sure now that your data is specific enough that there is a unique “simple” pattern. So you feel pretty certain that, given enough time and paper, your probabilities for what s(4) might be would concentrate significantly on some single value K. You just don’t have any clue, right now, what value that is. These are two respects, then, in which you differ from the “ideal” Bayesian. First, you have fewer computational resources. Second, you are less clever. (I leave it open whether these amount to the same thing.) But how, if at all, do these facts bear on the status of “Bayesianism”?

I would say they do not bear on that status at all. But let’s solve the problem and discuss that after.

First, you may note that s(3) ends with a lot of trailing zeros. That means it has a lot of 2s  and a lot of 5s in its prime factorization. So, it seems likely that s(3) was arrived at by multiplication, and it becomes natural to factor s(3) into primes. Having done that (childs’ play on a computer), one would notice some peculiar things. First, no prime factor greater than 720 appears. Second, every prime between 360 and 720 appears precisely once. Third, every prime between 240 and 360 appears precisely twice, between 180 and 240 thrice, etc. It’s not far from here to construct the hypothesis, easily verified, that the number s(3) is 720!.

So now we have our sequence: 0, 1, 2, 720!.

But you may recognize as well that 720 = 6!. So here is our sequence: 0, 1, 2, 6!!. What next? Well, 6 = 3! and 2 = 2!!, so: 0, 1, 2!!, 3!!!. Is that all? Well, 1 = 1!, so: 0, 1!, 2!!, 3!!! and of course the next number in the sequence is 4!!!!.

It’s now that Glymour would, I imagine, say something to the effect that what we just did was not Bayesian, or perhaps not even compatible with what is Bayesian. Before we encountered the evidence, i,e. s(0) – s(3), we did not have priors (because we had never considered the question) reflecting the inevitability of s(4) = 4!!!! in light of evidence s(0) – s(3). One might claim that our eventual considered response, then, that s(4) = 4!!!! with near certainty, constitutes a departure from Bayesian behavior. I don’t agree with that, and the reason is this: it is not merely the case that we now think that the probability of 4!!!!, conditioned on what s(0) – s(3) actually are, is near unity. No–we also believe that our former unreflective priors (if any) are irrational. We have thefefore abandoned them in favor of more better priors.

So we now hold that a rational priors function P will have it that P[0, 1!, 2!!, 3!!!, 4!!!!] is nearly equal to the sum, over all K, of P[0, 1, 2!!, 3!!!, K]. We’ve remained dedicated to the Bayesian perspective throughout our deliberations; we’re just better educated, post-reflection, about what “ratonal priors” ought to look like. Now it’s true that we did not subscribe to such priors (perhaps to any) before we got the puzzle. Nor is it the case that the Bayesian perspective is what led us to them. Bayesianism is silent on the matter of what priors it’s rational to adopt. All Bayesianism tells us is how updated credences should relate to priors and evidence. (They should be arrived at via conditionalization of one’s priors on the evidence, whatever those may be.) And should you be a less than ideal agent, it doesn’t even say, as many have supposed, “never change your priors”. Changing your priors implies diachronic irrationality, true enough. But that’s because only one set of priors is correct, and it’s irrational to have incorrect priors! (This is not the usual view, but it is the correct view.) Obviously if you change your priors, then they were, at some point, incorrect, and to have held incorrect credences at some point in time is to have been diachronically irrational. (It is not necessarily to have been diachronically incoherent, which is a stronger claim. Changing your priors does imply diachronic incoherence, but that’s no reason to persist in holding onto bad unreflective priors.) But it’s not to have violated Bayesianism. What would constitute a violation of Bayesianism? Nothing less than to hold a credence that is distinct from the credence that would be held were one to condition one’s current priors on one’s total evidence.

To repeat: in our example, the role of the evidence you acquired, namely s(0) – s(3), had merely an accidental, attention-focusing role in the adoption of your “improved” (i.e. reflective) priors…priors it was inevitable you would endorse once you had considered the matter closely, whether in response to evidence or not. If you had an infinity of years in time-suspended-isolation to compose comprehensive priors on integer sequences, you would become that ideal creature that can’t be embarrassed by novel patterns. You would, without having ever encountered a pattern in “real life”, have come to “know all the patterns”, simply for having encountered them all in various thought experiments. That knowledge would be sitting there, in your priors. Maybe someone else doing the same would have different priors. They’d almost certainly each notice a lot of the same patterns, but might not agree on their relative strengths. They can’t both be ideally rational, but they can both be coherent, in spite of their differences.

Bayesianism, then, given a set of priors, is just a mindless updating scheme. It’s fine so far as it goes, but the real work of science is in the priors, and Bayesian lore is silent (not wrong) on the issue of what they should be. It’s more or less agreed that “simplicity” should be sought after in the setting of one’s priors, but there’s a lot of question as to what “simplicity” comes to. For the actual practice of science, this question is probably not that important. (Scientists know simplicity when they see it.) For philosophy, however, this is where the action is. I think Glymour senses as much, but it’s just not right for him to encourage others (by example) to disown Bayesianism (“I am not a Bayesian”). Bayesianism isn’t a recipe for all rationality, but there’s no reason not to be one. It’s probably better to just say “I’m not not a Bayesian.”

So I won’t not say it.

Glymour’s concluding paragraph:

“None of these arguments is decisive against the Bayesian scheme of things, nor should they be, for in important respects that scheme is undoubtedly correct But taken together, I think they do at least strongly suggest that there must be relations between evidence and hypotheses that are important to scientific argument and to confirmation but to which the Bayesian scheme has not yet penetrated.”

I don’t know what “relations between evidence and hypotheses” amount to in this context, but that is not how I would put it. The real problem that the Bayesian scheme doesn’t penetrate…makes no claims to penetrate now or ever, so there’s little point in saying “has not yet penetrated”…is that of the adoption of rational priors. Appealing to “simplicity” can only get us so far. Linear relations are simpler than quadratic, we may say, which are in their turn simpler than cubic, etc. There are other sorts of functions appearing in nature, however, both familiar (exponential, trigonometric) and not. Many distributions, both familiar (binomial, hypergeometric, Poisson, normal) and not. It’s well and good to order relations according to the number of unknown parameters (fewer = simpler) or to say that a relation is confirmed precisely when the number of data points exceeds the number of unknowns, and certainly we suspect that any comprehensive theory of the rationality of priors should reflect versions of these and other ad hoc principles, but none of this means that we have such a comprehensive theory, least of all that Bayesianism (of all things!) should be faulted for not laying it at our feet.

After all, the theory of logical consequence didn’t lay this at our feet either–no one wrote a paper called “Why I am not a Logical Consequentialist”.