Cian Dorr: Against Counterfactual Miracles

Cian Dorr writes, in the recent paper Against Counterfactual Miracles,

“It is natural to suppose that if…say, you had blinked one more time than you actually did while reading the previous sentence–the state of the world in the distant past would still have been…as it…was. … But if determinism is true….”

There is something slightly paradoxical going on here. In evaluating a counterfactual, we are, according to orthodoxy, put upon to alter the actual world as meagerly as possible while making the antecedent true, then judge the truth of the counterfactual according to whether the consequent is true, subsequent these alterations. But unless we change the laws underwriting determination, we need to alter the past in order to account for the extra blink.

The problem is general, of course. It infects at least those counterfactuals whose antecedents aren’t chance events. All counterfactuals, if determinism is true. Is it? Dorr writes:

“determinism…is a live possibility, one  that many physicists and philosophers of physics take quite seriously. So it is not a merely academic exercise to investigate which of our ordinary beliefs are consistent with it.”

I would urge an interpretation of counterfactuals on which the issue of determinism becomes a red herring. In particular, I would urge that when we imagine incorporating truth of the antecedent into actuality, we do so in a way that only fixes the epistemic position of the speaker.

On a certain view, this seems like a non-starter. On the table is

A♠ 9♠ 8♣ 9♥ 3♠

John, holding A♦  A♥, goes all in, whereupon Matt goes into cardiac arrest and dies. John then says, rather insensitively, “if Matt had called, I would have taken his money”.

On just about any extant view, this counterfactual appears to be true if and only John had a stronger hand than Matt, i.e. if Matt was not holding 9♣ 9♦.  In particular, its truth appears not to be a function of John’s epistemic situation….it depends on facts about the world that John doesn’t know. Apparently, then, I am surrendering the view that counterfactuals have truth conditions, and/or that their meaning is closely related to truth, or to when or how they are true.

So what did John mean by “if Matt had called, I would have taken his money”? We can imagine saying to him in reply “you don’t know that, John.” To which he might say “Well, no…I don’t know it.” (Is this merely contextualism?) Why then did you say it, we should then ask, to which we might get “I said it because, most of the time, when you’re someone like me playing against someone like Matt in situations like this with those cards up and you have aces full and go all in and the other guy calls, you take his money.” Most of the time, then. Not all. Indeed…not even this (ostending some counterfactual) time.

Is this a breed of unfashionable internalism? I don’t mind being unfashionable. (In fact, I tend to prefer it.) I just don’t want to get in trouble later on.

Maybe I will get in trouble later on, but for now, I’m doing quite well, for by “if I had blinked twice (rather than once, say), then the past would have been the same” just means something like “most of the time, when someone like me blinks twice in a scenario epistemically similar to mine, the past turns out to the be same”. Which is clearly just wrong. All of this squares with my own intuition.

I realize it will not square with everyone’s. Here, though, is some therapy. There is a strong conversational norm against asserting that which you don’t know to be true. (It’s not quite lying, but it’s close.) On the orthodox view, John would be in  violation of this norm in avowing “if Matt had called, I would have taken his money”. For he is clearly in violation of this norm in avowing “my hand was stronger than Matt’s” (he knows no such thing) and, on the orthodox view, these are truth-equivalent. But (I claim) our intuitions suggest that John is not in violation of this norm. What John’s words indicate is something a bit vaguer. Something along the lines of “I had a good hand” or “I wasn’t bluffing” or perhaps just “I think I had a better hand than Matt”.

Why “if Matt had called, I would have taken his money” is more assertible here than “I had a better hand than Matt” is something of a mystery, for if any counterfactual has truth conditions, this one does, and every semantics (that I know of) would agree on what they are. I believe the moral is that the two avowals aren’t truth-equivalent, and the only way I can imagine that being the case is if the counterfactual has no truth conditions at all.

Or: whatever I mean by “if it had been the case that F then it would have been the case that G”, it’s surely going to turn out to be something I think I know. (Otherwise, why exactly am I saying it.) So “given what I know” is implicit. It’s what I know that’s relevant to the fact that at least most of the pertinent F worlds are G worlds. In this case (thinks John) the fact that I have cards that win most of the time in scenarios like these.

But, I’m getting sidetracked. Let me try to get back on the paper. Dorr gives an example from Frank Jackson:

a. If I had jumped out of this tenth floor window, I would have been killed.

b. If I had jumped out of this tenth floor window, I would have done so only because someone had put a safety net in place.

On the similarity interpretation of counterfactuals, both a. and b. seem to have valid readings. (This is one of the big problems with the similarity interpretation…there are too many viable choices for a similarity metric.) The b. reading requires what Dorr, following Lewis, calls “backtracking”. In finding a similar world or worlds, one allows oneself to significantly alter the past. Presumably, worlds where there are some people who fear that I might be suicidal are more similar to ours than are worlds where I am suicidal. Now you have to decide whether backtracking is legitimate.

Which I think is fairly hopeless. “Similar” can mean too many different things. My own interpretation of counterfactuals is in no such trouble here. Since I am implicitly fixing my own epistemic position, b. has no viable reading. It would, in particular, be weaker to avow “If my epistemic position had been the same and I had jumped out this tenth floor window then my epistemic position would have been different”.

Hmm. There’s some subtlety here. I want to allow an epistemic reading of something like “If  had looked at that card, I would now know who the messenger is”, which seems to be stronger than “If my epistemic position were the same and I had looked at that card, my epistemic position would be different”. Though this is to say different now, not different at the moment where the counterfactual and the actual diverge. Hence “If my epistemic position were the same then and I had looked at that card, my epistemic position would be different now“. What it seems we want to fix is my epistemic situation at the moment of divergence.

Again, though…there are traps. How about “If I had known then what I know now, I would not have married Jane”. Clearly that is a valid counterfactual, but a satisfactory reading of it requires us not to fix my epistemic situation at the moment of divergence. Or does it? Maybe it doesn’t. My epistemic position is the same up to the moment of divergence…said divergence being the point at which I counterfactually learn then what I actually know now.

The apocryphal analysis is just as before. What I know now is that Jane is DKL-positive. (DKL is a dread virus that causes one to mishandle analysis of counterfactuals.) If I had know that then, it would have to have been (according to the apocryphal analysis) because I was also DKL-positive (you can only learn this sort of thing at a DKL-ics anonymous meeting, it seems), and so I would have married her anyway.

So…no backtracking! Not in any way that affects my epistemic situation prior to divergence, that is…though the divergence itself may involve a counterfactual epistemic position. What is distinctive about the epistemic perspective, then, is that I am free to backtrack the hidden variables, (if determinism is true) as freely as others evoke counterfactual chance outcomes (should it be false). At any rate it doesn’t really matter whether the relevant variables are merely hidden (determinism) or generated on the fly (chance). And plainly it should not matter!

This is the sense in which determinism is a red herring.

Okay…Dorr wants ultimately, he says, to hang blame on the following:

Past: Necessarily, whenever x is normal at t, there is a true history-proposition p such that p would still have been true if x had blinked at t.

He writes:

“We will be tempted to dismiss Past on the basis of our reactions to sentence like (2):

(2) If determinism is true and x does not blink at t, then if x had blinked at t, that would have been because of a prior history of determining factors differing all the way back.

(2) sounds incontrovertible and is plausibly true on its most natural interpretation.”

Whereupon he strikes an analogy between (2) and b. above and prescribes that all counterfactuals will be parsed in the spirit rather of a. The expedience of the epistemic perspective is now completely clear, as (2) no longer reads as “incontrovertible” (mostly it just reads as improperly formulated, i.e. confused) and no such prescription is necessary.

Now we come to footnote 5, which I reproduce in full.

“5. Note that the following also sounds obviously true:

(2′) If determinism is true and x does not blink at t, then if x had blinked at t, a miracle would have to have occurred.

Although I hold that Past fails in ordinary contexts, I am inclined to think that (2′), like (2), is true in the context it most naturally evokes. Lewis’s dichotomy between “backtracking” and “standard” contexts is not particularly helpful here. I believe the explanation turns on subtle ways in which epistemic necessity modals (like “have to”) can serve to signal that certain other propositions, serving as premises from which the asserted content can be inferred, are to be taken for granted.”

There are several issues here. First…what is a miracle? It can’t be a counterexample to a strict law–strict laws don’t admit of counterexamples. It can’t be an exception to a ceteris paribus law–exceptions to ceteris paribus laws aren’t miraculous. I think I see a way to make sense of “miracle”, but it requires my favored metaphysics. A miracle is an event of probability zero. The idea here is that the universe is infinite, admits of densities, and that the density of any metaphysically possible event is positive. Events that are of zero density are not metaphysically possible. If they do occur, however (with density zero), then I’m willing to let those occurrences be “miracles”. I’d bet against long odds that there are no miracles. But…who knows.

Of course there’s something funny in my terminology. If there are miracles, then they are actual but not metaphysically possible! Some better terminology is perhaps advisable, though if I am right and there are no miracles then the metaphysically possible and the actual coincide. Though…wouldn’t that be a relief? The notion of metaphysical possibility is rather vague in the hands of philosophers. (I don’t think anyone knows what in hell it means. Not to say this makes it any different from most extant philosophy!)

But getting back to (2′)…it’s bad enough that we had to worry about backtracking and standard contexts. Now we have some new ones, apparently? This is further evidence that the epistemic perspective is preferable.

The fourth section of the paper is fairly wild. Recall the counterfactual

“If Nixon had pressed the red button there would have been a nuclear war.”

This is often taken as a problem case for the Lewisian similarity analysis: worlds where there is a short in the wire preventing the signal getting through appear closer to ours than those where the mechanism functions properly and Armageddon follows. Lewis wants a similarity metric on which the counterfactual comes up true. So a “similar” world will be the same up to a time very close to the actual non-pressing of the red button, then a “small miracle” will occur and the red button will be pressed. Then we just follow that course according to physical law.

Lewis does explore the possibility of avoiding the small miracle with miniscule past differences. A naive solution would be to opt for some smallish differences in the past that eventually manifest in the pressing of the button. Lewis sees problems here. Dorr quotes him thus:

“…there is no guarantee whatever that [a world where the actual laws are true and were Nixon presses the button] can be chosen so that the differences diminish and eventually become negligible in the more and more remote past. Indeed, it is hard to imagine how two deterministic worlds anything like ours could possible remain just a little bit different for very long. There are altogether too many opportunities for little differences to give rise to bigger differences.”

Dorr disagrees:

“But…Our best deterministic physical theories have continuous dynamics, which means that so long as the past is not infinite, we can always find a nomologically possible world that stays arbitrarily close to the actual world throughout any finite initial segment of history, just by choosing an initial state that is close enough to that of the actual world. (paragraph) This is worth making precise. … (follows a tedious page I’ll skip) … Of course, the fact that there are nomically possible worlds that stay very similar to actuality until shortly before t but diverge after t does not by itself establish that there are nomically possible worlds to the kind that Lewis was worried about–for example, worlds that say very close to actuality until shortly before t and at which Nixon goes on to press the button at t.”

Dorr argues that there are such worlds on the basis of an “Independence Conjecture”, paraphrased as “the macropresent screens off the macrofuture from the macropast”. This is something that doesn’t appear to be true in systems with particularly trivial dynamics. Consider for example, a single object travelling through space, not interacting with any other objects. A macropresent view will tell us roughly where the object is, where it’s going, how fast. But there could be indeterminacy here (“macro”). Seeing where it was in the past will cut down on the indeterminacy of our future estimates. For systems with sufficiently complex (“mixing”) dynamics, however, the Independence Conjecture looks plausible. Here the idea is that if we take a set of “macroscopically described future” orbits, such as those comprising “Nixon presses the red button” and a set of “macroscopically described past” orbits, such as “close to the actual past” then these sets ought to be, at least approximately, probabilistically independent, so that the probability (everything conditional on the present macrostate), conditional on having a past state close to the actual past, of having a future state in which Nixon presses the red button, ought to be near the absolute probability that Nixon presses the red button, in particular non-zero.

Something along these lines ought to be true, but not everything…witness the fact that for orbits having past states “very very close” to the actual past state, the state at time t will be too close to actual for Nixon to press the red button (just what Dorr spent that page we skipped proving). On the other hand merely “close to the actual past” may not be close enough to stay close up to a time shortly prior to t. What Dorr seems to require is that at the in-between level of closeness (“very close”) one already has the sought-for independence kicking in. In other words…independence kicks in just as fast as the escape from closeness to a single state does. This strikes me as loose talk, but I’ll let it slide just the same. (I wonder though if he could do better to try to just manually get Nixon to press the button. So long as we are assuming continuous dynamics, we might as well assume smooth dynamics. Then we have derivatives, so that bulldozing the relevant particles and velocities around at will while leaving others relatively fixed by manipulating the distant past in the neighborhood of a point could come down to linear algebra.)

At any rate, suppose we indulge all of Dorr’s fancy here. Does it get him what he wants? He writes:

“We can regiment Lewis’s time-relative notion of similarity between possible worlds using a metric d on M.”

M is the the set of states of the world, by the way.

“The distance d(p, p’) represents the degree of dissimilarity between w at t and w’ at t’, when t instantiates p at w, t’ instantiates p’ at w’, and both w and w’ are nomically possible.”

I think Dorr means “w instantiates p at t”, etc. but I suppose a time could instantiate at state at a world, too, odd as it sounds to my ear.

The first thing that concerns me here is “We can regiment Lewis’s…notion of similarity…using a metric…”. Indeed, this sounds a lot like something Lewis explicitly rejects:

“We could…define exact distance measures…for…worlds. At worst, we might need a few numerical parameters. For instance, we might define on similarity measure for distribution of matter and another for distribution of fields, and w would then need to choose a weighting parameter to tell us how to combine these in arriving at the overall similarity of two worlds. All this would be easy work for those who like that sort of thing, and would yield an exact measure of something–something that we might be tempted to regard as the similarity distance’ between worlds. … We must resist temptation. The exact measure thus defined cannot be expected to correspond well to our own opinions about comparative similarity. Some of the similarities and differences most important to us involve idiosyncratic, subtle, Gestalt properties.”

Lewis goes on to talk about facial similarity and its irreducibility to similarity in a simple metric based on pixels. Notwithstanding the fact that today’s digital passport  readers seeming to do fairly well, the point is well taken. But maybe I misunderstand Dorr. It may be that he advocates using “Gestalt” properties, primarily, when evaluating closeness of worlds, but breaking ties using metric closeness…at least, metric closeness up to divergence. This could still save the intuition that if he had blinked twice the past would have still been (approximately) the same…assuming there is any such intuition…while avoiding certain problems. But I will set this aside for the moment and assume that Dorr does intend to use the metric as a measure of similarity.

A technical point…is the set of worlds where Nixon presses the button closed in the topology generated by said metric? I think probably it is, as the complement of this set surely has to be open by Dorr’s own reasoning. Probably then it follows that the set of distances between the actual world and a “Nixon presses the button” world has a minimum value. It doesn’t follow that there is a unique world w of this sort, but a moment’s reflection reveals that this is extremely likely. Suppose so.

Now if we follow the Lewisian semantics for counterfactuals, we can take any distant future truth P holding in w and the counterfactual “If Nixon had pressed the red button, then P” will come out true.” So for example “If Nixon had pressed the red button then in March of 3001 a winged mutant named April May would have become the first human descendant to survive one hundred unaided falls onto land from above one hundred meters” would come out true. And that’s highly counterintuitive. Among the worlds where Nixon presses the button, there are worlds all over the map having this property, true enough, but they are hardly concentrated in one place, and there are vastly more that do not. What should it matter to us that the one world closest to actual in the metric we have chosen is such a world? That fact appears to be an accident.

Far from rescuing Lewisian semantics from miracles, what Dorr’s argument points out is how deeply implausible Lewisian semantics are when based on such a metric. Let’s look again at the pretty pictures Lewis draws.


The only reason to utter a counterfactual “if phi had been, psi would have been” is to point out a correlation between (“nearby” if you like) phi worlds and psi worlds. If there’s no such correlation you shouldn’t say it…much less should it come out true. One might think that this situation is reflected in Lewis’s (D): among the most nearby phi worlds, some are psi worlds and some are not. But Lewis’s image pictures a discrete set of spheres, and if we buy into Dorr continuous variation assumption, such a picture is wrong. We get spheres of radius r for every real number r, so that, if phi is a closed set, we get (probably) a value of r for which the r-sphere meets phi in exactly one point, whereby we land in (B) or (C) irrespective of whether there is any correlation between phi worlds and psi worlds. But as Dorr teaches us here, there often won’t be any such correlation! Indeed, where phi and psi are macroscopically described events, the benign regions Lewis has drawn will need replaced by fractal regions that, quite often, will be approximately independent of fractal regions associated with different propositions. Looking at Lewis’s (B), it might seem reasonable to say “if the world had been phi then it would have been psi”. For if you asked (with apologies for the personification of worlds) the actual world to impersonate a phi world, it would plausibly (to some sort of intuition) gravitate mindlessly in the approximate direction of all of the phi worlds and, hitting a nearish one, find itself to be also a psi world something like every time. Suppose the regions are highly fractal, though; would the actual world gravitate to a tiny phi region .03032… units away or a much larger one .03033… units away? Even if we agree that it would gravitate to the nearest one…wouldn’t approximate independence imply that the psiness or non-psiness of the world it lighted on would be essentially random? An accident? There are worlds near that one that are psi worlds, and worlds near that one that are not psi worlds. And if you seek to save Dorr here by saying “well, most of the worlds are psi worlds” then you are just agreeing that it comes down to conditional probability, not the accidental properties of that one special phi world that is closest to actual. Nothing Lewisian about that.

On the picture Dorr paints for us we’d need to replace the clean Lewisian pictures by images more like:


(Sorry for the lame graphics. Anything beyond Microsoft Paint is beyond me as well.)

Somewhere at the center of those concentric spheres I’ve tried so feebly to draw is the actual world–a random dart toss at this rainbow colored fractal. If you want to think in terms of the earlier example, think maybe cool colors for Nixon doesn’t press the red button and reddish/purplish colors for Nixon presses the red button, with nuclear war occurring at all but magenta. Say the actual world lies in a greenish area (Nixon doesn’t press). The dart landed on green, but we can ask about the truth of “If the dart had landed on a reddish color, it wouldn’t have been magenta.” If there’s a miniscule patch of magenta somewhat nearby and no closer cool colored patch, Lewis might say that that counterfactual is false, regardless of how much magenta there is in the image (even somewhat nearby magenta…only the closest reddish patch counts). Indeed, one might just as correctly say “if the dart had landed on a reddish patch it would have landed here”, pointing at the nearest reddish patch, irrespective of the fact that the pointed-to patch is orders of magnitude smaller than other nearish patchs of red.

This is no longer compelling.

In favor of counterfactual miracles?

All right, so the treatment I have been recommending goes something like this. When I utter “If F had been, G would have been” and F is an outcome of a chance event in the past that did not occur, then what I am suggesting is that G has a highish probability conditional on F, what I know about the actual state of the world just prior to the chance event (the one that did not eventuate in F actually, but might have), and perhaps also the results of chance events after t that do not lie causally downstream of F. (So I can say “if Manfred had played, we would be champions” even though this counterfactual championship requires a subsequent very unlikely upset in a distant, causally isolated venue, provided it actually occurred.) I utter such a thing, to the extent that I am doing “communication”  with those words, as a way of imparting information to listeners…information about what I know about the actual state of the world just prior to the chance event, perhaps, or information gleaned from what I take to be a perceptive take on what (usually) follows from what.

Dorr has some cases that he presents in the next section that don’t fall so nicely in this category, however.

“Suppose that, on the phone to Mary at t, Fred speaks the truth by saying “If I were there right now, I would give you a hug.” On the operative interpretation of the counterfactual, how do we think Fred would have got to be with Mary at t? Would he have been whisked there quickly by a recent, antithermodynamic puff of wind, or would he have got there by a less showy method, requiring a somewhat earlier divergence from the approximate course of actual history? The latter option seems better. If we choose the puff of wind, we will need to combine it, rather artificially, with further unusual goings-on in Fred’s brain to ensure that he arrives still in a mood to give Mary a hug…”

Hilarious. I particularly enjoy the phrase “rather artificially”, given how jaw-droppingly artificial the whole “puffs of wind” notion is in the first place. (If you are already having Fred blown across a continent by an easterly gust of wind, does it qualify as a stretch to have him sleep through it?)

Here’s a problem with metric similarity: among all ways of getting Fred to Mary, the  “antithermodynamic puffs of wind” may do it with the most delayed deviation (and hence the greatest similarity of initial conditions) from the actual. Probably you could get Fred across a continent in just a few minutes using them, and on Dorr’s continuous dynamics view there ought to be states near the actual state say a half hour ago that do this.

Dorr now wants to distance himself from metric similarity, Lewis, or both, and I don’t blame him. Fred’s Ripley’s moment may be close in the metric, but that doesn’t make it closest to actuality. It doesn’t “get the Gestalt”.

I asked my wife (she learned Lewis’s semantics for counterfactuals from George Schumm, who was apparently rather animated about it) what she thought about this. At the risk of misinterpreting her (which is likely) she thinks it’s important not to fix too much…you only fix what’s relevant. In particular you have to have a rather plastic notion of similarity, presumably different for each counterfactual utterance. For the case of Fred and Mary, the important thing is that Fred’s and Mary’s general moods be fixed in the inner spheres…probably also their identities and the semi-normalcy of their current experiences, blah blah blah…and not much else. So “in nomically accessible worlds where our needs and desires are as they actually are and we are together, I give you a hug” or something. (“So romantic”, Mary no doubt replied.)

Let’s see if my own view is in any trouble here. To avoid the possibility of running several issues together, I am going to change to third person and put the situation in the past. So let’s say I utter “If Ted had been there, he would have hugged Mary.” I think it would be within your rights to say something like “Ted was 3000 miles away at the time…so what are you saying? Are you saying that if Ted had gotten on a plane that morning and they were together then, he would have hugged her?” Dorr rightly notes in a footnote that in most worlds of this sort where they are together at t, the circumstances that led me to utter “If Ted had been there, he would have hugged Mary” aren’t operating at all. (He may have said it because she missed him so much, which she wouldn’t, were he there.)  I think I might reply: “well, I suppose if he had gotten on a plane, unbeknownst to her, and were just then knocking on her hotel room door during the same sort of phone conversation, then, sure, when she answered, he would have hugged her.” Then you might ask “so are you saying that if he had been there then he would have come in secret and been in the hallway talking to her on a cell phone?” And I would have to confess that, no, that isn’t what I meant.

What to do? Do I follow my wife and say that I can just change the circumstances for different counterfactual utterances? That seems ugly. I would much rather have a uniform treatment. But I despair of one. Consider this: it’s New Year’s Eve. Ted’s flight was cancelled, so he can’t be with Mary. On the way home from the airport, he got in a fender bender and slammed his mouth on the steering wheel. It’s shortly before midnight and Mary laments that he can’t kiss her at he stroke of midnight. Well, notes Ted, I couldn’t kiss you anyway…my lips have been smashed. But I would give you a hug…. What do we make of that? Surely the closest worlds where Ted is there with Mary are worlds in which Ted’s flight wasn’t cancelled. But if his flight wasn’t cancelled and he’s there, he doesn’t hug Mary…he kisses her! Notwithstanding that, what Ted says seems incontrovertibly “true”. (For those who favor a truth value semantics for counterfactuals, anyway.)

I think we have some notion of what Dorr would say, as in a different context he writes: “…we are free to hold fixed both approximate history up to, say, one day before t, and also the facts about whatever Mary said just before t that inspired Fred’s impulse to give her a hug.” In the current case, then, Dorr would perhaps say that we are free to hold fixed both approximate history up to the point where Fred’s flight was cancelled and the fact that he later smashed his lip…of course now he has to do it in the cab to her hotel room (for example) rather than on his drive back home.

This however I cannot abide. We can make the situation even worse: perhaps Mary has said she would not kiss Ted now because, after his flight was cancelled, he got a haircut, breaking his promise to Mary not to make unnecessary expenditures in the state of New York (where, in Mary’s mind, corrupt politicians skim income tax dollars). How are you going to fix that fact if the fight isn’t cancelled? We could probably make it worse still…probably we could make holding fixed both the approximate past and some future event require a genuine miracle. Even if not a miracle, though, surely it requires implausible coincidence. It can’t be part of “if I were there, I would give you a hug” that Ted has worked out in his mind what would have happened if his flight weren’t cancelled and he’d wound up with Mary with a differently acquired lip smash because such a thing would never occur to anyone unless they were writing a philosophy paper. On the contrary, Ted is probably thinking “man, if I were with Mary, my lip wouldn’t have been smashed”. Granted, he’s also thinking “man, if I were with Mary, I would give her a hug”though if he puts this all together he won’t think “if I were with Mary, my lip wouldn’t have been smashed and I would give her a hug”…rather he would think “if I were with Mary, my lip wouldn’t have been smashed and I would kiss her.”

I think the upshot of this is that counterfactuals where the antecedent isn’t a chance event at t, occurrence of which implies (macroscopic or epistemic) divergence from the actual at t, but rather a consequence of an earlier (vaguely formulated or perhaps unformulated) divergence, are a different beast. Strictly speaking, they should probably be discouraged in favor of utterances such as “this may sound quite strange, Mary…given that you are not, unfortunately, actually here with me, but I am, at this very instant, experiencing a most discernible impulse…to give you a hug.” Hugh Grant, I think, would do it that way, and we should whatever extent we want to come off as charming, anyway.

Quick Summary

Let’s look at it this way. On one view, the evolution of the world requires what one might think of as some real time random number generation. (I.e. there are chance events.) You can think of these random numbers as coming from Godly coin tosses, or whatnot. On the deterministic view, this isn’t the case…every outcome is determined by initial data. It changes nothing, however, if the universe evolves in identical fashion, with the random numbers not coming from Godly coin tosses but from a random number table. (The table is part of the initial data.) What’s the difference, when it comes to how we should analyze counterfactuals? Nothing whatsoever, clearly. Where F is a non-actual “outcome” at t, “if F had been then G would have been” means

a. “if such-and-such Godly coin toss had landed differently…” or

b. “if such and such entry in the random number table had been different…” or,

c. “…if the initial data had been different…”.

Nobody has trouble seeing a. and b. as essentially equivalent from the standpoint of counterfactual analysis. Here’s why: it’s easy enough to know which counterfactual world we’re in, in these cases. We are in the world where everything is the same except for the result of that one Godly coin toss, or that one entry in the random number table. In case c. we don’t know for sure what counterfactual world we are in, and the reason for this is that, as Lewis points out, there are too many opportunities for small differences to give rise to large ones. If the initial data had been different, everything about the past would have been different. If determinism is true, there aren’t nomically accessible worlds where everything (including what it is I think I know that led me to assert the counterfactual) about the past is fixed but F happens. So in saying “if F had been”, says Lewis, we are saying “if there had been a miracle, and F had been”.

Dorr, in this paper, wants to get out of this by making c. look like b. That is, he wants the data to be so fine-grained that the “Nth-and-beyond” digits of it, as N gets ridiculously large, act much like numbers read off of a random number table, at least insofar as they don’t have visible effects prior to some t and have huge effect after that. (Just like pseudo-chance events whose outcomes are determined by lookup.) I doubt that’s the way it works…I don’t think the world deals with that much data (infinite data, it seems, from Dorr’s explanations) at every update. Still, it does seem to reduce situation c. to situation b.

However, we found a problem. The whole reason for wanting to evaluate the truth of a counterfactual “if F had been, G would have been” at a “nearby” F world is that (recall) my whole reason for asserting the conditional in the first place was to pass information I have about the actual state of the world at t. (And insight I may think I have about what follows from what.) If I evaluate at a far away world, some of what I know will no doubt be no longer the case. Indeed, some of what I know (not F, for example) will surely be no longer the case, but the idea is to make there be as little of that as possible. But…and here is the problem…sensitivity to initial conditions may make it the case that, even for nearby worlds, F-ness and G-ness may be essentially independent. At the very least, there are G worlds and not-G worlds near to the “nearest” F world, and in cases where there’s reason to think the closest ones are one or the other, it might not be the one you’d want. Consider:

“If the Blazers hadn’t drafted Sam Bowie, they would have won multiple NBA titles in the nineties.”

The idea here is that Michael Jordan was second on the Blazer’s draft board, that they thought long and hard about picking him, and if they had, things would probably have gone very well for them in the nineties. But on the metric  similarity view, the nearest world in which the Blazer’s don’t draft Sam Bowie probably isn’t a world where they draft Michael Jordan…it’s more likely a world where they draft Sam Perkins. Because, well…such a world can closely match the actual world all the way through the recording of the pick’s first name on the card that is about to be handed to the commissioner! Whether then the Blazers go on to win multiple NBA titles in such a world is anybody’s guess. Suffice it to say it’s less likely with Perkins than with Jordan, and it’s its relative likelihood with Jordan that justifies the avowal. As is the case almost everywhere in philosophy, but most especially on the view we’re discussing, where the local neighborhood is teeming with possible futures encompassing just about everything under the sun, everything comes down to probability.

So why don’t I want to fix more than the speaker’s epistemic situation? In normal assertions we do. If John says “Matt wasn’t holding 9♣ 9♦”,  his assertion is true or false according to whether or not Matt was holding 9♣ 9♦, and those truth conditions are surely the better part of the meaning of John’s assertion. John doesn’t mean by “Matt wasn’t holding 9♣ 9♦” that in most cases epistemically similar to his, the person referred to as “Matt” isn’t holding 9♣ 9♦; he means that in this case, Matt isn’t holding 9♣ 9♦!

That’s just the problem, though. When John utters a counterfactual, there is no obvious candidate for “this” case, i.e. the actual case. We aren’t interested in the actual case. When John says “if Matt had called, he would have lost”, he isn’t claiming that he lost in “that” case, ostending by “that” a particular counterfactual world. On the other hand John will probably admit that he was “wrong”…he may even say that what he avowed was “false”…if we turn over Matt’s cards, revealing 9♣ 9♦. Why is that? Perhaps it is part of the meaning of “if Matt had called he’d have lost” that Matt doesn’t have 9♣ 9♦.

Hmm. I was hoping not to get into trouble, but I fear I am. What I’d like to do is keep this in the realm of philosophy of probability, but I feel it creeping into philosophy of language, where I will be unceremoniously flayed. Obviously all I can ever hope to impart via any utterance is some aspect of my epistemic position, yet we do hold most of them accountable to how the world is apart from what I know. There doesn’t seem to be any principled reason why counterfactuals would be unique in this regard. On the other hand, I don’t want my utterance of a counterfactual to be held accountable to how another world is…whether it be a near one or a far. Surely my avowal of “if F had been, then G would have been” when F is a chance event at t that isn’t actual will go by probability of G conditional on F and whatever else I know about the state of the world at t, but I may disavow if I subsequently learn more about the state of the world at t. Part of the meaning of the utterance then, may be that I would not disavow were I to know more. Namely, I am expressing a confidence that I know enough of the relevant stuff that I need not disavow should I know more. That’s the way it works for ordinary assertions, after all. When I say “P is the case” I don’t just intend that P is likely given my epistemic situation. If so I should not then admit I was wrong when it turns out that P is not the case. Rather I intend to suggest that P really is the case…so that in particular, I should continue to avow P should I know more. Or…well, not exactly. There is always the chance that I will be misled, even though I am right. It’s perhaps not part of what I am claiming that this won’t happen. Though if I say “I know that P is the case”, maybe then it is.

Clearly I’m just rambling by now so I will just quit.





why i am not not a bayesian

Okay so today it’s Why I am not a Bayesian by Clark Glymour. It’s a dense paper, and I will cordon off a section of it for discussion, namely the final section dealing with the so-called problem of old evidence. Indeed, I’m not really even going to discuss that, per se. Rather, I’m going to focus on a single three-sentence passage in the paper. This one:

“How might Bayesians deal with the old evidence/ new theory problem? Red herrings abound: the prior probability of the evidence, Bayesians may object, is not really unity; when the evidence is stated as measured or observed values, the theory does not really entail that those exact values obtain; an ideal Bayesian would never suffer the embarrassment of a novel theory. None of these replies will do: the acceptance of old evidence may make the degree of belief in it as close to unity as our degree of belief in some bit of evidence ever is; although the exact measured value (of, for example, the perihelion advance) may not be entailed by the theory and known initial conditions, that the value of the measured quantity lies in a certain interval may very well be entailed, and that is what is believed anyway; and, finally, it is beside the point that an ideal Bayesian would never face a novel theory, for the idea of Bayesian confirmation theory is to explain scientific inference and argument by means of the assumption that good scientists are, about science at least, approximately ideal Bayesians, and we have before us a feature of scientific argument that seems incompatible with that assumption.”

Now most of this passage is uninteresting to me. Of the three proposed “Bayesian” strategies, the first two make no sense to me at all, and that Glymour spends so much time rebutting them seems to me a bit of a straw-man exercise. The third, however, I got very excited about, in particular this phrase: “an ideal Bayesian would never suffer the embarrassment of a novel theory”. This is why we read philosophy. To hear others formulate, in developed form, our own embryonic positions. But then, when I read “the idea of Bayesian confirmation theory is to explain scientific inference and argument by means of the assumption that good scientists are, about science at least, approximately ideal Bayesians, and we have before us a feature of scientific argument that seems incompatible with that assumption” I sort of lost the thread of Glymour’s argument. This was confirmed by the last page and a half of the paper, which I did not recognize as having been written by the same person that had so cogently characterized my embryonic thought.

So what I propose to do in this blog post is to develop my embryonic thought. Which is this:

The fact that an ideal Bayesian would never be embarrassed by a novel theory completely solves the old evidence problem. Moreover, science does approximate ideal Bayesian behavior…albeit slowly. Moreover, it is in communities of scientists, not in individual scientists, that this approximation is best. 

Now I’ll try to explain myself. According to the Bayesian, when one encounters new evidence, one updates one’s prior probabilities by eliminating all world-states incompatible with that evidence and raising credences in other world-states uniformly, i.e. retaining the ratios of their likelihoods. So if worlds A and B are consistent with evidence E and if my priors dictate a credence in A twice that of B before observing E, I will have credence A equal to twice credence B after observing E as well. The ideal Bayesian doesn’t need to “think about” her new credences at all. She just updates by the standard formula from her priors. That, at any rate, is how the story goes.

There are a couple of potential problems with this story. The first is that it requires us to have prior probabilities. So it is, for example, that Bayesian parameter estimation is perceived by some to be clunkier than some “other” methods of parameter estimation, such as maximum likelihood estimation, that do not appear, on the surface at any rate, to utilize prior distributions of the parameters directly. I don’t think this is really right. Generally some distribution is assumed if an estimation method is to be rigorous. (Perhaps it is a uniform one.) So it’s not the sense advocates using methods of estimation that go by the name “Bayesian” that I am concerned to defend here. If I take prior parameter distributions to be implicitly uniform wherever they have not been reflected upon, even my choice to employ maximum likelihood estimation becomes a genre of Bayesianism. The real problem I want to address here is the question of whether it’s anti-Bayesian to reflect on and change one’s priors after receiving evidence. In general I will conclude no, but with a caveat: I am not talking about changing your priors due to the fact that you take the fact of evidence e to count as evidence that e was likelier than thought to be obtained as evidence. Rather, I am considering a change brought solely about by using the topicality (rather than the actuality) of e as an impetus for undertaking first-time reflection on how one ought to update in cases where e is obtained as evidence. The sort of reflection that could, and in fact would, have been undertaken, long before e was encountered, given time and computational resource.

I’ll get at these issues by way of the following game. I will start giving you terms in some sequence or other s(n). After s(n-1) is revealed, you take a guess at s(n). Your score is the largest n for which your guess was wrong. The lower your score, the better. Here we go: [Personal note: I have an amusing mental block concerning this puzzle. In particular, no matter how many times I talk about it, I screw it up. It’s really sort of amusing.]

s(0) = 0

I am assuming you guess s(1) = 1.

s(1) = 1

Now you’re feeling pretty confident. You guess s(2) = 2.

s(2) = 2

Wonderful. Guess s(3) = 3, naturlich. Now comes this:

s(3) =
































I’m guessing that this is not what you were expecting s(3) to be. Clearly this is no polynomial of modest degree and smallish coefficients–we must think “outside the box” on this one.

So what would a Bayesian do? On paper, just condition her prior probabilities for the space of all possible five term sequences on what she’s seen so far, namely s(0) – s(3). But, you might reason, since s(3) has well over a thousand digits, this would appear to be an unrealistic demand in at least two respects. First, there are just too many candidate sequences. Second, you feel pretty sure now that your data is specific enough that there is a unique “simple” pattern. So you feel pretty certain that, given enough time and paper, your probabilities for what s(4) might be would concentrate significantly on some single value K. You just don’t have any clue, right now, what value that is. These are two respects, then, in which you differ from the “ideal” Bayesian. First, you have fewer computational resources. Second, you are less clever. (I leave it open whether these amount to the same thing.) But how, if at all, do these facts bear on the status of “Bayesianism”?

I would say they do not bear on that status at all. But let’s solve the problem and discuss that after.

First, you may note that s(3) ends with a lot of trailing zeros. That means it has a lot of 2s  and a lot of 5s in its prime factorization. So, it seems likely that s(3) was arrived at by multiplication, and it becomes natural to factor s(3) into primes. Having done that (childs’ play on a computer), one would notice some peculiar things. First, no prime factor greater than 720 appears. Second, every prime between 360 and 720 appears precisely once. Third, every prime between 240 and 360 appears precisely twice, between 180 and 240 thrice, etc. It’s not far from here to construct the hypothesis, easily verified, that the number s(3) is 720!.

So now we have our sequence: 0, 1, 2, 720!.

But you may recognize as well that 720 = 6!. So here is our sequence: 0, 1, 2, 6!!. What next? Well, 6 = 3! and 2 = 2!!, so: 0, 1, 2!!, 3!!!. Is that all? Well, 1 = 1!, so: 0, 1!, 2!!, 3!!! and of course the next number in the sequence is 4!!!!.

It’s now that Glymour would, I imagine, say something to the effect that what we just did was not Bayesian, or perhaps not even compatible with what is Bayesian. Before we encountered the evidence, i,e. s(0) – s(3), we did not have priors (because we had never considered the question) reflecting the inevitability of s(4) = 4!!!! in light of evidence s(0) – s(3). One might claim that our eventual considered response, then, that s(4) = 4!!!! with near certainty, constitutes a departure from Bayesian behavior. I don’t agree with that, and the reason is this: it is not merely the case that we now think that the probability of 4!!!!, conditioned on what s(0) – s(3) actually are, is near unity. No–we also believe that our former unreflective priors (if any) are irrational. We have thefefore abandoned them in favor of more better priors.

So we now hold that a rational priors function P will have it that P[0, 1!, 2!!, 3!!!, 4!!!!] is nearly equal to the sum, over all K, of P[0, 1, 2!!, 3!!!, K]. We’ve remained dedicated to the Bayesian perspective throughout our deliberations; we’re just better educated, post-reflection, about what “ratonal priors” ought to look like. Now it’s true that we did not subscribe to such priors (perhaps to any) before we got the puzzle. Nor is it the case that the Bayesian perspective is what led us to them. Bayesianism is silent on the matter of what priors it’s rational to adopt. All Bayesianism tells us is how updated credences should relate to priors and evidence. (They should be arrived at via conditionalization of one’s priors on the evidence, whatever those may be.) And should you be a less than ideal agent, it doesn’t even say, as many have supposed, “never change your priors”. Changing your priors implies diachronic irrationality, true enough. But that’s because only one set of priors is correct, and it’s irrational to have incorrect priors! (This is not the usual view, but it is the correct view.) Obviously if you change your priors, then they were, at some point, incorrect, and to have held incorrect credences at some point in time is to have been diachronically irrational. (It is not necessarily to have been diachronically incoherent, which is a stronger claim. Changing your priors does imply diachronic incoherence, but that’s no reason to persist in holding onto bad unreflective priors.) But it’s not to have violated Bayesianism. What would constitute a violation of Bayesianism? Nothing less than to hold a credence that is distinct from the credence that would be held were one to condition one’s current priors on one’s total evidence.

To repeat: in our example, the role of the evidence you acquired, namely s(0) – s(3), had merely an accidental, attention-focusing role in the adoption of your “improved” (i.e. reflective) priors…priors it was inevitable you would endorse once you had considered the matter closely, whether in response to evidence or not. If you had an infinity of years in time-suspended-isolation to compose comprehensive priors on integer sequences, you would become that ideal creature that can’t be embarrassed by novel patterns. You would, without having ever encountered a pattern in “real life”, have come to “know all the patterns”, simply for having encountered them all in various thought experiments. That knowledge would be sitting there, in your priors. Maybe someone else doing the same would have different priors. They’d almost certainly each notice a lot of the same patterns, but might not agree on their relative strengths. They can’t both be ideally rational, but they can both be coherent, in spite of their differences.

Bayesianism, then, given a set of priors, is just a mindless updating scheme. It’s fine so far as it goes, but the real work of science is in the priors, and Bayesian lore is silent (not wrong) on the issue of what they should be. It’s more or less agreed that “simplicity” should be sought after in the setting of one’s priors, but there’s a lot of question as to what “simplicity” comes to. For the actual practice of science, this question is probably not that important. (Scientists know simplicity when they see it.) For philosophy, however, this is where the action is. I think Glymour senses as much, but it’s just not right for him to encourage others (by example) to disown Bayesianism (“I am not a Bayesian”). Bayesianism isn’t a recipe for all rationality, but there’s no reason not to be one. It’s probably better to just say “I’m not not a Bayesian.”

So I won’t not say it.

Glymour’s concluding paragraph:

“None of these arguments is decisive against the Bayesian scheme of things, nor should they be, for in important respects that scheme is undoubtedly correct But taken together, I think they do at least strongly suggest that there must be relations between evidence and hypotheses that are important to scientific argument and to confirmation but to which the Bayesian scheme has not yet penetrated.”

I don’t know what “relations between evidence and hypotheses” amount to in this context, but that is not how I would put it. The real problem that the Bayesian scheme doesn’t penetrate…makes no claims to penetrate now or ever, so there’s little point in saying “has not yet penetrated”…is that of the adoption of rational priors. Appealing to “simplicity” can only get us so far. Linear relations are simpler than quadratic, we may say, which are in their turn simpler than cubic, etc. There are other sorts of functions appearing in nature, however, both familiar (exponential, trigonometric) and not. Many distributions, both familiar (binomial, hypergeometric, Poisson, normal) and not. It’s well and good to order relations according to the number of unknown parameters (fewer = simpler) or to say that a relation is confirmed precisely when the number of data points exceeds the number of unknowns, and certainly we suspect that any comprehensive theory of the rationality of priors should reflect versions of these and other ad hoc principles, but none of this means that we have such a comprehensive theory, least of all that Bayesianism (of all things!) should be faulted for not laying it at our feet.

After all, the theory of logical consequence didn’t lay this at our feet either–no one wrote a paper called “Why I am not a Logical Consequentialist”.

presumptuous philosophers

Here’s a ubiquitous gaffe in modern philosophy of probability, exemplified by a thought experiment of Nick Bostrom:

“It is the year 2100 and physicists have narrowed down the search for a theory of  everything to only two remaining plausible candidate theories, T1 and T2 (using  considerations from super-duper symmetry). According to T1 the world is very,  very big but finite, and there are a total of a trillion trillion observers in  the cosmos. According to T2, the world is very, very, very big but finite, and  there are a trillion trillion trillion observers. The super-duper symmetry  considerations seem to be roughly indifferent between these two theories. The  physicists are planning on carrying out a simple experiment that will falsify  one of the theories. Enter the presumptuous philosopher: “Hey guys, it is  completely unnecessary for you to do the experiment, because I can already show  to you that T2 is about a trillion times more likely to be true than T1.”

The presumptuous philosopher is attempting to apply a “self indicating assumption”. The self indicating assumption itself is quite sound. According to it, you take yourself to have been sampled uniformly at random from a pool of observers in which worlds are represented in proportion to their objective chance. (Meaning that when an observer is selected, the probability it came from world w is proportional to the product of the objective chance of w and the number of observers w has.) That’s maybe a bit vague, but it’s clear enough for most purposes. Credences, then, are expected long run frequencies. (But not quite literally! As we shall see.)

Bostrom’s presumptuous philosopher isn’t doing that, though. Note: T1 or T2 is the “theory of everything”. So either T1 is a necessary truth or T2 is a necessary truth. That’s what a “theory of everything” is. It’s not the case that half of the worlds are T1 worlds and half of the worlds are T2 worlds. If that were the case, the presumptuous philosopher wouldn’t be presumptuous at all: intuition would square with his recommendation, as his counterparts would be vindicated in proportion to their numbers. No…what our intuitions rail against is the fact that T2 may, for all we know, be a total fiction: not merely un-actual, but impossible. Indeed, we think there is a 50% chance of that. The presumptuous philosopher is screwing up. But does that mean that self indication doesn’t work in general?

Not at all. It just doesn’t apply in this case. Why? Because the choice between T1 and T2 is a matter of epistemic probability, not objective chance. Self indicating assumptions only apply to cases of objective chance. Recall: in self indication, we take ourselves to have been sampled uniformly at random from a pool of observers in which the proportion of world w members is in proportion to the product of world w’s objective chance and the number of observers w has. Bostrom’s presumptuous philosopher attempts to substitute epistemic probability for objective chance. That’s a mistake, but you don’t throw the baby out with the bathwater.

So: if T1 is the correct theory of everything then the frequency of T1 observers is 1. If T2 is the correct theory of everything then the frequency of T1 observers is 0. Now, when computing an expected frequency you do use epistemic probabilities (obviously), so the expected frequency of T1 observers in the pool of all observers is 1/2. It’s not one over a trillion. If T1 is true then there aren’t any T2 observers at any other worlds. Because if it’s true it’s not just true…it’s a necessary truth. If it weren’t necessary, our intuitions wouldn’t rail against the presumptuous philosopher’s solution.

Would they? Those of some might, I will grant. But then, bad probabilistic intuitions are not to be heeded. That they are so common is a rhetorical problem here, but I don’t really see that anything can be done against this fact. There’s a possible objection I can see. No one would think of it, but it’s clever. And wrong. It’s both clever and wrong. “Hi, I’m Cleverus Wrongly, and I’m here to attack your thesis.”

CW: But imagine a sleeping beauty experiment where beauty gets 2 awakenings if sin e^(A(googol)) is positive where A is the Ackermann function and 1 awakening otherwise. If sin e^(A(googol)) is positive then it is necessarily positive. And if it is negative it is necessarily negative. So the frequency of positive awakenings is 1 or 0 with equal epistemic probabilities. And that means by your argument your credence in sin e^(A(googol)) being negative must be 1/2. But wait…imagine now that it’s the other way around: Beauty gets 2 awakenings if sin e^(A(googol)) is negative and 1 awakening otherwise. Still 1/2! But now imagine that we toss a coin to determine which of the two experiments is run! What’s Beauty’s credence, at wakeup, in “there is just one awakening”? Well, it would be 1/2 upon learning the result of the toss, either way. So it’s 1/2. But the toss of the coin is contingent! So it has to be 1/3! Ha ha! There’s no difference between contingent and necessary after all! Ha ha ha!

Answer: Not really. There is a difference. In the case of the first experiment you put out there, where beauty gets 2 awakenings if sin e^(A(googol)) is positive and 1 awakening otherwise, she should have credence 1/3 in sin e^(A(googol)) being negative. The (correct) reason why doesn’t commit one to a sanction of Bostrom’s “presumptuous philosopher”:

“…the answer 1/3 is exactly correct only when the expected value of the total quantity of conscious, minimally rational life in the universe is independent of the coin toss. In ordinary cases the effect of this factor is negligible: but in the limiting case where Beauty knows that she is the only rational being in the universe, and that her conscious life will be twice as long if the coin lands Tails, her credence in Heads when she wakes up should be 1/2. Generating the thirder result even in cases like this would, as Bostrom (2002) points out, require an implausible skewing of prior credences in favour of more populous worlds.”

This is from footnote #1 of Cian Dorr’s awesome marginalized paper A Challenge for Halfers. Dorr is a thirder, but subscribes to the one-half solution in a case in which not only awakenings but “total quantity of consciousness” is doubled if tails. I almost agree. Which is to say that I don’t agree, but only because the result of the toss is contingent. If it were necessary, I would endorse Dorr’s analysis.

Here is the right way to look at it. Imagine, in parallel, two hypothetical streams of consciousness. The first is from the “sin e^(A(googol)) is positive” multiverse, the second from the “sin e^(A(googol)) is negative” multiverse. The streams are hopping from world to world at regular intervals (say one minute intervals) under the auspices of self indication…that is, worlds are chosen in proportion to the product of their objective chance and quantity of consciousness, then slices of consciousness are chosen uniformly at random from that world’s pool of consciousness slices. As we pan right, we see two experimental awakenings in the first stream for every one in the second (because awakenings in the first are doubled, but not quantity of consciousness). It’s in this sense that the “long run frequency” of “sin e^(A(googol)) is negative” awakenings is 1/3.

This method of computing frequencies…treating epistemic probabilities in parallel, objective chances in series…I will call The Dorr Method. (Update: I guess it’s not really correct to call it that, because it’s not what Dorr does. Even so, Dorr does something similar. But perhaps I should not read too much into a single footnote of his written over a decade ago, and stop speculating as to what he intended at that time.) I’m wholly convinced that it’s correct. Notice that it’s not quite what I described earlier (expected long run frequency), though the latter gives the right result much of the time–including the Bostrom example, in which one stream consists entirely of T1 slices and the other consists entirely of T2 slices. “But wait,” one might say, “the T2 stream is longer.” Well, no. Not really. They’re both infinite. A multiverse encompasses infinitely many worlds. I’m viewing a world as something finite in time (a single expansion/contraction cycle, on one cosmological view), and I take time to be infinite. “But in that case the time scales of the streams don’t match.’ Actually I’m not sure what this objection means, but I will respond by saying “no attempt is made to match time scales” (if that means anything to anybody).  One final objection: “if there are infinitely many worlds how can you speak of the objective chance of a world?” Good point, but although I take the set of worlds to be infinite I hazard that the set of equivalence classes of worlds under the indiscriminability relation is finite. Objective chance is a function over those equivalence classes.

The Dorr Method should really be seen as a rational constraint on credences. If you don’t adopt it, you’re going to get in trouble somewhere. Either you’re going to wind up being a double halfer in SB, which puts you at odds with diachronic constraints such as conditionalization and reflection, or you will be a Lewisian halfer vulnerable to the Doomsday argument (cf. Sadistic Scientist argument of Meacham), or you will be a thirder vulnerable to the Presumptuous Philosopher argument.

None of these concessions are especially tenable.

ten reasons to care less about titelbaum’s sleeping beauty survey

Michael Titelbaum has written an excellent survey on Sleeping Beauty, called Ten Reasons to Care About the Sleeping Beauty Problem. Check it out:

It impressed me, though there are some things I don’t like about it. A few, anyway. Would be nice to get ten. Ten? Maybe. It’s worth a try:

1. Titelbaum gives us the following thought experiment:

“Some scientists will flip a fair coin: If it comes up heads, they’ll put nine black balls in a bag and one white; if it comes up tails, they’ll put in one black and nine white. The bag will then be passed around the room, and each person will draw one ball without replacement. Everyone in the room is informed of the experimental protocol, but no one is allowed to see the outcome of the coin flip or the ball anyone else has drawn.You draw your ball and see that it’s black. This should increase your confidence in heads from 1/2 to 9/10. But now suppose you have no non-indexical way of picking yourself out from among the ten people. You all look the same (so you can’t uniquely describe yourself by appearance), the room is cylindrical (so you can’t pick yourself out by absolute position), etc. Then, the only way to describe your new evidence is ‘I picked a black ball.’ But that’s centered evidence, and it has increased your rational credence in the uncentered proposition that the coin came up heads.”

One may ask…why does your confidence in heads increase from 1/2 to 9/10? Conditionalization, of course. You conditionalize on “I got a black ball.” But as Titelbaum notes, this is centered evidence…you can’t tell the difference between you and the others. But if one commits to parsing I rigidly, everything works out. Here is the idea. Number the participants 1 to 10. I don’t know what number I am. But if I am committed to a rigid parsing of I, then “I got a black ball” means either “1 got a black ball” or “2 got a black ball” or … or “10 got a black ball.”  I don’t know which it means, but it doesn’t really matter. It means some one of those ten uncentered propositions, and conditionalization on any one of the ten yields a credence of 9/10 in heads. It’s not far from here to the correct thirder argument from evidence, where Beauty conditions on “I am awakened today.” Here the crucial indexical is today. Commit to parsing today rigidly, and everything works out. For thirders, anyway. On parsing, “I am awakened today” means either “I am awakened Monday,” which is uninformative, or “I am awakened Tuesday“, which is very informative indeed. (It implies tails.) It would have been nice if this survey could have gleaned some of that.

2. Titelbaum writes “Conditionalization systematically fails for cases involving self-locating degrees of belief. So does the Reflection Principle…”. This is a terrible, terrible mischaracterization. The so-called Reflection Principle is just the bounded martigale stopping theorem, as is well understood by philosophers who understand stochastic analysis. (See Stopping to Reflect by Schervish et. al.) There are cases where the hypotheses are not met, surely. But there aren’t “violations” of reflection. Not of correct formulations of it, anyway. Take for example the fact that Beauty has credence 1/3 on Monday morning. She did know that this would be the case Sunday night, surely, yet did not update then. Is this a violation of reflection? No, because “On Monday morning” is not a stopping time. That’s one of the hypotheses of the theorem. Reflection between now and later requires, in general, that the agent be able to recognize when “later” has arrived. Beauty does not recognize that it’s Monday morning on Monday morning, so it isn’t a stopping time. As for the claim that conditionalization fails, see #1 above. (Commit to rigid parsing of your indexicals and average. It doesn’t fail!)

3. Titelbaum writes:

“A Dutch Strategy can be assembled against thirders in the Sleeping Beauty Problem. Hitchcock (2004) proposes that on Sunday night, the bookie sells Beauty a bet that costs $15 and pays $30 if the coin comes up heads. Since Beauty is 1/2 confident in heads on Sunday night, she will accept this bet as fair. The bookie also tells Beauty on Sunday night that when she awakens Monday morning, he will sell her a bet for $20 that pays $30 if the coin comes up tails. If Beauty plans on being a thirder, she is certain that she will accept this bet as fair on Monday morning. (Notice that the bookie places this bet only once – on Monday morning – however the coin flip comes out.)”

Not really. Beauty will not accept the $20 for $30 bet on Monday as the mere fact that it is offered tells her it is Monday and changes her credence to 1/2. Titelbaum thinks he means something here, but I don’t know what. What I do know is that this Dutch Book example should not appear in his paper. Or anywhere. (Except of course on this blog, where it’s thematically appropriate.)

4. This isn’t really about Sleeping Beauty, but we don’t “quit certainties”. Well, okay, we do. It’s called forgetting. That’s the only way, though. My certainty that it’s midnight now and my certainty one minute from now that it was midnight a minute ago are the same certainty. I didn’t quit the one and adopt the other. I mean you can do your accounting that way if you want to, but it’s hopelessly inefficient and gains you nothing. In short…madness.

5. Titelbaum screws up self-indication. (All philosophers do. It’s frustrating.) He writes, for example:

“If the Self-Indication Assumption is correct, our very existence is evidence for hypotheses positing many (populated) universes over hypotheses that posit just one.”

Not true. Here is the way self-indication works. I am sampled, according to self indication, uniformly at random from the pool of like observers in the space of metaphysically possible worlds where the worlds occur in proportion to their objective chance. The emphasized phrase is what is important here. Self indication does not say that I am sampled from a pool of like observers in a hypothetical space of epistemically possible worlds where worlds occur in proportion to their epistemic probability. You can look at the situation this way. Suppose that the universe expands and contracts in perpetuity. Each cycle is a “metaphysically possible world”. Indeed, those just are the metaphysically possible worlds. Now…when I am wondering whether they come with one or zillions of branches at a time, I am not thinking that some cycles have one and some have zillions. On the contrary, I think (and you do too, I hope) that either they always come with one branch or they always come with zillions. Whether there are or aren’t zillions of parallel branches (or “universes”) in the world is not a matter of contingency. When I say I am indifferent to the proposition that there are zillions of parallel universes I am not saying that I think half of the expansions involve zillions of parallel universes and half don’t. What I am saying is that either they all do or they all don’t, and I am indifferent between those choices. Remember…a credence is an expected frequency. I have credence one half in the proposition that, in the sequence of observers like me, the frequency of observers populating single-branch worlds is 1, and credence one half in the proposition that, in the sequence of observers like me, the frequency of observers populating single-branch worlds is 0. Therefore my credence in the proposition that I populate a single branch world is the expectation of this frequency…name 1/2.

6. Titelbaum mishandles quantum mechanics. The correct model if that when there is a branching then, yes, both branches exist in some multi-verse, but the probability of consciousness taking the one branch is 70% (in his example). Think of it as the “least effort branch” or whatnot. At any rate this isn’t really that important. I mean it is, but really it’s not about Sleeping Beauty, it’s about finding a place for consciousness without violating the causal closure of the physical or lapsing into epiphenomenalism. So maybe the comment doesn’t belong here. But, my point, I guess, is that Titelbaum’s section on quantum mechanics didn’t belong in his paper, either, because it doesn’t say anything that is particularly sensible.

7. What about Cian Dorr’s beautiful paper A Challenge for Halfers? The marginalization of that awesome paper is one of the great travesties of Sleeping Beauty lore. Titelbaum has seventy-nine papers in his bibliography. Dorr’s paper is better and more central to the discussion than at least seventy-five of them.

8. Nothing good about Jacob Ross? Too bad. Difficult terrain, though. Oh well.

9. Nothing good about Pust’s attack on conditionalization on centered evidence? Damn.

10. I’ll close with one of the best passages in Titelbaum’s paper:

“So it looks like the halfer must maintain that upon learning it’s Monday, Beauty should be more than 1/2 confident in heads. That’s the unwelcome consequence. After all, it’s Monday evening, the coin hasn’t been flipped yet (because of Elga’s first modification), the coin is fair, and Beauty is certain of all this. So how can her rational credence in heads be other than 1/2? David Lewis recognized this unwelcome consequence of his own halfer view. He also was famously committed to the Principal Principle, which requires an agent to assign credences equal to chance values of which she is certain unless she has what Lewis called ‘inadmissible’ evidence. An obvious example of inadmissible evidence is information causally downstream from the outcome of the chance process in question. For example, if you watch a fair die roll come up 3, you’re allowed to change your credence in that outcome while remaining certain that its objective chance was 1/6. But Beauty doesn’t seem to have any information causally downstream from the coin flip on Monday evening. So Lewis cast about for a ‘novel and surprising’  kind of inadmissible evidence that would authorize a greater-than-1/2 Monday-evening credence in heads.”

I would love to have written that myself. This not-so-subtle diss of the greatest philosopher of the second half of the 20th century may well be dead on. We don’t know, because, alas, the greatest philosopher of the second half of the 20th century is dead. I have preferred to treat Lewis with more charity, imagining that he came to his position not for having “cast about” for an unlikely save, but for coherent reasons involving “sample weight dilution”. My reading may be too generous: Titelbaum may be more on the right track. Certainly it at least needs to be said, but (possibly because my admiration for Lewis gets in my way) I do find myself resisting….

At any rate those are ten things I would have done differently, had I written the survey.