Alan Hajek’s paper *Fifteen Arguments against Hypothetical Frequentism *is an attack on the following thesis:

(HF) The probability of an attribute A in a reference class B is *p* if and only if the limit of the relative frequency of occurrences of A within B would be *p* if B were infinite.

I’ll defend the thesis, more or less, as it seems to be somewhat defensible. My viewpoint (since I am an ergodic theorist) will be taken from ergodic theory. So I take it that, more or less, the actual world is an orbit in some dynamical system or other. Since we are doing probabilities, I will take the system to be an ergodic measure preserving transformation, call it T, of [0,1]. Space average, which is just Lebesgue measure on the interval, represents “real probability”, while time average (relative frequency) merely converges to it (by the ergodic theorem) almost surely. Why ergodic? Well, I tend to the view that our world is representative of everything that is metaphysically possible. It’s an orbit in which every metaphysical attribute is instantiated, in other words, and, almost surely, with the right asymptotic frequency. Now maybe this isn’t what most philosophers can mean by “actual world”. The actual world, for some, is perhaps something like “our hundred billion years universal run, Big Bang to Big Crunch”. Not at all, for me. It’s everything that is, was or will be the case, anywhere. And by anywhere, I don’t limit us to “you can get there from here by spaceship/time machine”. My hope and sincere belief (perhaps naive) is that there are even causal links between seemingly independent (from the perspective of say, decision theory) sub-domains of this vast actual world, albeit encrypted beyond the ability of anyone to, say, send a message through The Big Crunch (to the next expansion or the next sector on God’s hard drive or whatever–you can call these different worlds if you like, I choose not to).

Did I cheat? I have avoided half of Hajek’s arguments already, in that I removed the “Hypothetical” from “Hypothetical Frequentism”. Very well, but this isn’t cheating. It’s something more like dialectic. If B were infinite? Fantastic…B *is* infinite. So:

(F) The probability of an admissible attribute A within an admissible reference class B is *p* if and only if the limit of the relative frequency of occurrences of A within B is *p*.

Here “admissible” means something like “finitely non-indexically describable”. So something like “*this* *very coin* comes up heads the first time it is tossed” is inadmissible, but something like “*a coin like this one in these* (finitely many non-indexically described)* respects* comes up heads the first time it is tossed” is.

Okay, so now let’s look at Hajek’s arguments:

1. You abandoned empiricism, in that you’ve (a) replaced relative frequency, which can be observed, with asymptotic relative frequency, which cannot, and (b) employ a counterfactual.

Objection (b) has already been answered. We are not employing a counterfactual. Even indexically described events belong to the infinite reference class of events that are indiscernable in the appropriate non-indexical respects. In order to preserve the spirit of *objective* probability, it’s presumably not just indiscernability that matters…one has to consider the class of events that are identical in some other–but not *all–*relevant respects that might influence, say, the outcome of a toss, whether variation in these respects be discernible or not. (This is actually a bit more murky than one might expect. If we are to speak of the objective chance, rather than the epistemic probability, of a head on this toss, then the reference class B can’t include tosses of coins that look like the current one but are differently biased, but on the other hand it can’t be constrained to tosses at identical angular velocity, etc. However, I am going to set these issues aside here, as in the obvious cases it’s clear what B should be. If there are, as seems likely, problem cases where it’s not clear, I can be accused of overlooking these cases.)

Objection (a) is incomprehensible, but if I understand it at all, a complete non-starter. One has a certain probability distribution (priors) on what the objective probability might be, and one conditions this on the finite set of observations to update probabilities. I don’t really understand the nature of the “empiricism” objection, but if this isn’t empiricism then good riddance to empiricism. Hajek writes “…finite strings of data put absolutely no constraint on the hypothetical frequentist’s probability.” That’s wrong *provided* it’s a phrase in English language. I have two coins in my pocket. One is fair, the other biased. It’s dark, so I can’t see which one it is. I toss the coin. You have night vision goggles and tell me the toss landed *heads*. Is this not supposed to constrain my probability that the tossed coin is the biased one? Hajek can’t be this confused. And yet, what he goes on to say makes even less sense:

“The frequentist could try to relieve this problem by restricting his account, only according probabilities to attributes whose relative frequencies converge quickly (in some sense to be spelled out more precisely). For such attributes, finite strings of data could be a good guide to the corresponding limiting behavior after all. For example, there are many hypothetical sequences of coin tosses for which the relative frequency of Heads beyond the 100th toss never deviates by more than 0.1 from 1/2. In those sequences, the first 100 tosses reflect well the limiting behavior. More generally, the frequentist might choose an e > 0 and an N such that the only attributes that have probabilities are those whose hypothetical relative frequencies beyond N remain within e of their limiting values.”

This is utter nonsense. I can’t imagine anyone, frequentist or otherwise, doing any such thing. It will be very uncommon for the relative frequency after N trials to be far from the asymptotic frequency when N is large, that is true, but it will happen occasionally and one would not say in such a case that the probability was not equal to the asymptotic frequency. It would just be that the frequentist would be off mark for a time. I don’t see any point here.

2. “The counterfactual appealed to are utterly bizarre”. Handled already. Not appealing to any counterfactuals. Nor, I would argue, is my take on the actual at all bizarre. Do you really think that eventually everything just ends? If you don’t then you’re well over halfway to agreement with my view.

3. “There is no fact of what the hypothetical sequences look like.” Handled already. There is a fact about it, because it’s not hypothetical.

4-6. “The limiting relative frequency may not exist, or may be sensitive to ordering” Let’s recall the ergodic theorem….the time average converges to the space average *almost surely*. So when you fix the order of observations, you only get the equivalence between time averages (asymptotic frequencies) and the “true probabilities” (space averages) with probability 1. Of course, I sense irony here where Hajek sees a wedge for skepticism. (I mean in “only…with probability 1”.) As for apocryphal limits under reordering, of course you have to specify the order in advance, just like you have to specify the attributes in advance of picking an orbit. (Otherwise the attribute* actual* would have limiting frequency 1 despite having probability zero.) Again…specify the ordering any way you like. Heck…specify countably many orderings. (It will be hard to specify more by explicit construction, considering *you can only perform countably many finite sequences of acts.*) With probability 1, the asymptotic limits will exist and be equal *for every one of them almost surely*. You want to pick the ordering after you see the sequence? That’s like saying you can win money betting on a horse race by betting after the race.

7. “Hypothetical frequentism’s order of explanation is back-to-front.” Sort of. We know that the probability of the experiment issuing in outcome A is 2/5 because the outcome is determined by certain equations (or an update rule) and initial data, and we take it that the distribution on initial data is such that, 2/5 of the time, the initial condition will lie in the basin of attraction of outcome A. (This isn’t really as circular as it may seem, as this fact will often be fairly independent of what distribution we put on the initial data choices, so we can just assume a uniform one, in which case it comes down to a space average and not a time average.) We aren’t laboring under the confusion that the asymptotic frequency “explains” why the probability is 2/5. It’s the equations or the update rule that explain that. The problem is, in practice it can be an intractable problem to infer “2/5” or whatever from the equations/update rule, but quite easy to do enough experimental runs (physical or Monte Carlo or whatever) to get very narrow high confidence intervals for the value p in question. For example, given an oddly shaped, 17 sided die, it might in theory be possible to figure out the odds of rolling an “8”, but next to impossible in practice. Trials indicate that the true value lies in [.0610, .0613] with 99.9999% confidence. What is the explanation for our knowledge? Relative frequencies, not careful analysis of the shapes of the sides and distribution of the mass of the die. If this is back-to-front so be it. It’s often all that’s available in practice. I won’t say more here because this issue comes up again below.

8. “The limit might exist when it should not.” Hajek writes:

“Suppose that I repeatedly may choose to raise my left hand or right hand…. I can freely steer the limiting relative frequency anywhere I want. And yet my free choice may lack an objective probability. The frequentist sees objective probability values, when there may be none.”

This is a pretty serious mistake. The reference class of *dthat(Hajek)*‘s (I am not using this function in its usual sense, given how vast I take the actual world to be, but just to indicate that I am rigidly referring to a “local” version) arm-raise #17, say, is not the sequence of *dthat(Hajek)*‘s arm-raises:

{*dthat(Hajek)*‘s arm-raise #1, *dthat(Hajek)*‘s arm-raise #2, etc.}

If it were, then of course *dthat(Hajek)* could steer the relative frequency about as he claims. (For a little while, anyway…there are eventually going to be issues.) No….the reference class is something like “number seventeen arm raises of guys named “Hajek” who got their “Ph.D.” at “Princeton University” in “1993” who are (confusedly) trying hard to steer their arm-raise relative frequencies about “at will” and whose first sixteen arm-raises went so and so and…usw.” Obviously in that case, *dthat(Hajek) *can do little to steer the relative frequencies about, as he can’t communicate with his “counterparts” (again, non-standard usage).

9. Subsequences can be found converging to whatever you like.

Hajek’s mind is apparently more time stationary than I gave him credit for in the previous paragraph. He forgets that he already glossed reorderings in 4-6, and this is just that issue revisited.

10. “Necessarily single-case events”. Hajek writes “consider an event that is essentially single case: it cannot be repeated”. Hajek’s example of whether the universe is open or closed doesn’t work for me as I define, or appear to define “world” differently. (For me, the actual world is big enough to contain many closed and/or open “universes”.) But there is something to this. If I take the measure preserving transformation idea literally, I might consider the event “the actual universe is zero (information-theoretic) entropy”. (This may be something akin to “determinism is true”.) I agree that such events must be handled differently. What is being considered here is not a property of points in the space, but of spaces among collections of spaces. We have made no attempt to define objective chances for such “events”. Nor should we. There may be subjective probabilities here, or credences, but no objective chances. Actually there is much subtlety here that has led many into error over “self indication” and the so-called Doomsday paradox, etc. There are many more interesting issues that arise in this vicinity, but they are outside the scope of this post. We must trudge dutifully on.

11. “Uncountably many events”. Hajek writes: “each space-time point may have a certain property or not – say, the property of having a field strength of a certain magnitude located there. What is the probability that a given point has this property? The trouble is that there are uncountably many such points”. Hmm. Well, we can surely specify only countably many distinct points with finite descriptions. That sort of seems to be the issue. Yes, there is a construction of countably many points involving infinite descriptions, but….at any rate this example is nonsense. Even if we allow for the existence of so many points…it seems we are doing away with quantization here…surely we don’t imagine that the distribution of field strength isn’t describable in finite time, or at least in denumerable time. Hajek seems to want to imagine that we have uncountably many points and that the field strength at each point is independent of the field strength at the others. For example, Hajek might ask for the probability that a randomly chosen point is named “Hajek” (self named, or named by God, or whatnot). That’s asking *way* too much.

12. “Failure of the IID assumption.” Hajek writes: “Consider a man repeatedly throwing darts at a dartboard, who can either hit or miss the bull’s eye. As he practices, he gets better; his probability of a hit increases: “P(hit on trial n+1) > P(hit on trial n).” Amusingly this underspecified example is specified just enough to ensure that the asymptotic limit of hits will exist. However, that accident is beside the point. What matters is that this is the second time we saw this argument. (The first was when Hajek was lifting his arms with an aim at thwarting the frequentist.) Unlike his thrower of darts, Hajek didn’t get any better at the argument the second time around, and it fails here for the same reason it failed before. In fairness, Hajek anticipates this and responds, albeit feebly:

“The hypothetical frequentist might reply that we should ‘freeze’ the dart-thrower’s skill level before a given throw, and imagine an infinite sequence of hypothetical tosses performed at exactly that skill level….”

Exactly. It is part and parcel of the frequentist’s method that the reference class B be IID.

“But this really is to give up on the idea that relative frequencies in the actual world have anything to do with probabilities.”

Again, I take the “real world” to be vast, and to contain (infinite, IID) sequences of throws at the “frozen” skill level in question. So the claim is just wrong.

“Indeed, this seems like a convoluted way of being a propensity theorist: all the work is being done by the thrower’s ‘ability’, a dispositional property of his, and the hypothetical limiting relative frequency appears to be a metaphysically profligate add-on, an idle wheel.”

Of *course* all the work is being done by the thrower’s ability. The structure of his neural net, the state of his soul, whatever psycho-physical laws there are, etc. No one denies this. The relevant information as to the probability is all there–if one can figure out how to extract it. (Good luck with that.) But wait…there’s awesome news. As it turns out, the relevant information is in the asymptotic frequency as well. *All of it*. There is no information about the probability of a bulleye to be found in the inner recesses of the dart thrower’s mind that isn’t fully and faithfully (with probability 1) recorded in the asymptotic frequency of bullseyes. And here, as always, when two properties are coextensive, one has a choice of which to use as definition and which becomes the theorem. The only issue here is whether *almost sure *coextension is good enough. I say yes. Hajek would, no doubt, say no, and he is not alone in this. Karl Popper in *The Propensity Interpretation of Probability* writes:

“”the frequency theorist…will now say that an admissible sequence of events…must always be a sequence of repeated experiments…characterized by a set of generating conditions…then our problem is at once solved…the frequency interpretation is no longer in any difficulty…what I have here described…only states explicitly an assumption which most frequency theorists…have always taken for granted….Yet…we find that it amounts to a transition from the frequency interpretation to the propensity interpretation.”

I am sympathetic to this way of seeing things, as there are some special cases in which the frequency interpretation and the propensity interpretation come apart, and in these cases the propensity view seems to be correct. However, in actual practice frequencies are final arbiter. (It’s a bad scientist who pleads coincidence in the face of an anomalous yet repeatable frequency–it’s part and parcel of the scientific method that when faced with recalcitrant data, one should seek rather to understand where one’s grasp of an underlying propensity has gone wrong.) So even though it’s possible (as I show in the next section) to construct countably many events generating a sigma-field on which the frequency notion of probability fails to be countably additive, I nevertheless think we should live as frequency theorists (more or less synonymously, as good scientsts) whenever possible (more or less–essentially always–when sigma fields generated by countable *partitions* satisfy our purposes–see below).

13. “Limiting relative frequency violates countable additivity.” This is probably my favorite argument of the fifteen. It’s not because it’s clever, or hard to answer. One can actually give cleverer examples of these “countable additivity violations”. Again, the reference class B has to be IID. Hajek writes: “start with a countably infinite event space—for definiteness, consider an infinite lottery, with tickets 1, 2, 3….” The “for definiteness” is classic because of what Hajek does *not* make definite…namely, the means by which a ticket is drawn. I believe he wants us to think that each ticket is equally likely, though he does not state as much. He also makes use of the wild hypothesis that “each ticket is drawn exactly once”. Which is shocking, if the process is with replacement (as he says it is) and anything like stationary. Obviously this example is so underspecified and involves such a fortuitous run as to be useless. (No IID output would ever look like that.) But no matter. I will give a better example.

At each timestep k, k=1,2,3,…, roll countably many dice. Die #n has 3^n sides, and is a fair die (each side equally likely to come up). Now it is true, with probability 1, that for each n and for each j =1,2,3,…,3^n, the asymptotic frequency of occurrence of attribute B(n,j) = (outcome j on die #n) is 3^(-n). Okay, so for each k, fix the particular j_k that comes up on the kth roll of die #k. Now look at the attributes B(k, j_k). If we sum the asymptotic frequencies of their instantiation sets, we get 1/2. On the other hand, for every timestep k at least one of them–namely B(k, j_k)–is instantiated.

What happened? We have a bunch of sets whose union is everything but the sum of their asymptotic frequencies is 1/2. Worse, seemingly, is that the sets were instantiation sets of outcomes in perfectly good IID processes. Is that a problem?

I would say no. What’s true is that if you specify in advance a partition P={A_1,A_2,..}, or for that matter countably many partitions P_n = { A^n_1, A^n_2, …}, so that here for each n, A^n_j, j = 1,2,… are mutually exclusive and exhaustive attributes, then with probability 1 it will turn out that for every n, it will be the case that the sum over j of the asymptotic frequency of the instantiation set of A^n_j will equal 1. In fact, countable additivity will be satisified in the sigma field G_n generated by the sets of instantiation of A^n_j, j=1,2,.. (where n is fixed). As we have just seen, it need not be satisfied in the field generated by *all* of the sets A^n_j. Indeed, for the previous example it is not satisfied for in the field generated by the instantiation sets B_k = B(k, j_k). What gives? If we had started by specifying the partition P = {B, B_1, B_2-B_1, B_3-B_2-B_1, … } (here B is the complement of the union of all the sets that follow) then the result I mentioned above would have failed for P. (Indeed B would have empty instantiation, although it seems clear that P(B)>1/2.)

There is good reason for this having happened. Namely, there are uncountably many (in fact continuum many) choice functions k–>j_k, each giving rise to a partition of “attribute space” along which countable additivity might fail. Since we can only guarantee countable additivity along a given partition almost surely, we can only handle countably many such partitions at a time, all of which must be specified (as usual) in advance. Specify a huge cardinality of partitions, and it is easy to arrange for a failure of countable additivity on the fields generated by some of them.

Is this fatal? Unequivocally *no–*what’s left is powerful enough for just about any purpose*.*

14. “The instantiation sets for which asymptotic frequency exists do not form a field.” Well, they do almost surely if they come from a discrete IID process. (See #13.) So sure, you can write down phony outputs that would never come from an IID process and say they are counterexamples, but never in the long history of even my vast world will you ever see it happen in a case where the attribute partition is specified in advance. (Almost surely, as always.)

15. “According to HF, there are no infinitesimal probabilities.”

Is that supposed to be an argument *against* HF? I’m confused.

I never read the paper of Bartha and Hitchcock on the so-called *Shooting Room Paradox*, but I find it difficult to imagine a good use for infinitesimal probabilities. As for the Shooting Room itself, the whole source of the paradox is that the expected value of the number shot is infinite, and that is a violation of a rational constraint, as I have argued elsewhere (Section 6 of my Sleeping Beauty survey.) To wit:

You’re suicidal and want desperately to be shot, but you don’t have a gun. You are given a choice between being in the pool for Shooting Room A or being the in the pool for Shooting Room B. You choose A, but this choice is arbitrary. You are given the option to switch, and an enticement; if you choose Room B, they will run the experiment with double the number of people in the room at every stage. Only half of them will be shot, but if you are in the room at the time double sixes are rolled, you are guaranteed to be in that half. Obviously you switch to Room B. Now you find out exactly how many rolls it will take to get double sixes in Room B. Call this number N. So twice 10^(N-1) people will be in Room B when double sixes are rolled, and you will be shot if you are one of them. You’re pretty happy with this trade, but now you are given the option to switch back to Room A. At a cost…if you do switch, you will be forced to wear a vest that blocks bullets half the time. You compute the expected number to be in Room A when double sixes are rolled. It’s infinite. Even half of that is better than twice 10^(N-1). So you switch back. And you’re pretty happy about your trade. Until you realize that your two trades have left you worse off than you were before. You’re back in Room A, but now you have to wear this damn vest.

What I don’t want to hear is that it’s “logically possible” that a world could be such as to allow a faithful implementation of the Shooting Room. Of course it’s logically possible. Logical possibility counts for very little. The point is, it can’t be *metaphysically* possible–for the simple reason that the world is not that badly designed.

Not, at any rate, with those Big Crunches coming every hundred billion years or so.

## Leave a Reply