A folk theorem for minority games

We study a particular case of repeated games with public signals. In the stage game an odd number of players have to choose simultaneously one of two rooms. The players who choose the less crowded room receive a reward of one euro (whence the name “minority game”). The players in the same room do not recognize each other, and between the stages only the current majority room is publicly announced. We show that in the infinitely repeated game any feasible payoff can be achieved as a uniform equilibrium payoff, and as an almost sure equilibrium payoff. In particular we construct an inefficient equilibrium where, with probability one, all players choose the same room at almost all stages. This equilibrium is sustained by punishment phases which use, in an unusual way, the pure actions that were played before the start of the punishment.


Introduction
An odd number of players have to choose simultaneously one of two rooms. The players who choose the less crowded room receive a reward of one euro. The others receive nothing. The game is repeated over time. A version of this game was introduced by Arthur (1994) under the name El Farol's Bar problem (see also Arthur (1999)). In his paper customers have to decide every weekend whether to go to the bar or stay home. Only customers who make the minority choice are happy. Arthur's paper gave rise to a huge literature on so called minority games. The interest in this class of games came especially from theoretical physicists working in statistical mechanics (see e.g. Challet and Zhang (1997), Savit et al. (1999)). They focus on the case of many players and see "these problems as novel examples of frustrated and disordered many-body systems" (Cavagna et al. (1999)). In their models the many agents have limited memory and act according to some evolutionary paradigm without taking into account strategic considerations. The reader is referred to http://www.unifr.ch/econophysics/minority/ for an extensive list of references.
In our paper we will consider a repeated minority game and we will look at it according to the classical rational approach of game theory. Notice that, if after each stage each player observes the players which are in the room she selected, then, by the folk theorem, any feasible payoff is an equilibrium payoff of the repeated game. We study here the following version of a repeated minority game. At each step the players choose an action (one of two rooms). After their choice only a public signal (the majority room) is announced to all players. Therefore they do not observe the actions or the payoffs of the other players. The game is infinitely repeated and the payoffs are not discounted. We use the standard notion of uniform equilibrium, which will turn out to be payoff-equivalent here to that of almost sure equilibrium. We characterize the set of equilibrium payoffs.
Our model is a particular case of repeated games with imperfect observation: The players repeat a known one-shot game and after each stage each player receives a signal depending on the actions played. The reader is referred to Sorin (1992) for a survey of repeated games with complete information. Renault and Tomala (2000) characterized the set of uniform communication equilibrium payoffs for any repeated game with imperfect monitoring, but no general characterization exists for (Nash) equilibrium payoffs. Fudenberg and Maskin (1986) proved a folk theorem for a certain class of repeated games with discounting. Lehrer (1989Lehrer ( , 1992a dealt with two-person undiscounted repeated games with imperfect observation. More recently Tomala (1998) studied the case of public signals, where all players get the same signal after each stage. In this setup he characterized the set of pure uniform equilibrium payoffs. He also provided a characterization of the set of uniform (possibly mixed) equilibrium payoffs in a certain class of games, where all payoffs can be deduced from the public signal (Tomala (1999)).
Since we are interested not only in pure equilibria but in all (possibly mixed) uniform equilibria, the solution to our problem cannot be found in the existing literature. We will prove that a folk theorem holds for our game, i.e. we will show that any feasible payoff is an equilibrium payoff. In particular, we will construct a uniform equilibrium where the payoff of each player is simply zero. This equilibrium can be considered as particularly inefficient, since all feasible payoffs are non negative. It contains a main path and punishment phases. A punishment phase starts when the players suspect that a deviation have occurred. The identity of the possible deviator is not known by the players and it is not possible to punish simultaneously all players suspected of deviation, as done in several recent papers (Tomala (1999); Renault and Tomala (2000)). On the other hand it is possible to punish the deviator, if any, by replicating some actions previously played in the main path before the punishment phase. To our knowledge, this kind of punishment is new in the literature. The technical parts of our proofs use statistical techniques due to Lehrer (1990Lehrer ( , 1992b, or, more specifically, the variations used by Renault (2000). In our opinion, the construction of our inefficient equilibrium gives insights concerning the difficulty of a general characterization of equilibrium payoffs in repeated games with public signals.
For the sake of simplicity, we first deal with the case of three players. Section 2 contains the model, and the statement of our main result. In Section 3 we define a particular strategy where all players are, at almost all stages with great probability, in the same room. In Section 4 we prove that this strategy is a uniform equilibrium with payoff 0 for each player. In Section 5 we finally extend our result to the case of any odd number of players.

The model
If E is an event, then E c is its complementary event. The cardinality of a finite set A will be indicated as |A|. If C is a subset of an Euclidean space, convC is the convex hull of C.
There are two rooms: L(eft) and R(ight). At each stage, three players have to choose simultaneously one of the two rooms. The player who finds herself in the less crowded room (if any) gains a positive payoff of 1, and the most crowded room is publicly announced before going to the next stage.

The stage game
The set of players is N = {1, 2, 3}. For all i ∈ N , we call A i = {L, R} the set of actions for player i, and we put It is easy to compute the equilibria of the one-shot game. These are the action profiles such that one player plays L with probability 1 and another player plays R with probability 1, and the action profile where each player plays L and R with equal probability. Consequently, the set of equilibrium payoffs of the one-shot game is just Notice that all payoffs x = (x 1 , x 2 , x 3 ) in E 1 satisfy x 1 + x 2 + x 3 ≥ 3/4. In a Nash equilibrium of the one-shot game, the three players are in the same room with probability at most 1/4. Since the stage game will be repeated, we also need notations about what the players observe. We define the set of public signals as U = {L, R}. The signalling function : A → U , giving the most crowded room, is formally defined by

The repeated game Γ ∞
At each stage t ≥ 1, each player i (simultaneously with the other players) selects and action a i t ∈ A i . If a t = (a 1 t , a 2 t , a 3 t ) ∈ A is chosen, the stage payoff of player i is g i (a t ), and the signal u t = (a t ) is publicly announced. Then the play proceeds to stage t + 1. All the players have perfect recall and the whole description of Γ ∞ is common knowledge.
The game Γ ∞ is a game with imperfect monitoring, in that the players do not observe the actions of their opponents, but only a signal (the majority room).

The equilibria of Γ ∞
A behavioral strategy of player i is an element σ i = (σ i t ) t≥1 , where for all t σ i t : (A i × U ) t−1 → ∆(A i ). Therefore, for each t ≥ 1, σ i t (a i 1 , u 1 , a i 2 , u 2 , . . . , a i t−1 , u t−1 ) is the lottery played by player i at stage t if she played a i 1 at stage 1, . . . , a i t−1 at stage t − 1, and the signal was u 1 at stage 1, . . . , u t−1 at stage t − 1.
We call Σ i the set of behavioral strategies of player i, and Σ = Σ 1 × Σ 2 × Σ 3 . A strategy profile σ = (σ 1 , σ 2 , σ 3 ) ∈ Σ induces a probability measure P σ over the set of plays Ω = (A × U ) ∞ = {(a 1 , u 1 , a 2 , u 2 , . . . ), ∀t ≥ 1, a t ∈ A, u t ∈ U )}. With an abuse of notation we will indicate as a t the random variable of the joint action profile in A played at stage t. For all i ∈ N , and for all T ≥ 1, Definition 1. The profile σ is a uniform equilibrium of Γ ∞ if (a) for all i ∈ N , (γ i T (σ)) T converges to some quantity x i as T goes to infinity (the vector (x 1 , x 2 , x 3 ) is called the payoff of σ).
(b) for all > 0 there exists T 0 such that for all T ≥ T 0 , σ is an -Nash equilibrium in the finitely repeated game with T stages, i.e. for all i ∈ N , for all Definition 2. The vector x ∈ R 3 is an equilibrium payoff of Γ ∞ if there exists a uniform equilibrium with payoff x.
We denote by E ∞ the set of equilibrium payoffs of Γ ∞ . Since all payoffs are nonnegative and g 1 + g 2 + g 3 ≤ 1, it is clear that E ∞ is a subset of the simplex S, where Our main result is the following theorem.
Since repeating at each stage a Nash equilibrium of the one-shot game is a uniform equilibrium of Γ ∞ , we know that E 1 ⊂ E ∞ . Moreover E ∞ is convex, so the only thing we have to do is to prove the following theorem.
In order to prove the above theorem we need to construct a strategy σ ∈ Σ that satisfies the two properties of Definition 1, namely, Note that since all payoffs are non negative, (a) is a consequence of (b) here.
To get a payoff of 0, we need all the players to play with high probability the same action (say L) most of the stages. But if all players play L with probability 1, the deviation of one player, that consists of playing R, will be profitable (in terms of payoffs) and will not be detected (the signal will still be L). Hence some of the players must play R with small but positive probability.
Imagine all players play at each stage R with probability , where is small but positive. In order to detect a deviation, we will need a statistical test. If the frequency of stages where R is the most crowded room is higher than it should be, all players will consider that a deviation has occurred and a punishment phase will start. We then need to define an appropriate punishment phase, the difficulty being that the identity of the deviator (if any) is not known by the players. Our main idea is then the following. If player i is deviating, then with great probability at most of the stages where R was the most crowded room, the situation was the following: Player i played R, and exactly one of the other players played R, too. So if the players different from i repeat the actions they have played at the stages where R was the most crowded, at most stages one of them will play L and the other will play R. This punishes player i by giving him a payoff of zero.
We now formally construct σ. The set of stages {1, 2, ...} is divided into consecutive blocks of increasing lengths B 1 ,...,B m ,..., such that for all m ≥ 1, |B m | = m 10 . This is needed because we need the statistical tests to become more and more accurate. The strategy σ consists of a main path and of punishment phases, starting from the main path.
When the play is in the main path, at some block B m , all players play at each stage t of B m , independently of what happened before, the mixed action At the end of such a block, all players can compute the empirical frequency of "R being the most crowded room" in this block Note that if no player deviates at block B m , by Tchebychev's inequality α m should be close to the expectation of "R being the most crowded room", which is equivalent to 3/m 2 . The statistical test will be the following: , the test will be considered as passed. The play stays in the main path (and block B m+1 is played).
, the test will be considered as failed, and the players will assume that a deviation has occurred. The play will immediately go out of the main path and a punishment phase will start. The punishment phase will last a large number of blocks, but will not be infinite, because there will always be a chance for the punishment to fail. More precisely, the punishment phase will last from the first stage of block B m+1 to the last stage of block B m 2 . Then, and whatever happens during the punishment phase, the play will go back to the main path at block B m 2 +1 .
To complete the definition of σ, we have to define what is played in the punishment phases.
Let m be a positive integer, and consider a block B m where the play is in the main path, such that α m > 1/(m √ m), namely, the test fails. Define On the set D we suspect the deviator, if any, to have played R on purpose. We have |D| = m 10 α m . In order to play the punishment phase at blocks B m+1 , . . . , B m 2 , each player will have to remember D and the action she played at each stage of D. We order the elements of D so that We now define what σ recommends to play at such block B m during a punishment phase. Let With high probability the right hand side of (2) will be small when m is large, even in case of deviation. Consequently we can define σ arbitrarily on such a block B m d+1 . Let d ∈ {1, . . . , d}. At B m d the strategy σ recommends the players to mimic what happened at stages in D. If B m d = {t 1 , . . . , t |D| } with t 1 < t 2 < · · · < t |D| , then σ recommends each player i at each stage t n ∈ B m d (with n ∈ {1, . . . , |D|}) to repeat the action she played at stage t n , i.e. to play a i tn . Notice that σ recommends to play exactly the same sequence of actions at each sub-block B m 1 , . . . , B m d .
1. Suppose that all players follow σ. Then at each stage of some block B m in the main path the probability of R being the most crowded room is equivalent (as m goes to the infinity) to 3/m 2 . Consequently, by Tchebychev's inequality, α m will be close to 3/m 2 with high probability. Since 3/m 2 < 1/(m √ m), for m large, the test of block B m will pass. It will even be possible, by Borel-Cantelli lemma, to show that the set of blocks m such that B m is not in the main path is almost surely finite. Moreover the (stage) average payoff of some player i at some block B m in the main path will be close to the probability that she plays R whereas the others play L, hence will be close to 1/m. This will ensure that the average payoff of each player will go to zero as the number of stages goes to infinity.
2. Suppose that some player (e.g. player 1) deviates from σ. In order for player 1 to have a good payoff at some block B m in the main path, she should play R a large number of times in this block. We will see that in this case the empirical frequency of "R being the most crowded room" will be greater than 1/(m √ m) with high probability. Hence a punishment phase will start, and the efficiency of this phase to punish player 1 will only depend on what happened at stages in D = {t ∈ B m : (a t ) = R}.
The set D consists of two kinds of stages: (i) the stages where player 1 played R and exactly one of the other players played R, and (ii) the stages where both players 2 and 3 played R. We will show that, with high probability, the stages of type (ii) are negligible. Consequently, for most of the stages in D, player 2 and player 3 do not play the same action. Hence for most of the punishment stages, player 1's payoff will be zero.
Summing up, player 1 cannot have a good payoff on some block in the main path without being severely punished afterwards with high probability. This will ensure that no deviation is profitable.
To show that σ is a uniform equilibrium, we only need to prove Proposition 6 below. However we will first shortly prove the following Proposition 5 to simplify the exposition of our proof (and because the analogue of Proposition 5 will be needed in Section 5).
Proof of Proposition 5. By symmetry, we only consider the case where i = 1. Assume that all players play σ. All the probabilities and expectations in the sequel of the proof are computed according to P = P σ .
For each block m, we define the following events: B m = {the play is in the main path at block B m }, Fix a block number m where B m holds. At each stage of B m each player plays i.i.d. the mixed action So at each stage the probability that R is the most crowded room is and the probability that player 1 has a payoff of 1 is We have, by Tchebychev's inequality Moreover, for m large enough, Again the last inequality is just Tchebychev. Since m≥1 4/m 7 < ∞, by Borel-Cantelli lemma we obtain Since after a punishment phase the play always comes back to the main path, (4) implies that with probability 1 there exists a block m 1 such that for each m ≥ m 1 , B m holds.
By Borel-Cantelli lemma again and (3), we now have P (lim sup(A m ∩ B m )) = 0, hence P(lim sup A m ) = 0. Hence, with probability 1, there exists a block m 2 such that for all m ≥ m 2 , Since the cardinality of the B m is polynomial in m, we have By the bounded convergence theorem we also have that lim T →∞ γ 1 T (σ) = 0.
Proposition 6. For all > 0 there exists T 0 such that for all Proof of Proposition 6. Without loss of generality, we consider only deviations by player 1. Fix τ 1 ∈ Σ 1 in all the sequel, and assume that (τ 1 , σ 2 , σ 3 ) is played. All the probabilities and expectations in the sequel will be with respect to P = P τ 1 ,σ 2 ,σ 3 . For each block m we define the following random variables: We have X m ≤ U m + x m . We also define the event Conditionally on C m , one of the following three possibilities is true: Either the play is in a punishment phase, or it is in the main path at B m , player 1's payoff is low, and it will still be in the main path at B m+1 , or an efficient punishment starts at block B m+1 .
Lemma 7. There exists M 1 , independent from τ 1 , such that for all m ≥ M 1 Proof of Lemma 7. Consider a block B m , with m large enough, where the play is in the main path. Via Tchebychev's inequality we obtain Hence with high probability player 2 and 3 will not be simultaneously in room R at the same stages.
We now want to estimate the number of stages where R is the most crowded room and exactly two players, including player 1, are in R. Define for all t ∈ B m for m ≥ 2. The variables (Q t ) t∈B m may not be independent and may not be independent of (ξ t ) t∈B m , since player 1 is using an arbitrary strategy τ 1 . Nevertheless, for each t ∈ B m , ξ t is independent of (ξ t ) t ∈B m ,t <t and (Q t ) t ∈B m ,t <t . Hence we can apply a generalization of Tchebychev's inequality due to Lehrer (see Lehrer (1990), Assume now that there is no punishment phase at block B m+1 , i.e. that B m+1 holds. This implies Assume also that We obtain as a first result Therefore if we define the event Assume that G m and B m hold. Then • either B m+1 holds, and this implies that X m ≤ 3/ √ m, • or B c m+1 holds, and therefore a punishment phase starts at block B m+1 .
Since (U m ≤ 2/m 2 ), the number of stages in D where player 2 and player 3 play the same action is at most 2m 8 .
Consider a block B m with m ∈ {m + 1, . . . , m 2 }. Let d be the integer such that d ≤ |B m |/|D| < d + 1. At each stage where player 2 plays L and player 3 plays R, or vice versa, players 1's payoff is 0. So the total payoff of player 1 at block B m is This implies that G m ⊂ C m . The desired result now follows from (6).
Lemma 7 is the key to the proof of Proposition 6. The rest is technical, and very close to the end of the proof in Renault (2000). One may also assume that for all m ≥ M 2 , we have and Fix now m 0 ≥ M 2 2 , and put Assume that for all m ≥ M 2 , C m holds. We will show that this implies By (5) and (7) we have sequences (X m ) m≥M 2 and (B m ) m≥M 2 such that for all m ≥ M 2 the following events are true Since m 0 ≥ M 2 2 , and after a punishment phase the play always comes back to the main path, there necessarily exists some block number m 1 in {M 2 , . . . , m 0 } such that B m 1 holds.
Two cases are possible: (II) There exists a first block number m 2 ≥ m 1 such that B m 2 ∩ B c m 2 +1 holds. We have X m ≤ whenever m 1 ≤ m < m 2 .
Two sub-cases of (II) are possible (i) m 2 ≥ m 0 . For all m such that m 2 < m ≤ m 2 0 , we have X m ≤ (the punishment starting from B m 2 +1 will finish after B m 2 0 ). So, by (8) (ii) m 2 < m 0 . For all m ∈ {m 0 , . . . , m 2 2 }, X m ≤ , and B m 2 2 +1 holds. We just have to repeat the argument and consider the following sub-subcases.  Proposition 6 is proved since T 0 does not depend on τ 1 .
Remark 9. (Almost sure equilibrium payoffs) As in Lehrer (1992a), we can define an almost sure equilibrium payoff as a vector x = (x 1 , x 2 , x 3 ) in R 3 such that there exists an (almost sure equilibrium) strategy profile σ satisfying and ∀i ∈ N, ∀τ i ∈ Σ i , lim sup Take σ to be our inefficient uniform equilibrium just constructed. We only need to prove (10) with x = (0, 0, 0) (because (9) is proved in Proposition 5, or because (9) is a consequence of (10) here). Fix as before a strategy τ 1 of player 1 and define for each m, C m as in (5). By Lemma 7 and Borel-Cantelli lemma, with probability one we can find an integer M 4 , that may depend on τ 1 , such that for all m ≥ M 4 , C m holds. Looking at the proof of Lemma 8, this implies that for every > 0, one can find M 4 such that for all Therefore σ is not only a uniform equilibrium; it is also an almost sure equilibrium, and (0, 0, 0) is an almost sure equilibrium payoff. It is then easy to see that for this game, the set of almost sure equilibrium payoffs coincides with the set of uniform equilibrium payoffs.

An odd number of players
We generalize the model of Section 2 as follows. The set of players is now N = {1, ..., 2n + 1}, where n is a fixed positive integer. At each stage, each player gets a payoff of 1 if he is in the minority room, and gets a payoff of 0 otherwise. The signal is again the most crowded room. The previous definitions of equilibrium and equilibrium payoffs extend unambiguously to this general model.
For each subset S of N such that |S| ≤ n, define e S as the payoff in R N where each player in S gets 1, and each player not in S gets 0. If S = ∅, then e S is just the null vector. The set of feasible vectors is now We show that also in this general case the set of uniform equilibrium payoffs and the set of feasible payoffs coincide.
Proof of Theorem 10. The proof is a generalization of the proof for the three-player case. If S is a subset of N with exactly n elements, e S is a Nash equilibrium of the one-shot game, hence e S is also a uniform equilibrium payoff. By convexity, to prove that the set of uniform equilibrium payoffs is S it is sufficient to show that for any S with |S| < n, we can construct a uniform equilibrium with payoff e S .
Fix a subset S of players such that |S| < n. We need to construct a strategy profile σ = (σ i ) i∈N such that σ is a uniform equilibrium with payoff e S . The construction of Section 3 generalizes as follows.
If i ∈ S, σ i is very simple: play R at each stage in {1, 2, ...}, independently of what happened before.
Divide the set of stages {1, 2, ...} into consecutive blocks B 1 , ..., B m ,... with |B m | = m 10 for each m, exactly as in Section 3. The strategy σ consists of a main path and of punishment phases, starting from the main path. When the play is in the main path at some block B m , each player i in N \ S plays i.i.d. at each stage the mixed action ( Notice that 0 < 2 n + 1 − |S| ≤ 1, so δ m ≥ 1/m and lim m→∞ δ m = 0. At the end of such a block, all players compute as before the empirical frequency of "R being the most crowded room" in this block The statistical test is the following: • If α m ≤ θ m , the test is passed. The play stays in the main path (and block B m+1 is played). Notice that if n = 1 and S = ∅, σ is exactly the strategy constructed in Section 3. To conclude, we have to prove that σ is a uniform equilibrium with payoff e S . In the following computations, "if m is large enough" should be understood as "if m is larger than some constant only depending on n and |S|." We will use the following binomial coefficients: A) Assume that all players follow σ. Probabilities are computed according to P = P σ .
Fix a block number m where B m = {the play is in the main path at block B m } holds. Consider some stage in this block. For R to be the most crowded room at this stage we need at least n + 1 − |S| players in N \ S to play R, hence the probability that R is the most crowded room is Hence by Tchebychev's inequality and if m is large we get By Borel-Cantelli lemma we obtain as in Section 3 that with probability 1 there exists m 1 such that for each m ≥ m 1 , B m holds. Let i be a player in S. At some stage in the main path, the probability that player i's payoff is 1 is the probability that L is the most crowded room, hence it is at least 1 − K 1 /m 2 . Since |1 − 1/m − (1 − K 1 /m 2 )| > 1/(2m) for m large, by Tchebychev's inequality one can prove that Again by Borel-Cantelli lemma with probability 1 there will exist a block number m 2 such that for each m ≥ m 2 , From this it follows lim T →∞ 1 T T t=1 g i (a t ) = 1 P σ -a.s., and lim T →∞ γ i T (σ) = 1.
Let now i be a player in N \ S. Fix m where B m holds. At some stage t in B m , if player i's payoff is 1 then either she plays R or R is the most crowded room, hence Tchebychev's inequality then shows that And as before, lim T →∞ γ i T (σ) = 0. B) It remains to prove that no player can benefit by deviating from σ. Since 1 is the largest possible payoff in the game, we do not have to care about deviations by players in S. We thus only consider a deviation of some player i not in S. By symmetry, we assume that i = 1 / ∈ S and fix in all the sequel a deviation τ 1 of player 1. We use the probability P = P τ 1 ,σ −1 . For each m, denote as before the average payoff of player 1 at block B m as The definition of C m generalizes as follows We are going to prove the following analogue of Lemma 7.
Lemma 11. There exists M 1 , independent from τ 1 , such that for all m ≥ M 1 Notice that lim Once Lemma 11 is proved, one can proceed exactly as in the proof of Lemma 8 and as the end of the proof of Proposition 6, and Theorem 10 will be proved.
Proof of Lemma 11. For each stage t, we define the random variables ξ t and U t with values in {0, 1} such that ξ t = 1 iff there are exactly n − |S| players in N \ (S ∪ {1}) that play R at stage t. U t = 1 iff there are at least n + 1 − |S| players in N \ (S ∪ {1}) that play R at stage t.
If U t = 1, the most crowded room at stage t is R. If ξ t = 1, player 1's payoff is 0, and her action determines the most crowded room at stage t.
For each block number m, we also define Again we have X m ≤ U m + x m . for m large. Putting Q t = 1 {a 1 t =R} for each stage t, Lemma 5.6 of Lehrer (1990) gives For some stage t in B m , the conditional probability (given B m ) that at least n + 1 − |S| players in N \ (S ∪ {1}) play R at stage t is at most 2n − |S| n + 1 − |S| δ n+1−|S| m = K 2 m 2 .
Hence we obtain Similarly, the conditional probability (given B m ) that at least n − |S| players in N \ (S ∪ {1}) play R at stage t is at most So we obtain and that there is no punishment after B m , which implies t∈B m ξ t Q t m 10 ≤ θ m .
Then p m x m ≤ 2θ m , and Since X m ≤ U m + x m , we obtain that X m ≤ 3 √ δ m for m large. Hence we have So with probability at least 1 − 3/m 6 , the following event holds Assume finally that both G m and B m hold. Then • either B m+1 holds, and this implies that X m ≤ 3 √ δ m , • or B c m+1 holds, and therefore a punishment phase starts at block B m+1 . We have U m ≤ 2K 2 /m 2 and ξ m + U m ≤ 2K 3 /m. Consider D = {t ∈ B m , l(a t ) = R}. We have |D| ≥ m 10 θ m , and |D| ≤ (ξ m + U m )m 10 , so |D| ≤ 2K 3 /m 9 . The number of stages in D where player 1 may have a payoff of 1 is at most U m m 10 ≤ 2K 2 m 8 .
Hence we obtain So for m large enough, X m ≤ 3K 2 δ m .
Hence we obtain that G m ⊂ C m . This concludes the proof of Lemma 11.
Remark 12. The arguments of Remark 9 can be used here, and one can easily show that σ is also an almost sure equilibrium payoff. The set of almost sure equilibrium payoffs, the set of uniform equilibrium payoffs, and the set of feasible payoffs coincide.