Secret Correlation in Repeated Games with Imperfect Monitoring

We characterize the maximum payoff that a team can guarantee against another in a class of repeated games with imperfect monitoring. Our result relies on the optimal trade-off for the team between optimization of stage-payoffs and generation of signals for future correlation.

1. Introduction. In many strategic situations, a group of players may find it beneficial to coordinate their action plans in a way that is hidden from other players. The manager of a sport team devises coordinated plans for the team members, and generals of allied armies need to keep their coordinated plans secret from enemies. On the Internet, coordinated attacks of systems (e.g., by viruses) are known to be much more dangerous than uncoordinated attacks. The management of a firm coordinates the actions of the units of production in a way that is hidden from the competitors.
Coordination of a group of players needs to rely on the observation of a common signal by its members. This signal can arise from an external correlation device (Aumann [2]), or be the result of communication between the players in the group (Forges [5]). In the model of repeated games with imperfect monitoring, each player observes a signal that depends stochastically on chosen actions (deterministic signals is a particular case); the signals may be correlated. These games feature both correlated signals and communication possibilities since actions may be used as messages.
This article explores the possibilities of secret correlation between team members in a repeated game with imperfect monitoring. In our model, two teams are matched against each other. Each member i of team I has an action set A i . Team II is viewed as single player with action set B. At each stage, team II observes a (possibly random) signal s about I's action profile a, drawn according to some probability distribution q s a . Team I's members are informed of a, s, and possibly of II's actions (our result covers the cases in which team I has perfect, imperfect, or no observation of II's choice). The payoff to team I is a function of both team's action choices. In order to stress the value of secret correlation between team members, we assume that team II's goal is to minimize team I's payoff. Since team I has more information than team II about action choices, this extra information can be used as a correlation device for future actions. Our model allows us to study the optimal tradeoffs for team I between generation of signals for future correlation and use of correlation for present payoffs.
Our main result is a characterization of the best payoff that the team can guarantee against outside players as either the horizon of the game grows to infinity, or the discount factor goes to one. We emphasize three reasons why we believe characterizing the max min value is important: First, such characterizations are important for the general study of repeated games with imperfect monitoring because they provide the individually rational levels. Some generalizations of the Folk Theorem from the perfect monitoring case to imperfect monitoring, such as Renault and Tomala [20] and Hörner and Olszewski [12] show that the set of equilibrium payoffs of repeated games is the set of feasible and individually rational payoffs, but do not characterize the individually rational levels. Such a characterization completes these studies, thus providing full descriptions of the sets of equilibrium payoffs of the repeated games.
Second, von Stengel and Koller [22] proved that, in finite games where a team of players is matched against one outside player, the max min payoff is a Nash payoff. Furthermore, it is the most natural Nash payoff to select since team members can guarantee this value. Combined with our result, we know that the maximal Nash payoff to the team in the repeated game with imperfect monitoring is the max min we characterize.
Finally, the max min of the repeated game measures how successful team I is in correlating secretly its actions from outside players. Indeed, when no correlation is possible the max min of the repeated game coincides with the max min in mixed strategies of the stage game. When full correlation is achievable, this max min equals the generally higher max min in correlated strategies of the stage game. In general, only partial correlation may be achievable; the max min of the repeated game may lie between these two values. The study of the endogenous emergence of secret correlation of a group of players is interesting in itself. This article studies secret correlation as arising from monitoring structures. Gossner [8,9] and Bavly and Neyman [3] studied its emergence through limitations of computational capacities of the players.
The problem faced by the team consists in finding the optimal tradeoff between using previous signals that are unknown to team II as correlation devices, and generating such signals for future use. We measure the amount of secret information contained in past signals by the signals' entropy. Our main result characterizes the team's max min payoff as the best payoff that can be obtained by a convex combination of correlated strategies under the constraint that the average entropy spent by the correlation devices does not exceed the average entropy of secret signals generated.
We motivate the problem by discussing examples in §2, present the model and definitions in §3, and the main result in §4. We discuss examples in §5, and computational aspects in §6. The proof of the main results is given in §7. For simplicity, the model and the main result are first stated for a simple class of signalling structures; we extend our main result to more general signalling structures in §8. Finally, we show consequences for the Folk Theorem in §9.

Examples.
We consider a three-player game where the teams are I = 1 2 and II = 3 . Player 1 chooses rows, Player 2 chooses columns, and Player 3 chooses matrices. The payoffs to the team are given by In the repeated game with perfect monitoring, the team guarantees the max min of the one-shot game, where the max runs over the independent probability distributions on A 1 × A 2 . That is, the team guarantees 1 4 . Now assume that Player 3 receives blank signals, i.e., has no information on the action profile of I, whereas Players 1 and 2 observe each other's actions. The team can then use the first move of Player 1 as a correlation device, and thus can guarantee the max min of the one-shot game in long repetitions, where the max runs over the set of all probability distributions on A 1 × A 2 . That is, from the second stage on, I guarantees 1 2 . Now consider the case where team members observe each other's actions and the signal of Player 3 is given by the following matrix: a b a b s s s s Player 3 thus learns at each stage whether Players 1 and 2 played the same action. Consider the following strategy of the team: at Stage 1 each player randomizes between his two actions with equal probabilities. Let a 1 1 be the random move of Player 1 at Stage 1. At each stage n > 1, play a a if a 1 1 = a and play b b if a 1 1 = b. The signal of Player 3 at Stage 1 is uniformly distributed and conditional on this signal; a 1 1 is also uniformly distributed. Since after Stage 1 the signals will be constant, Player 3 never learns anything about the value of a 1 1 . Actions of Players 1 and 2 are thus correlated from Stage 2 on and I guarantees 1 2 . Finally, consider the case where team members observe each other's actions and the signal of Player 3 is given by Player 2's action, i.e., by the following matrix: and thus the correlation gained at Stage 1 is lost after Stage 2. The tradeoff between generating signals for correlation and using this correlation appears here, Stage 1 generates a correlation device, and the Stage 2 uses it. Playing this two-stage strategy cyclically, the team guarantees 3 8 and we will see that this is not optimal. This game with the latter signaling structure serves in the sequel of the paper for further illustrations. We shall therefore refer to it as the main example.

Model and definitions.
3.1. The repeated game. Let I = 1 I be a finite set of players called team and II be another player. For each player i ∈ I, let A i be player i's finite set of actions and let B be player II's finite set of actions. We denote A = i∈I A i . At each stage t = 1 2 , each player chooses an action in his own set of actions; if a b = a i i∈I b ∈ A × B is the action profile played, the payoff for each team player i ∈ I is g a b where g A × B → . The payoff for player II is −g a b .
After each stage, if a is the action profile played by players i ∈ I, a signal s is drawn in a finite set S with probability q s a , where q maps A to the set of probabilities on S. Player II observes s b , whereas team players observe a s b . Thus, in our model, all team members observe the same random signal that reveals the signal observed by player II. Note that the model is designed to preserve transparency; §8 presents extensions of the model and results to a larger class of signalling structures.
For each finite set E, we let E be the set of probabilities on E. We write an element x ∈ E as a vector x = x e e∈E with x e ≥ 0 and e x e = 1. We denote by ⊗ the direct product of probabilities, i.e., p ⊗ q x y = p x q y . A history of length n for the team is an element h n of H n = A × B × S n , and a history of length n for player II is an element h II n of H II n = B × S n ; by convention, H 0 and H II 0 are arbitrary singletons. A behavioral strategy i for a team player i is a mapping i n≥0 H n → A i ; a behavioral strategy for player II is a mapping n≥0 H II n → B . A profile of behavioral strategies = i i∈I induces a probability distribution P on the set of plays A × B × S endowed with the product -algebra. Given a discount factor 0 < < 1, the discounted payoff for team I induced by is = E n≥1 1 − n−1 g a n b n where a n b n denotes the random action profile at stage n.
The average payoff for team I up to stage n is n = E 1/n n m=1 g a n b n . The n-stage max min payoff of team I denoted v n is v n = max min n The uniform max min payoff of player II denoted v is defined as follows: The uniform max min, if it exists, is v ∈ such that I guarantees v and II defends v . The max min v 1 of the one-shot game is simply max x∈⊗ j =i A j min b g x b , where g is extended to mixed action in the usual way. We call this the independent max min. This is the best that the team can guarantee in the one-shot game with independent mixed strategies. This quantity can also be guaranteed in every version of the repeated game by playing independent and identically distributed (i.i.d.) a mixed strategy profile x that achieves the maximum in v 1 . Therefore, On the other hand, in any version of the repeated game, the team cannot guarantee more than the value of the two-person zero-sum game defined by I II A B g . Let us denote val g = max x∈ A min b g x b this value, and call it the correlated max min. One has Information theory tools. The entropy of a finite random variable x with law P is, by definition, where log denotes the logarithm with Base 2, and 0 log 0 = 0. Note that H x ≥ 0 and that H x depends only on the law P of x. The entropy of x is thus the entropy H P of its distribution P , with H P = − x P x log P x .
Let x y be a couple of random variables with joint law P such that x is finite. The conditional entropy of x given y = y is the entropy of the conditional distribution P x y when this conditional distribution is well defined: The conditional entropy of x given y is the expected value of the previous If y is also finite, one has the following relation of additivity of entropies: The main result. The max min values v , v n are defined in terms of the data of the repeated game. Our main result is a characterization of their asymptotic values and of v .

Correlations systems.
Let be a strategy. Suppose that at stage n, the history for player II is h II n = b 1 s 1 b n s n . Let h n = a 1 b 1 s 1 a n b n s n be the history for the team. The mixed action played by the team at stage n + 1 is h n = i h n i∈I . Player II holds a belief on this mixed action-he believes that player I plays h n with probability P h n h II n . The distribution of the action profile a n+1 given the information h II n of player II is h n P h n h II n h n , an element of A the set of correlated distributions on A.
Definition 1. Let X = ⊗ i∈I A i be the set of independent probability distributions on A. A correlation system is a probability distribution on X and we let C = X be the set of all correlation systems.
X is a closed subset of A and thus C is compact with respect to the weak- * topology. Assume that at some stage n, after some history h II n , the distribution of h n conditional on h II n is c. The play of the game at this stage is as if h n were drawn according to the probability distribution c and announced to each player of the team but not to player II. Given h n , each team player chooses a mixed action. This generates a random action profile for the team and a random signal. We study the variation of uncertainty of player II regarding the total history, measuring uncertainty by entropy.
Definition 2. Let c be a correlation system and x a s be a random variable in X × A × S such that the law of x is c, the law of a given x = x is x, and the law of s given a = a is q · a . The entropy variation of c is The entropy variation is the difference between the entropy gained and the entropy lost by the team. The entropy gain is the conditional uncertainty contained in a s given x; the entropy loss is the entropy of s, which is observed by player II. If x is finite, from the additivity formula The entropy variation is thus the new entropy of the information possessed by I and not by II minus the initial entropy. Now we define, given a correlation system c, the payoff obtained when player II plays a best reply to the expected distribution on A.
Definition 3. Given a correlation system c, the distribution of the action profile for the team is x c ∈ A such that for each a ∈ A, x c a = X i x i a i dc x . The optimal payoff yielded by c is c = min b∈B g x c b .
We consider the set of feasible vectors H c c in the (entropy variation, payoff) plane: Proof. Since s is independent of x conditionally on a, the additivity formula gives H a s x = H a x + H s a and the entropy variation is From the definitions of entropy and conditional entropy, recalling that the law of a given x = x is x, which is clearly a continuous function of c. H and are thus continuous on the compact set C so V is compact.
We introduce the following notation: This is the highest payoff associated with a convex combination of correlation systems under the constraint that the average entropy variation is nonnegative. For every correlation system c such that x is almost surely constant, H c ≥ 0 thus V intersects the half-plane x 1 ≥ 0 . Since V is compact, so is its convex hull and the supremum is indeed a maximum. The set V need not be convex as shown in Goldberg [7]; the supremum in the definition of w above might not be achieved by a point in V , but might be achieved by a convex combination involving two points of V with nonzero weights. For computations, it is convenient to express the number w through the boundary of co V . Define for each Since V is compact, u h is well defined. Let cav u be the least concave function pointwise greater than u. Then Indeed, u is upper-semi-continuous, nonincreasing, and the hypograph of u is the comprehensive set V * = V − 2 + associated with V . This implies that cav u is also nonincreasing, u.s.c., and its hypograph is co V * . Figure 1 illustrates how the map cav u and the value w are derived from the set V .

A characterization of asymptotic max min values.
Theorem 5. The max min value of the -discounted game and of the n-stage game both converge to the same limit respectively as goes to 1 and as n goes to infinity. This limit coincides with the uniform max min which is

Perfect observation.
We say that the observation is perfect when the signal s reveals the action profile a, i.e., a = a ⇒ supp q · a ∩ supp q · a = . It is well known that, in this case, the max min of the repeated game is the independent max min of player II; i.e., w = v 1 = max x∈X min b g x b . Now we verify that our main theorem gives the same value. Now let x 1 x 2 ∈ co V such that x 1 ≥ 0. We can write x 1 x 2 as a convex combination: From the above discussion, for each k such that k > 0, c k is a Dirac measure on some x k ∈ X; thus c k = min b g x k b ≤ v 1 . Therefore, x 2 ≤ v 1 and also w ≤ v 1 , hence the equality.

Trivial observation.
We say that the observation is trivial when the signal s does not depend on the action profile a. In this case, there is no limitation on the correlation the team may achieve by exchanging some messages; thus w = val g = max x∈ A min b g x b , which is the correlated max min of player II. Applying our main theorem, we remark that if observation is trivial, H c ≥ 0 for each c. Let x ∈ A that achieves the max in val g and let c be such that the distribution induced on actions is x c = x (e.g., decompose x as a convex combination of pure action profiles). One has H c ≥ 0 and c = val g; thus w = val g.

5.
3. 3 8 is not optimal in the main example. We revisit our main example, i.e., the following three-player game where Player 1 chooses rows, Player 2 chooses columns, and Player 3 chooses matrices: We have c +1 = 1 4 , H c +1 = +1, c −1 = 1 2 , and H c −1 = −1 since the move of Player 2 at an even stage reveals the action of Player 1 at the previous stage. The so-defined strategy, playing c +1 at odd stages and c −1 at even stages, gives an average payoff of 3 8 and an average entropy variation of 0. Now we deduce from Theorem 5 the existence of strategies for Players 1 and 2 that guarantee more than 3 8 . By Theorem 5, it is enough to show the existence of a convex combination of two correlation systems yielding an average payoff larger than 3 8 and a nonnegative average entropy variation. Define the correlation system c which puts equal weights on 1 − ⊗ 1 0 and 1 − ⊗ 0 1 : Using that h 0 = + , we deduce the existence of > 0 such that H c c lies above the line For this , there exists 0 ≤ ≤ 1 such that H c + 1 − H c +1 = 0 and c + 1 − c +1 > 3 8 , which implies that the team can guarantee more than 3 8 . Figure 2 gives a geometric illustration of the fact that playing c and c +1 with frequencies and 1 − yields a payoff above 3 8 .
6. Computing w. In §4, the max min w is characterized as cav u 0 with u h = max c c ∈ C H c ≥ h so the numerical computation of w consists of computing the function u h , i.e., in solving the associated optimization problem. This task proves to be difficult. In the paper, Gossner et al. [10], we develop tools to solve it starting from the following observations. In this maximization problem, the objective function c = min b g x c b depends on the correlation system c through its barycenter x c only. Also, if we look at the constraint in the expression x c a q · a the second and third terms depend only on x c . Only the first term H a x = H x dc x depends on the way the distribution c averages on x c . We argue that fixing the barycenter x c , we may choose any other c that also averages on x c provided that H x dc x ≥ H x dc x . In Gossner et al. [10], we study the problem of how to generate a correlated distribution of actions x * through a correlation system c, while maximizing the expected entropy: max c x c =x * H x dc x . Note that this latter problem is independent both of the game and of the signaling structure; thus its solution is helpful in solving all the instances covered by Theorems 5 and 14. The paper Gossner et al. [10] studied this auxiliary problem and solved the case where the team consists of two players, each of them having two actions. The solution and its proof are rather involved and the reader is referred to Gossner et al. [10] for the statement of the solution. Building on this result, two examples of games and signalling structures have been completely resolved so far: one in Gossner et al. [10] and one in Goldberg [7].
Note that for each h ∈ , either cav u h = u h or cav u is linear on some interval containing h. Thus, either cav u 0 = c for some c s.t. H c ≥ 0 or there exists c 1 , c 2 , and ∈ 0 1 s.t. cav u 0 = c 1 + 1 − c 2 and H c 1 + 1 − H c 2 ≥ 0. In the first case, the optimal strategy can be thought of as stationary (in the space of correlation systems), since only one correlation system is used at almost all stages. In the second case, the strategy repeatedly plays two phases. Assume without loss of generality H c 1 > 0. In a first phase, the optimal strategy plays c 1 to accumulate entropy; in a second phase, the optimal strategy plays c 2 , spending entropy to yield a good payoff. The relative lengths of these phases are 1 − . Gossner et al. [10] showed that our main example is of the first kind and Goldberg [7] exhibited an example of the second.
We consider once more the main example. In this case, Gossner et al. [10] proved that the only points v in the set V that are undominated (i.e., v + 2 for some x ∈ 0 1 . Such a correlation system has the following properties: for each x, the marginal distribution of actions under c is 1 2 1 2 for each player and the respective probabilities of a a and b b are equal. It follows that the associated payoff is c = 1 2 x 2 + 1 2 1 − x 2 ; the entropy variation is H c = 2H x 1 − x − 1. The graph of h → u h is then the parametric curve: This curve is concave (this is easily checked by computing the slope of this curve at x) thus w = cav u 0 = u 0 ; i.e., w = 1 2 The graph of u can be seen in Figure 3. 7. Proof of the main results.

Player II defends w.
Here we prove that for every strategy of the team, if player II plays stage-best replies, the average vector of (payoffs, entropy variation) generated belongs to V . This latter implies that no strategy for the team can guarantee a better payoff than w. The proof follows the same lines as some previous papers using entropy methods (see, e.g., Neyman and Okada [18,19] and Gossner and Vieille [?]). Definition 6. Let be a strategy for the team. Define inductively as the strategy of player II that plays stage-best replies to : At Stage 1, ∈ arg min b g b where is the null history that starts the game. Assume that is defined on histories of length less that n + 1. For every history h II n of player II, let x n+1 h II n ∈ A be the distribution of the action profile of the team at stage n + 1 given h II n and let h II n be in arg min b g x n+1 h II where the second equality holds since a m and b m are independent conditional on h II m , the third equality holds since b m is h II m -measurable, and the fourth equality holds since a m s m depends on h m only through x m . We deduce Corollary 8. Player II defends w in every -discounted game; i.e., for each ∈ 0 1 and strategy profile for the team : ≤ w Therefore, for each , v ≤ w.
Proof. The discounted payoff is a convex combination of the average payoffs (see, e.g., Lehrer and Sorin [17]): = n≥1 1 − 2 n n−1 n From Lemma 7, we get ≤ w and thus v ≤ w.
7.2. v n converges to w. We introduce a class of strategies for the team against which the myopic best reply is a best reply in the repeated game. Call a strategy of a team player autonomous if it does not depend on player II's past moves; that is, for i ∈ I, i n A × S n → A i . Against a profile of autonomous strategies, the myopic best reply is a true best reply. Lemma 9. Let be a profile of autonomous strategies, for each stage n and strategy for player II, E g a n b n ≤ E g a n b n . Thus is player II's best reply in any version of the repeated game.
Proof. Consider the optimization problem of player II, min E n≥1 1 − n−1 g a n b n Since player II's moves do not influence the play of the team, this optimization problem is equivalent to solving min b E g a n b h II n for n and each history h II n . The same argument applies in the n-stage game. Now we associate autonomous strategies to distributions on strings of actions and signals. Note that for every autonomous strategy , the induced distribution P on A × S is such that for every history h I n = a 1 s 1 a n s n , P a n+1 s n+1 h I n = i i h I n a i ·q s a . We let Y be the set of probability distributions y on A × S for which there exists x ∈ X such that for each a s , y a s = i x i a i · q s a .
We call a distribution P on A × S a Y -distribution if at each stage n, after P-almost every history h I n = a 1 s 1 a n s n ∈ H I n , the distribution of a n+1 s n+1 conditional on h I n , P a n+1 s n+1 h I n belongs to Y . Every autonomous strategy profile induces a Y -distribution; conversely, a Y -distribution defines an autonomous strategy profile (up to histories with probability 0: for these histories, the strategy is arbitrarily defined).
Given an autonomous strategy profile or equivalently a Y -distribution, consider the random correlation system at stage n: given h II n , c n is the distribution of h I n . The random variable c n is h II n -measurable with values in C = X . We consider the empirical distribution of correlation systems up to stage n, i.e., the time frequencies of correlation systems appearing along the history h II n . We define the random variable For every ∈ C such that E H ≥ 0, there exists a Y -distribution P on A × S such that E P d n weak- * converges to .
Since a Y -distribution P corresponds to an autonomous strategy, there exists autonomous such that n weak- * converges to .
Note that Theorem 2.2 (Gossner and Tomala [11]) applies to an observer (here player II) who gets deterministic signals on a stochastic process. The process may be constrained in such a way that transitions belong to a fixed closed subset of probability distributions. When applying Theorem 2.2 to prove Lemma 10, we assume that the team chooses the pair a s at each stage and is constrained in that the law of a s conditional on the past history belongs to y ∈ Y . Since the transition q is fixed, choosing y = x ⊗ q ∈ Y is equivalent to choosing x ∈ X, so the construction is legitimate.
Proof. For each such that E H ≥ 0, the previous lemma yields the existence of an autonomous strategy such that lim n n = E . From Lemma 9, this gives lim inf n v n ≥ E . We may now conclude the proof. The set of vectors E H E as varies in C is co V ; thus sup E ∈ C E H ≥ 0 = w. From Lemmas 7 and 11 we get lim n v n = w. 7.3. v converges to w. Since v ≤ w, it is enough to prove the following lemma: Proof. For > 0, choose autonomous such that n ≥ w − /2. Define a cyclic strategy * as follows: play until stage n and restart this strategy every n stages. Set y m as the expected payoff under * * at stage m. Since * is cyclic, * is also cyclic and * * = 7.4. Proof of the existence and value of v . From Lemma 7, by playing stage-best replies player II defends w. On the other hand, team I guarantees v n by playing cyclically an optimal strategy in the n-stage game, thus I guarantees lim v n = w.
8. More general signalling structures. The method developed in this paper, and thus Theorem 5, extends to a larger class of signals than those presented in §3. Note, indeed, that our proof relies on only the following three conditions: (i) the signal of player II does not depend on his own action; (ii) the information regarding actions and signals that are unobserved by player II is symmetric within the team; and (iii) each team player knows the information of player II regarding those actions and signals. Condition (i) means that the entropy variation is not controlled by player II, which ensures that player II best-responds in the repeated game by optimizing myopically. Otherwise, player II faces a tradeoff between best-responding (potentially allowing team players to get a large amount of entropy) and minimizing the entropy produced by his choice. The case where the entropy variation depends on the action of player II is under investigation. Conditions (ii) and (iii) mean that each team player is able to compute the entropy variation. They thus agree on how to use the available entropy for correlation.
Consider the following signalling structure: if the team plays the action profile a and player II plays action b, then • a pair of signals s t ∈ S × T is drawn from a pair of finite sets S, T according to q · a with q A → S × T . The tuple a s t is observed by each team player. Player II observes b s ; • each player i ∈ I observes a private signal r i = f i a b which is a deterministic function of the action profile.
These signalling structures generalize those of §3 in two respects. First, team players do not fully observe the move of player II. Second, they get to observe a random signal t that depends on the action profile. For instance, the generalization includes the case where all actions are perfectly observed and the team gets to privately observe at each stage the realization of a random variable. Note that the more-general signaling structure satisfies the requirements (i), (ii), and (iii) above: the only information asymmetry within the team is about the move of player II, which cannot be used for correlation. Theorem 5 extends naturally to these signalling structures.
The definition of the optimal payoff c associated with a correlation system c is unchanged. The definition of the entropy variation generalizes as follows: Definition 13. Let c be a correlation system and x a s t be a random variable in X × A × S × T such that the law of x is c, the law of a given x = x is x, and the law of s t given a = a is q · a . The entropy variation of c is With this adaptation, we still consider the set of feasible vectors H c c in the (entropy variation, payoff) plane: V = H c c c ∈ C and we derive the quantity Under the generalized signaling structure, the max min value of the -discounted game and of the n-stage game both converge to the same limit, respectively, as goes to 1 and n goes to infinity. This limit is Furthermore, the uniform max min exists and takes the value w.
Proof. We begin by extending the proof of Lemma 7. We first modify the signalling structure by assuming that the actions of player II are publicly observable. In this modified game, since the signals r i are deterministic functions of b, the set of strategies of the team is larger, so if player II defends w in the modified game, it is also true in the original game.
The crux is to prove that Using the additivity of entropy, Second, to prove that the team guarantees w, define an autonomous strategy as a strategy that does not depend on the signals r i ; i.e., it depends solely on the action profiles a and on the signals s t . We let Y ⊂ A × S × T be the set of probability distributions y such that ∀ a s t , y a s t = x a q s t a for some x ∈ X. This is the set of distributions on A × S × T that can be obtained by a profile of mixed strategies x of the team and the transition q. An autonomous strategy can be identified with a probability distribution P on A × S × T such that at each stage n, after P-almost every history h I n = a 1 s 1 t 1 a n s n t n ∈ H I n , the distribution P a n+1 s n+1 t n+1 h I n belongs to Y . We may thus apply Lemma 10 to Y -distributions and conclude as in Theorem 5.
9. Consequences for the Folk Theorem. In repeated games with imperfect monitoring, information asymmetries raise a number of difficulties that cause the set of equilibrium payoffs to be hard to characterize in general. For this reason, the central results consider public equilibria (Abreu et al. [1], Lehrer [14], Fudenberg et al. [6]), equilibria in which a communication mechanism serves to resolve information asymmetries (see Compte [4], Kandori and Matsushima [13], Renault and Tomala [21]), or two-player games (Lehrer [15,16]). In our approach, we tackle information asymmetries by measuring them with the entropy function.
The previous examples show three-player games in which our main theorem allows us to characterize the individually rational payoff of one player in the repeated game. Now we present a signalling structure for which our theorem allows for a characterization of all individually rational payoffs.
Consider a game in which the set of players is 1 n , n ≥ 4, and in which i's finite action set is A i . Players i = 2 n − 1, have perfect observation: they observe s i = a 1 a n . Player 1 observes every player but player n's signal is s 1 = a 1 a n−1 . Player n observes every player but Player 1's signal is s n = a 2 a n . This structure of signals is represented in Renault and Tomala [20] by a graph whose nodes are the players and where there is an edge between i and j whenever i and j monitor each other. The graph described here is two-connected: there are at least two distinct paths from i to j for each pair i j .
Let co g A be the set of feasible payoffs. To define the individually rational level of player i in the repeated game, we consider the game played by the team −i-i.e., all players but i-against player i (thus, with payoff −g i ), and we let v i be the associated uniform value. We set then IR = x ∈ n x i ≥ v i , the set of individually rational payoffs with respect to the min max values of the repeated game. Renault and Tomala [20] proved that in a repeated game where each player monitors the actions of his neighbors in a fixed graph the set of uniform equilibrium payoffs equals co g A ∩ IR when the graph is two-connected. However, Renault and Tomala [20] left open the characterization of min max values of the repeated game.
Since each Player 1 < i < n has perfect observation, his individually rational level v i in the repeated game equals his independent min max v i 1 . Regarding player n (respectively, Player 1), we may apply Theorem 14. Signals are deterministic: team players in 1 n − 1 fully observe the team action profile, and each of them gets to observe a signal on the move of player n. Player 1 observes a constant signal and the other players observe this move. We thus get a complete characterization of the set of uniform equilibrium payoffs.
Lehrer [14] characterized Nash equilibrium payoffs for all repeated games having a semistandard signalling structure. Our example constitutes-as far as we know-the only other n-player signalling structure for which a characterization of Nash equilibrium payoffs is known for all payoff functions.