EMPIRICAL DISTRIBUTIONS OF BELIEFS UNDER IMPERFECT OBSERVATION

Let (xn)n be a process with values in a finite set X and law P , and let yn = f(xn) be a function of the process. At stage n, pn = P (xn | x1, . . . ,xn−1), element of Π = ∆(X), is the belief of a perfect observer of the process on its realization at stage n. A statistician observing y1, . . . ,yn’s holds a belief en = P (pn | x1, . . . ,xn) ∈ ∆(Π) on the possible predictions of the perfect observer. Given X and f , we characterize set of limiting expected empirical distributions of the process (en) when P ranges over all possible laws of (xn)n. Date: January 2004. 1 2 OLIVIER GOSSNER AND TRISTAN TOMALA


Introduction
We study the gap in predictions made by agents that observe different signals about some process (x n ) n with values in a finite set X and law P .
Assume that a perfect observer observes (x n ) n , and a statistician observes a function y n = f (x n ).At stage n, p n = P (x n | x 1 , . . ., x n−1 ), element of Π = ∆(X), is the best prediction that a perfect observer of the process can make on its next realization.To a sequence of signals y 1 , . . ., y n corresponds a belief e n = P (p n | y 1 , . . ., y n−1 ) that the statistician holds on the possible predictions of the perfect observer.The information gap about the future realization of the process at stage n between the perfect observer and the statistician is seen in the fact that the perfect observer knows p n , whereas the statistician knows only the law e n of p n conditional to y 1 , . . ., y n−1 .
We study the possible limiting expected empirical distributions of the process (e n ) when P ranges over all possible laws of (x n ) n .
Call experiment elements of E = ∆(Π) and experiment distribution elements of ∆(E).We say that an experiment distribution δ is achievable if there is a law P of the process for which δ is the limiting expected empirical distributions of (e n ).Represent an experiment e by a random variable p with finite support and values in Π.Let x be a random variable with values in X such that, conditional on the realization p of p, x has law p.Let then y = f (x).We define the entropy variation associated to e as:

∆H(e) = H(p, x|y) − H(p) = H(x|p) − H(y)
This operator measures the evolution of the uncertainty for the statistician on the predictions of the perfect observer.
Our main result is that an experiment distribution δ is achievable if and only if E δ (∆H) ≥ 0. This result has applications to statistical problems and to game theoretic ones.
Assume that at each stage, both the perfect observer and the statistician take a decision, and the payoff to each decision maker is a function of his decision and the realization of the process.Then, given that both agents maximize expected utilities, their expected payoffs at stage n write as a function of e n .Consequently, their long-run expected payoffs is a function of the long-run expected empirical distribution of the process (e n ).One application of our result (under progress) is a characterization of the bounds on the value of information in repeated decision problems.
Information asymmetries in repeated interactions is also a recurrent phenomenon in game theory, and arise in particular when agents observe private signals, or have limited information processing abilities.
In a repeated game with private signals, each players observes at each stage of the game a signal that depends on the action profile of all the players.While public equilibria of these games (see e.g.Abreu, Pierce and Stachetti [APS90] and Fudenberg, Levine and Maskin [FLM94]), or equilibria in which a communication mechanism serves to resolve information asymmetries (see e.g.Compte [Com02] and Renault and Tomala [RT00]) are well characterized, endogenous correlation and endogenous communication gives rise to difficult questions that have only been tackled for particular classes of signalling structures (see Lehrer [Leh90] [Leh91], Renault and Tomala [RT98], and Gossner and Vieille [GV01]).
When agents have different information processing abilities, some players may be able to predict more accurately the future process of actions than others.These phenomena have been studied in the frameworks of finite automata (see Ben Porath [BP93], Neyman [Ney97] [Ney98], Gossner and Hernández [GH03], Bavly and Neyman [BN03], Lacôte and Thurin Our result has already found applications to the characterization of the minmax values in classes of repeated games with imperfect monitoring (see Gossner and Tomala [GT03], Gossner, Laraki and Tomala [GLT03], and Goldberg [Gol03]).
Next section presents the model and main results, while the remain of the paper is devoted to the proof of our theorem.

Definitions and main results
2.1.Notations.For a finite set S, |S| denotes its cardinality.
For S compact, ∆(S) denotes the set of regular probability measures on S endowed with the weak- * topology (thus ∆(S) is compact).
If (x, y) is a pair of random variables defined on a probability space (Ω, F, P ) such that x is finite, P (x|y) denotes the conditional distribution of x given {y = y} and P (x|y) is the random variable with value P (x|y) if y = y.ǫ x denotes the Dirac measure on x, i.e. the probability measure with support {x}.
If x is a random variable with values in a compact subset of a topological vector space V , E(x) denotes the barycenter of x and is the element of V such that for each continuous linear form ϕ, E(ϕ(x)) = ϕ(E(x)).
If p and q are probability measures on two probability spaces, p⊗q denotes the direct product of p and q, i.e. (p ⊗ q)(A × B) = p(A) × q(B).

Definitions.
2.2.1.Processes and Distributions.Let (x n ) n be a process with values in a finite set X such that |X| ≥ 2 and let P be its law.A statistician gets to observe the value of y n = f (x n ) at each stage n, where f : X → Y is a fixed mapping.Before stage n, the history of the process is x 1 , . . ., x n−1 and the the history available to the statistician is y 1 , . . ., y n−1 .The law of x n given the history of the process is: This defines a (x 1 , . . ., x n−1 )-measurable random variable p n with values in Π = ∆(X).The statistician holds a belief on the value of p n .For each history y 1 , . . ., y n−1 , we let: e n (y 1 , . . ., y n−1 ) = P (p n |y 1 , . . ., y n−1 ) This defines a (y 1 , . . ., y n−1 )-measurable random variable e n with values in E = ∆(Π).Following Blackwell [Bla51] [Bla53], we call experiments the elements of E.
The empirical distribution of experiments up to stage n is: The (y 1 , . . ., y n−1 )-measurable random variable d n has values in D = ∆(E).We call D the set of experiment distributions.
Definition 1.We say that the law P of the process n-achieves the experiment distribution δ if E P (d n ) = δ, and that δ is n-achievable if there exists P that n-achieves δ.D n denotes the set of n-achievable experiment distributions.
We say that the law P of the process achieves the experiment distribution and that δ is achievable if there exists P that achieves δ.D ∞ denotes the set of achievable experiment distributions.
Proof.To prove (1) and (2), let P n and P ′ m be the laws of processes x 1 , . . ., x n and x ′ 1 , .
where 0 log 0 = 0 by convention.Note that H(x) is non-negative and depends only on the law P of x.The entropy of a random variable x is thus the the entropy H(P ) of its distribution P , with H(P ) = − P (x) log P (x).
Let (x, y) be a couple of random variables with joint law P such that x is finite.The conditional entropy of x given {y = y} is the entropy of the conditional distribution P (x|y): The conditional entropy of x given y is the expected value of the previous: If y is also finite, one has the following relation of additivity of entropies: Given an experiment e, let p be a random variable in Π with distribution e, x be a random variable in X such that the distribution of x conditional on {p = p} is p and y = f (x).Definition 3. The entropy variation associated to e is: Assume that e has finite support.From the additivity formula: The interpretation is the following.The operator ∆H measures the evolution of the uncertainty of the statistician at a given stage.Assume e is the experiment representing the information gap between the perfect observer and the statistician that at stage n.The evolution of information can be seen with the following procedure: • Draw p according to e; • If p = p, draw x according to p; • Announce y = f (x) to the statistician.
The uncertainty for the statistician at the beginning of the procedure is H(p).At the end of the procedure, the statistician knows the value of y and p, x are unknown to him, the new uncertainty is thus H(p, x|y).∆H(e) is therefore the variation of entropy across this procedure.Note that it also writes as the difference between the entropy added to p by the procedure: H(x|p), and the entropy of the information gained by the statistician: H(y).
Lemma 5.The operator ∆H : E → R is continuous.
Proof.H(x|p) = H(x|p)de(p) is linear-continuous in e, since H is a continuous on Π.The mapping that associates to e, the probability distribution of y is also linear-continuous.2.3.Main results.We characterize achievable distributions.Theorem 6.An experiment distribution δ is achievable if and only if We also prove a stronger version of the previous theorem in which the transitions of the process are restricted to belong to an arbitrary subset of Π. Remark 10.If C is closed, the set of experiment distributions that are achievable by laws of C-processes is convex and closed.The proof is identical as for D ∞ so we omit it.
2.4.Trivial observation.We say that the observation is trivial when f is constant.
Lemma 11.If the observation is trivial, any δ is achievable.
This fact can easily be deduced from theorem 6.Since f is constant, H(y) = 0 and thus ∆H(e) ≥ 0 for each e ∈ E. However, a simple construction provides a direct proof in this case.
Proof.By closedness and convexity, it is enough to prove that any δ = ǫ e with e of finite support is achievable.Let thus e = k λ k ǫ p k .Again by closedness, assume that the λ k 's are rational with common denominator 2 n for some n.Let x = x ′ be two distinct points in X and x 1 , . . ., x n be i.i.d. with law 1 2 ǫ x + 1 2 ǫ x ′, so that (x 1 , . . ., x n ) is uniform on a set with 2 n elements.Map (x 1 , . . ., x n ) to some random variable k such that P (k = k) = λ k .Construct then the law P of the process such that conditional on k = k, x t+n has law p k for each t.P achieves δ.
2.5.Perfect observation.We say that information is perfect when f is one-to-one.Let E d denote the set of Dirac experiments, i.e. measures on Π whose support are a singleton.This set is a weak- * -closed subset of E.
We derive this result from thm. 6.
On the other hand, if e = 1 2 ǫ ǫ j + 1 2 ǫ ǫ k , under e the law of x conditional on p is a Dirac measure and thus H(x|p) = 0 whereas the law of y is the one of a fair coin and H(y) = 1.Thus, E δ (∆H) = ∆H(e) < 0 and from thm. 6 δ is not achievable.
The intuition is as follows: if δ were achievable by P , only j and k would appear with positive density P -a.s.Since f (j) = f (k), the statistician can reconstruct the history of the process given his signals, and therefore correctly guess P (x n |x 1 , . . ., x n−1 ).This contradicts e = 1 2 ǫ ǫ j + 1 2 ǫ ǫ k which means that at almost each stage, the statistician in uncertain about P (x n |x 1 , . . ., x n−1 ) and attributes probability 1 2 to ǫ j and probability 1 2 to ǫ k .

Reduction of the problem
The core of our proof is to establish the next proposition.Sections 4, 5, 6 and 7 are devoted to the proof of this proposition.We now prove theorems 6 and 9 from proposition 13.

3.1.
The condition E δ ∆H ≥ 0 is necessary.We prove that any achievable δ must verify E δ ∆H ≥ 0. Let δ be achieved by P .Recall that e n is a (y 1 , . . ., y n−1 )-measurable random variable in E. ∆H(e n ) is thus a (y 1 , . . ., y n−1 )-measurable real-valued random variable and: From the definitions: We derive: The first equality uses the additivity of entropies and the fact that p m is (x 1 , . . ., x m−1 )-measurable and the second is a reordering of the first.The third equality uses again that p m is (x 1 , . . ., x m−1 )-measurable and thus, conditionally on p m , x m is independent of (x 1 , . . ., x m−1 ).It follows that:  n ǫ e j is feasible where e j = ǫ p j .But then, with e j ∈ E C for each j.Let S be the finite set of distributions ǫ e j ; j .We claim that δ can be written as a convex combination of distributions δ k such that: • For each k, δ k is the convex combination of two points in S.
This follows from the following lemma of convex analysis: Lemma 16.Let S be a finite set in a vector space and f be a real-valued affine mapping on co S the convex hull of S. For each x ∈ co S, there exists an integer K, non-negative numbers λ 1 , . . ., λ K summing to one, coefficients t 1 , . . ., t K in [0, 1], and points (x k , x ′ k ) in S such that: Proof.Let a = f (x).The set S a = {y ∈ co S, f (y) = a} is the intersection of a polytope with a hyperplane.It is thus convex and compact so by Krein-Milman's theorem (see e.g.[Roc70]) it is the convex hull of its extreme points.An extreme point y of S a -i.e. a face of dimension 0 of S a -must lie on a face of co S of dimension at most 1 and therefore is the convex combination of two points of S.
We apply lemma 16 to S = ǫ e j ; j and to the affine mapping δ → E δ (∆H).Since the set of achievable distributions is convex it is enough to prove that for each k, δ k is achievable.The problem is thus reduced to δ = λǫ e + (1 − λ)ǫ e ′ such that λ∆H(e) + (1 − λ)∆H(e ′ ) > 0. We approximate λ by a rational number and since C is closed, we may assume that the supports of e and e ′ are finite subsets of C. Proposition 13 now applies.

Presentation of the proof of proposition 13
We want to prove that any δ = λǫ e + (1 − λ)ǫ e ′, with λ∆H(e) + (1 − λ)∆H(e ′ ) > 0 and λ rational is achievable.We construct a process P such that the induced experiment is close to e and to e ′ in proportions λ and 1−λ of the time.How can we design a process such that between periods T and T + n, the experiments are close to e?
A first idea is to define x T +1 up to x T +n as follows: draw independently of the past p T up to p T +n i.i.d.according to e, and then draw independently x t according to p t for T + 1 ≤ t ≤ T + n.This simple construction is not adequate since the induced experiment in stages T + 1 ≤ t ≤ T + n is the unit mass on E e (p), and is different from e as soon as e is not a Dirac measure.We need to construct the process in such a way that a perfect observer knows which is the distribution p t of x t conditional to the past, but the statistician only knows it may be p with probability e(p).
We thus amend the previous construction in order to take into account the information gap between a perfect observer of the process and the statistician before stage T .When the realized sequence of signals to the statistician up to stage T is ỹT = (y 1 , . . ., y T ), this information gap can be measured by the conditional probability µ(ỹ T ) = P (x 1 , . . ., x T |ỹ T ).
Assume that the distribution µ(ỹ T ) is close to that of n i.i.d.random variables and has entropy nh with h > H(e).We explicit in this case a mapping ϕ from X T to Π n such that the image distribution of µ(ỹ T ) by ϕ is close to e ⊗n .
We construct then the process at stages T + 1 up to T + n as follows: Let (p T +1 , . . ., p T +n ) be the image of (x 1 , . . ., x T ) by ϕ, draw x t for T ≤ t ≤ T + n according to the realization of p t and independently of the rest.
The realized sequence of experiments e T +1 , . . ., e T +n is then close to e repeated n times, since the statistician does not know the realized value of p t , whereas the perfect observer does.
Our construction of the process P mostly relies on the above idea.In order to formalize it, we need to define the notions of closeness that are useful for our purposes (closeness between µ(ỹ T ) and the uniform distribution, closeness between the realized sequence of experiments and e).Once we have defined the conditions on µ(ỹ T ) that allow us to construct the process for stages T ≤ t ≤ T + n with n = λN , we need to check that, with large enough probability, the construction can be applied once more for the block To do this, we prove that with high enough probability, µ(ỹ T ) is close that of m i.i.d.random variables and has total entropy n(h + ∆H(e)).
Section 5 presents the construction of the process for one block of stages, and establishes the necessary bounds on closeness of probabilities.In section 6, we iterate the construction, and show the full construction of the process P , including after sequences µ(ỹ T ) for which the construction of section 5 fails.We terminate the proof by proving the weak- * convergence of the sequence of experiments to λe + (1 − λ)e ′ in section 7.In this last part, we first control the Kullback distance between the law of the process of experiments under P and an ideal law Q = e ⊗n ⊗ e ′⊗m ⊗ e ⊗n ⊗ e ′⊗m ⊗ . .., and finally relate the Kullback distance to weak- * convergence.

The one block construction
5.1.Kullback and absolute Kullback distance.For two probability measures with finite support P and Q, we write P ≪ Q when Q is absolutely continuous with respect to P i.e. (Q(x) = 0 ⇒ P (x) = 0).
Definition 17.Let K be a finite set and P, Q in ∆(K) such that P ≪ Q, the Kullback distance between P and Q is, We recall the absolute Kullback distance and its comparison with the Kullback distance from [GV02] for later use.
Definition 18.Let K be a finite set and P, Q in ∆(K) such that P ≪ Q, the absolute Kullback distance between P and Q is, 5.2.Equipartition properties.We say than a probability P with finite support verifies an Equipartition Property (EP for short) when all points in the support of P have close probabilities.
Definition 20.Let P ∈ ∆(K), n ∈ N, h ∈ R + ,η > 0. P verifies an EP(n, h, η), when We say than a probability P with finite support verifies an Asymptotic Equipartition Property (AEP for short) when all points in a set of large P -measure have close probabilities.

Distributions induced by experiments and by codifications.
Let e ∈ ∆(Π) be an experiment with finite support and n be an integer.
Notation 23.Let ρ(e) the probability on Π × X induced by the following procedure: First draw p according to e, then draw x according to the realization of p.Let Q(n, e) = ρ(e) ⊗n .
We need to approximate Q(n, e) in a construction where (p 1 , . . ., p n ) is measurable with respect to some random variable l of law P L in an arbitrary set L.
Notation 24.Let (L, P L ) be a finite probability space and ϕ : L → Π n .We denote by P = P (n, L, P L , ϕ) the probability on L × (Π × X) n by the following procedure.Draw l according to P L , set (p 1 , . . ., p n ) = ϕ(l) and then draw x t according to the realization of p t .
We let P = P (n, L, P L , ϕ) be the marginal of P (n, L, P L , ϕ) on (Π × X) n .
Another point we need to take care of is that such a construction can be iterated, by relating properties of the "input" probability measure P L with those of the "output" probability measure P (l, p 1 , . . ., p n , x 1 , . . ., x n |y 1 , . . ., y n ).
In proposition 25, the condition on P L is an EP property, thus a stronger input property than the output property which is stated as an AEP.We strengthen this result by assuming that P L verifies an AEP property in proposition 26.5.5.EP to AEP codification result.We now state and prove our coding proposition when the input probability measure P L verifies an EP.Proposition 25.For each experiment e, there exists a constant U (e) such for every integer n with e ∈ T n (Π) and for every finite probability space (L, P L ) that verifies an EP(n, h, η) with n(h − H(e) − η) ≥ 1, there exists a mapping ϕ : L → Π n such that, letting P = P (n, L, P L , ϕ) and P = P (n, L, P L , ϕ): (1) d( P ||Q(n, e)) ≤ 2nη + |supp e| log(n + 1) + 1 (2) For every ε > 0, there exists a subset Y ε of Y n such that: Proof of prop.25.
Construction of ϕ: Since P L verifies an EP(n, h, η), From the previous and equation ( 1 Hence for α 1 > 0: and from lemma 19, The statistics of ( k, s) under P : We write that the type ρ p,x ∈ ∆(Π × X) of (p, x) ∈ (Π × X) n is close to ρ, with large P -probability.First, note that since ϕ takes its values in T n (e), the marginal of ρ p,x on Π is e with Pprobability one.For (p, x) ∈ Π × X, the distribution under P of nρ p,x (p, x) is the one of a sum of ne(p) independent Bernoulli variables with parameter p(x).For α 2 > 0 the Bienaymé-Chebyshev inequality gives: Hence, The set of ỹ ∈ Y n s.t.Qỹ verifies an AEP has large P -probability: and, denoting f (ρ) the image of ρ on Y : letting M 0 = −2|(supp e) × X| log(min p,x ρ(p, x)), this implies: Define: Equations ( 4) and ( 5) yield: Definition of Y ε and verification of (2a): 3) and (6) imply We first prove that Pỹ verifies an AEP for ỹ ∈ Y ε .
5.6.AEP to AEP codification result.Building on proposition 25, we now can state and prove the version of our coding result in which the input is an AEP.
Lemma 27.K is a finite set.Suppose that P ∈ ∆(K) verifies an AEP(n, h, η, ξ).Let the typical set of P be: Let P C ∈ ∆(K) be the conditional probability given C: P C (k) = P (k|C).

Construction of the process
Taking up the proof of proposition 13, let λ rational, e, e ′ having finite support be such that λ∆H(e) + (1 − λ)∆H(e ′ ) > 0 and C = supp e ∪ supp e ′ .
We wish to construct a law P of a C-process that achieves δ = λǫ e +(1−λ)ǫ e ′.
Again by closedness of the set of achievable distributions, we assume w.l.o.g.Therefore, there exists p 0 ∈ supp e such that ∆H(ǫ p 0 ) > 0 and we assume w.l.o.g.supp e ′ ∋ p 0 .Hence max{d(ǫ p 0 e), d(ǫ p 0 e ′ )} is well defined and finite.
We construct the process by blocks.For a block lasting from stage T + 1 up to stage T + M (resp.T + N ), we construct (x 1 , . . ., x T )-measurable random variables p T +1 , . . ., p T +M such that their distribution conditional to y 1 , . . ., y T is close to that of M (resp.N ) i.i.d.random variables of law e (resp.e ′ ).We then take x T +1 . . ., x T +M of law p T +1 , . . ., p T +M and independent of the past of the process conditional to p T +1 , . . ., p T +M .
We define the process (x t ) t and its law P over N = N 0 +L(M +N ) stages, where (M, N ) are multiples of (m, n), inductively over blocks of stages.

Definition of the blocks
The first block labelled 0 is an initialization phase that lasts from stage 1 to N block During the initialization phase, x 1 , x 2 , . . ., x N 0 are i.i.d. with law p 0 , inducing a law P 0 of the process during this block.
First block Let S 0 be the set of ỹ0 ∈ Y N 0 such that P 0 (•|ỹ 0 ) veri-fies an AEP(M, h 0 , η 0 , ξ 0 ).After histories in S 0 and for suitable values of the parameters h 0 , η 0 , ξ 0 , proposition 26 allows to define random variables p N 0 +1 , . . ., p N 0 +M such that their distribution conditional to y 1 , . . ., y N 0 is close to that of M i.i.d.random variables of law e.We then take x N 0 +1 , . . ., x N 0 +M of law p N 0 +1 , . . ., p N 0 +M , and independent of the past of the process conditional to p N 0 +1 , . . ., p N 0 +M .We let x t be i.i.d. with law p 0 after histories not in S 0 .This defines the law of the process up to the first block.
Second block Let ỹ1 be an history of signals to the statistician during the initialization block and the first block.Proposition 26 ensures that, given ỹ0 ∈ S 0 , P 1 (•|ỹ 1 ) verifies an AEP(N, h 1 , η 1 , ξ 1 ) with probability no less that 1 − ε, where h (2).From the definition of the sequence (η k ), for each k: Using that 4 √ ξ max < 20, the expression of η max follows.
The starting entropy h 0 comes from the initialization block.
We give now sufficient conditions for the construction of the process to be valid.Summing up we get, Lemma 31.Under conditions (11), ( 12), (13), and ( 14), the process is well-defined.

Bound on Kullback distance
Let P be the law of the process process (x t ) defined above.We estimate on each block the distance between the sequence of experiments induced by P with b ⊗M [resp e ′⊗N ].Then, we show that these distances can be made small by an adequate choice of the parameters.Finally, we prove the weak- * convergence of the distribution of experiments under P to λe + (1 − λ)e ′ .
By uniform continuity of g, for every ε > 0, there exists ᾱ > 0 such that: e 1 − e 2 1 ≤ ᾱ =⇒ |g(e 1 ) − g(e 2 )| ≤ ε We let e k = e for k odd and e k = e ′ for k even and g = max e ′′ |g(e ′′ )|.For where the first inequality comes from the convexity of the Kullback distance.
Reporting in the previous and averaging over blocks yields: Thus, |E δ ′g − E δγ g| goes to 0 as γ goes to 0.

Definition 7 .
The distribution δ ∈ D has support in C ⊂ Π if for each e in the support of δ, the support of e is a subset of C. Definition 8. Given C ⊂ Π a process (x n ) n with law P is a C-process if for each n, P (x n |x 1 , . . ., x n−1 ) ∈ C, P -almost surely.Theorem 9. Let C be a closed subset of Π.If δ has support in C and E δ (∆H) ≥ 0, then δ is achievable by the law of a C-process.

Proof. 3
If e ∈ E d , the random variable p associated to e is constant a.s., therefore H(x|p) = H(x) = H(y) since observation is perfect.Thus ∆H(e) = 0, and E δ (∆H) = 0 if supp δ ⊂ E d .Conversely, assume E δ (∆H) = 0. Since the observation is perfect, H(y) = H(x) ≥ H(x|p) and thus ∆H(e) ≤ 0 for all e.So, ∆H(e) = 0 δ-almost surely, i.e.H(x|p) = H(x) for each e in a set of δ-probability one.For each such e, x and p are independent, i.e. the law of x given p = p does not depend on p. Hence e is a Dirac measure.2.6.Example of a non-achievable experiment distribution.Example Let X = {i, j, k} and f
,d , the random variable p associated to e is constant a.s., therefore H(x|p) = H(x) = H(y) since f is C-perfect.Thus ∆H(e) = 0, and E δ (∆H) = 0 if supp δ ⊂ E C,d .3.3.The condition E δ ∆H ≥ 0 is sufficient.According to proposition 13, any δ = λǫ e + (1 − λ)ǫ e ′ with λ rational, e, e ′ of finite support and such that λ∆H(e) + (1 − λ)∆H(e ′ ) > 0 is achievable by the law of a C-process with C = supp e ∪ supp e ′ .We apply this result to prove theorem 9. Theorem 6 then follows using C = Π.Proof of thm. 9 from prop.13.Let C ⊂ Π be closed, E C ⊂ E be the distributions with support in C, and D C ⊂ D be the distributions with support in E C .Take δ ∈ D C such that E δ (∆H) ≥ 0.Assume first that E δ (∆H) = 0 and that there exists a weak- * neighborhood V of δ in D C such that for any µ ∈ V , E µ (∆H) ≤ 0. For p ∈ C, let ν = ǫ ǫp .There exists 0 < t < 1 such that (1 − t)δ + tν ∈ V and therefore E ν (∆H) ≤ 0. Taking x of law p and y = f (x), E ν (∆H) = ∆H(e) = H(x) − H(y) ≤ 0. Since H(x) ≥ H(f (x)), we obtain H(x) = H(f (x)) for each x of law p ∈ C.This implies that f is C-perfect and the theorem holds by lemma 15.Otherwise there is a sequence δ n in D C weak- * converging to δ such that E δn (∆H) > 0. Since the set of achievable distributions is closed, we assume E δ (∆H) > 0 from now on.The set of distribution with finite support being dense in D C (see e.g.[Par67] thm.6.3 p. 44), again by closedness we assume:
Measures of uncertainty.Let x be a finite random variable with values in X and with law P .Throughout the paper, log denotes the logarithm with base 2. By definition, the entropy of x is: . ., x ′ m such that P n n-achieves δ n ∈ D n and P ′ m machieves δ ′ m ∈ D m .Then any process of law P n ⊗ P ′ m (n + m)-achieves n n+m δ n + m n+m δ ′ m ∈ D m+n , and any process of law P n ⊗ P n ⊗ P n ⊗ . . .achieves δ n ∈ D ∞ .Point (3) is a direct consequence of the definitions and of (2).
Proof.Point (1) Let (x n ) be a C-process P , δ achieved by P and p 1 be the law of x 1 .Since f is one-to-one on supp p 1 , the experiment e 2 (y 1 ) is the Dirac measure on p 2 = P (x 2 |x 1 ).By induction, assume that the experiment e n (y 1 , . . ., y n−1 ) is the Dirac measure on p n = P (x n |x 1 , . . ., x n−1 ).Since f is one-to-one on supp p n , y n reveals the value of x n and e n+1 (y 1 , . . ., y n ) is the Dirac measure on P (x n |x 1 , . . ., x n ).We get that under P , at each stage 3.2.C-perfect observation.In order to prove that E δ ∆H ≥ 0 is a sufficient condition for δ to be achievable, we first need to study the case of perfect observation in details.Definition 14.Let C be a closed subset of Π.The mapping f is C-perfect if for each p in C, f is one-to-one on supp p.We let E C,d = {ǫ p , p ∈ C} be the set of Dirac experiments with support in C. E C,d is a weak- * closed subset of E and {δ ∈ D, supp δ ⊂ E C,d } is a weak- * closed and convex subset of D. Lemma 15.If f is C-perfect then: (1) The experiment distribution δ is achievable the law of a C-process if and only if supp δ ⊂ E C,d (2) For each δ such that supp δ ⊂ E C,d , E δ (∆H) = 0. the experiment belong to E C,d P -a.s. and thus supp δ ⊂ E C,d .Conversely let δ be such that supp δ ⊂ E C,d .Since the set of achievable distribution is closed, it is sufficient to prove that for any p 1 , . . ., p k in C, n 1 , . . ., n k integers, n = j n j , δ = j n j in the k-th block: Ed(e t e k ) since p q 1 ≤ 2 ln 2 • d(p q) ([CT91], lemma 12.6.1 p.300) and from Jensen's inequality.Applying Jensen's inequality again:Ed(P (p t |ỹ k−1 , y t k +1 , . . ., y t−1 e k ) (p t |ỹ k−1 , p t k +1 , . . ., p t−1 ) e k ) = E ỹk−1 d(P (p t k +1 , . . ., p t k+1 |ỹ k−1 ) e ⊗N k Ed(e t e k ) = k ) ≤ N k γ