Variational Mean Field Games

This paper is a brief presentation of those Mean Field Games with congestion penalization which have a variational structure, starting from the deterministic dynamical framework. The stochastic framework (i.e. with di ﬀ usion) is also presented both in the stationary and dynamic case. The variational problems relevant for MFG are described via Eulerian and Lagrangian languages, and the connection with equilibria is explained by means of convex duality and of optimality conditions. The convex structure of the problem also allows for e ﬃ cient numerical treatment, based on Augmented Lagrangian Algorithms, and some new simulations are shown at the end of the paper


Introduction and modeling
The theory of Mean Field Games has been introduced some years ago by Lasry and Lions (in [22,23,24]) to describe the evolution of a population, where each agent has to choose a strategy, in the form of a trajectory in a state space, which best fits his preferences, but is affected by the other agents through a global mean field effect (with a terminology borrowed from physics).
Mean Field Games (MFG for short) are differential games, with a continuum of players, usually considered all indistinguishable and all negligible.We typically consider congestion games (i.e.agents try to avoid the regions with high concentrations), where we look for a Nash equilibrium, to be translated into a system of PDEs.
MFG theory is now a very lively topic, and the literature is rapidly growing.Among the references for a general overview of the original developments of this theory, we recommend the videotapes of the 6-years course given by P.-L.Lions at Collège de France [25] and the lecture notes by P. Cardaliaguet [11], directly inspired from these courses.
The initial goal behind the theory is to study the limit as N → ∞ of N-players games, each player choosing a trajectory x i (t) and optimizing a quantity 2  2 + g i (x 1 (t), . . ., x N (t))) In particular, we are interested in the case where g i penalizes points close to too many other players x j , j i.The indistinguishability assumptions translates into the fact that all the functions Ψ i are equal and the cost g i takes the form which means that the congestion cost felt by each agent only depends on his position compared to the distribution of the other players, i.e. the probability measure 1   N−1 j i δ x j .In the limit as N → ∞, this measure is essentially the same as ρ = 1 N N j=1 δ x j , which gives a cost of the form g(x, ρ).Many possible dependences can be considered, but the main one that we will consider in this paper is a local congestion cost which takes the form, in the continous limit, of g(x, ρ(x)), for a function g : Ω × R + → R, increasing in its second variable.Note that from the mathematical point of view this is the most intriguing choice, as non-local congestion costs (of the form g(x, (K * ρ)(x)) for an interaction kernel K, so that the effective density perceived by the agents is of the form K(x−y)ρ(y)dy) automatically provide more compactness and regularity which are not available for local costs.We refer to [12] for rigorous definitions and results for the local case.
Rigorous convergence results starting from N players and letting N → ∞ in the previous differential game are a delicate issue, beyond the scope of the present paper, the short presentation above only aimed at introducing the continuous version that we will detail in the sequel of the paper.

A coupled system of PDEs
We will describe now in a more precise way the continuous equilibrium problem resulting from the previous considerations.We consider a population of agents moving inside Ω (which can be a domain in R d , the flat torus T d := R d /Z d . . .), and we suppose that every agent chooses his own trajectory solving min T 0 |x (t)| 2  2 + g(x, ρ t (x(t))) dt + Ψ(x(T )), with given initial point x(0); here g is a given increasing function of the density ρ t at time t.The agent hence tries to avoid overcrowded regions.
For the moment, we consider the evolution of the density ρ t as an input, i.e. we suppose that agents know it.Supposing the function h(t, x) = g(ρ t (x)) to be given, a crucial tool to study the above individual optimization problem is the value function.The value function ϕ for this problem is defined as where, again, h(t, x) = g(x, ρ t (x)).Dynamic programming arguments from optimal control theory provides useful information on the role of the value function.First, we know that it solves a Hamilton-Jacobi equation (HJ) This equation is to be intended in the viscosity sense, but the presentation in this section will be quite formal.The important point for the moment is that the value function ϕ depends, through a Hamilton-Jacobi equation, on Ψ and h.
Moreover, the optimal trajectories x(t) in the above control problem can be computed using the value function.Indeed, the optimal trajectories are the solution of x (t) = −∇ϕ(t, x(t)) (we do not discuss here whether these solutions are unique or not, as it depends on regularity issues on ∇ϕ).Now, given the initial density of the population ρ 0 , if we know that the agents move along solutions of an equation x (t) = v t (x(t)), an easy computation which is standard in fluid mechanics gives the PDE which is solved by the density as a function of (t, x).This PDE is the so-called continuity equation: This equation has to be interpreted in a weak sense (see Equation ( 2)), and it is completed by no-flux boundary conditions ρv • n = 0, which model the fact that no mass enters or exits Ω.
In MFG, as standard in non-cooperative games, we look for a stable configuration, which is an equilibrium in the sense of Nash equilibria: a configuration where, keeping into account the choices of the others, no player would spontaneously decide to change his own choice.
This means that we can consider the densities ρ t as an input, compute the optimal trajectories, which depend on h = g(x, ρ t ) through the (HJ) equation, then compute the solution of (CE) and get new densities as an output: the configuration is an equilibrium if and only if the output densities coincide with the input.Alternatively, we can consider the trajectories of the players as an input, compute the densities using (CE), then compute the optimal trajectories as an output via (HJ): again, the configuration is an equilibrium if and only input=output.
All in all, an equilibrium is characterized by the following coupled system (HJ)+(CE): the function ϕ solves (HJ) with a right-hand side depending on ρ, which on turn evolves according to (CE) with a velocity field depending on ∇ϕ(t, x). (1) Later (Section 5) we will see how to define a similar approach for the stochastic case, which means that agents follow controlled stochastic differential equations of the form dX t = α t dt + In this case, a Laplacian appears both in the (HJ) and in the (CE) equations:

Variational principle
It happens that a solution to the equilibrium system (1) can be found by an overall minimization problem as first outlined in the seminal work of Lasry and Lions [23].We consider all the possible population evolutions, i.e. pairs (ρ, v) satisfying ∂ t ρ + ∇ • (ρv) = 0 (note that this is the Eulerian way of describing such a movement; in Section 3 we will see how to express it in a Lagrangian language) and we minimize the following energy where G is the anti-derivative of g with respect to its second variable, i.e. ∂ s G(x, s) = g(x, s) for s ∈ R + with G(x, 0) = 0. We fix by convention G(x, s) = +∞ for s < 0. Note in particular that G is convex in its second variable, as its derivative is the increasing function g.
The above minimization problem recalls the Benamou-Brenier dynamic formulation for optimal transport (see [6]).The main difference with the Benamou-Brenier problem is that here we add to the kinetic energy a congestion cost G; also note that usually in optimal transport the target measure ρ T is fixed, and here it is part of the optimization (but this is not a crucial difference).Finally, note that the minimization of a Benamou-Brenier energy with a congestion cost was already present in [10] where the congestion term was used to model the motion of a crowd with panic.
As is often the case in congestion games, the quantity A(ρ, v) is not the total cost for all the agents.Indeed, the term 1 2 ρ|v| 2 is exactly the total kinetic energy, and the last term Ψdρ T is the total final cost, but the term G(x, ρ) is not the total congestion cost, which should be instead ρg(x, ρ).This means that the equilibrium minimizes an overall energy (we have what is called a potential game), but not the total cost; which gives rise to the so-called price of anarchy.
Another important point is the fact that the above minimization problem is convex, which was by the way the key idea of [6].Indeed, the problem is not convex in the variables (ρ, v), because of the product term ρ|v| 2 in the functional and of the product ρv in the differential constraint.But if one changes variable, defining w = ρv and using the variables (ρ, w), then the constraint becomes linear and the functional convex.We will write Ā(ρ, w) for the functional A(ρ, v) written in these variables.The important point for convexity is that the function is convex (and it is actually obtained as sup{as + b • w : a + 1 2 |b| 2 ≤ 0}).This will be the basis for the numerical method of Section 6.In order to convince the reader of the connection between the minization of A(ρ, v) (or of Ā(ρ, w)) and the equilibrium system (1), we will use some formal argument from convex duality.We will see in Section 2 how to rigorously justify this equivalence.
In order to formally produce a dual problem to min A, we wil use a min-max exchange procedure.First, we write the constraint ∂ t ρ + ∇ • (ρv) = 0 in weak form, i.e.
for every function φ ∈ C 1 ([0, T ] × Ω) (note that we do not impose conditions on the values of φ on ∂Ω, hence this is equivalent to completing (CE) with a no-flux boundary condition ρv • n = 0).Also note that, if ρ 0 is a datum of our problem, ρ T is not, and the equation (2) does not make sense unless we give a meaning at ρ t for every instant of time t.This will be done in Section 3 (see Proposition 3.1), where we will interpret ρ t as a(n absolutely) continuous curve in the space of measures.However, we do not insist on this now, as the presentation stays quite formal.Using (2) , we can re-write our problem as since the sup in φ takes value 0 if the constraint is satisfied and +∞ if not.We now switch the inf and the sup and get First, we minimize w.r.t.v, thus obtaining v = −∇φ and we replace 1 2 ρ|v| 2 + ∇φ • ρv with − 1 2 ρ|∇φ| 2 .Then we get, in the double integral, where we set p := −∂ t φ + 1 2 |∇φ| 2 and G * is defined as a Legendre transform with respect to its second variable only.Then, we observe that the minimization in the final cost simply gives as a result 0 if Ψ ≥ φ T (since the minimization is only performed among positive ρ T ) and −∞ otherwise.Hence we obtain a dual problem of the form Note that the condition G(x, ρ) = +∞ for ρ < 0 implies G * (x, p) = 0 for p ≤ 0. This in particular means that in the above maximization problem one can suppose p ≥ 0 (indeed, replacing p with p + does not change the G * part, but improves the value of φ 0 , considered as a function depending on p).The choice of using two variables (φ, p) connected by a PDE constraint instead of only φ is purely convetional, and it allows for a dual problem which has a particular symmetry w.r.t. the primal one.Also the choice of the sign is conventional and due to the computation that we will perfomr later (in particular in Section 4).Now, standard arguments in convex duality, which will be made precise in the next section, allow us to say that optimal pairs (ρ, v) are obtained by looking at saddle points ((ρ, v), (φ, p)).This means that, whenever (ρ, v) minimizes A, then there exists a pair (φ, p), solution of the dual problem, such that • v minimizes 1  2 ρ|v| 2 + ∇φ • ρv, i.e. v = −∇φ ρ-a.e.This gives (CE): this gives (HJ): −∂ t φ+ 1 2 |∇φ| 2 = g(x, ρ) on {ρ > 0} (as the reader can see, there are some subtleties where the mass ρ vanishes; this will be discussed later).
• ρ T minimizes (Ψ − φ T )ρ T among ρ T ≥ 0. But this is not a condition on ρ T , but rather on φ T : we must have φ T = Ψ on {ρ T > 0}, otherwise there is no minimizer.This gives the final condition in (HJ).
This provides an informal justification for the equivalence between the equilibrium and the global optimization.What we lack for the moment is the the fact that there is no duality gap beween min A and max −B and that there is existence of minimizers (in particular in the dual problem, and in which spaces).Also, even once these issues are clarified, what we will get will only be a very weak solution to the coupled system (CE)+(HJ).Nothing guaranteees that this solution actually encodes the individual minimization problem of each agent.This will be clarified in Section 3 where a Lagrangian point of view will be presented.
We finish this section with a last variant, inspired by the crowd motion model of [26].We would like to consider a variant where, instead of adding a penalization g(x, ρ), we impose a capacity constraint ρ ≤ 1.How to give a proper definition of equilibrium?A first, naive, idea, would be the following: when (ρ t ) t is given, every agent minimizes his own cost paying attention to the constraint ρ t (x(t)) ≤ 1.But if ρ already satisfies ρ ≤ 1, then the choice of only one extra agent will not violate the constraint (since we have a non-atomic game), and the constraint becomes empty.As already pointed out in [29], this cannot be the correct definition.
In [29] an alternative model is formally proposed, but no solution has been given to it so far, and it is likely to be non-variational.Instead, we can look at the variational problem min This means using G(ρ) = 0 for ρ ∈ [0, 1] and +∞ otherwise.The dual problem can be computed and we obtain sup (note that this problem is also obtained as the limit m → ∞ of g(ρ) = ρ m ; indeed the functional By looking at the primal-dual optimality conditions, we get again v = −∇φ and φ T = Ψ, but the optimality of ρ means 0 This gives the following MFG system Formally, by looking back at the relation between (HJ) and optimal trajectories, we can guess that each agent solves min Here p is a pressure arising from the incompressibility constraint ρ ≤ 1 and only present in the saturated zone {ρ = 1}, and it finally acts as a price paid by the agents to travel through saturated regions.From the economical point of view this is meaningful: due to a capacity constraint, the most attractive regions develop a positive price to be paid to pass through them, and this price is such that, if the agents keep it into account in their choices, then their mass distribution will indeed satisfy the capacity constraints.This problem has been studied in [16], where suitable regularity results (see also Section 4) allow one to give a meaning to what we said above.This is necessary, because of the fact that, from the linear growth in the dual problem, we should not a priori expect p to be better than a measure, and this makes it difficult to define the integral over the trajectory in (3).The only way to handle this difficulty is to prove extra summability for p.

Existence of minimizers and convex duality
The aim of this section is to explain how to relate rigorously the two variational problems formally introduced in section 1.2 with a suitable weak notion of solutions for the MFG system, following the approach developed by Cardaliaguet [12] (also see [13]) and further refined in Cardaliaguet and Graber [14].In what follows, the spatial domain Ω will denote either a smooth bounded domain of R d or the flat torus Ω := T d = R d \Z d (periodic case).To fix ideas, we take a quadratic Hamiltonian, as in section 1 and assume that the congestion term G, given by G(x, ρ) : uniformly in x (which in particular includes the case of a hard congestion constraint i.e.G(x, ρ) = 0 if ρ ∈ [0, 1] and +∞ otherwise).Due to the dependence on x (which is non-essential in most of the paper, but some problems could become trivial without, in particular in Section 5), we also add this very mild assumption x → G(x, ρ) is l.s.c. for all ρ. (5) Recall that that the initial distribution of players is given and is denoted ρ 0 ∈ P(Ω) as well as the terminal cost Ψ ∈ C(Ω).Let us then consider the variational problem inf Since G * (x, p) is nondecreasing in p and identically 0 for p ∈ (−∞, 0), this is a convex minimization problem, which can be re-written (as we did in Section 1) as where F consists of all pairs (φ, p) (note, that imposing equality or inequality in ( 13) is the same in the above minimization problem, since G * is non-decreasing).The dual of this problem is (see [19]) is inf where in the sense of distributions (with respect to the formulation given in terms of the continuity equation (CE) in section 1, the formulation above corresponds to the change of variable (ρ, v) → (ρ, w) = (ρ, ρv) that we already mentioned in the spirit of the Benamou-Brenier formulation of optimal transport and which makes the minimization problem (8) convex).More precisely, the Fenchel-Rockafellar duality theorem (see [19]) gives: Theorem 2.1.Suppose ( 4) and ( 5), then we have The proof of this theorem can be obtained following the same arguments as in [12].
In particular, a minimizer to (8) exists.One further has uniqueness of such a minimizer (ρ, w) if G is strictly convex.Since one cannot expect that a smooth minimizer to (6) exists one has to suitably relax (6).To do so, following [14], we shall further assume that for some exponent q > 1, and some constant C > 0, one has : for every ρ ≥ 0 so that G * satisfies a similar power growth condition with the dual exponent q = q/(q − 1).The relaxation of ( 6) is then as follows: where F consists of all pairs (φ, p) ∈ BV((0, ) ≤ Ψ in the sense of traces and in the sense of distributions.As shown in [14], Problem ( 12) is really a relaxation of ( 6) in the sense that the values of both problems coincide.The existence of a minimizer to the relaxed problem ( 12) is however more involved and requires more assumptions, a key point is to understand how an L q (with q = q/(q − 1)) bound on p gives pointwise bounds on φ subsolution of the HJ equation (13).Such bounds can be obtained (see Lemma 2.7 in [14]) and subsequently the existence of a minimizer for (12) can be proved as soon as We mention also the work in [16], corresponding to the constrained case ρ ≤ 1 (hence, in some sense, to q = ∞), where a similar result is proven under the assumption ||ρ 0 || L ∞ < 1.
To sum up, a rigorous existence result established by Cardaliaguet and Graber [14] (also see [13] for a slightly different but related problem) can be summarized as: Theorem 2.2.Assume ( 11)-( 14), then the infimum in ( 12) is achieved and coincides with − min (8).This said, it remains to understand in which sense the duality relation 0 = B(φ, p)+ Ā(ρ, w) relating an optimal (ρ, w) ∈ K for (8) to an optimal (φ, p) ∈ F for the relaxed problem gives rise to a Mean-Field-Game like system.This is the object of the main result for which we refer again to [12] and [14]: Theorem 2.3.Assume ( 11)-( 14) and let (ρ, w) ∈ K solve (8) and (φ, p) ∈ F solve (12), then It is worth pointing out that ( 15)-( 16)- (17) imply that the Hamilton-Jacobi equation is satisfied in the following weak sense: ρ-a.e one has where (∂ t φ) ac denotes the absolutely continuous part of the measure ∂ t φ.In other words, if ρ vanishes nowhere and ∂ t φ has no singular part then (ρ, φ) solves the MFG system in some appropriate weak sense and the MFG system actually is a necessary and sufficient optimality condition for the variational problems in duality ( 8)-( 12).

The Lagrangian framework
In this section, we present an alternative point of view for the overall minimization problem presented in the previous sections.As far as now, we only looked at an Eulerian point of view, where the motion of the population is described by means of its density ρ and of its velocity field v.The Lagrangian point of view would be, instead, to describe the motion by describing the trajectory of each agent.Since the agents are supposed to be indistinguishable, then we just need to determine, for each possible trajectory, the number of agents following it (and not their names. . .); this means looking at a measure on the set of possible paths.Set C = H 1 ([0, T ]; Ω); this will be the space of possible paths that we use.In general, absolutely continuous paths would be the good choice, but we can restrict our attention to H 1 paths because of the kinetic energy term that we have in our minimization.We define the evaluation maps e t : C → Ω, given for every t ∈ [0, T ] by e t (ω) = ω(t).Also, we define the kinetic energy functional K : C → R given by We endow the space C with the uniform convergence (and not the strong H 1 convergence, so that we have compactness of the sublevel sets of K).
To pass from the Eulerian to the Lagrangian framework, we will need some easy tools from optimal transport.First, we give some definitions.We refer to [30] (Chapters 1, 5 and 7) and to [5,32] for more details and complete proofs.
Given two probability measures µ, ν ∈ P(Ω), we consider the set of transport plans i.e. those probability measures on the product space having µ and ν as marginal measures.
We consider the minimization problem which is called the Kantorovich optimal transport problem for the cost c(x, y) = |x − y| 2 from µ to ν.The value of this minimization problem with the quadratic cost may also be used to define a quantity, called Wasserstein distance: If we suppose that Ω is compact, this quantity may be proven to be a distance over P(Ω), and it metrizes the weak-* convergence of probability measures.The space P(Ω) endowed with the distance W 2 is called Wasserstein space of order 2 and denoted in this paper by W 2 (Ω).
We summarize here below how the theory of optimal transport helps in studying the relation between curves of measures and measures of curves, which is the main point in passing from the Eulerian to the Lagrangian formalism.
We recall the definition of metric derivative in metric spaces, applied to the case of W 2 (Ω): for a curve t → ρ t ∈ W 2 (Ω), we define |s − t| , whenever this limit exists.If the curve t → ρ t is absolutely continuous for the W 2 distance, then this limit exists for a.e.t.The important fact, coming from the Benamou-Brenier formula and explained for the first time in [5] (see also chapter 5 in [30]) is that the absolutely continuous curves in W 2 (Ω) are exactly those curves which admit the existence of a velocity field v t solving (CE) together with ρ and that the metric derivative |ρ |(t) can be computed as the minimal norm ||v t || L 2 (ρ t ) among those vector fields.This is part of the following statement.
Under these assumptions, it is easy to prove, by standard semicontinuity arguments in the space P(C), that a minimizer of (18) exists.We summarize this fact, together with the corresponding optimality conditions, in the next proposition.Proposition 3.2.Suppose that Ω is compact and that G is a convex function satisfing (4) and (5).Then the problem (18) admits a solution Q.
Moreover, Q is a solution if and only if for any other competitor Q ∈ P(C) with J(Q) < +∞ with where J h is the linear functional the function h being defined through ρ t = (e t ) # Q and h(t, x) = g(x, ρ t (x)).
Remark 1.The above optimality condition, and the interpretation in terms of equilibria that we will give below, are very close to what has been studied in the framework of continuous Wardrop equilibria in [17] (see also [18] for a survey of the theory).Indeed, in such a framework, we associate with each measure Q on C a traffic intensity i Q (which is a measure on Ω), and we define a weighted length on curves ω using i Q as a weighting factor.We then prove that the measure Q which minimizes a suitable functional (also constructed via the antiderivative of a congestion function g) minimizes its linearization, which in turn implies that the same Q is concentrated on curves which are geodesic for this weighted length, which depends on Q itself!The analogy is very strict, with the only difference that the framework of Wardrop equilibria (which are traditionally studied in a discrete framework on networks, see [33]) are a statical object.The use of time to paramterize curves in Wardrop models is fictitious, and one has to think at a continuous traffic flows, where mass is constantly injected in some parts of Ω and absorbed in other parts (see Chapter 4 of [30] for a general picture of this kind of models).
We now consider the functional J h .Note that the function h is obtained from the densities ρ t , which means that it is well-defined a.e.But the integral T 0 Ω h(t, x)d(e t ) # Q is well-defined and does not depend on the representative of h, since J(Q) < +∞ implies that all the measures (e t ) # Q are absolutely continuous.Hence, this functional is well-defined for h ≥ 0 measurable.
Yet, if we suppose for a while that h is a continuous function, we can also write and hence we get It is not difficult to see that in this case Q satisfies the optimality conditions of Proposition 3.2 if and only if Q-a.e. curve ω satisfies This is exactly the equilibrium condition in the MFG! Indeed, the MFG equilibrium condition can be expressed in Lagrangian language in the following way: find Q such that, if we define ρ t = (e t ) # Q and h(t, x) = g(ρ t (x)), then Q is concentrated on minimizers of for fixed initial point (let us also define Here, we just found out that Q satisfies this equilibrium condition if and only if it minimizes J. The question which thus arises is how to give a rigorous meaning to this equilbrium condition when h C 0 .We will not enter details here, but we want to stress that there is a solution which passes through the choice of a precise representativ of h.Indeed, following what Ambrosio and Figalli did in [4] we can define h r (t, x) = B(x,r) h(t, y)dy and ĥ(x) := lim sup r→0 h r (x).The technique developed in [4] (and later used in [16] for MFG with density constraints) allows to prove that if Q minimizes J h , then it is concentrated on curves minimizing A ĥ(ω) by using h r and passing to the limit as r → 0, provided some upper bounds on h r are satisfied.More precisely, if one defines Mh the maximal function of h, given by Mh(t, x) = sup r h r (x), it is possible to prove the following.
Proposition 3.3.Given a positive and measurable function h, suppose that Q minimizes J h .Then Q is concentrated on curves ω such that, for all t 0 , t 1 with 0 ≤ t 0 < t 1 ≤ T , A ĥ(ω, [t 0 , t 1 ]) ≤ A ĥ( ω, [t 0 , t 1 ]) for every ω s.t.ω(t 0 ) = ω(t 0 ) and In particular, this applies for h(t, x) = g(ρ t (x)) whenever Q is a solution of (18).This condition is useful only if there are many curves ω satisfying t 1 t 0 Mh(t, ω(t))dt < +∞ (note that the use of t 0 and t 1 is due to the fact that there could be only few curves such that Mh is integrable on [0, T ] but more such that Mh is integrable on [t 0 , t 1 ]).What one can do is to take an arbitrary Q such that J(Q) < +∞ and compute We would like to guarantee that every Q with J(Q) < +∞ is such that Since we know that G(x, (e t ) # Q) is integrable, it is enough to guarantee G * (x, Mh) ∈ L 1 .In the case where g(x, s) = s q−1 we need Mh ∈ L q .Since in this case we know ρ ∈ L q , then h = g(x, ρ) ∈ L q and this implies Mh ∈ L q from standard theorems in harmonic analysis, as soon as q > 1.
However, the analysis of this equilibrium condition motivates a deeper study of regularity issues, for several reasons.First, in the cases where L ∞ constraints are considered (as it happened for incompressible fluid mechanics in [4] but also in the density-constrained model of [16]) we find q = 1 and we cannot get the integrability of Mh unless we first prove some extra summability.Other non-power cases (such as g(ρ) = log ρ, or other) also prevent to use the L q theory on the maximal function and require extra regularity or at least extra summability.Then, it cannot be denied that getting h ∈ L ∞ (which implies Mh ∈ L ∞ ), or even h ∈ C 0 would be much more convenient, and would allow to avoid the condition T 0 Mh(t, ω(t))dt < ∞ or even to use this special representative.More generally, better regularity on ρ (or on the dual variable φ) could give "better" solutions to the (HJ) equation (instead of just a.e.solutions).This is why in the next section we will see a technique to prove some mild regularity results on the optimal density ρ (which are far from being complete).

A bit of regularity
We present here a technique to prove Sobolev regularity results on the solutions of ( 8).This technique, based on duality, is inspired from the work of [9], and has been applied to MFG in [16].It is actually very general, and [31] shows how it can be used to prove (or re-prove) many regularity results in elliptic equations coming from convex variational problems.
We start from a lemma related to the duality results of Section 2.
Lemma 4.1.For any (φ, p) ∈ F and (ρ, ρv) ∈ K we have Proof.We start from Then we use If we insert this into (19) we get the desired result.
It is important to stress that we used the fact that φ is C 1 since (ρ, v) only satisfies (CE) in a weak sense, i.e. tested against C 1 functions.The same computations above would not be possible for (φ, p) ∈ F .
The regularity proof will come from the previous computations applied to suitable translations in space and/or time.
In order to simplify the exposition, we will choose a spatailly homogeneous setting.In particular, we will suppose that Ω = T d is the d-dimensional flat torus, which avoids boundary issues.Also, we will suppose that g, G and G * do not explicitly depend on the variable x (but will explain how to adapt to the space-dependent case).To handle the case of a domain Ω with boundary, we refer to the computations in [31] which suggest how to adapt the method below.Finally, for simplicity, we will only prove in this paper local results in (0, T ), so that also the time boundary does not create difficulties.
Here is the intuition behind the proof in this spatially homogeneous case.First, we use Lemma 4.1 to deduce (since the other terms appearing in Lemma 4.1 are positive).Then, let us suppose that there exist two function J, J * : R → R and a positive constant c 0 > 0 such that for all a, b ∈ R we have Remark 2. Of course, this is always satisfied by taking J = J * = 0, but there are less trivial cases.For instance, if G(ρ) = 1 q ρ q for q > 1, then G * (p) = 1 q q r , with q = q/(q − 1) and i.e. we can use J(a) = a q/2 and J * (b) = b q /2 .We wish to show that if (ρ, v) is a minimiser of A then J(ρ) ∈ H 1 loc ((0, T ) × Ω).The idea is that, should B admit a C 1 minimiser φ (more precisely, a pair (φ, p)), then by the Duality Theorem 2.1, we have B(φ, p) + A(ρ, v) = 0. From our assumption and Lemma 4.1, we get J(ρ) = J * (p).If we manage to show that ρ(t, x) := ρ(t + η, x + δ) with a corresponding velocity field ṽ is close to minimising A, and more precisely for small η ∈ R, δ ∈ R d , then we would have However we already know that J * (p) = J(ρ), and so L 2 , which would mean that J(ρ) is H 1 as we have estimated the squared L 2 norm of the difference of J(ρ) and its translation by the squared length of the translation.Of course, there are some technical issues that need to be taken care of, for instance ρ is not even well-defined (as we could need the value of ρ outside [0, T ] × Ω), does not satisfy the initial condition ρ(0) = ρ 0 , we do not know if B admits a minimiser, and we do not know whether ( 21) holds.
To perform our analysis, let us fix t 0 < t 1 and a cut- It is easy to check that the pair (ρ η,δ , v η,δ ) satisfies the continuity equation together with the initial condition ρ η,δ (0) = ρ 0 .Therefore it is an admissible competitor in A for any choice of (η, δ).We may then consider the function The key point here is to show that M is smooth (actually, it would be enough to have M ∈ C 1,1 ).
Proof.We have Since ρ η,δ (T, x) = ρ(T, x), the last term does not depend on (η, δ).For the other terms, we use the change-of-variable (s, y) which is a C ∞ diffeomorphism for small η.Then we can write where K(η, δ, s) is a smooth Jacobian factor (which does not depend on y since the change of variable is only a translation in space).Hence, this term depends smoothly on (η, δ).
We also have where K(η, δ, s) is the same Jacobian factor as before, and t(η, s) is obtained by inversing, for fixed η > 0, the relation s = t + ηζ (t), and is also a smooth map.Hence, this term is also smooth.
We can now apply the previous lemma to the estimate we need.
With this result in mind, we may easily prove the following Theorem 4.4.If (ρ, v) is a solution to the primal problem min A, if Ω = T d and if J satisfies (20), then J(ρ) satisfies, for every t 0 < t 1 , (where the constant C depends on t 0 , t 1 and on the data), and hence is of class H 1 loc (]0, T [×T d )).Proof.Let us take a minimizing sequence (φ n , p n ) for the dual problem, i.e.
We use ρ = ρ η,δ and ṽ = v η,δ as in the previous discussion.Using first the triangle inequality and then Lemma 4.1 we have (where the L 2 norme denotes the norm in L 2 ((0, T ) × T d )) Letting n go to infinity and restricting the L 2 norm to [t 0 , t 1 ] × T d , we get the claim.
Remark 3. If one restricts to the case η = 0, then it is also possible to use a cut-off function ζ ∈ C ∞ c (]0, T ]) with ζ(T ) = 1, as we only perform space translations.In this case, however, the final cost T depends on δ, and one needs to assume Ψ ∈ C 1,1 to prove M ∈ C 1,1 .This allows to deduce H 1 regularity in space, local in time far from t = 0, i.e.J(ρ) ∈ L 2 loc (]0, T ]; H 1 (T d )).Remark 4. From J(ρ) = J * (p), the above regularity result on ρ can be translated into a corresponding regularity result on p. Remark 5. How to handle the case of explicit dependance on x?In this case one should assume the existence of functions J, J * : Ω × R → R such that (20) holds, for a uniform constant c 0 , for every x, in terms of J(x, a) and J * (x, b).Then, Lemma 4.2 is no more evident, and requires regularity of x → G(x, ρ).Finally, the regularity which can be obtained is J(x, ρ) ∈ H 1 , which can usually turn into regularity of ρ if J depends smoothly on x.
We stress that a finer analysis of the behavior at t = T also allows to extend the above H 1 regularity result in space time till t = T , but needs extra tools (in particular defining a suitable extension of ρ for t > T ).This is developed in [28].
Finally, we finish this section by underlining the regularity results in the density-constrained case ( [16]): the same kind of strategy, but with many more technical issues, which follow the same scheme as in [9] and [3], and the result is much weaker.Indeed, it is only possible to prove in this case p ∈ L 2 loc ((0, T ); BV(T d )) (exactly as in [3]).Even if very weak, this result is very important in what it gives higher integrability on p, which was a priory only supposed to be a measure and this allows to get the necessary sumability of the maximal function that we mentioned in Section 3.

Stochastic control variants
We now consider the case where the dynamics for the state of each player is governed by the controlled SDE dX t = α t dt + √ 2dW t , where the drift α t is the agent's control and W t is a standard Brownian motion; the goal of this section is to present (rather informally) some examples of MFG systems and their variational counterparts in various dynamic or static situations.

Dynamic MFG with diffusion
Let us start with the finite horizon case where the goal of each agent starting from a position x at time 0, given the density of the other players ρ t , is to solve the following stochastic control problem on the period [0, T ]: so that, the value function is governed by the following backward HJB equation and provided φ is smooth the optimal control in feedback form is given by α t (x) = −∇φ(t, x).Once we know this optimal drift and the initial distribution of players, the evolution of ρ t is governed by the Fokker-Planck (or forward Kolmogorov equation): The MFG system consists of the two equations ( 23)-( 24) and corresponds to an equilibrium condition: the optimization of each player given their anticipation of the density of the other players has to be consistent with the evolution of the players density resulting from their optimizing behavior.The fact that the MFG system ( 23)-( 24) is related to an optimization problem (which is convex when g is nondecreasing i.e. in the congested case) was first emphasized in the seminal works of Lasry and Lions [23,24] and further analyzed by Cardaliaguet et al. in [15].More precisely, defining G as before as the antiderivative of g (extended by +∞ on (−∞, 0)) with respect to the second variable, and G * its Legendre transform, the MFG system appears, at least formally (we refer to Cardaliaguet et al. in [15] for rigorous and detailed statements) as the optimality conditions for the two convex minimization problems in duality (for simplicity we consider again the periodic case Ω = Ω): where Ā(ρ, w) is defined as in (8) and the dual inf

Static MFGs with noise
We now consider static situations with diffusion corresponding to different individual stochastic control problems for the players.It is now important to allow g to depend also on x (to avoid obvious situations such as a constant density being an equilibrium).

The ergodic problem
The ergodic MFG, first introduced in [22], corresponds to the case where each player aims at minimizing lim inf Given the measure ρ giving the (stationary) density of players, the value function satisfies the following HJB equation (again in the peridiodic setting for the sake of simplicity) At equilibrium ρ should be the invariant measure for the process corresponding to the optimal feedback α = −∇φ i.e. : Again, setting G(x, ρ) = ρ 0 g(x, s)ds the MFG system ( 27)-( 28) is formally the primal-dual system of optimality conditions for inf ρ,w Ω We wish to mention in this framework the study which has been done in [27] about the same problem, with the constraint ρ ≤ 1, where the (HJB) equation lets a pressure term appear on the saturated region {ρ = 1}.

The exit problem
Instead of the ergodic problem, it is also natural to look at the case where we are given a domain Ω ⊂ R d and starting from x ∈ Ω, the player seeks to minimize where τ x is the exit time from Ω i.e the first time at which x + t 0 α s ds + √ 2W t hits ∂Ω and Ψ is a given exit cost.This control problem correponds of course to the Dirichlet problem for the stationary HJB equation The MFG system then corresponds to the system formed of (29) coupled with (note that total mass is not fixed here) The optimality conditions for the convex minimization problem inf can then be (formally) written as follows: set ρ := G * x, −∆φ + 1 2 |∇φ| 2 (the derivative is in the second variable, of course) then we thus recover (30), as for the HJB equation, since ρ := G * x, −∆φ + 1 2 |∇φ| 2 , we have g(x, ρ) = (−∆φ + 1 2 |∇φ| 2 ) i.e. φ satisfies (29).Finally, it can be easily checked that the equilibrium measure ρ can be obtained by solving the dual problem inf (ρ,w) It is worth noting here that if g(x, 0) ≥ 0 the problem is totally degenerate (and thus not really interesting); indeed in this case G * is minimal on R − , so every subsolution of the HJB equation −∆φ + 1 2 |∇φ| 2 ≤ 0 coinciding with Ψ on ∂Ω solves (31) and the very degenerate density ρ = 0 is an equilibrium.This should come as no surprise since, as the mass is not fixed and the cost G is always minimal for ρ = 0, no player a priori wishes to enter this game!If on the contrary g is negative close to 0 the previous trivial situation (ρ, w) = (0, 0) is in general not optimal for (32).This is the case, for instance, for logarithmic congestion functions.

The discounted infinite horizon problem
The last stationary situation we wish to discuss corresponds to the infinite horizon discounted criterion for the players: for a certain discount rate λ > 0. Such cases are particularly important for applications to macroeconomic dynamic models (see [1]) but as we shall see, they cannot be treated by a variational approach as the examples recalled above.Indeed the HJB equation for the value function reads which is coupled with the same elliptic equation for the measure ρ as before i.e. −∆ρ − ∇ • (ρ∇φ) = 0. Now it is quite clear that the corresponding system does not have a variational structure because the linear parts in the two equations are not adjoint: there is no λρ term in the second equation!One could add artificially this term, by considering a growth rate of the population and assuming that, by chance, this growth rate coincides with the discount rate λ, but this would lead to a different (and quite questionable) model. . .This example shows that there are of course limitations to the variational approach. . .

Numerical simulations 6.1 Solving the MFG system by an augmented Lagrangian method
Our aim now is to explain how the variational problems for MFG systems recalled previously can be solved numerically by augmented Lagrangian methods and in particular the algorithm ALG2 of Fortin and Glowinski [20].Such methods for the dynamical formulation of mass transport problems was used in the work of Benamou and Brenier [6].Let us consider the deterministic evolutionary case as in section 2 and let us rewrite the variational problem (6) (or some finite-element discretization of it) as inf (a,b,c,φ) which we may rewrite as an inf-sup (inf in (φ, a, b, c) and sup in (ρ, w, µ)) problem for the Lagrangian or equivalently (see [20]) for the augmented Lagrangian where r > 0 (in practice in our simulations we will take r = 1).The augmented Lagrangian algorithm then consists, starting from an initial guess, in building inductively a sequence as follows: given Step 1: Find φ k+1 by minimizing L r (., a k , b k , c k , ρ k , w k , µ k ); since this is a quadratic problem in D t,x φ = (∂ t φ, ∇φ) it thus amounts to solve a Laplace equation (in the t and x variables) with suitable boundary conditions; Step 2: Find (a k+1 , b k+1 , c k+1 ) by minimizing L r (φ k+1 , ., ., ., ρ k , w k , µ k ); this consists in two pointwise proximal subproblems (one in (a, b) and one in c) which are in practice easy and quick to solve (see [7] for some details); Step 3: Update the dual variables by the gradient ascent formula (ρ k+1 , w k+1 ) = (ρ k − r(∂ t φ k+1 + a k+1 ), w k − r(∇φ k+1 + b k+1 ), µ k + r(φ k+1 (T, .)− c k+1 )).
Note that this algorithm ensures that along the iterations the dual variables ρ k , w k remain such that ρ k ≥ 0, w k = 0 whenever ρ k = 0 so that one can define a velocity through w k = ρ k v k and the continuity equation is satisfied at each step.The algorithm can be adapted to diffusive cases as well.For evolutionary (respectively stationary) diffusive cases, the relaxed variables become a = −∂ t φ − ∆φ (resp.a = −∆φ) and b = −∇φ, the only significant modification then is that step 1 now involves an elliptic problem with the fourth-order operator −∂ tt + ∆ 2 (resp.the bi-Laplacian ∆ 2 ) for φ.See [2] for a recent ALG2 implementation in this case.We present numerical results obtained with a FreeFem++ implementation adapted from [7].

Hard and soft congestion
We present simulations corresponding to the congestion models discussed at the end of section 1.2.The macroscopic measure of the crowd density is ρ. Figure 1 shows the domain Ω made of two communicating rooms.The potential Ψ represented on the left is a penalization which encourages agents to move from the first room to the second, and gives target preferences within the rooms.On the right we see the initial density (at time t = 0).The congestion is taken into account by the "cost" function G either in a "hard" way : G(ρ) = 0 if ρ ∈ [0, 1] and +∞ else or in a "soft" way G(ρ) = ρ m m for m > 1.In the simulation below we used m = 6.
Figure 2 shows the decrease of the L 2 residual of the Hamilton-Jacobi equation (the other equation in the coupled system, which is in this case a discretized continuity equation, is automatically satisfied after each ALG2 step).Figures 3 and 4 show time snapshot of the density in the hard, respectively soft case.Agents move as expected to their optimal final position.As predicted by the analysis of ALG2, the density stays perfectly in [0, 1].

Stationary problem
Here we show ALG2 simulations applied to the ergodic stationary models of section 5.2 in a periodic domain.We use a linear form for the congestion g(x, ρ) = ψ(x) ρ with a potential depending on x.
Step 1 of ALG2 now involves a bi-laplacian operator.This is taken care of in FreeFem++ thanks to the recent addition of C1 conforming (HCT) finite elements.
The periodic square domain is discretized with a 50 × 50 grid, and this is the only parameter of the method!We show solutions for two different potentials ψ in figure 5 and 6.
Figure 7 shows the decrease of the L 2 residual of the Hamilton-Jacobi equation (again, the other equation in the coupled system, −∆ρ + ∇ • w = 0, is automatically satisfied after each ALG2 step).

Figure 1 :
Figure 1: On the left the final potential Ψ -on the right the initial crowd density

Figure 2 :
Figure 2: Convergence of the ALG2 algorithm

Figure 3 :
Figure 3: Time evolution of the density.Soft congestion case m = 6.The color scale are the same as in figure 1 right.

Figure 4 :
Figure 4: Time evolution of the density.Hard congestion case.The color scale are the same as in figure 1 right.

Figure 6 :Figure 7 :
Figure 6: On the left the potential ψ -on the right the invariant density On the left the potential ψ -on the right the invariant density