Preconditioning the augmented Lagrangian method for instationary mean field games with diffusion

. We discuss the application of the augmented Lagrangian method to the convex optimization problem of instationary variational mean ﬁeld games with diffusion. The problem is ﬁrst discretized with space-time tensor product piecewise polynomial bases. This leads to a sequence of linear problems posed on the space-time cylinder that are second order in the temporal variable and fourth order in the spatial variable. To solve these large linear problems with the preconditioned conjugate gradients method we propose a preconditioner that is based on a temporal transformation coupled with a spatial multigrid. This preconditioner is thus based on standard components and is particularly suitable for parallel computation. It is conditionally parameter-robust in the sense that the condition number of the preconditioned system is low for sufﬁciently ﬁne temporal discretizations. Numerical examples illustrate the method.

numerical experiments in Sections 5.2-5.4illustrate these concepts for 1d and 2d examples.The final Section 5.5 extends the model to include intermediate instantaneous costs.

Mean field games and ALG2 with diffusion.
2.1.Mean field games.The spatial domain D ⊂ d is assumed to be a cuboid but the main ideas are generic.Let T > 0, set J := (0, T ).Consider the system of partial differential equations KFP[ρ, φ] := ∂ t ρ − ν 2 ∆ρ + div(ρ∇H(t, x, ∇φ)) = 0, (1a) HJB[ρ, φ] := ∂ t φ + ν 2 ∆φ + H(t, x, ∇φ) = A (t, x, ρ), (1b) for (t, x) ∈ J × D. We will refer to this system as the mean field games (MFGs) equations.See Section 2.2 for an interpretation of the equations.We omit the dependence on (t, x) in the notation where convenient (in particular, the right-hand-side of (1b) actually means A (t, x, ρ(t, x)), etc.).Here, A and Γ are convex real valued functions of the third variable ρ ≥ 0 that evaluate to +∞ for negative ρ, and the indicated derivatives are with respect to this variable.The essential assumption on the Hamiltonian H is convexity with respect to the third variable.The unknowns are the density ρ and the cost φ, both space-time dependent real valued functions.Periodic boundary conditions are often assumed in the literature but here we will be interested in no-flow boundary conditions on the density ρ.Based on (1a), these are implemented by requiring ∇H(t, x, ∇φ) • n = 0 and ∇ρ • n = 0 on ∂ D (2) where n is the outward normal to the spatial boundary.In particular, the total mass D ρ is conserved in time, and we suppose that ρ 0 ≥ 0 non-trivially.We consider only radially symmetric coercive Hamiltonians (in the third variable); hence the first condition of (2) amounts to Let L(t, x, v) be the Lagrangian obtained from the Hamiltonian H as the dual conjugate with respect to the third variable.By convexity of H in that variable, H is also the dual conjugate of L (and the formal optimality condition for the supremum is p = ∇L(v)): The principal feature of the MFGs equations is that the Kolmogorov-Fokker-Planck (KFP) equation evolves forward in time with an explicit initial condition at t = 0 and the Hamilton-Jacobi-Bellman (HJB) equation evolves backward in time with a possibly implicit initial condition at t = T .The mathematical interpretation of the KFP is in the weak sense and that of the HJB is in the viscosity sense, but we will mostly proceed in a formally (in particular assuming the regularity ( 11)-( 12) below).
The main innovation of this work with respect to the numerical method based on finite elements proposed in [9] is the presence of the positive diffusion coefficient ν > 0. (5) We will therefore pay particular attention to robustness with respect to ν.The first consequence of ( 5) is a minor adaptation in the formulation due to the ν term in (14).The second consequence is the necessity of choosing the finite element spaces appropriately, e.g. in our case we require H 2 -conformity of the spatial discretization of φ.The third consequence is the appearance of a linear partial differential operator (defining successive approximations of φ in the ALG2 iteration) that is second order in time and fourth order in space.This means that a large linear algebraic system of equations, typically highly ill-conditioned, has to be solved in each outer iteration.The main contribution of this paper is a preconditioner that breaks that linear problem down into a sequence of independent spatial problems, each amenable to multigrid.The preconditioner is based on the principle of operator preconditioning [22,25], i.e. it is obtained as the discretization of an operator that is similar to on the continuous level.We first motivate the functional framework in Sections 2.2-2.4,and then define the operator in Section 2.5.1.

Variational formulation.
As already observed in [23], the MFGs equations (1) arise as the first order optimality conditions of a constrained optimization problem which we recall here.This perspective provides some useful intuition for the quantities appearing in (1), see also the example in Section 5.2.
Consider the transport equation (slightly overloading the notation from (1a)) One may think of the space-time dependent function ρ as the state variable and of the space-time dependent vector field v as the control variable.To fix ρ and v we introduce the functional (in Section 5.5 we also consider a variant with multiple Γ -like terms) and the constrained optimization problem inf For ρ ≤ 0, by convention, L(v)ρ = 0 if (ρ, v) = 0 and L(v)ρ = +∞ else.To formally characterize the minimizers we look for the stationary points of the space-time Lagrangian J 1 (ρ, v) + 〈φ, KFP[ρ, v]〉, where the space-time scalar function φ is the Lagrange multiplier for the KFP constraint.The derivative of the Lagrangian with respect to v gives the relation ∇L(v) = ∇φ, at least where ρ = 0. Using the optimality condition for the supremum in (4), this implies the representation H(∇φ) = v • ∇φ − L(v) and hence the feedback strategy v = ∇H(∇φ).Employing the latter in the derivative of the space-time Lagrangian with respect to ρ gives the HJB equation (1b) with its terminal condition from (1c).

The predual problem. We will work with the function spaces
We write homogeneous Neumann boundary conditions, we use the norm given by Introducing the Bochner-Sobolev space we suppose that the Lagrange multiplier φ from the previous subsection satisfies φ ∈ X. ( We abbreviate L 2 := L 2 ((0, T ) × D).Recall the isometry L 2 ∼ = L 2 ((0, T ); L 2 (D)).We identify H ∼ = H via the Riesz isomorphism, obtaining the Gelfand triple V → H ∼ = H → V , and in particular the embedding V → V .Set This choice of spaces and norms will be important in Section 2.5.1 in the derivation of the preconditioner .We identify again the L 2 spaces with their dual.Elements of Y will be denoted by σ = (a, b, c) or λ = (ρ, m, e), and those of Y by λ = (ρ, m, e ).We define the linear operator This definition follows [9] except that here we admit ν > 0.
By assumptions on A and Γ , the functions A and Γ are "flat" on negative arguments.For convenience of the reader, and to highlight some technical points, we explain how (15) and (8) are connected in the remainder of this subsection.First, by the general theory of duality [18], the (typical) convex dual problem of (15) where Λ : Y → X is the adjoint (see Theorems 1-2 below for the relation between the minimizers of ( 15) and ( 18)).We write conditions [18,Prop. 4.1], the function [L + A] coincides with its second dual.The same holds for Γ .This leads to (cf. ( 7)) Concerning , we first observe (using integration by parts in time) where the spatial boundary terms are collected in Invoking the spatial homogeneous Neumann boundary conditions (3) we find that if and only if the equations are satisfied in the weak sense.In view of ( 19) and (22), writing v := m/ρ we find that ( 18) is just the constrained optimization problem (8).We note here two consequences of (23) for the numerical method in Section 3 below: • The spatial Neumann boundary conditions (2) imply m • n = 0 on ∂ D, which we also use for the discrete flux m h .• From the discrete triple λ h = (ρ h , m h , e h ), an approximation of the velocity field in (8) may be obtained as v h := m h /ρ h .
The built-in regularity assumption φ ∈ X in (15) and c ∈ V, and certain growth conditions on A , Γ and H ensure that G(σ) is well-defined.Let us focus on d = 2 spatial dimensions.Then we may assume quadratic growth for A and Γ and arbitrary polynomial growth for H in the last variable.For an arbitrary σ ∈ Y not in the range of Λ we may extend the definition of G by +∞ but this would preclude the usage of Theorem 1.
this road in this paper in order to simplify the numerics in Step B of ALG2 below.Another possibility is to assume convexity and at most quadratic growth for A and A • H ensuring continuity of G on Y so that Theorem 1 could be used but this would rule out the desirable situation of the standard quadratic Hamiltonian H together with a quadratic cost A in (7).In any case Theorem 1 does not address existence for the primal problem (15), and seeing ( 18)-( 15) as the primal-dual pair is problematic because F is discontinuous wherever it is finite (unless again, we modify Y suitably).
Existence and stability in space-time Lebesgue spaces were obtained using Theorem 1 in [14], and by operating directly on the MFGs equations for example in [24, Theorem 2.7] and [3,Theorem 3.1].The methods of [3] do not seem to apply in our case due to the lack of a discrete maximum principle.
Another possible approach to existence of minimizers in ( 15) is to verify coercivity of the functional under suitable assumptions on the data (which would also play a role in establishing Γ -convergence of numerical solutions [9, Section 3.1]).This, however, does not seem possible because the term D Γ (−φ(T )) does not provide control on the spatial derivatives of φ(T ), which would be necessary to control the norm of φ in X (see e.g.[6, Theorem 4.1] for that kind of statement).We believe the mesh-dependent convergence rate of ALG2, see Section 5.4, is a manifestation of this fact.

ALG2 formulation.
In order to solve the optimization problem (15), in the augmented Lagrangian method one looks for saddle points where r ≥ 0; these are characterized by It is elementary to check that the Lagrangians L r and L 0 have the same saddle points.Moreover, any saddle point furnishes a solution to (15) and (18).With the notation of the previous subsection it would have been natural to have the duality pairing 〈Λφ − σ, λ 〉 with λ ∈ Y instead of the Y scalar product but below it is more convenient to work with the Riesz representative λ ∈ Y given by (λ, •) Y = λ .With the L 2 identification already made in Section 2.3, the first two components of λ and λ coincide.Let r > 0. Consider the function h r (λ) := inf φ,σ L r (φ, σ, λ).Then λ in the saddle point characterization (25) maximizes this function.At any point λ, its gradient direction is (Λ φ − σ) where ( φ, σ) := argmin L r (•, •, λ).This suggests a steepest ascent algorithm for finding the optimal λ.This is the algorithm "ALG1" in [19, Section 3.1].The subsequent algorithm "ALG2" [19, Section 3.2] is a modification where the minimization of φ and σ is decoupled.It proceeds by iterating the following three steps.
A. Minimize L r with respect to the first component by solving the elliptic problem B. Proximal step: C. Multiplier update: The algorithm is also known as ADMM or Douglas-Rachford, and it was shown in [17] to be a proximal point algorithm.A recent overview of convergence results may be found in [20].We now comment on each of those steps separately, and use this occasion to introduce the operator that will be the basis for the preconditioner below.

2.5.1.
Step A. Since is linear, (26) amounts to the linear variational problem The only term involving the unknown φ (k+1) is (we suppress the iteration superscript) Expanding the first term on the right-hand side we obtain having used integration by parts in time and space on the term 〈∂ t φ, ∆ φ〉.The boundary term ν 2 J×∂ D (φ∇ φ − φ∇φ) • n disappears for any combination of homogeneous/periodic Dirichlet/Neumann spatial boundary conditions.The negative ν 2 term cancels by the definition of the norm on V, so we are left with Consider the operator := Λ Λ : X → X , where the adjoint is with respect to the Y scalar product; hence 〈 φ, φ〉 = (32).Below we will iteratively invert a discretized version of , and will therefore require a good preconditioner.To that end we define the symmetric operator : X → X by omitting the last term in (32), cf.Section 4.1, i.e.

〈 φ, φ〉
The operator is equivalent to by an argument similar to that of [4, Section 2.5].Specifically, take the eigenfunctions ϕ n normalized to ϕ n D = 1 and the corresponding eigenvalues 0 = µ 0 < µ 1 ≤ . . . of the operator −∆ with the Neumann boundary conditions (3).Consider the expansion φ = n θ n ⊗ ϕ n .The first inequality in the equivalence (33) (32) (33) is obvious.For the second, we need to estimate ν 2 ∇φ(0 , where α and f are placeholders for ν 2 µ n and θ n .We consider two cases: . This suggests K := tanh T .In both cases we obtain a constant K that depends only on the time horizon T .Consequently, the equivalence (32) ∼ (33) is robust in the diffusion coefficient ν.
The variational problem (29) is convenient for finite element discretization as is.In strong form (29)-(31) amounts to a PDE of the form shows that the problem is of second order in time and of fourth order in space.Step B consists of two decoupled proximal subproblems, the first one being with the "priors" ā(k+1) := (∂ t + ν 2 ∆)φ (k+1) + 1 r ρ (k) and b(k+1) := ∇φ (k+1) + 1 r m (k) , and the second one being with the "prior" c(k+1) := −φ (k+1) (T ) + 1 r e (k) .The first subproblem is as in [9, Section 4.2] but the second is not due to the non-L 2 norm, which is a consequence of the choice (13) of Y.

Step C.
Step C is a straightforward update.

Discretization.
3.1.Discrete spaces.We discretize the quantities (φ, σ, λ) in ( 24) using tensor products of piecewise polynomial functions on an interval.We use the following notation, leaving the underlying mesh to be specified separately: P1. Continuous piecewise affine functions (hat functions); P2.Continuous piecewise quadratic functions; D0.Piecewise constant functions (top functions); D1.Piecewise polynomials of degree one with no inter-element continuity; B2.Piecewise polynomials of degree two with C 1 continuity.For simplicity, we assume here that the spatial dimension is d = 2.We write P2 ⊗ B2 2 for the function space spanned by the products of P2 functions in time with B2 functions in each spatial dimension, etc.We will look for a discrete saddle point (φ h , σ h , λ h ) as follows: Furthermore, we set Y h := A h × B h × C h with the norm Y.The motivation for taking B2 in (36a) is mainly H 2 (D)-conformity required for the regularity (12).It also interacts well with the choice D0 in (36b) because D0 simultaneously approximates functions in B2 and their second derivatives well, which plays a role in (42) and leads to the dimension formula (38) below.More generally, instead of the P2-D1 combination in (36a)-(36b) one can take P(p + 1)-D(p) for any degree p ≥ 0, so that (42) below corresponds to a so-called continuous Galerkin time-stepping scheme.The choice of the spatial component in (36c) allows integration by parts in space in an expression like 〈m h , −∇φ h 〉 and approximates ∇φ h well.Finally, the space in (36d) is simply the trace of (36a) at t = T , and is sufficiently regular for the proximal step (49) to be well-defined.
We impose the no-flux boundary conditions for the density ρ through (cf.( 2) and ( 23)): homogeneous Neumann spatial boundary conditions on X h ; (37a) homogeneous Dirichlet boundary conditions on the P1 components of B h . (37b) the discrete problem (41) admits a minimizer (under reasonable conditions on A, Γ , and L, e.g. as in the numerical examples below).
An alternative to ALG2 is to first discretize the MFGs equations ( 1), thus obtaining a nonlinear system of evolution equations, and then apply a root-finding method such as the Newton iteration.This route was investigated in [2].In this case, a linearized discrete version of (1) is to be solved in each iteration.Some drawbacks associated with this approach are: the saddle point structure of the linear problem as opposed to the symmetric positive definite problem (45); the required degree of differentiability of the data (e.g.C 2 for H(x, •) in [2]); the iterates may not respect constraints such as non-negativity of the density, unless they are already sufficiently close to the solution [2, p. 202]; it is difficult to specify the required accuracy for the iterates.It seems therefore meaningful to switch to a Newton iteration once a good guess has been constructed with the ALG2 iteration.
It is tempting to try to solve the MFGs system (1) by iterating the KFP-HJB equations.This works in practice [16], but averaging as in [15] may furnish a provably convergent iteration.

Discretized ALG2.
We run ALG2 from Section 2.5 on the discrete augmented Lagrangian obtained from (40): Suppose h and h are closed proper convex functions into (−∞, ∞], the operator Λ h is injective, and fix r > 0 (we will use r = 1).Assume that there exists a Kuhn-Tucker pair

Discrete
Step A. The iterate φ (k+1) h ∈ X h is defined by the linear variational problem With h := Λ h Λ h , where the adjoint is w.r.t. the Y scalar product, this can be written as

Discrete Step B. The prior is σ(k+1
The minimization problem (34) is a pointwise minimization in space-time.Even if the data are discrete functions, the minimizer need not lie in the discrete spaces.We first perform the minimization on collocation nodes on each space-time element that are together unisolvent for a piecewise polynomial space.Specifically, we use the 2-node Gauss-Legendre quadrature points on each one-dimensional element (hence 4 nodes per spatial element and 8 nodes per space-time element) to characterize a function in the discrete space Z h := D1 ⊗ D1 2 .We write (Z h ) ⊂ J × D for those collocation nodes.Then we project the result onto the original discrete spaces.The procedure is thus as follows.
1.For each collocation node n ∈ h from the values on the collocation nodes.

Project (orthogonally in
the new iterates (a (k+1) h , b (k+1) h ).Let I 1 denote the operator that constructs a D1 2 function from its values on the spatial collocation nodes (D1 2 ) ⊂ D. The minimization problem (35) is approximated by (49)

Preconditioning the
Step A of ALG2.
4.1.Basic preconditioner.In Section 2.5.1 we introduced the symmetric operator : and argued that it is equivalent to the operator := Λ Λ uniformly in the diffusion coefficient ν.We discretize this operator to obtain h , defined by h φ h := ( φ h )| X h , and use it as a preconditioner for the discrete operator h = Λ h Λ h in ( 46)-( 53).Recall from (39) that Λ h includes the projections Q 1 and Q 2 onto discrete spaces.This begs the question whether the equivalence h ∼ h is still true on X h uniformly in the relevant parameters (as it would be without the projections).The answer is a conditional yes in the sense of [5], as reported in Figure 1.We give only a sketch of the underlying mechanism here.Note that it suffices to check the first equivalence in J×D .Let T 1 and T 2 be those quantities without the projections, so that 〈 φ, φ〉 = T 1 + T 2 .Clearly, T i,h ≤ T i .Define the hyperbolic and parabolic CFL numbers in terms of the resolution of the space-time mesh as CFL hyp := (time scale)/(length scale) CFL par := ν 2 (time scale)/(length scale) 2 .Consider two cases: • Case ν 1.In this case T 2 T 1 holds, because the third term T 2 of (32) is controlled by the second.Moreover, provided CFL par 1, the arguments of [5, Sec.3.2.3]give T 1 T 1,h and therefore (50).Compare Figure 1 (left) with the implicit midpoint rule in [5, Fig. 1.1].• Case ν 1.In this case T 1 ∼ T 1,h and the term T 2 is the problematic one.But T 2 T 1,h + T 2,h holds provided CFL hyp 1 by an argument similar to [5, (3.20)].See Figure 1 (right).We believe that the CFL conditions are fundamental to the good performance of the overall method, because the "continuous Galerkin" P(p + 1)-D(p) temporal discretization is used in (36a)-(36b) to solve the parabolic evolution equation (1b) for φ h with test functions like ρ h .Perhaps adapting the unconditionally stable discretization variant from [5,Sec. 3.4] could alleviate at least the parabolic CFL restriction.
The inversion of this preconditioner becomes more tractable by passing to a temporal basis of P2 in (36a) that is orthogonal with respect to both scalar products approximately diagonal (cf.[4]).For example, in a standard B-spline wavelet basis they will be approximately diagonal up to a block whose size is logarithmic in the number of temporal degrees of freedom due to the boundary term in (51).

Discussion. It was noted in Section 2.5.1 that in
Step A, essentially a PDE of the form −∂ 2 t φ + ν 4 ∆ 2 φ − ∆φ = RHS has to be solved on the space-time cylinder.Similar problems appear in the literature on numerical methods for optimal control of parabolic PDEs, even though the cost functional is usually simpler than (7).Take for example the work [11], where the quadratic tracking functional y − z 2 J×D + α u 2 J×D is minimized subject to the heat equation ∂ t y − ∆ y = −u, y| ∂ D = 0, for given initial value y(0) and desired state z.The first order optimality system includes the equations (∂ t +∆)p = z − y, p(T ) = 0 and u = p.In this example, y, p correspond to ρ, φ.Testing the equation for p by (∂ t + ∆)p and integrating in space-time leads to 〈 Λp, Λp〉 := 〈(∂ t + ∆)p, (∂ t + ∆)p〉 + α −1 〈p, p〉 = RHS.This is the analog of Equation ( 29)-( 30).With this definition, Λp Thus, a preconditioner along the lines of §2.5.1 and §4.1 can be constructed (cf.also [4,Sec. 4.1.3]).Vice versa, preconditioners developed in that context might apply here with some adaptations.
A number of preconditioners for the Newton iteration (see Section 3.1) were proposed in [2].At this point it is appropriate to recall that of [2, Algorithm C].It consists in a) eliminating ρ from the linear saddle point problem thus obtaining a linear PDE for φ; b) applying a linear solver such as BiCGstab; and c) using multigrid with Gauss-Seidel smoothing and spatial semicoarsening only.The principal part of the PDE in a) is again −∂ 2 t φ + ν 4 ∆ 2 φ, but it also includes a (symmetric nonnegative) term of the form − div(ρ (k) ∂ pp H(∇φ (k) )∇φ).The authors report favorable results, but the interpretation and a meaningful comparison is difficult because the number of BiCGstab iterations is used as an indirect measure of the condition number (with possibly mesh-dependent norms in the stopping criterion), and the effect of temporal vs. spatial refinement is not investigated separately.The quality of that preconditioner does seem to improve with increasing diffusion ν, at least for moderate values of ν.By contrast, • for the basic preconditioner from Section 4.1, the interplay of the equivalences (50) and the discrete spaces (36) requires increasing the temporal resolution with increasing diffusion until CFL par 1 (and CFL hyp 1) to guarantee good preconditioning.• for the modified preconditioner from Section 4.3, the quality is robust in ν, only requiring CFL hyp 1.However, we do not have a solid theoretical justification.

Implementation.
We give here some details on the implementation.
As the starting values for the ALG2 iteration we use the zero vector.The solution to (46) is approximated by the preconditioned conjugate gradient method, where −1 h is the approximation of the inverse of the preconditioner as described in Section 4.1 (basic preconditioner) or Section 4.2 (multigrid preconditioner).The PCG iteration is initialized with the zero vector.We use the relative residual tolerance ε pcg = 10 −1 , meaning that the iteration terminates once the residual in the −1 h norm (defined by r 2 := 〈 −1 h r, r〉) is reduced by the factor ε pcg .The maximal iteration number maxit pcg = 100 is never attained in our computations.
The discrete spatial biharmonic problems in the basic preconditioner ( §4.1) and on the coarsest level of the multigrid preconditioner ( §4.2) are solved in Matlab with "backslash".Temporal mesh refinements Temporal mesh refinements Temporal mesh refinements Temporal mesh refinements   computed on the finer mesh with 2 6 × (2 7 × 2 6 ) space-time elements, 5 000 ALG2 iterations and the multigrid preconditioner ( §4.2). Figure 6 shows the error of the discrete density ρ (k)  h and the discrete cost φ (k)  h as the ALG2 iteration progresses, varying the spatial resolution (keeping the temporal resolution at 2 5 elements), the PCG tolerance ε pcg in (53), and the diffusion coefficient ν.The preconditioner used is the basic preconditioner ( §4.1) but the multigrid preconditioner produces ( §4.2) essentially the same results.The convergence rate is clearly mesh dependent, indicating nonuniform convexity properties (with respect to the discretization) of the functional to be minimized, see the discussion in Section 2.4.Moreover, for the larger diffusion coefficient ν = 1 we observe a somewhat non-monotonic convergence and 1 000 ALG2 iterations seem not enough; this indicates that even 5 000 ALG2 iterations for the reference solution might be insufficient.The average number of PCG iterations is reported in Table 1.
The application of the preconditioner consumes the bulk of the time.The basic preconditioner (with backslash for the solution of the bi-Laplace equations) takes around 7.5s on the 2 5 × (2 6 × 2 5 ) mesh and around 375s on the 2 5 × (2 7 × 2 6 ) mesh; the multigrid preconditioner 0.9s and 8s, respectively.
subject to the same transport equation KFP[ρ, v] = 0. Here, {0 =: τ 0 < τ 1 < . . .< τ N = T } are temporal nodes where some information on the density ρ is available.Take, say, Γ i (x, ρ(τ i )) := 1 2 (ρ τ i (x) − ρ(τ i , x)) 2 for some given spatial densities ρ τ i .Then the functional (54) selects an evolution of ρ which is propelled by a "lenient" optimal transport starting from ρ 0 along the ρ τ i and subject to congestion and diffusion effects.The resulting density ρ is not the same as would be obtained by optimizing (7) on the successive temporal intervals [τ i−1 , τ i ] because more concession in meeting ρ τ i is potentially made with (54).
To construct a numerical method, the functional framework is modified as follows.We The discretization is a simple adaptation of (36a)-(36d), allowing temporal discontinuities in X h at each τ i and setting Y h := A h ×B h ×C N h .The discrete operator Λ h includes the projections as in (39).Motivated by (55), the preconditioner for Λ h Λ h is constructed as in Section 4.1 with the additional terms in the definition of (θ , θ ) 1 in (51).As in the example from Section 5.2 we take D = (−2, 2) and the initial density ρ 0 = 1 (−2,−1) .We set T = 2, ν = 1, A(ρ) = 1 2 ρ 2 and H(t, x, p) = 1 2 |p| 2 .We define the costs Γ i (ρ) = 10 × 1 2 (ρ − ρ i ) 2 with ρ 1 = 1 (1,2) and ρ 2 = ρ 0 .Thus we expect the density to accumulate in the interval (1, 2) at time t = 1 and then go back to (−2, −1) at time T = 2.We compute on a 2 8 × 2 8 space-time mesh with the additional temporal tenfold geometric refinement around t = 1.We use the basic preconditioner from Section 4.1 with the given modification and perform 1 000 ALG2 iterations.The algorithm appears to be very sensitive to the error incurred by the PCG iteration (53).We start therefore with the initial tolerance ε pcg = 10 −10 and increase it by a factor of ten every 10 iterations, leaving it at ε pcg = 10 −2 after 80 iterations.The computed density and cost function are shown in Figure 7.The discontinuity of the cost function across t = 1 is clearly visible.
Then the algorithm converges to a solution of the discrete optimization problem (40) and the convergence is robust under perturbations in Step A and Step B if those perturbations decay sufficiently fast with the iteration [17, Theorem 8].

FIG. 1 .
FIG. 1.The condition number of the discrete operator h preconditioned with the basic preconditioner h from Section 4.1 as a function of the temporal resolution under uniform refinement.The spatial domain is D = (0, 1) 2 , the time horizon is T = 1.The lines (bottom-to-top) correspond to 1, . . ., 5 uniform spatial refinements.Left: ν = 1.Right: ν = 0.

TABLE 1
Average number of PCG iterations over the first 500 ALG2 steps in the convergence study in Section 5.