A numerical solution to Monge’s problem with a Finsler distance as cost

Monge’s problem with a Finsler cost is intimately related to an optimal ﬂow problem. Discretization of this problem and its dual leads to a well-posed ﬁnite-dimensional saddle-point problem which can be solved numerically relatively easily by an augmented Lagrangian approach in the same spirit as the Benamou-Brenier method for the optimal transport problem with quadratic cost. Numerical results validate the method. We also emphasize that the algorithm only requires elementary operations and in particular never involves evaluation of the Finsler distance or of geodesics.

where Π(f − , f + ) is the set of transport plans between f − and f + i.e. the set of probability measures having f − and f + as marginals and d L is a Finsler distance.More precisely, d L is given by d L (x, y) := inf 1 0 L(γ(t), γ(t)) dt : γ ∈ W 1,1 ([0, 1], Ω), γ(0) = x, γ(1) = y (1.2) where the Lagrangian L: Ω × R d → R + is a continuous function of Finsler type i.e. for every x ∈ Ω, v → L(x, v) is a norm 1 and there is a constant C > 0 such that the following nondegeneracy condition holds: Of course, one difficulty is the evaluation of the cost, and we shall see how to avoid computing it.This will be done by considering suitable dual, minimal flow and saddle-point formulations for which one can easily use an augmented Lagrangian method.The use of augmented Lagrangian methods in optimal transport was pioneered in the seminal work of Benamou and Brenier [4] on the dynamic formulation of the quadratic optimal transport case.For a distance cost (Monge case), in fact there is no need to introduce an additional time-variable and the analogue of the Benamou-Brenier dynamic problem is the minimal flow problem introduced by Beckmann [3].We refer to the recent work [5] of the first two authors for applications of these augmented Lagrangian methods to Mean-Field-Games and optimal transport and to the work of the third author [14] for applications to anisotropic congested optimal transport.To the best of our knowledge, the relevance of augmented Lagrangian methods for a general Finsler metric has remained unnoticed in the literature.It is worth mentioning here that for two relevant particular classes of metrics namely Riemannian ones and crystalline norms (i.e.norms whose unit ball is the intersection of finitely many half spaces), there are alternative methods.The Riemannian case leads to a second-order cone program (SOCP) for which there are mature interior points solvers.The crystalline case can in principle be solved by linear programming.The advantage of the augmented Lagrangian approach is that it is flexible and can handle arbitrary Finsler metrics, only a pointwise projection step has to take into account the particular geometry of the metric.For other methods to solve optimal problems with the euclidean distance as transport cost, we refer for instance to [2] where a certain regularization is considered.Our goal is to show that Monge's problem with a Finsler metric is in fact quite easy to solve directly numerically by using an augmented Lagrangian approach.Let us finally mention the work of Rubinstein and Wolansky [16] which leads to problems similar to (1.1)-(1.2) having their roots in the study of semiclassical limits for some classes of dispersive wave equations.The paper is organized as follows.In section 2, we recall several reformulations of the Monge problem with Finsler cost (1.1): the Kantorovich dual, the minimal flow reformulation and finally a (formal) saddle-point problem for finding at the same time the Kantorovich potential and the optimal flow field.Section 3 describes the discretization of the saddle-point problem, discusses the convergence and details the steps of the augmented algorithm ALG2 of Glowinski and Fortin in this context.Section 4 gives numerical results.

Dual and minimal flow formulations
The standard Kantorovich duality formula (see [18]) says that the infimum in Monge's problem (1.1) coincides with the value of the dual: (2.1) We first observe if u is 1-Lipschitz for d L on Ω then, thanks to (1.3), u is also Lipschitz for the geodesic distance on Ω which is the euclidean distance since we have assumed Ω to be convex.Thus, u is differentiable a.e., moreover the constraint u(x) − u(y) ≤ d L (x, y) can be expessed in differential form as follows.Defining the dual norm L * (x, .) of L(x, .): one can express the fact that u is 1-Lipschitz for d L by the following pointwise constraint on ∇u Thus (2.1) can be rewritten in sup-inf form as sup Switching the infimum and the supremum above, we obtain another dual formulation of (2.1): inf observing that the supremum with respect to and +∞ otherwise, we obtain the following minimal flow problem dual formulation of (2.1): inf Minimal flow formulations for transport problems were first introduced in the 1950's by Beckmann in an economic context [3], the connection with Monge's problem was realized much later (see in particular [10] and [11]).
It is obvious that (2.1) possesses solutions.Standard convex duality also implies that there is no duality gap and that sup(2.1)= inf(1.1)= inf(2.5). (2.6) It is however not clear in general that (2.5) possesses L 1 solutions.In the spatially homogeneous case where L(x, v) is the euclidean norm (or more generally some smooth and uniformly convex norm), |σ| is called the transport density and there are important and involved L 1 regularity results for the transport density under suitable assumptions on f ± due to Feldman and McCann [11], De Pascale, Evans and Pratelli [7], De Pascale and Pratelli [8] and Santambrogio [17].We are not aware of extensions to the Finsler case yet.Since the cost in (2.5) is convex and homogeneous of degree one, (2.5) can be relaxed to vector-valued measures which amounts to replace (2.5) by: inf where |σ| is the total variation measure of the vector-valued measure σ and dσ d|σ| is the density of σ with respect to |σ|.It is then obvious by (1.3) and Banach-Alaoglu Theorem that the relaxed problem (2.7) admits solutions (note that the divergence condition −div(σ) = f is understood together with the boudary condition σ • n = 0 on ∂Ω in the weak sense of (2.4) which is obviously a closed condition for the weak star convergence of measures).To sum up, we have the following duality and attainment relations MK(L, f ) := min(1.1)= max(2.1)= inf(2.5)= min(2.7) (2.8) where we have denoted MK(L, f ) the common value of (1.1), (2.1) and (2.5).

Relations between the three problems
We now discuss in a slightly formal way the relationships between the three problems (1.1), (2.1) and (2.5).For further use, let us denote by B(x) and B * (x) respectively the unit ball for L(x, .) and L * (x, .): and recall Recalling that if C is a closed convex subset of R d and z ∈ C the normal cone of C at z, N C (z) is by definition: it is easy to see that, if non zero vectors σ and q satisfy L(x, σ)L * (x, q) = q • σ this means that In the case where B(x) or B * (x) is smooth the normal cones at a point of ∂B(x) or ∂B * (x) are simply the half line generated by the normal vectors (Gauss maps) and thus the previous relations give an unambiguous information on the relation between the direction of q and σ.Any optimal plan π for (1.1) is related to any optimal potential u for (2.1) by the complementary slackness condition: (2.12) Let then (x, y) ∈ spt(π) and let t ∈ [0, 1] → γ x,y (t) be a constant speed geodesic between x and y, it is easy to deduce from (2.12) and the fact that u is 1-d L Lipschitz that one also has for every (s, t) such that 0 ≤ s ≤ t ≤ 1: In other words, u grows at the maximal rate allowed by the Lipschitz constraint on the geodesic γ x,y .If u was smooth, we could further write: and then which also gives This expresses in a local way the fact that the Lipschitz constraint on u is binding on geodesics (this is again formal).Note that (2.15) gives a precise relation between ∇u(x) and the direction of geodesics passing through x: they are tangent to a vector in the normal cone N B * (x) (∇u(x)).Now if σ solves (2.5) and u is a solution of (2.1), then complementary slackness takes the form which again expresses that the Lipschitz constraint is binding on the support of the transport density.The direction of optimal flows and gradients of Kantorovich potentials are therefore related by the duality relations It remains to investigate the relations between optimal plans and optimal flow fields.The following (heuristic) construction is well-known (see for instance [1]) in the euclidean setting: let π be an optimal plan i.e. a solution for (1.1).For every (x, y) ∈ spt(π) let t ∈ [0, 1] → γ x,y (t) be a geodesic between x and y and define the vector-valued measure σ π by i.e. −div(σ π ) = f so that σ π is admissible for the minimal flow problem (2.5).
To see that σ π is actually optimal, we consider a Kantorovich potential i.e. a solution u of (2.1).Thanks to (2.8), it is enough to show that On the one hand, observing that: On the other hand, thanks to the complementary slackness condition (2.14) and −div(σ π ) = f , we have This shows, at least formally, the optimality of σ π .

Lagrangian and saddle-point
where G(q) := Ω G(x, q(x))dx and and then rewrite (2.1)-(2.5)as the saddle point problem inf where the Lagrangian L is defined by For r > 0, let us also introduce the augmented Lagrangian .
Recall that L and L r have the same saddle-points ( [12], [13]).Note that in both L and L r we multiply the L ∞ vector field ∇u by σ, which a priori only makes sense only if σ is L 1 .Existence of saddle-points is therefore not guaranteed unless there is an L 1 solution to (2.5).However at the level of the discretized problems (see next section), there is no such regularity issue, there exists saddle-points for the discretized Lagrangian and finding such saddle-points is equivalent to solving (2.1) and (2.5) simultaneously.
3 Discretization and algorithm

Discretization
We now consider suitable approximations of our problems by finite-dimensional (convex) ones using finite elements.In these finite-dimensional approximations, existence of saddle-points is not an issue anymore.More precisely, consider a family of regular triangulations T h of the domain Ω (which we will be a two-dimensional polyehdron in our simulations) indexed by the typical meshsize h (i.e. the diameter of each T ∈ T h is less than Ch for some positive constant C), let E h ⊂ W 1,∞ (Ω) be the corresponding finite-dimensional space of Lagrange P 1 (piecewise linear) finite elements of order 1 (a similar analysis can be done for higher order finite-elements) whose generic elements are denoted u h .Slightly abusing notations, we shall consider u h both as a finite-dimensional vector and a Lipschitz, piecewise linear function defined on the whole domain, the gradient of u h has piecewise constant components, it is still denoted ∇u h .We further assume that the mesh is regular in the sense that the Lagrange interpolate map We also approximate the linear form f by f h ∈ (E h ) * ≃ E h (again with f h , 1 = 0) in such a way that f h weakly converges to f in the sense of measures as h → 0.
We then consider the approximation of (2.1): sup where K h is the convex subset of E h consisting of all u h 's in E h such that for every T ∈ T h one has where x T is a given point in T (for instance its center of mass or one of its vertices).To prove that this is a consistent approximation of Kantorovich problem (2.1), it is useful to observe first that smooth functions are dense in the admissible set for (2.1): Lemma 3.1.Let u be a 1-Lipschitz for d L function on Ω, then there exists a sequence u n of C ∞ (Ω), 1-Lipschitz for d L functions converging uniformly on Ω to u.In particular this implies that 2 by well-know results (see [6], chapter 4), assumption (3.1) holds as soon as T h is nondegenerate in the sense that there is some ρ > 0 such that for every h and every T ∈ T h , there is a ball B T included in T such that diam(B T ) ≥ ρ diam(T ).More precisely, in this case, ∇v − ∇( Proof.Recall that u is Lipschitz on Ω with Lipschitz constant ∇u L ∞ , for θ > 0 define: and observe that ω(θ) tends to 0 as θ → 0 + .Without loss of generality we may assume that 0 ∈ Ω, then for λ > 0 define u λ on (1 + λ)Ω by For fixed (small) λ > 0, chose ε = ε λ > 0 small enough so that (1 + λ)Ω contains the set of points whose distance to Ω is less than ε.Since ∇u λ (x) = (1 + λ) −1 ∇u((1 + λ) −1 x), L * (y, ∇u(y)) ≤ 1 a.e. and using the homogeneity of L * (x, .),we have For λ > 0, define the smooth function v λ := ρ ε λ ⋆ u λ .We then have for every x ∈ Ω, using the convexity of L * (x, .),Jensen's inequality, and again the fact that L * (y, ∇u(y)) ≤ 1 a.e.
One easily deduces the following convergence result: Proposition 3.2.Let u h be a solution of (3.2) normalized so as to have zero mean (which costs no generality since f h , 1 = 0), then for some vanishing sequence of meshsizes h n → 0 as n → ∞, u hn converges uniformly to some Kantorovich potential u i.e. some solution of (2.1).
Proof.Thanks to (1.3), u h is uniformly Lipschitz.Since it has zero mean, thanks to Ascoli's theorem, it converges in C(Ω) (up to a subsequence) to some Lipschitz function u.Thanks to Banach-Alaoglu's Theorem, we may also assume that ∇u h converges weakly * in L ∞ to ∇u.To check that u is 1-d L Lipschitz, it is enough to show that for every σ ∈ C(Ω, R 2 ) one has multiplying by the measure of T , summing over all triangles of T h and letting h → 0 gives (3.6).It remains to prove that u solves (2.1), which thanks to Lemma 3.1 amounts to show that f, u ≥ f, v for every smooth and 1d L -Lipschitz function v. Let then v be such a smooth and 1-d L -Lipschitz function, for every T ∈ T h , we have where tends to 0 as h → 0 thanks to (3.1).Then defining v h := (1 + ω h ) −1 I h (v), we have v h ∈ K h and v h converges uniformly to v as h → 0, passing to the limit in f h , u h ≥ f h , v h , we can conclude that u is a Kantorovich potential.

Augmented Lagrangian algorithm
From now on, we drop the dependence in h in the approximation parameter and slightly abusing notations, we use the same notations as in the continuous framework, eventhough in what follows we actually consider the discretization of the augmented Lagrangian L r .Existence of a saddle-point is not an issue at the level of the finite-dimensional approximation and convergence of the augmented Lagrangian algorithm recalled below is well-known (see Eckstein and Bertsekas [9]).The augmented Lagrangian algorithm ALG2 splitting scheme, consists, starting from (u 0 , q 0 , σ 0 ) ∈ R n × R m × R m to generate inductively a sequence (u k , q k , σ k ) as follows (abusing notations we still denote by ∇ the discretization of the gradient): • Step 1: minimization with respect to u: • Step 2: minimization with respect to q: • Step 3: update the multiplier by the gradient ascent formula Using finite elements, for instance P 1 for u and P 0 for σ, ∇u and σ are not exactly in the same finite dimensional space.In this formula, the gradient has in fact to be replaced its P 0 projection.
Step 1 consists in solving a Laplace equation : together with the Neumann boundary condition Step 2 is a pointwise projection problem where p B * (x) is the projection onto B * (x) := {q ∈ R d : L * (x, q) ≤ 1} the unit ball for L * (x, .).

Examples
We now give some details on how to perform the projection step 2 in practice.
For the sake of simplicity we drop the dependence of L and L * in x.

The Riemannian case
In the Riemannian case for some symmetric positive definite matrix A. Up to diagonalizing A, there is no loss of generality in assuming that 2 with λ i > 0 the eigenvalues of A. The dual norm L * is then given by L * (q) = ( with α the positive root of (3.12) otherwise where the nonlinear equation 3 to be solved by α reads (3.12) This single equation is monotone in α and can be efficiently solved by Newton's method.
The case where L(x, .) is defined by finitely many directions The second case we have in mind is the polyhedral case where L is defined by finitely many directions.More precisely (and again this is for a fixed x), we are given a collection of unit vectors v 1 , • • • , v k which we complete by and such that 0 belongs to the interior of the (symmetric) convex polytope co({v j , j = 1, • • • , 2k}).We are also given positive reals (ξ j ) j=1,••• ,2k with ξ j+k = ξ j for j ∈ {1, • • • , k} and then consider the crystalline norm It is immediate to see that L is the gauge of the symmetric convex polytope ) which is then its unit ball.The dual norm L * is then explicitly given by its dual unit ball B * is then defined by the inequalities |q • v j | ≤ ξ j for j = 1, • • • , k.In dimension two, the projection onto B * can be easily performed 3 α is a Lagrange multiplier associated to the (binding) constraint , it is obtained by plugging the optimality condition q i − q i + α λ i q i = 0 in the constraint.
as follows.First, compute the vertices and sides of B * (note that the latter have one of the vectors v j as normal, so these computations can be done in an automatic way) so as to be able to represent where S 1 , • • • , S l are the successive vertices of B * and denote by ν i the unit exterior normal to the side [S i , S i+1 ].Now if q is a generic vector of the plane belonging to the complement of B * (otherwise its projection is q), then q belongs either to one half strip [S i , S i+1 ] + R + ν i and in this case its projection on B * coincides with its projection on the line S i +ν ⊥ i or it belongs to one of the sectors S i + R + ν i−1 + R + ν i and in this case the projection of q is the vertex S i .We illustrate these considerations in Figure 1, by the following example with k = 4, In fact, the vector v 4 is useless since ξ 4 is very large with respect to the other ones.So the ball B * is only defined by the inequalities |q • v j | ≤ ξ j for j = 1, . . ., 3. The point q 1 is in the half strip [S 6 , S 1 ] + R + v 1 so that its projection is on the segment [S 6 , S 1 ].The point q 2 belongs to the sector S 6 + R + v 6 + R + v 1 so that its projection is S 6 .
In the following two subsections, in each figure, there are two images.The top one represents σ and the bottom one corresponds to the level lines of u.

Polyhedral case
We tested the following polyhedral examples.In Figure 6 we take k = 2 and v 1 perpendicular to v 2 , the dual unit ball B * then is a rectangle.In all other examples, k = 15 and the angle between two consecutive directions is π/k.The form of B * then depends on the chosen ξ j 's.If the ξ j 's are (almost) equal, B * is a polyhedron with thirty edges.It is in particular the case for Figure 5, Figure 8, Figure 9

Error Criteria
To analyze the convergence of our simulations, we have considered three criteria corresponding to the optimality conditions: as well as the duality relation which can be equivalently rewritten in a dual way as We use a triangulation of the unit square with n = 1/h element on each side and the following convergence criteria: is the L 2 error on the divergence constraint.

BND.Error = ∂Ω
is the L 2 (∂Ω h ) error on the Neumann boundary condition.The first two criteria represent the optimality conditions for the minimization of the Lagrangian with respect to u and the third one is for the maximization with respect to σ.We do not take exactly the same criteria for both examples.Indeed, in the Riemannian case, L(•, σ) is simple to compute whereas in the polyhedral case, it is tedious in general.On the other hand, L * (•, ∇u(•)) has an explicit form given by (3.13).We show below the results of our numerical simulations after 400 iterations for each case.These results suggest that smoothness of the metric affects the convergence speed.

A non symmetric case
So far, we have only considered symmetric cases corresponding to distances (L(x, v) = L(x, −v)) but this symmetry assumption is not essential for our approach.In fact, non symmetric situations might even be relevant in some models of road or pedestrian traffic.Only step 2 of the ALG2 algorithm (projection on the convex set B * (x) := {q : L * (x, q) ≤ 1}) is affected by this modification.We present here two simulations corresponding to gaussian densities (f = f 1 ) and B * (x) is the euclidean unit ball centered at 0 (symmetric case, as represented in Figure 11) and centered at 0.95(cos(πx 1 ), sin(πx 1 )) (non symmetric case, represented in Figure 12).

and Figure 10 .
In the last examples, we have ξ j = cos 2(j−1)π k + 1.5 and the ball B * then only has 12 edges.

Figure 11 :
Figure 11: Gaussian to gaussian transport, euclidean distance cost.Top: vector fied σ, bottom: level sets of the Kantorovich potential u.

Figure 12 :
Figure 12: Gaussian to gaussian transport, a non symmetric case where the unit ball of L * is not centered at 0. Top: vector fied σ, bottom: level sets of the Kantorovich potential u.

Table 1 :
Test case DIV.Error BND.Error DUAL.Error Time execution (seconds) Convergence of the finite element discretization for all test cases.