Minimal convex extensions and finite difference discretization of the quadratic Monge-Kantorovich problem

We present an adaptation of the MA-LBR scheme to the Monge-Amp{\`e}re equation with second boundary value condition, provided the target is a convex set. This yields a fast adaptive method to numerically solve the Optimal Transport problem between two absolutely continuous measures, the second of which has convex support. The proposed numerical method actually captures a specific Brenier solution which is minimal in some sense. We prove the convergence of the method as the grid stepsize vanishes and we show with numerical experiments that it is able to reproduce subtle properties of the Optimal Transport problem.


Introduction
Given two bounded open domains X and Y in R 2 and (strictly positive) probability densities f and g, defined respectively on X and Y , our goal is to numerically solve the quadratic Monge-Kantorovich problem where T : X → Y is a Borel map and T f = g is a mass conservation property, that is (see also (8))ˆT This problem has been extensively studied, we refer the reader to the classical monograph of Villani [Vil09] and also the more recent book by Santambrogio [San15] for a comprehensive review of its mathematical theory and applications.
Numerical approaches to optimal transport. From the numerical point of view, the oldest approach is the linear programming formulation of Kantorovich [Kan42] which relaxes the problem in the product space X × Y . This approach however does not scale with the size of the discretization. The so called "Benamou-Brenier" approach [BB00] is based on a different convex relaxation in a time extended space. It is difficult to assess exactly its efficiency but its many numerical implementations suggest it does no better than O(N 3 ).
Significant progress has been achieved this last decade and new algorithms are now available which can reach almost linear complexity. They can be classified in two groups : first, alternate projection methods (a.k.a. Split Bregman, Sinkhorn, IPFP) [Cut13, Gal16, BCC + 15] which are based on the entropic regularization of the Kantorovich problem. These methods are extremely flexible, they apply to a much wider range of Optimal Transport problems (OT) and are easy to parallelize. The entropic regularization however blurs the transport map. Decreasing this effect requires more sophisticated tools, see [CPSV16] and references therein.
The second class relies on the Monge-Ampère (MA) interpretation of problem (1). Because the optimal solution satisfies T = ∇u, u convex (see theorem 2.1) below), one seeks to solve ∇u(X) ⊆ Y , u convex. (MA-BV2) Classical solutions of this problem have been studied in [Urb90]. They are unique up to a constant which can be fixed, for example by adding´X u dx to the r.h.s. of the first equation in (MA-BV2), this additional term must then vanish because of the densities balance. Weak solutions can also be considered either in the Aleksandrov sense or in the regularity framework developed after Cafarelli in the 90s using Brenier weak solutions, that is solutions of (1). Section 2 briefly reviews these notions as we will work with C 1 solutions.
Numerical resolution of the Monge-Ampère equation. Numerical methods based on Monge-Ampère subdivide again in two branches, the semi-discrete approach where g is an empirical measure with a finite number N of Dirac masses [OP88,CP84,Mér11,Lév15a], and finitedifference methods (FD) where f is discretized on a grid of size N [LR05, SAK15,BFO14]. Efficient semi-discrete algorithms rely on the fast computation of the measure of Laguerre cells which correspond to the subgradient of the dual OT map at the Dirac locations. In this paper we focus on the second approach, but we will also use that finite differences solutions yield an approximation of this subgradient at grid points. The second boundary value condition (BV2), is a non local condition and a difficulty for the Monge-Ampère finite differences approach. Under a convexity hypothesis for Y , an equivalent local non-linear boundary condition is given in [BFO14] which preserves the monotonicity (aka "degenerate ellipticity" after Oberman [Obe06]) of the scheme. In particular a Newton method can be applied for the solution of the discretized system for periodic [LR05,SAK15] or Dirichlet [FO13,Mir15] Boundary Conditions (BC). We provide in Section 3 a new interpretation of these BC in terms of an infinite domain "minimal" convex extension. It can be used to build the same boundary conditions as in [BFO14], it shows how to extend the FD scheme outside of possiby non convex supports of f and also provides a suitable continuous interpolation tool for the convergence proof in Section 5.
For the sake of completeness, we mention that Optimal transport problems can be attacked trough non-specific methods based on the Kantorovich linear programming relaxation and its Entropic regularization [BCC + 15, OR15, Sch16] amongst many ...
Convergence of the discretizations. Existing convergence proofs for FD methods in [FO13,Mir15] rely on the viscosity solutions of Crandall and Lions [CIL92] and the abstract convergence theory of Barles and Souganidis [BS91]. This is a powerful framework which can be applied, in particular, to general degenerate elliptic non-linear second order equations. Our problem however has two specificities. First, the Monge-Ampère operator is degenerate elliptic only on the cone of convex function and this constraint must somehow be satisfied by the discretization and preserved in the convergence process. Second, the theory requires a uniqueness principle stating that viscosity sub-solutions are below super-solutions. The first problem or more generally the approximation of convex functions on a grid, has attracted a lot of attention, see [Mir16a] and the references therein. The Lattice Basis Reduction (LBR) technique in particular was applied to the MA Dirichlet problem in [BCM16]. The second issue (uniquess principle) is more delicate and, even though the BV2 reformulation in [BFO14] clearly belongs to the family of oblique boundary conditions (see [CIL92] for references) there is, to the best of our knowledge, no treatment of the specific (MA-BV2) and convexity constraint in the viscosity theory literature.
We follow a different path in this paper. We build a "minimal" convex extension interpolation of the discrete solutions and interpret it as an Aleksandrov solution of an adapted semi-discrete problem. We can then use classical optimal transport theory to prove convergence of our finite difference discretization. Instead of monotonicity and consistency of the scheme, the proof relies on three ingredients : specific properties of the LBR discretization of the Monge-Ampère operator (Section 4.5), the volume conservation enforced by the BV2 boundary condition and the uniqueness and the C 1 regularity of the limit problem. This last condition also requires the convexity of the target Y . We borrow here some of the techniques used in [CMOB15] to prove convergence of semi-discrete approximation of JKO gradient steps problems. In the case where g the target density is not constant, we show that simple centered FD is sufficient for the discrete gradient. In summary, we provide a convergent finite difference method for the optimal transport problem which applies to Lebesgue integrable source densities f and Lipschitz target densities g with convex support.
Newton solvers. A common feature of the semi-discrete and FD approaches is the successful use of a damped Newton method to solve the discrete set of equations. It results in a numerically observed linear complexity of these methods. Mérigot, Kitagawa and Thibert [KMT16] have proven in the semi-discrete case that the Jacobian of the discrete non-linear system is strictly positive definite, they show convergence of the Newton method and provide speed convergence. For finite difference similar results are available for the periodic and Dirichlet problems [LR05,FO13,Mir15]. The convergence of the Newton method relies on the invertibility of the Jacobian of the non-linear scheme. It remains open in the case of BV2 boundary conditions. We provide a numerical study in Section 6 indicating convergence and that the method has linear complexity. As the optimal transport problem is our main motivation for solving the Monge-Ampère problem (MA-BV2) with BV2 conditions, it also guides the notion of generalized solution we are interested in. Let µ, ν be Borel probability measures on R n . The Monge problem consists in finding a Borel map T : R n → R n , which solves where T µ = ν means that µ(T −1 (A)) = ν(A) for all Borel subset A of R n . Its relaxation, the Monge-Kantorovich problem, consists in finding a transport plan γ solution to min γ∈Π(µ,ν)ˆRn ×R n |x − y| 2 dγ(x, y), where Π(µ, ν) is the set of Borel probability measures on R n × R n with respective marginals µ and ν, i.e. γ(A × R n ) = µ(A) and γ(R n × B) = ν(B) for all Borel subsets A, B ⊆ R n .
The following theorem is a simplified version of [Vil03, Th. 2.12] which summarizes results of Knott-Smith (for the first point) and Brenier [Bre91] (for the remaining points).
2. If µ is absolutely continuous with respect to the Lebesgue measure, there is a unique optimal transport plan γ for (5); it has the form γ = (I × ∇u) μ, where u : = ∇u is the unique transport map solution to (4), and supp(ν) = ∇u(supp(u)).
3. If both µ and ν are absolutely continuous w.r.t. the Lebesgue measure, then ∇u • ∇u * (y) = y ν − a.e. y ∈ R n , and ∇u * is the unique optimal transport map from ν to µ.
Remark 2.2. In Theorem 2.1, second item, T = ∇u may be regarded as a map T : R n → R n defined almost everywhere. It is actually the uniqueness of the restriction of T (or ∇u) to supp(µ) which is asserted. It is obviously possible to modify T (or u) on R n \ supp(µ) without changing the optimality of T . In Section 3, we use this fact to choose a solution with a "good" behavior outside X = supp(µ).
Remark 2.3. It is convenient to consider (6) in terms of the subdifferential ∂u of u. We say that The subdifferential is a multivalued function which extends the gradient in the sense that u is differentiable at x iff ∂u(x) is single-valued, in which case ∂u(x) = {∇u(x)} (see [Roc83] for more detail). We deduce from (6) that the closure of ∂u(R n ) contains supp(ν). In general, ∂u(R n ) is not closed, but one may show that supp(ν) ⊆ ∂u(R n ). Indeed, given y ∈ supp(ν), it suffices to consider x n ∈ supp(µ) such that ∇u(x n ) → y; by the compactness of supp(µ), there is a subsequence of (x n ) n∈N which converges to some x ∈ supp(µ), passing to the limit in inequality (7) we get y ∈ ∂u(x).

Brenier Solutions
The connection between the Monge-Ampere equation and the optimal transport problem was pointed out formally in [Bre91], and stated more precisely in [McC97]. The convex function u of Theorem 2.1 has an (Aleksandrov) second derivative at almost every point of the interior of its domain, and if both µ and ν are absolutely continuous (with respective densities f and g), Together with (6), that property tells that u is in some generalized sense a solution to (MA-BV2). This motivates the following definition.
Definition 2.4. Let µ and ν be two probability measures on R n with compact support, such that µ is absolutely continuous w.r.t. the Lebesgue measure. We say that u : R n → R ∪ {+∞} is a Brenier solution to (MA-BV2) if it is convex lower semi-continuous and (∇u) µ = ν.

Aleksandrov Solutions and Semi-Discrete OT
Aleksandrov solutions are useful to tackle the Semi-Discrete Optimal Transport problem, which corresponds to the situation when one of the two measures is an empirical measure, the other being absolutely continuous.
Definition 2.5 (Aleksandrov solution). Let µ and ν be two compactly supported probability measures on R n . A convex l.s.c. function u is an Aleksandrov solution of (MA-BV2) if for every measurable set E ∈ R 2 , ν(∂u(E)) = µ(E).
When µ is absolutely continuous w.r.t. the Lebesgue measure and supp ν is convex, Aleksandrov and Brenier solutions coincide (see [FL09]).
If µ is absolutely continuous and the target measure ν has only atoms, for instance ν = Conversely, if the source measure is an empirical measure µ = N i=1 f i δ x i then it is not possible to define a Brenier map, but the Aleksandrov solution u still makes sense and satisfieŝ or equivalently for all iˆ∂ The mass concentrated at the Dirac locations is mapped to cells corresponding to the subgradients. The Semi-Discrete numerical approach is based on solving system (10) using a Newton method and fast computations of the subgradients ∂u(x i ). These subgradients are known as Laguerre cells tesselation in computational geometry [Mér11,Lév15a]. It should be noted that u * , its Legendre Fenchel transform is a Brenier solution for the reverse mapping ν to µ. For more on the duality properties of Semi-Discrete OT see [BFO14].

Regularity
Based on Brenier solutions Caffarelli has developed a regularity theory for MA. In particular, when Y is convex, the following result holds 1 : Theorem 2.6 ( [Caf92,Caf96]). Let X, Y ⊆ R n be two bounded open sets, f : X → R + , g : Y → R + be two probability densities, bounded away from zero and infinity respectively on X and Y , i.e.
and let T = ∇u : X → Y be the optimal transport maps sending f to g. If Y is convex, then (i) T ∈ C 0,α loc (X) for some α > 0.
(iii) If f ∈ C k,β loc (X), g ∈ C k,β loc (Y ) and both X and Y are smooth and uniformly convex, then T : X → Y is a global diffeomorphism of class C k+1,β .
In the following, we shall assume that f and g are bounded away from zero and infinity. The first conclusion of the theorem thus implies that u ∈ C 1,α loc (X).
We finally recall a slightly more general regularity result due to Figalli and Loeper wich does not require lower bounds on the source density f and which holds in the plane (n = 2) : Theorem 2.7 ( [FL09, Theorem 2.1]). Let X, Y ⊆ R 2 be two bounded open sets, Y convex, f : X → R + , g : Y → R + be two probability densities, such that there exist λ > 0 with f ≤ 1 λ in X and λ ≤ g in Y . Let u : R 2 → R be a Brenier solution to (MA-BV2) such that ∂u(R 2 ) = Y .

Definition
From now on, we consider X ⊆ R 2 , Y ⊆ R 2 two bounded open sets, and we assume that Y is convex. We assume that the probability densities f , g are bounded away from zero and infinity respectively on X and Y (see (11)). We note in the following Proposition that the property ∂u(R 2 ) = Y assumed by Theorem 2.7 allows to single out a particular Brenier solution to the Monge-Ampère problem.
Remark 3.2 (Uniqueness). As the Brenier map is unique, note that uniqueness of the potential u up to a constant carries over to the other notions of solution recalled in Section 2.
Proof. We first prove uniqueness. Let u 1 , u 2 be two such functions. Then the gradients of their Legendre-Fenchel conjugates u * 1 and u * 2 solve the optimal transport problem from ν to µ, and by Theorem 2.1, ∇u * 1 (y) = ∇u * 2 (y) for a.e. y ∈ Y . As a result, there is some C ∈ R such that those two convex functions satisfy u * 1 (y) = u * 2 (y) + C for all y ∈ Y . Let us prove that the equality also holds on ∂Y . Let y 0 ∈ ∂Y , y 1 ∈ Y . By lower semi-continuity and convexity The converse inequality is obtained by swapping the role of u * 1 and u * 2 , and the equality is proved. Now, since u 1 , u 2 are proper convex lower semi-continuous, we know from [Roc83] that where ri(dom u * i ) refers to the relative interior of dom u * i , the effective domain of u * i (see [Roc83]).
The double conjugate reconstruction formula then yields To prove the existence ofũ, let u 0 be a Brenier solution to the Monge-Ampère problem (MA-BV2). We define where u * 0 is the Legendre-Fenchel conjugate of u 0 .

= sup
It is immediate thatũ is convex lower semi-continuous. From Theorem 2.6, we know that u 0 is a convex function which is C 1,α loc in X. Moreover, u 0 is continuous up to ∂X since, for all x ∈ X, ∇u 0 (x) ∈ Y which is a bounded set. As the supremum of a (finite) upper semi-continuous function on the compact set Y ,ũ is finite on R 2 . Now, we prove thatũ(x) = u 0 (x) for all x ∈ X. Since u 0 is proper convex l.s.c., for a.e. x ∈ X, But since ∇u 0 (x) ∈ Y , x, ∇u 0 (x) − u * 0 (∇u 0 (x)) ≤ sup y∈Y ( x, y − u * 0 (y)), and the above inequality is in fact an equality. That equality holds almost everywhere in the open set X hence in fact everywhere.
Remark 3.3. From (12), one may observe thatũ(x) ≤ u 0 (x) for all x ∈ R 2 . It is the minimal convex extension of u 0 outside X in the sense that it is the smallest convex function defined on R 2 which coincides with u 0 on X. Additionally, it is minimal among all Brenier solutions in the sense that its subdifferential is the smallest possible. Indeed, by Remark 2.3, any Brenier solution satisfies ∂u(R 2 ) ⊇ Y .

The affine ray property
The aim of this section is to give some insight on the behavior ofũ outside X, which helps motivate the discrete scheme of Section 4. In the following, co(X) denotes the closed convex hull of X.
Proposition 3.4. The functionũ has the following properties.
(i) For all x ∈ R 2 \ co(X), there exists x 0 ∈ ∂ co(X) such thatũ is affine on the half-line Remark 3.5. The condition ∇ũ(x) ∈ argmax y∈Y y, x − x 0 actually means that x − x 0 is in N Y (y), the normal cone to Y at y = ∇ũ(x). In fact, all the points x such that x − x 0 ∈ N Y (y) are mapped to the same gradient value y = ∇ũ(x), andũ is affine on that set. Figure 1 provides an illustration of Proposition 3.4 and Remark 3.5. The connected blue regions in the source space represent points which are mapped to the same gradient value. Such regions are invariant by translation by N Y (y). In co(X) \ X (here X is not convex) points are mapped into the interior of Y , onto a line segment (black dashed line) which is a discontinuity set for the gradient of the inverse optimal transport map. Gradients are also constant on some convex sets corresponding to the subgradients of the inverse optimal map on this line segment. These regions are naturally connected to X. The behavior of optimal transport maps in the case of non-convex supports is analysed in depth in [Fig09].
Proof. In view of the uniqueness (up to a constant) stated in Proposition 3.1, we may assume without loss of generality that the function u 0 used in the construction ofũ (see (12)) is convex lower semi-continuous and satisfies as this does not change its being a Brenier solution. Let y 0 ∈ Y be a slope, and x ∈ R 2 . Then y 0 is optimal for x in (12) iff as there is a point y ∈ Y where χ Y is continuous and u * 0 is finite. In the above equation, χ Y and N Y respectively stand for the characteristic function and the normal cone of Y at y 0 , Equation (15) is equivalent to the existence of some Moreover, sets of the form (15) cover the whole space R 2 , since by compactness and semicontinuity, there always exists an optimal y 0 for (15). Incidentally that slope is in fact ∇ũ(x) since, provided y 0 is optimal for x, To summarize, we have proved Remark 3.6. Another point of view, using standard tools of convex analysis (see e.g. [Roc83, Th.

16.4]) is to interpretũ as an infimal convolution
where σ Y : x → sup y∈Y y, x is the support function of Y .

Discussion
From Proposition 3.4, we see that the functionũ defined in Proposition 3.1 formally satisfies the equations Indeed, (18) is the Monge-Ampère equation, (19) and (20) follow from the affine property in Proposition 3.4. Ifũ is smooth and the minimum eigenvalue of D 2 u is null, this also enforces convexity (see [Obe08] for the connection with the convex enveloppe problem). As for (21), The discretization strategy is presented in Section 4 and then the convergence proof in Section 5. The convergence will hold in the Aleksandrov/Brenier setting but the limit solution regularity itself will depend on the the regularity of f and g.
Remark 3.7 (Uniqueness). It is not difficult to show thatũ is a viscosity solutions of equations (18-21) but it is is much harder to prove uniqueness for this class of equations, see [CIL92] and its references to oblique boundary conditions . However,ũ coincide with the unique Brenier solution on X and the R 2 extension is also unique. More precisely (see remark 3.2)ũ is unique up to a constant.

Finite Difference Discretization
This section explains how our scheme is built from the set of the equations of Section 3.3 and discuss the properties of the resulting discrete system.
We will consider a sequence of discretization steps (h n ) n∈N , h n > 0, h n 0 + , and we define an infinite lattice of points G n def.
= h n Z 2 . We work in a compact square domain D ⊆ R 2 (say D = [−1, 1] 2 ) which contains X in its interior. We assume without loss of generality that 0 ∈ X. A discrete solution U n ∈ R card(Gn∩D) is defined on that grid : if u is a continuous solution of our problem, its discrete interpolant on the grid is We will use the following finite differences formulae in each grid direction e, . We say that a vector e ∈ Z 2 is irreducible if it has coprime coordinates.

Discretization of the target Y
As it is defined on a finite grid, our discrete scheme is only able to estimate the directional gradient in a finite number of directions. Hence, the constraint we can impose in practice when discretizing (21) is that the gradient belongs to a polygonal approximation Y n to Y . More precisely given a finite family of irreducible vectors, V n ⊆ Z 2 \ {0}, we consider the corresponding set which is nonempty, closed and convex. We assume that 4. The following inclusion holds for n large enough The third point holds as soon as Y is defined by a finite number of inequalities of the form (22) (in which case the constraint ∇u(x) ∈ Y is exactly imposed) or provided that e/ |e| ; e ∈ n∈N V n is dense in S 1 . The fourth point will be useful in Proposition 4.3 to ensure that the variations of U n inside X are bounded by σ Yn . A general strategy to ensure the above four assumptions is to choose where ρ n → +∞ and ρ n h n → 0 as n → +∞ (for instance ρ n = 1/ √ h n ). The result of this approximation strategy is illustrated in Figure 2 for three values ρ n ∈ {1, 2, 3}. It appears that ρ n = 3 already yields a sharp polygonal approximation of the considered ellipse.
In any case, the first three points above ensure that Y n converges towards Y in the sense of the Hausdorff topology [Sch93, Lemma 1.8.1], that is True boundary  (22) for the values ρ n ∈ {1, 2, 3} (the true boundary is shown in dashed red line). The ellipse is obtained by the rotation with angle π/3 of (y 1 , y 2 ) ∈ R 2 ; 1 a 2 y 2 1 + 1 b 2 y 2 2 = 1 with a = 2, b = 1. Right: the corresponding irreducible vectors used in V n .

Discretization of the Monge-Ampère operator in X
We use the (MA-LBR) scheme for the Monge-Ampère operator det(D 2 u) discretization [BCM16]. It relies on the notion of superbase : Definition 4.1. A basis of Z 2 is a pair (e , e ) ∈ (Z 2 ) 2 such that | det(e , e )| = 1. A superbase of Z 2 is a triplet (e, e , e ) ∈ (Z 2 ) 3 such that e + e + e = 0, and (e , e ) is a basis of Z 2 .
The finite difference MA-LBR operator, is a consistent and monotone approximation of the Monge-Ampère operator given by Note that the minimum in (26) is in fact restricted to superbases (e, e , e ) such that (that is, both x + h n e and x − h n e belong to D and similarly for e and e ). The number N of such superbases is a priori very large, but the adaptive algorithm proposed in [BCM16] yields a dramatic speed-up as it is asymptotically sufficient to test log(N ) superbases out of N . We use this refinement in our simulations and we refer to [BCM16] for the detail of the adaptive algorithm. Also, it is shown in [BCM16] that the largest width of the vectors in the optimal superbase grows with the condition number of the Hessian of u. In practice, it is therefore possible to limit the minimization to a stencil defined by its width, i.e. vectors on the grid Z × Z with a given maximum norm. The scheme consistency is remarkable. Given a quadratic form u(x) = 1 2 M x, x , where M a strictly positive definite matrix with condition number κ, its grid interpolation U n satisfies provided that {x, x ± h n e, x ± h n e , x ± h n e } ⊆ D for all superbases as in Definition 4.1 such that e 2 ≤ κ √ 2 (see [BCM16]). The MA-LBR operator also provides interesting "discrete" convexity properties (see [BCM16] and proposition A.3 in [Mir16a]). That notion is used in Appendix A to study the finite difference approximation of the gradient.
Finally the following property states that the MA-LBR operator overestimates the subgradient of the convex envelope of U n at grid points. The proof was communicated to us by J.-M Mirebeau, and we reproduce it in Appendix B with his kind permission. This result will be useful in the convergence proof.
Lemma 4.2. Let u : R 2 → R∪{+∞} be the largest convex lower semi-continuous function which minorizes U n [x] at all points x ∈ G n ∩ D. Then, where ω n = h 2 n is the area of one cell of the grid.

Discretization of the Monge-Ampère operator and the BV2 conditions in D \ X
The MA-LBR scheme is only suitable to discretize strictly convex functions. When at least one eigenvalue decreases to 0, the stencil width (see above) becomes infinite. To capture the flat behavior of solutions in R 2 \ X, we need to add more and more directions to the minimization. Instead, we apply the Wide-Stencil (WS) formulation proposed by Oberman in [Obe08] to discretize (20), that is, the minimum eigenvalue of the Hessian should be 0. This simply yields the scheme where V (x) denotes the set of irreducible vectors e ∈ Z 2 \ {0} such that {x − e, x, x + e} ⊆ D Additionally, we take advantage of the points in D \ X to impose the (BV2) boundary condition, by modifying the scheme (30) as follows : where for all x ∈ G n ∩ D, The rationale of that scheme comes from (20) and (21): for fixed e ∈ R 2 , imposing∆ hn e U n [x] = 0 is consistent with ∇u(x), e = σ Y (e) if x ∈ ∂D and e points outwards D.
The formal consistency with with (20) and (21) is straightforward. In pratice, the same stencil can be used for V (x), i.e. discretization of the degenerate Monge-Ampère operator and V n , i.e. the discretization of the target geometry.

Gradient Approximation
Except when g is the constant density on Y , one needs to discretize the gradient ∇u in order to discretize the Monge-Ampère equation (18).
In Section 5, we prove the convergence of the scheme as n → +∞. The main assumption to obtain this convergence is that the discrete gradient D 1 n U n satisfies the following uniform convergence whereũ n is the continuous interpolation of U n defined in the next Section (Eq. (44)) and U n are solutions of our discrete scheme summarized with his properties in section 4.5. Provided a solution U n to the scheme exists, Theorem 5.5 then ensures that its interpolationũ n converges towards the minimal Brenier solutionũ such thatũ(0) = 0.
In particular, we prove in Appendix A that (32) holds for • the centered finite difference on the cartesian grid, • the forward and backward finite differences on the cartesian grid, .
Please note that the proof strongly depends on the specific properties of our construction (discrete-convexity, boundedness of the gradient, convergence to a C 1 function. . . ), and that (32) should not hold in general for U n an arbitrary sequence of discrete functions. Another approach in the framework of Hamilton-Jacobi equations and conservation laws [LeV92] the simplest and classic correction is to use a Lax-Friedrich style regularisation. Setting F (x, q 1 , q 2 ) = f (x) g(q 1 ,q 2 ) the first order term in the Monge-Ampère equation is discretized as The construction ensures that the derivatives of the scheme with respect to the δ hn e remain positive and hence preserve the degenerate ellipticity. This formulation allows to prove the convergence of the Non-linear Newton solver the price to pay is the introduction of artificial diffusion wich has no meaning in our Monge-Ampère equation.

Summary of the scheme and property of the discrete system
Finally, for fixed n ∈ N, we plan on computing U n solution of ∀x ∈ G n ∩ D,  3. For all x ∈ G n ∩ D, e ∈ V n and (k, ) ∈ N 2 , such that k ≤ , whenever x + ih n e ∈ D for i ∈ {k, k + 1, , + 1}.
4. If x ∈ D and e ∈ V n irreducible are such that U n [x + h n e] − U n [x] = σ Y (h n e), then U n [x + kh n e] = U n [x] + kσ Y (h n e) for all integer k ≥ 0 such that x + kh n e ∈ D.
5. There exists C > 0 (independent of n) such that for all x, x ∈ G n ∩ D, Proof. The first point follows from MA n (U n )[x] =f n(x) g((D 1 n Un)[x]) > 0 and the inequality h(a, b, c) ≤ min{ab, bc, ca} (see [BCM16]). As for the second, point it follows immediately from the definition of∆ hn e U n and the scheme in D \ X.
= max(k, 0). Hence (38) holds with C = max{σ Y (e 1 ), σ Y (−e 1 ), σ Y (e 2 ), σ Y (−e 2 )}. This condition is in particular necessary to satisfy the general convergence result of Barles and Souganidis [BS91] towards viscosity solutions as discussed in the introduction. To the best of our knowledge, there are only two instances of DE scheme for the Monge-Ampère operator : the Wide-Stencil and MA-LBR schemes. The BV2 discretization also satisfies this conditions.
Another useful result in [Obe06] (Th. 8) is : A DE Lipschitz and proper scheme is well defined and has a unique solution. These properties can be checked on (36) when g is the uniform density. Otherwise and even when g is uniformly Lipschitz, the inner gradient approximation destroys the DE of the scheme. A fix proposed by Froese and Oberman [FO13] is to remark that a uniform positive lower bound on the Monge-Ampère operator is sufficient to dominate the DE defect potentially induced by the gradient approximation. Unfortunately this bound is not straightforward to establish (Froese and Oberman impose it by truncating their scheme) and well-posedness for our scheme remains open in the general case.
Newton solver. As in [BCM16,BFO14,Mir15,LR05,Mér11,Lév15b] we use a damped Newton algorithm to solve the system (36). For the description of the algorithm and proof of its convergence we refer to the papers above. The crucial ingredient is to prove global convergence of Newton iterates is the invertibility of the Jacobian matrix of the system. See for example [Mir15] where the case of Dirichlet boundary conditions is treated.
In our case the invertibility of the Jacobian remains an open problem, regarding : • The proposed discretization of the BV2 condition.
• The non constant g density case In practice and in the implementation, the system (36) is unchanged but for the residual inversion in the Newton procedure we use an inexact Jacobian which preserves diagonal dominance as follows.
For each line x of the Jacobian : set dF ) and add coefficients of an upwind type discretization on the classical five point stencil : At column x + (0, 1) h : On the diagonal : Gn + Gs + Gw + Ge Finally, g needs to be defined globally in case the iterate generate gradients outside of the target Y .

Convergence
We now assume that for all discretization steps (h n ) n∈N , h n > 0, h n 0 + , there exists a solution (U n [x]) x∈Gn∩D , and we proceed to show that it converges to the Brenier solution of the problem.
The proof is articulated along the following steps.
1. We build functionsũ n which "interpolate" the values (U n [x]) x∈Gn and which converge (along a subsequence) towards some funtion v as n → +∞ (Prop. 5.1 and Lemma 5.2).
2. The functionũ n is an Aleksandrov solution for a semi-discrete OT problem between some measure discrete measure µ n supported on the grid G n and some absolutely continuous measure ν n (Lemma 5.3).
3. The measures µ n and ν n respectively converge to absolutely continuous measuresμ andν, and v is a minimal Brenier solution for the transport betweenμ andν (Lemma 5.3).

Convex extension as an interpolation and its properties
Let us recall that we assume that g is continuous on Y , whereas f is only Lebesgue integrable. We will assume that there exists (α f , β f , α g , β g ) ∈ (0, +∞) 4 , such that ∀y ∈ Y , α g ≤ g(y) ≤ β g .
In fact, possibly changing g in R 2 \ Y , it is not restrictive to assume that g is continuous on a neighborhood of Y 0 ⊇ Y and that (42) holds for all y in that neighborhood.
From the values (U n [x]) x∈Gn∩D of the discrete problem (see Section 4.5), we build the following functionũ n : R 2 → R, with The following proposition gathers some properties which typically hold with such constructions (see also [CMOB15]).
Proposition 5.1. The following properties hold.
1.ũ n is convex, finite-valued, and by constructionũ n (x) ≤ U n [x] for all x ∈ G n ∩ D.
Proof. The first point is left to the reader. To prove the second one, consider any slope y ∈ Y n . There is some x ∈ G n ∩ D which minimizes x → U n [x ] − y, x over G n ∩ D (let α denote the corresponding minimum). Then the affine function L : for all x ∈ G n ∩ D, and L(x) = U n (x). Hence, by construction ofũ n , L(x ) ≤ũ n (x ) for all x ∈ R 2 , and L(x) =ũ n (x), which means that y ∈ ∂ũ n (x). As a result Y n ⊆ ∂ũ n (G n ∩ D). The fact that ∂ũ n (R 2 ) ⊆ Y n follows from ∂ũ n (R 2 ) ⊆ dom(ũ * n ) = dom(U n + χ Yn ). Now, let us prove the third point. Assume by contradiction that |∂ũ n (x)| > 0. Since ∂ũ n (x) is convex, it must have nonempty interior, hence there is some y 0 ∈ Y n , r > 0 such that Hence, for α ∈ (0, 1] small enough, the affine function L : x →ũ n (x) + y 0 , x − x + αrh satisfies L(x ) ≤ U n [x ] for all x ∈ G n ∩ D, andũ n (x) < L(x) which contradicts the definition ofũ n .
The next Lemma describes more specific properties ofũ n which follow from the construction of U n .
Lemma 5.2. The family {ũ n } n∈N ⊆ C (R 2 ) is relatively compact for the topology of uniform convergence on compact sets. Moreover, with ω n def.
Proof. We observe that (ũ n ) n∈N is uniformly equicontinuous by Proposition 5.1 and the fact that Y n ⊆ Y 0 . Moreover, from Proposition 4.3 and U n (0) = 0, we deduce that As a result,ũ n (0) ∈ [−C sup x ∈D x 1 , 0]. We deduce the claimed compactness by applying the Ascoli-Arzelà theorem. The inequality (46) follows from Lemma 4.2. Indeed, let u : R 2 → R ∪ {+∞} be the largest convex l.s.c. function which minorizes U n [x] at all points x ∈ G n ∩ D. Ifũ n (x) < U n [x], then |∂ũ n (x)| = 0 and there is nothing to prove. Otherwise,ũ n (x) = u(x) = U n [x] andũ n ≤ u imply that ∂ũ n (x) ⊆ ∂u(x). As a result, , then |∂ũ n (x)| = 0, so we assume thatũ n (x) = U n [x]. Assuming by contradiction that |∂ũ n (x)| > 0 there must exist again which is impossible. On the other hand, if x + h n e / ∈ D (the case x − h n e / ∈ D is similar), then x − h n e ∈ D and U n [x − h n e] − U n [x] = −h n σ Y (e). The slope ofũ n is monotone in the direction e, and it cannot exceed σ Y (e/ |e|), henceũ n (x + te) =ũ n (x) + tσ Y (e) for t ≥ −h n . But this contradicts the inequalityũ n (x + te) ≥ũ n (x) + t y 0 , e + r |t| e .

A semi-discrete optimal transport problem
Let us define the following measures f n is the grid discretization of f defined in section 4.5 and ω n = h 2 n is the area of one cell of the grid.
Proof. Let us consider the transport plan γ n ∈ M(R 2 × R 2 ) We note that γ n is an optimal transport plan between µ n and ν n since its support is contained in the graph of ∂ũ n (see Theorem 2.1). Moreover, we observe that where the right hand-side converges towards µ in the weak-* topology (i.e. induced by compactly supported continuous test functions). Additionally, since we deduce that the supports of µ n , ν n , γ n are respectively contained in the compact sets X, Y n , X × Y n , and their masses are uniformly bounded. Hence, there exist Radon measuresμ, ν ∈ M(R 2 ),γ ∈ M(R 2 × R 2 ) such that, up to the extraction of a (not relabeled) subsequence, µ n , ν n and γ n respectively converge toμ,ν,γ in the weak-* topology. We note thatγ has respective marginalsμ andν, andμ,ν have densities with respect to the Lebesgue measuref , g which satisfyf ≤ f andĝ ≤ β g 1 Y .
As forũ n , we already know from Lemma 5.2 that we may extract an additional subsequence so thatũ n converges uniformly on compact sets to some (convex function) v ∈ C (R 2 ). Since any element of suppγ is the limit of (a subsequence of) elements (x n , y n ) of supp γ n ⊆ ∂ũ n , passing to the limit in the subdifferential inequality ∀x ∈ R 2 ,ũ n (x ) ≥ũ n (x n ) + y n , x − x n , we obtain suppγ ⊆ graph ∂v.
As a result, and sinceμ is absolutely continuous with respect to the Lebesgue measure L 2 , γ = (I × ∇v) μ and ∇v is the optimal transport map betweenμ andν.
Proof. For all y ∈ Y ⊆ Y n , there exists x ∈ G n ∩ D such that y ∈ ∂ũ n (x). Except for a set of y with zero Lebesgue measure, that x is unique [Gut01, Lemma 1.1.12], and x ∈ X (see Lemma 5.2). Then, denoting by ω g the modulus of continuity of g over some neighborhood of Y 0 , we get as n → +∞, where the convergence of the right-hand side follows from (32). As a result the following convergence holds We deduce thatĝ ≡ g on Y and from (53) and (25) we obtain Since,f ≤ f , we get f =f .
To sum up, we are now in position to prove the following result.
Proof. Up to the extraction of a sequence, we obtain from Lemma 5.3 and 5.4 thatũ n converges uniformly on compact sets towards some convex function v ∈ C 1 (R 2 ) which satisfies in the Brenier sense det(D 2 v) = f (x)/g(∇v(x)), and ∂v(R 2 ) = Y .
However, the function v may depend on the choice of the subsequence. By Proposition 3.1 which states the uniqueness of the minimal Brenier solution up to an additive constant, we can prove that v =ũ by proving that v(0) = 0. Then we obtain that the full family (ũ n ) n∈N converges towardsũ by uniqueness of the cluster point. We know that v is a solution to the Monge-Ampère equation, hence (in our setting) also in the Aleksandrov sense. As a result, for all r > 0,´B (0,r) f dL 2 =´∂ v(B(0,r)) gdL 2 , and 0 < α f |B(0, r)| ≤ β g |∂v(B(0, r))| .  (B(0, r) As a result, for all n large enough, there exists x n,r ∈ B(0, r) ∩ G n such that |∂ũ n (x n,r )| > 0, and thus, by Proposition 5.1,ũ n (x n,r ) = U n [x n,r ]. By a diagonal argument we construct a sequence x * n such that x * n → 0 andũ n (x * n ) = U n [x * n ]. By Proposition 4.3 Passing to the limit n → +∞, we get v(0) = 0. As a result v =ũ and the full family converges towardsũ.

Numerical study
The numerical method, as we implemented it, depends on two main parameters : • h the step size of the cartesian grid discretizing the square D wich contains the support of f .
• The stencil width, i.e. the maximum norm of the vectors on the grid Z 2 which will be used in the variational schemes. More precisely we use superbases with vector with maximum norm given by the stencil width in (26), we use the same set of vectors for V (x) in (26) and finally we also use the same vectors to define V n in (22) for the discretization of the target.
There are also parameters linked to the precision of the damped Newton algorithm, but they have a very limited impact on the efficiency of the method.
We run the Newton method until the residual is stationary in ∞ norm which usually takes a few dozen interations. The value of the residual depends on the discretization parameter but in all of our experiments was always between 1e-10 and 1e-15.
The target density g is defined in all R 2 using a constant extension. In practice we initialized with a potential u which gradient maps inside the target Y but it cannot be excluded to hit out along the iterations.
6.1 Imposing the constraint U n [0] = 0 A straightforward implementation of (36) yields a non-square nonlinear system. When performing the Newton iterations with a Moore-Penrose pseudo-inverse of the Jacobian, we have observed convergence towards a function which approximately solves the Monge-Ampère equation in X, but yields a stronger error in D \ X. In other words, is small in X, but may be large in some parts of D \ X (see Figure 3, left). An alternative way to impose the condition U n [0] = 0 is to rely on the mass conservation property. Instead of trying to cancel (E[x]) x∈Gn∩D and U n [0] separately, one tries to cancel (E[x] + U n [0]) x∈Gn∩D . In principle, the mass conservation then implies that U n [0] = 0. However, at the discrete level, the mass conservation is only approximate, since the densities and the target Y are discretized. As a result U n [x 0 ] is not exactly 0 in practice.
Numerically, we have noticed that this alternative approach has better convergence properties, and yields an error which is smaller and uniformly spread (see Figure 3, right). In fact, up to some small residual, the error E[X] is uniformly equal to U n [0] which is slightly different from zero because of the inexact mass conservation. Although the straightforward approach has similar consistancy properties as the alternative one when h → 0, in the following we describe the numerical results of the alternative approach, as it shows better numerical properties. We refer to the quantity (E[x] + U n [0]) x∈Gn∩D as the residual error Res.

Experiment with h
A standard test is to optimally map a uniform density square to a uniform density circle. We use here a stencil width of 5 (i.e. we use vectors on the grid Z 2 of maximum ∞ norm 5).    On a standard Laptop a non optimized Julia implementation needs less than 5 minutes to solve the 512 × 512 case and 1.5 hours for the 1024 × 1024 case.

Experiments with stencil width
In all this section N = 128, and we vary the stencil width, meaning the number of vector on the grid used in scheme. The target is an heptagon whose normals directions are not necessarily vectors in the stencil. Table 2 shows, for increasing values of the stencil width , the number of iterations the norms of the reached residuals and the value of the U n [0], which decreases as the accuracy of the discretization of Y improves. Figure 5 to 7 shows the deformation of the computational grid under the gradient map and zooms. They show how the geometry of the computed target improves with the domain discretization which also depends on the stencil width. Notice again that grid volume is well preserved and that all grid points in D \ X are collapsed onto the boudary of Y .
In Section 6.6 below, we provide another experiment which shows that the MA-LBR schemes uses very small stencil widths.

Experiments with inhomogeneous source density f
We start with a test case taken from [BFO14] and for which the anaytical solutions is given. Set q(z) = − 1 8π z 2 + 1 256π 3 + 1 32π cos(8πz) + 1 32π 2 z sin(8πz) The map from the density defined on the square (−0.5, 0.5) 2 onto a uniform density in the same square has the exact solution In table 3   In the next example, the heterogeneous densities f mapped to a constant density ball Y , N = 128 and stencil width is 5 . Figures 8 and 9 show heterogeneous sources and the corresponding deformation of the cartesian grid.
In Figure 9 the source density is random on the square. This test case demonstrates the applicability of the method to Lebesgue integrable densities.

Experiments with inhomogeneous target densities g
In this section we show the results obtained with a constant density square X (not represented in the figures) mapped to heterogeneous densities g defined on a ball Y , N = 128 and stencil width is 5. Figure 10 shows the case of one Gaussian and a mixture of Gaussians. The boundary of the target is the red circle. The density g and its gradient need to be defined in the numerical code as functions which can be evaluated anywhere in the ambient space of Y as the map can hit anywhere including in R 2 \ Y . If g is given analytically (our case) this is easy we just extend it by a constant out of Y , otherwise one has to resort to interpolation.

Experiments with non convex and non connected sources
In this section we map constant source densities f with non convex and non connected supports to a constant density ball Y . The solution is still C 1 but is is know that the inverse mapping, i.e. the Legendre-Fenchel transform of the potential, has gradient singularities. This has been analyzed by Figalli in [Fig09]. Figures 11 to 14 show different densities an the associated map deformations. The zero densities inclusions created singular structures in the target which correspond to gradient singularities of the dual map. These structures are consistent with those predicted by Figalli. We use N = 128 and the stencil width is 5 . Eventually, in Figure 15, we display an experiment which shows the ∞ -norm of e for the optimal superbase chosen by MA n or the optimal vector chosen by MA 0 n . This illustrates the fact that the MA-LBR operator uses very compact stencils (in that case the superbase {(1, 1), (0, 1), (1, 0)} or its rotation is used everywhere in X), whereas the extension outside X uses more elongated vectors depending on where the points are mapped in Y .

Conclusion
In this work, we have proposed an novel way to impose the BV2 boundary condition for the Monge-Ampère equation in schemes such as MA-LBR. The idea consists in slightly extending the domain of the solution so as to capture the behavior of "minimal" Brenier solutions to the optimal transport problem. Our proof of convergence as the grid stepsize goes to zero does not appeal to the theory of viscosity solutions, but rather to simple arguments combined with standard optimal transport results. The numerical experiments faithfully reproduce the typical behavior of optimal transport solutions. Although numerically we have not encountered any particular difficulty with the resolution of the scheme, the existence of a solution to the discrete problem remains an open problem that we leave for future work. We have defined in Section 4, the values (U n [x]) x∈D∩Gn , and in Section 5 their "interpolation" u n using (44) as a convex function. In Lemma 5.3, we have shown that this interpolation converges (along some sequence) towards some function v ∈ C 1 (R 2 ). We wish to prove that the discrete gradients D 1,C n U n , D 1,F n U n and D 1,B n U n converge in some sense to the gradient ∇v.

A.1 Convergence of the gradient of convex functions
We begin with some general results on convex functions. The following lemma is standard, and mainly follows from the results in [HUL96, Section VI.6.2]. However, we provide a proof below, as this result is not explicitly stated there. Proof. Let us recall that the convexity of v n implies that v is convex, and v n converges uniformly on the compact sets of Ω towards v. Let ρ > 0 such that ρ ≤ 1 4 dist(K, R 2 \ Ω), and let K def.
Observe that ∂v n (K ) is bounded independently of n. Indeed, for any x ∈ K , any y n ∈ ∂v n (x ), if y = 0, v n x + ρ y n |y n | ≥ v n (x ) + ρ |y n | , and the right-hand side is uniformly bounded by uniform convergence of v n on {x ∈ Ω ; dist(x, K) ≤ 2ρ}. Now, assume by contradiction that there is some ε > 0, some (not relabeled) subsequence x n ∈ K, x n ∈ B(x n , Ch n ), and y n ∈ ∂v n (x n ) with |y n − ∇v(x n )| ≥ ε.
By compactness, there exists y ∈ R 2 , x ∈ K, such that (up to an additional extraction) y n → y and x n → x. Passing to the limit in the subgradient inequality Hence y = ∇v(x) which yields a contradiction with |y n − ∇v(x n )| ≥ ε for n large enough. This yields the claimed result. Now, we consider the finite difference scheme applied to a convex function.
Lemma A.2. Let Ω ⊆ R 2 be a nonempty open convex set, and {v n } n∈N be a family of finitevalued convex functions on Ω which converges pointwise on Ω towards some function v ∈ C 1 (Ω). def.
For all x ∈ G n ∩ D, there exists x ∈ G (I) n ∩ D such that x − x ∞ ≤ h n . Let e def.
As a result, the affine mapping L : at each x ∈ G n ∩ D, and ∇L = y ∈ Y n . We deduce thatũ n ≥ L. In particular,ũ n (x) ≥ L(x) = v There existsv ∈ C (K) such that for n large enough v (I) n ∈ C (K) and up to a subsequence, Proof. The first point follows from Proposition 4.3 which ensures that (U n [x]) n∈N is bounded uniformly in x ∈ G n ∩ D. As a result, the convex function v (I) n is uniformly bounded (hence uniformly Lipschitz) in a neighborhood of K and Ascoli-Arzelà's theorem ensures the convergence of v (I) n towards somev up to a subsequence. By Proposition A.3, we see thatũ hn must converge (pointwise) towardsv on a dense subset of K. Since K is compact and the functions are uniformly Lipschitz, the convergence is uniform on K.
Since Proposition A.3 also holds for v (II) , v (IV ) n , we deduce that those convex functions also converge uniformly (along the same subsequence) towardsv.
In Lemma 5.2, we show that {ũ n } n∈N ⊆ C (R 2 ) is precompact for the topology of uniform convergence on compact subsets of R 2 . As a consequence of Proposition A.3, ifũ hn converges along some subsequence towards some function v, then v (I) n , . . . v (IV ) n converge uniformly towards v on compact subsets of int(D), along the same subsequence. Now, in Lemma 5.3, we show that v ∈ C 1 (R 2 ). We may now state the main result of this section.
Proposition A.5. Assume that (up to a subsequence)ũ hn converges uniformly on the compacts subsets of R 2 towards some function v ∈ C 1 (R 2 ). Then, for all K ⊆ int(D) compact, We begin with the forward and backward differences defined in (34). Let e = (1, 0) or e = (0, 1). By monotonicity of the slopes (see Proposition 4.3), To fix ideas, let us assume that x ∈ G Lemma B.1. Let Ω be a convex neighborhood of a point x ∈ R d , and let u, v : Ω → R d be convex. If u ≤ v, and u(x) = v(x) then ∂ x u ⊆ ∂ x v. In particular |∂ x u| ≤ |∂ x v|.
The following Lemma is a consequence of the Brunn-Minkowski inequality (see [Sch93]).

Lemma B.2.
Let Ω be a convex neighborhood of a point x ∈ R d , and let u, v : Ω → R d be convex. Let w = (1 − t)u + tv with 0 ≤ t ≤ 1. Then (1 − t)∂ x u + t∂ x v ⊆ ∂ x w, where the sign plus denotes a Minkowski sum. In particular (1 − t)|∂ x u| We denote by u the largest convex function bounded above by u. Proposition B.3. Let X ⊆ R d be a finite point set, and let x ∈ X. Let Y ⊆ X be symmetric w.r.t. the point x, i.e. ∀y ∈ Y one has 2x − y ∈ Y . Let u : X → R, and let v : Y → R be defined by v(y) := 1 2 (u(y) + u(2x − y)). Then Proof. If u(x) > u(x), then |∂ x u(x)| = 0 and there is nothing to prove. We thus assume u(x) = u(x). Introduce the restriction u Y := u |Y , and its symmetry u −Y := u Y (2x − ·), and note that 1 The announced result follows since |∂ x u Y | = |∂ x u −Y |.
The final step to prove Lemma 4.2 is given by Remark 1.10 in [BCM16] which shows that MA n (U n )[x]ω n = |∂ x v(x)|.