A Planning Problem Combining Calculus of Variations and Optimal Transport

We consider some variants of the classical optimal transport where not only one optimizes over couplings between some variables x and y but also over some control variables governing the evolutions of these variables with time. Such a situation is motivated by an assignment problem of tasks with workers whose characteristics can evolve with time (and be controlled). We distinguish between the coupled and decoupled case. The coupled case is a standard optimal transport with the value of some optimal control problem as cost. The decoupled case is more involved since it is nonlinear in the transport plan.


Introduction
In the classical optimal transport problem, a planner has to find some coupling or transport plan between two nonnegative measures of equal total mass (say 1) so as to minimize average cost. The Monge-Kantorovich problem is then the linear program consisting in finding a cost minimizing measure having prescribed marginals on variables, say x and y. Such problems have been very much studied in recent years in particular for their various applications in PDE's, geometry, probability theory (see the recent books of Villani [15], [16])... Interestingly, transport/assigment problems have their modern roots 1 in planning problems (optimally transporting coal from mines to steel factories for instance) and the pionneering works of Kantorovich in the USSR and Koopmans in the West were awarded by the Nobel prize in economics in 1975. Typical applications are optimal allocation of resources or assignment problems; for other economic applications, we refer to the recent work of Ekeland [5], [6] and Chiappori, McCann and Nesheim [4] relating matching problems and hedonic equilibria to optimal transport theory.
If, to fix ideas, we think of a firm having to assign N workers of different skills x to N tasks of different difficulties y, the assignment problem simply consists in assigning a task to each worker (for instance assigning most difficult tasks to most skilled workers). It is however reasonable to think that, in addition to designing such an assignment, the firm has the possibility to act directly on the workers' types using some (costly) control variable like internal training. The same of course applies to the task type y that may change with time if, for example, some investments are made to make the task easier or quicker to execute. This is where optimal control comes into play. The problems we consider in the present paper are precisely intended to deal with such situations where a planner or firm has at disposal as decision variables not only the initial assignment but also control variables that affect the (possibly stochastic) dynamics of the state variables x and y. More precisely, we consider the case where the planner chooses the initial coupling between x and y, initially distributed according to respective probabilities µ 0 and ν 0 , and a control variable governing the evolutions of the characteristics x and y. It is assumed that a pair (worker/task) that is initially formed at time 0 with initial characteristics (x, y) will remain paired or married during a fixed period although its characteristics may evolve with time, this captures the idea that for instance workers are assigned a certain task during some minimal period.
We will consider two different cases that we respectively call the coupled and the decoupled case and will compare them. In the coupled case, one optimizes over initial transport plans and control for each initial pair (x, y). The coupled problem thus amounts to solve a standard optimal transport where the cost function simply is the value function of some optimal control problem. In the decoupled case, we will assume that y is constant in time and the control variable for x has to be chosen independently of y (for instance the planner offers the same training to all the workers of same skill, independently of the task they will be assigned). This case is much more involved since it is not linear (and not even lower semicontinuous) in the transport plan. Under quite general assumptions it turns out that the coupled and decoupled problems have the same value but that the latter may fail to have solutions (a counter-example will be given in section 4).
The paper is organized as follows. In section 2, the coupled problem is introduced, an existence result is proved and some cases of uniqueness are given. In section 3, we study the decoupled problem which, contrary to the coupled case, gives rise to a nonlinear optimization problem over transport plans. Finally, section 4 is devoted to a linear quadratic deterministic case that we treat in details and for which we compare the coupled and the decoupled case, we also give some counterexamples.
where W t is a standard 2d-dimensional Brownian motion on some filtered probability space and (u t ) t∈[0,T ] is an admissible control process i.e. it is a progressively measurable process with values in a control space U , assumed for simplicity to be a compact metric space. We will denote by U the set of admissible controls. We assume that b ∈ C(R 2d ×U, R 2d ), σ ∈ C(R 2d ×U, M 2d ) (M 2d denoting the space of 2d × 2d matrices with real entries) and that there is some constant K 1 such that for every (z 1 , z 2 , u) ∈ R 2d × R 2d × U one has The Lipschitz assumption (2.2) guarantees the existence and uniqueness of a strong solution to (2.1) for every initial condition Z 0 = (x, y) and every admissible control process u ∈ U. We will denote Z x,y,u t this solution in the sequel.
To each initial condition (x, y) and each admissible control process u ∈ U is associated a cost We assume that both c and φ are nonnegative and continuous and that there exist a constant K 2 and an exponent k ≥ 1 such that for every (z 1 , z 2 , u) ∈ R 2d × R 2d × U : The last ingredient in the problem is a pair of Borel probability measures µ 0 and ν 0 on R d giving respectively the distributions of the initial state for x and for y. The initial distribution γ for the joint process Z is chosen by the controller, it is a coupling between µ 0 and ν 0 and thus belongs to the set of transport plan Π(µ 0 , ν 0 ) consisting of all Borel probability measures γ on R d × R d having µ 0 and ν 0 as marginals. We shall finally assume the following on µ 0 and ν 0 : The controller's optimization problem then reads as which can simply be rewritten as the optimal transport problem: We won't discuss whether the previous optimal control admits a solution (for each fixed z). This issue relies on whether the value function is regular enough so that one can define an optimal feedback control, this is of course tightly related with the regularity for the corresponding Hamilton-Jacobi-Bellman equation. Instead, we will focus on the optimal transport formulation (2.6).

Existence
It is now easy to check that the infimum in (2.6) is attained, although the arguments are very standard in optimal transport (see [15]), we give a proof for the sake of completeness.
Theorem 2.1. Under the assumptions of paragraph 2.1, the value of (2.6) is finite and attained by some optimal transport plan.
Proof. First, it follows from assumption (2.3) and standard arguments in control theory (see for instance [8], section IV.6) that v(0, .) has polynomial growth and is continuous. More precisely, there is a constant K 3 such that and, for all (z 1 , z 2 ) ∈ R 2d × R 2d : It immediately follows from (2.7) and (2.4) that the value of (2.6) is finite.

Some cases of uniqueness
Our aim now is to exhibit some special cases where there is uniqueness of an optimal transport plan for (2.6) and this plan is given by a transport map, that is of the form γ = δ S(x) ⊗µ 0 for some measure preserving map S between µ 0 and ν 0 (which corresponds to a deterministic optimal coupling). It is indeed well-known (see [10], [12] [3]) that if the transportation cost v(0, ., .) satisfies a certain structural condition called the generalized Spence-Mirrlees or twist condition and if µ 0 is regular enough (absolutely continuous for instance) then there is uniqueness of an optimal transportation plan that is in fact given by a transport map. We won't insist here on the twist condition and the regularity required on µ 0 in general. Let us indicate some particular cases instead: • d = 1, v(0, ., .) of class C 2 (say) and the cross-derivative ∂ xy v(0, x, y) is everywhere strictly positive, in this case, as soon as µ 0 is atomless, there is a unique transport plan that is given by the unique nondecreasing map (or monotone rearrangement) pushing µ 0 forward to ν 0 , • v(0, x, y) = c 0 (x − y) with c 0 strictly convex and differentiable (see [11]) and µ 0 absolutely continuous and compactly supported, then there is also a unique optimal transport plan and the latter is given by a transport map, • of particular interest is the quadratic case c 0 (z) = |z| 2 which was first solved by Brenier [2], in this case, uniqueness of an optimal plan, once again given by a transport map, holds as soon as µ 0 and ν 0 have second moments and µ 0 does not give mass to sets of Hausdorff dimension less than d − 1 (see McCann [13]).
Uncontrolled case Let us first consider the uncontrolled case where b = b(z) is a Lipschitz vector field, σ = √ 2I 2d , the running cost c = c(z) and the terminal cost φ = φ(z) satisfy the same assumptions as in section 2.1. In this case, it is well-known that the function is the unique solution with polynomial growth of the backward linear parabolic equation As seen in the previous paragraph, in the optimal transportation problem (2.6), the cost to take into account is v(0, .), the value at time 0 of the solution of (2.10). Let us treat in details the case b = 0 and φ = 0. In this case, (2.10) is simply the heat equation Then the optimal transportation problem (2.6) reads as: In the particular case of the quadratic transportation cost c(x, y) = |x−y| 2 2 , a direct computation leads to: v(0, x, y) = T |x − y| 2 2 + 1 .
Consequently, the infimum of (2.6) is T (W 2 (µ 0 , ν 0 ) + 1) where W 2 stands for the squared-2-Wasserstein distance: It is well-known that for the quadratic cost, if µ 0 does not give mass to small sets (i.e. Borel sets of Hausdorff dimension less than d − 1) then there is a unique optimal transport plan that is in fact induced by a transport map S. Moreover S is characterized by S = ∇u with u convex (see [2], [9], [13], [11]).
Linear control and zero transportation cost Let us now consider the linear drift b(Z t , u t ) = u t with constant diffusion σ = √ 2I 2d and the quadratic running cost c(z, u) = c(u) = u 2 2 . We also assume that φ ∈ C 2 (R 2d ) and satisfies the same assumptions as before. The value function: is then the viscosity solution of the Hamilton-Jacobi-Bellman equation: Performing the usual Hopf-Cole transformation V (T − t, z) = e v(t,z)/2 , Φ := e φ/2 , transforms (2.12) into the heat equation for V : As soon as Φ has polynomial growth, this gives the explicit expression where G is again the heat kernel as in (2.11). In the unidimensional case, one may thus check whether v(0, ., .) satifies the cross-derivative criterion recalled above. Indeed, by a direct compuation, ∂ xy v(0, ., .) > 0 is equivalent to Calculus of variations Let us now consider the simple deterministic case where the variable y is constant with respect to time, where the state equation for X is simplyẊ = u and where the running and terminal costs respectively take the form and φ(x − y) where c, φ and F are bounded from below, smooth and strictly convex functions, with Λid ≥ D 2 F ≥ λid for some Λ > 0, λ > 0. Let us further assume for simplicity that µ 0 and ν 0 are compactly supported. In this case the value function is given by We then rewrite the value as v(0, x, y) = w(x − y) where It is now easy to check that w is a C 1 and strictly convex function. Then, it is well-known (see [11], [9], [3]) that the corresponding optimal transport problem inf γ∈Π(µ 0 ,ν 0 ) w(x − y)dγ(x, y) possesses a unique solution that is given by an optimal transport map S as soon as µ 0 is absolutely continuous with respect to the Lebesgue measure (see [11]). In this simple case again, we then have existence and uniqueness of an optimal plan that is further given by a transport map. We will again use this example in the decoupled variant of section 3.

Alternative formulations
We now describe formally an alternative formulation of (2.5). Given γ 0 ∈ Π(µ 0 , ν 0 ) as initial coupling of X and Y and an admissible control that we assume to be of Markovian type i.e. of the form u t = u t (X t , Y t ), let t ∈ [0, T ] and γ u t be the joint probability law of (X t , Y t ) (this is purely formal since the dependence of the Markovian control u on x, y may not be regular enough to define a flow to the SDE (2.1)): Then γ u t (again formally) solves the Fokker-Planck equation (controlled by u): (2.14) The total cost becomes linear with respect to γ u and simply reads as: which suggests as an alternative formulation of (2.5) to minimize (2.15) over initial conditions γ 0 ∈ Π(µ 0 , ν 0 ) and controls u, the dynamics of γ u being governed by the PDE (2.14).
Let us also mention that it is well-known that (2.6) admits the dual formulation: (2.16)

The decoupled case
In the coupled problem considered previously, the situation is rather simple since the problem amounts first to compute the value function and then to solve the optimal transport problem where the cost is precisely this value function. However, this requires that the optimizer is able to design a control that depends on both state variables x and y. If one thinks of x as being workers' skills and y the difficulty of some task (constant in time to fix ideas), it may be reasonable to assume in certain cases that the control governing the evolution of x (training, education...) is the same for every worker of type x independently of the task y he will be assigned. In this case, one faces a somehow decoupled variant of the previous problem. For the sake of simplicity, we will only consider in this section the deterministic case where in addition y is constant in time andẊ = u (calculus of variations). The running cost is given by some continuous Lagrangian (x, u, y) ∈ R 3d → L(x, u, y) that is convex with q-growth in the variable u for some q > 1, and the terminal cost is (x, y) ∈ R 2d → φ(x, y). We also assume for simplicity that µ 0 and ν 0 are compactly supported. In this framework, the coupled problem considered previously reads as where v(x, y) := inf Of course, (P) possessses solutions as soon as v is continuous (or more generally lower semicontinuous). In the decoupled case, X is not allowed to depend on y but has instead to minimize the conditional average of the cost. More precisely, for γ ∈ Π(µ 0 , ν 0 ), let us denote the disintegration of γ with respect to its first marginal as γ = γ x ⊗ µ 0 and then define for every x: Then, the decoupled problem is: The novel feature of this decoupled problem is that, contrary to the optimal transport problem (P), it is nonlinear. In fact, since v γ (x) is concave in γ x (as an infimum of linear functionals of γ x ), (Q) is a concave minimization problem over Π(µ 0 , ν 0 ). Existence of a minimizer is a real issue here since the criterion in (Q) may not be weakly lower semicontinuous (and in fact even the measurability of x → v γ (x) is not obvious and a counterexample to lower semicontinuity will be given in the next section). Our aim now is not to study existence or nonexistence issues in general but simply to remark that (P) turns out to be a kind of relaxation of (Q) under quite weak assumptions: Proposition 3.1. If the value function v is continuous and µ 0 is atomless, then one has inf(Q) = min(P).

Moreover, γ solves (Q) if and only if it solves (P)
and In particular, if S is an optimal transport for (P), i.e. δ S(x) ⊗ µ 0 solves (P), then it also solves (Q). Proof.
Since v γ (x) ≥ R d v(x, y)dγ x (y), one immediately deduces that inf(Q) ≥ min(P). Now, let us remark that if γ is given by a transport map that is γ = δ S(x) ⊗ µ 0 then v γ (x) = v(x, S(x)) for µ 0 -a.e. x and then in this case Since µ 0 is atomless the set of transport plans induced by transport maps is weakly- * dense in Π(µ 0 , ν 0 ) (see [1]). In particular, since v is continuous, there is a minimizing sequence for (P) of the form γ n = δ Sn ⊗ µ 0 where S n ♯µ 0 = ν 0 for every n. One then has inf(Q) ≤ v γn dµ 0 = v(x, S n (x))dµ 0 (x) and passing to the limit one gets inf(Q) ≤ min(P). Let γ ∈ Π(µ 0 , ν 0 ), γ solves (Q) if and only if that is γ solves (P) and If we consider again the case under the same convexity, growth and regularity assumptions as in the example of subsection 2.3, then we immediately deduce that if µ 0 is absolutely continuous with respect to the Lebesgue measure, then the corresponding decoupled problem admits a unique solution that is given by a transport map.

The linear-quadratic deterministic case
In this paragraph, we treat a special case where the state variable Y remains constant and the control on X is u =Ẋ. We will consider the case of a quadratic running cost and zero terminal cost: c(x, y, u) = 1 2 |x − y| 2 + 1 2 |u| 2 , φ(x, y) = 0 and we will distinguish the coupled case (where u is allowed to depend on both x and y) and the decoupled case where u depends on the initial condition x but has to be the same for every y. Of course, in this quadratic framework, we will also assume that µ 0 and ν 0 have finite second-order moments. 11
By standard arguments, this problem has unique solution given by: Replacing in v then leads to the optimal transportation problem: which is simply the optimal transportation problem with quadratic cost. Denoting by W 2 (µ 0 , ν 0 ) the squared-2-Wasserstein distance between µ 0 and ν 0 , we then have the following expression for the value of (4.2):

The decoupled case
In the decoupled case, one has to minimize with respect to γ ∈ Π(µ 0 , ν 0 ) and (t, x) → X(t, x) (independent of y) such that X(0, x) = x, the quantity By the disintegration theorem, every γ ∈ Π(µ 0 , ν 0 ) admits a disintegration with respect to its first marginal µ 0 i.e. can be written as γ = γ x ⊗ µ 0 where x → γ x is a Borel family of probability measures (naturally interpreted as conditional probability of y given x). We will denote by g γ (x) the average of γ x : Using the disintegration of γ with respect to µ 0 and the conditional moment g γ , it is convenient to rewrite the cost (4.4) as: x)| 2 − X(t, x) · g γ (x) + 1 2 |Ẋ(t, x)| 2 dt dµ 0 (x) + T 2 R d |y| 2 dν 0 (y).