A DECENTRALIZED ALGORITHM FOR A MEAN FIELD CONTROL PROBLEM OF PIECEWISE DETERMINISTIC MARKOV PROCESSES

. This paper provides a decentralized approach for the control of a population of N agents to minimize an aggregate cost. Each agent evolves independently according to a Piecewise Deterministic Markov dynamics controlled via unbounded jumps intensities. The N -agent high dimensional stochastic control problem is approximated by the limiting mean field control problem. A Lagrangian approach is proposed. Although the mean field control problem is not convex, it is proved to achieve zero duality gap. A stochastic version of the Uzawa algorithm is shown to converge to the primal solution. At each dual iteration of the algorithm, each agent solves its own small dimensional sub problem by means of the Dynamic Programming Principal, while the dual multiplier is updated according to the aggregate response of the agents. Finally, this algorithm is used in a numerical simulation to coordinate the charging of a large fleet of electric vehicles in order to track a target consumption profile.


Introduction
This paper focuses on a specific family of stochastic control problems called "large and aggregative".The problem consists in optimizing the strategies of a large number of agents, in a random environment, while the decisions of all the agents interact via the objective function because the latter depends on an "aggregative" term as the sum of the decisions of all the agents.This type of problem occurs naturally in several fields of application such as energy management [19], telecommunications [23], portfolio management [21], robot coordination [6] among others.
To deal with the curse of dimensionality in such high dimensional stochastic control problems, methods based on the value function have been developed in the convex setting.For instance, Stochastic Dual Dynamic Programming [24] relies on polyhedral approximations of the value function providing good performances for medium sized problems with a number of agents N ≤ 30.In our case, where the number of agents is very large N ≥ 1000, we look more specifically for decentralized solutions, allowing both to respect the privacy of each agent and to reduce the complexity of the problem.In the particular setting, where controls are bounded and strong convexity conditions are satisfied, [28] proposed an iterative stochastic algorithm providing a decentralized solution to such problems.In the same spirit as the Dual Approximate Dynamic Programming [7], this approach relies on a Lagrangian decomposition technique to obtain a decentralized solution, but it takes advantage of a mean field approximation to ensure the validity of the approach when the number of agents is sufficiently large.
This paper aims at extending the decentralized approach developed in [28] to a non-convex framework, with unbounded controls, allowing for more realistic dynamics ruling the evolution of each agent state.In particular, we are interested in the aggregative control of Piecewise Deterministic Markov Processes (PDMPs for short), where the state of each agent is modelled by a PDMP controlled by an unbounded jump intensity.PDMPs were introduced in [10] as a class of non-diffusion stochastic models, mixing random jumps and deterministic dynamics between jumps.This kind of process is used to model a wide broad of phenomena or situations, such as system reliability and maintenance [12,32], oil production [33], biology [22,26] etc..The approach proposed in this paper results in a decentralized algorithm, where at each iteration, each agent is intended to solve its own small dimensional optimal control problem of PDMP.This type of problem is well known and has been studied through dynamic programming and related equations [2,9,11,18,31].Due to the large number of agents, the problem is formulated as a Mean Field Control (MFC for short) problem.Relying on the law of large numbers, the MFC theory consists in controlling a representative agent of the population, where the interaction between the agents is approximated by the interaction of this representative agent with the probability distribution of its state or control [5].
This work is particularly motivated by demand-side management in power systems to help balancing between energy production and consumption.This problem is indeed critical today [30], due to the increasing share of uncontrollable energies (wind and solar) in the electricity generation mix, which requires to compensate the uncontrollable character of the production by controlling the demand.Controlling the sum of the consumption of each flexible consumer in order to balance the electrical system has already been investigated successfully in the specific framework of Quadratic Kullback-Leibler control problems [4].In [29] and [20], a mean field assumption is also considered to control the charging of a large fleet of electrical vehicles (EVs for short), leading to the optimal control of partial differential equation problems.
The novelty of the present paper is to extend the Lagrangian approach proposed in [28], to a particular nonconvex setting with unbounded controls by first establishing the existence of a saddle point and then ensuring the convergence of the decentralized algorithm.The first contribution lies in the originality of the proof of a saddle point existence which follows a completely different path than the one developed in [28].In particular, we make use of the existence and of the regularity results on the solutions of an optimal control problem of a continuity equation developed in [27], to show the existence of a saddle point of our Lagrangian problem.We highlight that while bounds assumptions on the jump intensity [2,9] or compactness assumptions to analyse the Hamilton Jacobi Bellman equation [18] are common in the optimal control of PDMPs literature, we obtain the existence of a solution without assuming any a priori bounds on the controlled jump intensity.The second contribution consists in proving that the Stochastic Uzawa Algorithm proposed in [28] is still providing a converging sequence of controls, in this new setting involving the PDMP dynamics which violates the convexity conditions originally exploited.Finally, an application to the smart charging of an electric vehicle fleet by an aggregator illustrates the performance and the interest of the approach to coordinate the charge of a large number of electric vehicles in order to track a given target power consumption profile for the whole population of electric vehicles.
The outline of this paper is as follows.In Section 2, we formulate the optimization problem, the assumptions and the main results.Section 3 presents a dual approach of this problem.The Stochastic Uzawa algorithm is presented in Section 4 and is proved to converge in this setting.Section 5 presents simulations of the coordination of power consumption of a large fleet of electrical vehicles.
In the rest of this section we will list some frequently used notations.Notation Let X be the state space defined by X := I × [0, 1], where I is a finite set, of cardinality d ∈ N * .Let [0, T ] be a time interval with T > 0. The space D([0, T ], I) is the space of càdlàg functions from [0, T ] to I. The spaces C 1 (X ) (resp.C 1 ([0, T ] × X )) denotes the sets of real-valued continuously differentiable functions defined on X (resp.on [0, T ] × X ).The space C 0 ([0, T ] × X , R d + ) denotes the set of continuous functions from [0, T ] × X to R d + and is endowed with the norm ∥f ∥ ∞ := sup We consider the spaces where ∂ z is the space derivative w.r.t. the space variable z ∈ [0, 1] and The space of Borel probability measures on the space X is denoted by P(X ).We recall the definition of the Wasserstein distance, denoted by W, on the space P(X ) 2 : where the discrete set is I is endowed with the usual discrete distance function (equal to 1 for different elements).The space C 0 ([0, T ], P(X )) is the set of continuous functions from [0, T ] to P(X ), endowed with the distance W ∥•∥∞ defined by: Let H be an Hilbert space and F be a real valued function defined on H.The convex conjugate of F is denoted by F * and is defined, for any x ∈ H, by F * (x) := sup y∈H ⟨x, y⟩ H − F (y).

Motivation
We consider a population of N independent and identically distributed processes, controlled by a central planner via a common feedback control α.The space of feedbacks, denoted by A, is defined in (1.1) and is endowed with the norm ∥ • ∥ ∞ .The state of an agent, controlled by the jump intensity α, is given at time t by In the smart charging application in Section 5, Y α represents the charging mode of a vehicle, specifying its discrete charging rate (e.g.i = 0 for not charging, i = 1 for charging and i = −1 for discharging).The continuous variable Z α represents the state of charge (SoC for short) of an EV, that is the ratio of energy charged into the battery w.r.t. the total capacity of the battery.We say that the process X α is a PDMP(b, α) with jump intensity α if X α = (Y α , Z α ), where Y α is a jump process with values in I switching spontaneously, at jump times {T α k } k∈N given by a Poisson process with intensity α i to switch to mode i ∈ I.The constraint α i (•, i, 0) = 0 on the control space A in (1.1) is considered to avoid jumps of Y α from i to i.The process Z α follows a deterministic dynamics between two consecutive jumps ).For any (τ, t, j, z) ∈ [0, T ] × [τ, T ] × X , the flow ϕ is the unique solution of the ordinary differential equation: where b is a function that is exogenous and depends on the characteristics of the battery and of the charging station.To be more specific, following the definition of a PDMP given in [13], knowing T α k and X α ) as follows: where {E k,j } k,j are independent random variables following an exponential distribution of parameter 1, independent of X α 0 .Indeed, the control process α = (α j ) j∈I is such that α j (t, x) represents the intensity rate, at time t ∈ [0, T ], of jumping into mode j ∈ I when coming from state x ∈ X .We initialize T 0 = 0 and at time t = 0, the law of the couple of random variables (Y α 0 , Z α 0 ) is given by m 0 ∈ P(X ).Although the dynamics of X α might seem specific due to the vector valued jump intensity α, the dynamics (2.1) and (2.2) define a PDMP and the link with the definition of a PDMP introduced in [10] is described in Section A.1 of the Appendix.Since the control space A is made of continuous functions and that we assume below in Assumption 2.1 that b is in C 1 (X ), the PDMP(b, α) is well defined for any α ∈ A. The cost function J N for the N -agents problem is defined for any α ∈ A by: where {X n,α } n∈{1,...,N } are supposed to be independent PDMP(b, α) driven by independent exponential variables.Indeed, for any n ∈ {1, . . ., N }, X n,α is supposed to be controlled by the common feedback function α ∈ A only depending on the agent state X n .Besides, the function G is common to every agent and defined as the the sum of a running and a terminal cost, such that for any From a practical point of view, the function f represents a coupling cost depending on the aggregate quantity 1 ), and G the individual cost.

Problem formulation
In this paper, we focus on the mean field limit control problem, corresponding to the problem with an infinite population.Thus, we introduce the function J defined, for any α ∈ A, by: where X α is a PDMP(b, α).This paper is dedicated to the following problem: As the cost function (2.4) is nonlinear w.r.t. the expectation term, E p(t, X α t ) , via the coupling cost f , Problem (2.5) goes beyond the scope of optimal control of PDMP.We propose to numerically solve Problem (2.5) by applying the Stochastic Uzawa Algorithm (1), that is detailed in Section 4. This algorithm is introduced in [28] and relies on the same arguments as those underlying the stochastic gradient descent algorithm [16].The main result of this paper, stated in Theorem 2.7, ensures the convergence of the Stochastic Uzawa Algorithm 1 to the solution of Problem (2.5).
Throughout the paper, we assume the following: General assumptions Assumption 2.1.The vector field b ∈ C 1 (X ) is assumed to vanish at the boundary: b(j, 0) = b(j, 1) = 0 for any j ∈ I.
Assumption 2.2.The function p is in C 1 ([0, T ] × X ) and f : [0, T ] × R → R is a Carathéodory function being lower semi-continuous (l.s.c. for short), strictly convex and twice differentiable w.r.t. the second variable with Lipschitz gradient uniformly in time.More precisely, for a.e.t ∈ [0, T ], we have f (t, •) ∈ C 2 (R) and there exists a constant C > 0 independent of t, such that ∥∂ 2 z f (t, •)∥ ∞ ≤ C. In addition, there exists a constant C f > 0 such that, for any (t, x) ∈ [0, T ] × R, Assumption 2.4.The function L : R → R is defined by: where l ∈ C 1 (R + , R + ) is an increasing strongly convex function bounded from above by a quadratic function.More explicitly, there exists C > 0 such that for any x ∈ R + : where the first inequality is due to the strong convexity of l.We assume that l satisfies lim ) is well defined for any α ∈ A. Also, the regularity assumptions w.r.t. the space variable z of the function p in Assumption 2.2 is crucial to show the existence of a fixed point of a function involving p in Lemma A.2.The continuity of the map α → E[G(α, X α )] is an automatic consequence of the properties of c, g and L given in Assumptions 2.3 and 2.4.By Assumption 2.4, H is non-decreasing, non-negative, and H ′ is Lipschitz continuous on R.This theorem can not be obtained by a direct application of [28], which relies on additional assumptions, that are not verified in the present framework, which involves unbounded controls and PDMP dynamics.The proof of Theorem 2.7 is given in Section 4. Before, we introduce and analyze in the next section a dual problem proved to be equivalent, in some sense, to Problem 2.5.

Dual approach
In the same vein as [28], a Lagrangian decomposition approach is adopted to obtain a decentralized algorithm.Contrary to [28], the specific assumptions of convexity, that ensure the absence of duality gap [14], Appendix I and the convergence of the Stochastic Uzawa Algorithm, are not satisfied here.In Sections 3 and 4, we propose new theoretical arguments allowing to demonstrate the validity of the approach proposed in [28] in this specific framework of control of PDMP.
Let F be defined for any v ∈ L 2 (0, T ) by: Owing to the properties of f in Assumption 2.2, one deduces that function F is strictly convex and has at least quadratic growth.Indeed, for any v, u ∈ L 2 (0, T ) and γ ∈ [0, 1], by the strict convexity of f , one has and by (2.6) ) The equivalence between the two problems then follows from the definitions of J and J.

Let us introduce the Lagrangian
where and the associated dual function W : L 2 (0, T ) → R: where inf α∈A L 1 (α, λ) and inf The following lemma states the existence of a unique solution of the dual problem (3.6).It is derived from the assumptions on f in Assumption 2.2.
The proposition below is a key result, that enables to show the convergence of a sequence {α k } k generated by Algorithm 4 defined in the next section, to a solution of Problem (2.5) that belongs to the set B, defined in (1.1).
Theorem 3.3.For any λ ∈ L 2 (0, T ), argmin α∈A L 1 (α, λ) is not empty.In addition, there exists a selection ) The proof of Theorem 3.4 is given in Section 3.2.The main argument for the proof of Theorem 3.4 is that the map λ → W(λ) is Gâteaux differentiable in L 2 (0, T ).We show that λ → inf v∈L 2 (0,T ) L 2 (v, λ) and λ → inf α∈A L 1 (α, λ) are both differentiable.The second result is more difficult to prove.Different intermediary results are needed.First, properties on the sub problem inf v∈L 2 (0,T ) L 2 (v, λ) are given in Lemma 3.5.Then, an optimal control problem of a continuity equation is introduced and proved to be equivalent to the sub problem inf α∈A L 1 (α, λ) in Lemma 3.8.This optimal control problem is solved using results obtained in [28] and provides a solution of the sub problem inf α∈A L 1 (α, λ) in Lemma 3.9.Theorem 3.3 is a consequence of Lemma 3.9.Finally, Lemma 3.12 shows that the map λ → inf α∈A L 1 (α, λ) is Gâteaux differentiable in L 2 (0, T ).These lemmas are stated and proved below.
The next Lemma shows that, for any λ ∈ L 2 (0, T ), there exists a unique solution v[λ] of the sub problem inf v∈L 2 (0,T ) L 2 (v, λ) and that the map satisfies that, for any µ ∈ L 2 (0, T ): Proof.(i) By Assumption (2.2) the map v → F (v) is strictly convex, continuous, proper on L 2 (0, T ) and by inequality (2.6) this map is also coercive.Thus, one can deduce the existence and uniqueness of v[λ] := argmin (ii) Since the function F is proper and strictly convex, classical results from convex analysis give that its convex conjugate F * is differentiable [17], Chapter E, Theorem 4.1.1.Therefore, the map

Proof of Theorem 3.1
This section is devoted to the proof of Theorem 3.3.The following optimization problem is studied for any λ ∈ L 2 (0, T ), A characterization of a solution of this optimization problem is provided in Lemma 3.9.This lemma relies on the results obtained in [27] about the characterization of the solutions of a mean field control problem arising in smart charging.To exploit these results, we first introduce an optimization problem, for a given λ ∈ L 2 (0, T ), that is equivalent to Problem (P λ ) and that fits with the framework of [27].To do so, for a given function α ∈ A, we introduce the continuity equation (CEQ α ) on [0, T ] × X : ) where m 0 ∈ P(X ) is given.The next definition gives the characterization of weak solutions of (CEQ α ).Definition 3.6.For a given function α ∈ A, a distribution m satisfies (CEQ α ) in the weak sense if m is in C 0 [0, T ], P(X ) and for any test function ϕ ∈ C ∞ c ([0, T ] × X ), we have: The following proposition is a classical result and is crucial to introduce a problem that is equivalent to Problem (P λ ).
and X α be a PDMP(b, α).Then, the law m α of the process X α is the unique weak solution of (CEQ α ) in the sense of Definition 3.6.
Proof.This result is proved in [8] for controls α that are continuous in space and time independent.The extension of this result to bounded controls that are measurable in time is straightforward.
Proof.Let α ∈ A X α be a PDMP(b, α) and m α be the law of X α .By the definitions of L 1 in (3.3) and L1 in (3.9), one has Further, by Proposition 3.7, m α is the unique element of C 0 ([0, T ], P(X )) that is a weak solution of (CEQ α ).Thus, (α, m α ) is admissible for Problem ( Pλ ) and the problems (P λ ) and ( Pλ ) are equivalent.
By the previous lemma, it is straightforward that if Problem ( Pλ ) has a solution (α, m α ), then α is a solution of Problem (P λ ).For any λ ∈ L 2 (0, T ), we define φ λ , the unique function in C (defined in (1.1)) satisfying for any (t, i, z) ∈ [0, T ] × X , where H is defined in Assumption 2.4 and the flow ϕ in (2.1).The existence and uniqueness of φ λ is a proved in Lemma A.2 in Appendix A and is closely related to [27], Proposition 4.2.The next lemma provides a solution of Problem ( Pλ ), based on φ λ .Lemma 3.9.For any λ ∈ L 2 (0, T ), the control α[λ] defined for any (t, i, j, z) is a solution of Problem (P λ ).
Proof.By Assumption 2.4, the map and by the uniform bound of {∥∂ z φ λ ∥ ∞ } t∈[0,T ] , there exists a constant C > 0 such that, for any t ∈ [0, T ], we have ∥α By Assumption 2.4, one can show that H ′ (0) = 0 and therefore, that ) is a solution of Problem ( Pλ ).The conclusion follows from Lemma 3.8.
Remark 3.10.The lemma 3.9 states the existence of a solution of an optimal control problem of PDMP without any assumptions on the bounds of the set A. Although the boundedness of the space of controls is a common assumption in the literature [2,9], some recent results enable to relax partially this assumption [18].Contrary to the literature previously cited, our existence and regularity results do not rely on a dynamic programming approach but is a direct application of the analysis of an optimal control problem of a continuity equation [27].
We are now ready to prove Theorem 3.3.
Proof of Theorem 3.3.By Lemma 3.9, for any λ ∈ L 2 (0, T ), α[λ] defined in (3.12) is a solution of Problem (P λ ).By the proof of Lemma 3.9, α[λ] belongs to B. Let k > 0 and λ, µ ∈ L 2 (0, T ) be such that ∥λ∥ 2 ≤ k and ∥µ∥ 2 ≤ k.Then, by the same arguments as in the proof of [27], Lemma 4.5, there exists a constant K(k, T ) > 0 such that for any (t, i, j, z) ∈ [0, T ] × I × I × [0, 1]: The Lipschitz continuity of the map λ → α[λ] follows from taking the supremum over [0, T ] × X of the l.h.s. of the previous inequality.Remark 3.11.Note that rewriting Problem (2.5) w.r.t. to the distribution of the states, as in Lemma 3.8 for the Problem (P λ ), one can prove Theorem 3.4 by using a change of variable, as it has been done in [27] for a similar problem.To avoid additional notations, we have preferred to prove Theorem 3.4 by means of Theorem 3.3.
We now turn to the proof of Theorem 3.4.

Stochastic uzawa algorithm
While the existence of a unique solution λ of the dual problem (3.6) is established in Lemma 3.2, we propose in this section an iterative algorithm converging to λ.The convergence of the algorithm relies on the same arguments as those developed for the analysis of the gradient descent algorithm in Hilbert space [16].This algorithm is an adaptation of the Uzawa Algorithm [1] in stochastic optimization settings.The sub gradient of the opposite of the dual function W is estimated by Monte Carlo simulations.Convexity assumptions introduced in [28] are not satisfied here.

We consider a sequence {ρ
k < ∞ to ensure the convergence.Typically, consider the sequence defined by ρ k = a b + k with a > 0 and b > 0 chosen empirically to accelerate the convergence.

7:
) . 8: The next Lemma shows that it is possible at any iteration k to compute α[λ k ] defined in line 5 of the Algorithm 1.
By previous inequality, one deduces that The conclusion follows from the definition of λ k+1 at line 8 of Algorithm 1, previous inequality and the fact that a.s.
The next Lemma provides an estimate on the norm of the gradient of W, that is crucial to show the convergence of Algorithm 1.

Lemma 4.2.
There exists C > 0 such that the sequence {U k+1 } k generated by Algorithm 1 (line 7) satisfies, for any k ∈ N, Proof.By Assumption 2.2, p is in C 1 ([0, T ] × X , R) thus, there exists C > 0, such that, for any k ∈ N and j ∈ {1, . . ., M }: The conclusion follows from previous inequality, inequality (4.2) in the proof of Lemma 4.1 and the definition of U k+1 at line 7 of Algorithm 1.
The convergence of Algorithm 1 is stated in the next lemma.
Proof.Since W is convex and Gateau differentiable (proof of Thm.3.4), by Lemma 4.2, the proof is a simple adaptation of the one in [16], Theorem 3.6, as it has been done in [28].
Before to give the proof of the main result of this paper, we need the following regularity result on the function J. Proof.Let α ∈ B, X α be a PDMP(b, α) and m α be the law of X α .By the definition of m α and of the map J in (2.4), J be rewritten as: By the regularity of the functions p, c, g and L in Assumptions 2.2-2.4 and the continuity of the map α → m α on B stated in Lemma A.1 in Appendix A, one obtains the continuity of J over B w.r.t. the norm ∥ • ∥ ∞ .
We can now prove Theorem 2.7.

Simulations
This section illustrates the results with an example of smart charging.In this use case, the discrete variable taking values in I represents the charging mode and the continuous variable taking value in [0, 1] represents the state of charge (SoC) of electrical vehicles relatively to the maximum energy capacity of the battery.The quantities are dimensionless.

Definition of the use Case
We consider a large fleet of EVs controlled by a central planner during their charging period [0, T ].The goal of the central planner is to provide ancillary services to the transmission grid by controlling the aggregate consumption profile of the fleet on the time horizon [0, T ].More specifically, the central planner aims at making the consumption profile of the fleet to be close to a given profile r = (r t ) 0≤t≤T , supposed to be known on the whole period [0, T ].In our simulation we compare two situations.In the first case known as V1G, it is assumed that the vehicle batteries can only draw electricity from the grid, then the set of charging modes is I = {0, 1}, where 0 stands for idle mode and 1 for charging.In the second case known as V2G, it is assumed that the batteries can either draw or inject electricity into the grid, then the charging modes are I = {−1, 0, 1}, where −1 corresponds to injection mode.In each situation, each mode i ∈ I is characterized by its charging rate b(i, •) such that The charging rate b of the batteries has been designed based on industrial data [15].The state of each EV, controlled by α, is represented by a PDMP X α = (Y α , Z α ) with Y α t , the charging mode and Z α t , the SoC of the vehicle at time t.We assume that the power consumption of a vehicle at state (i, z) ∈ X is equal to its instantaneous charging rate b(i, z).At time t = 0, Y α 0 equals 0 or 1 with probability 1/2, while Z α 0 = min(1, |z 0 | + 0.15) where z 0 ∼ N (0, 0.2).
To ensure customer satisfaction, the final cost g(i, z) := 30 × (1 − e 5(z−0.75) ) + penalizes vehicles with a final SoC z lower than 75%.As previously stated, high values of α are penalized through the cost α 2 /2 in order to avoid high frequencies of jumps damaging the batteries.For any t ∈ [0, T ], with κ = 1000 in order to ensure that the overall consumption is close to the profile r.
The target profile r is a slight modification of the nominal behaviour of the fleet of EVs (without control), with the same energy consumed over the fixed time horizon [0, T ].The nominal behaviour corresponds to the situation where the EVs are not required to fit the target profile and seek only to satisfy their own comfort (namely F (v) = 0).The idea is to define a realistic target profile so that it is possible for the fleet to follow this profile while satisfying their charging needs.We consider a population of N = 10 5 EVs.The optimal control of Problem (2.5) is computed using Algorithm 1.

Algorithm parameters
For the implementation of the algorithm, the line 5 is computed by discretization of the Hamilton Jacobi-Bellman-Equation III.5 associated to α[λ] defined in III.6 with N t = 1000 regular points of time and N s = 400 regular points of SoC.The Lagrangian multipliers λ k are obtained after k = 1000 iterations and with M = 1000 realizations and are displayed in Figure 4.The initial multiplier is λ 0 = 0 and the stepsize sequence is such that ρ k = min(30, 500 1 + k ).For each multiplier λ, the associated optimal strategy α[λ] is computed.

Results
Four periods on Figure 3 are distinguished, depending on whether the target profile is above or below the nominal consumption.The reference profile is displayed in green.EVs are encouraged to consume more than the nominal consumption during periods P1 and P3, and to consume less during periods P2 and P4.First, we observe that both V1G (blue line) and V2G (red line) consumptions are close to the profile r (green line).The main difference lies in P4, where the possibility for the EVs to perform V2G allows the fleet consumption to stay closer to the target profile.As expected, the profile is better tracked with the V2G mode, due to the additional  degree of liberty.On Figure 6 one can observe that, compared to the case with only V1G on Figure 5, V2G is used during periods P2 and P4, when the profile r is lower than the nominal consumption.Ten trajectories of SoC, generated in the V2G use case, are represented in Figure 1.The green line is an example of a PDMP switching from mode 0 to mode 1 around t = 0.2, from mode 1 to mode -1 around t = 2.6 and then from mode -1 to mode 1 around t = 4.The initial and final distribution of the SoC of the fleet are displayed on Figures 7  and 8.While the initial distribution is the same for both scenarios, one can observe that their final distribution are very closed, and that very few EVs have a SoC lower than 0.75% in both cases.Thus, the comfort of each agent is weakly impacted by the V2G mode.

A.3 Characterization of an optimal control
This section aims at giving an existence and uniqueness result of the equation (3.11) for any λ ∈ L 2 (0, T ).

Remark 2 . 5 .
x→0 l(x) = 0. Due to the assumptions on l, L is convex.We denote by H the convex conjugate of L, i.e.H is defined for any x ∈ R by H(x) := sup y∈R xy − L(y).The main role of Assumption 2.1 is to ensure that the flow defined in (2.1) exists and takes values in [0, 1].Assumption 2.2 ensures that

Remark 2 . 6 .Theorem 2 . 7 .
The boundary condition on b may seem restrictive in Assumption 2.1.However, if supp(m 0 ) ⊂ I × (0, 1), any function b that is such that, for any i ∈ I, b(i, 0) ≥ 0, b(i, 1) ≤ 0 and the sign of b(i, •) is constant, can be approximated by a function b satisfying the boundary condition.Indeed, for such a function b, there exists ε > 0 such that P(Z α t ∈ (ε, 1 − ε)) = 1 for any any α ∈ A and any t ∈ [0, T ].Therefore, there exists an open subset containing the boundaries, that is a.s.never attained and it is possible to approximate b by a smooth function b, vanishing at the boundary without modifying the trajectory of Z n,α t .The main results of the paper are summarized in the following theorem, and provide the convergence of both the sequence of iterates and the sequence of function values genrated by Algorithm 1 stated in Section 4. Problem 2.5 has a solution.Let {α k } k∈N be a sequence in A generated by Algorithm 1 (line 5), then the following assertions hold (i) The sequence {α k } k∈N converges a.s. to an element of argmin α∈A J(α) w.r.t. the norm ∥ • ∥ ∞ .(ii) The sequence {J(α k )} k∈N converges a.s. to min α∈A J(α).

Theorem2. 7 .
The existence of a solution to Problem 2.5 is a direct consequence of Theorem 3.4.Indeed, the control α[ λ], where by Lemma 3.2 λ is the solution of Problem (3.6), is a solution of Problem (2.5).(i) By Lemma 4.3, the sequence {λ k } k , generated by Algorithm 1, converges to λ a.s. in L 2 (0, T ).Then, by Theorem 3.3, the sequence {α[λ k ]} k converges to α[ λ] a.s.w.r.t. the norm ∥ • ∥ ∞ .Since α[ λ] is a solution of Problem 2.5 and that by definition α k = α[λ k ] ∈ B for any k ∈ N, the conclusion follows.(ii) Since the sequence {α k } k converges to a solution of Problem 2.5 a.s.w.r.t. the norm ∥ • ∥ ∞ and by Lemma 4.4 the function J is continuous on B, one deduces that the sequence {J(α k )} k∈N converges a.s. to min α∈A J(α).

Figure 6 .
Figure 6.Evolution of the proportion of vehicles for V2G.

Figure 7 .
Figure 7. Initial and Final SoC in the V1G case.

Figure 8 .
Figure 8.Initial and Final SoC in the V2G case.