On the linear convergence of the multi-marginal Sinkhorn algorithm

The aim of this short note is to give an elementary proof of linear convergence of the Sinkhorn algorithm for the entropic regularization of multi-marginal optimal transport. The proof simply relies on: i) the fact that Sinkhorn iterates are bounded, ii) strong convexity of the exponential on bounded intervals and iii) the convergence analysis of the coordinate descent (Gauss-Seidel) method of Beck and Tetru-ashvili [1].


Introduction
Eventhough the Sinkhorn algorithm 1 is more than 50 years old [16], it has attracted a considerable attention in the last years.It is now at the heart of efficient solvers for the entropic regularization of optimal transport problems, a field on which Cuturi's paper [6] had a tremendous impact (see Cuturi and Doucet [5], Cuturi and Peyré [13], Benamou et al. [2]...).The Sinkhorn algorithm remains fascinating by its simplicity and its connections with the Schrödinger bridge problem first addressed by Schrödinger in [15] and large deviations theory, see Dawson and Gärtner [7], Föllmer [9], Léonard [12,11].
The linear convergence of the Sinhkorn algorithm for two marginals is well-known.A very elegant proof consists in using a celebrated theorem of Birkhoff to show that the Sinkhorn algorithm consists in iterating a contraction for the Hilbert projective metric, see Franklin and Lorenz [10], and more recently, Chen, Georgiou and Pavon, [4].
To the best of our knowledge, the elegant Hilbert metric proof does not carry over to the multi-marginal case for which an annoying N − 1 factor (N being the number of marginals) appears in the Lipschitz constant for the Hilbert metric.Convergence of the multi-marginal Sinkhorn algorithm was recently obtained by Di Marino and Gerolin [8] and the well-posedness (existence, uniqueness and smooth dependence on the data) of the Schrödinger system (see (2.3) below) was addressed by completely different arguments (local and global inversion theorems) by the author and Laborde in [3].In the analysis of [8], a key ingredient is that Sinkhorn iterates are coordinate descent updates for a convex minimization problem (dual to an entropy minimization subject to multi-marginal constraints), see the definition of F in (2.4) below.In this note, we slightly improve the results of Di Marino and Gerolin, by showing linear convergence.The proof relies on the convergence analysis of the coordinate descent method of Beck and Tetruashvili [1] which can easily be used here, since Sinkhorn iterates are bounded in L ∞ so remain in a set where the functional F is uniformly convex.

Multi-marginal Sinkhorn algorithm
We are given an integer N ≥ 2, N probability spaces (X i , F i , m i ), i = 1, . . ., N and set Given i ∈ {1, . . ., N}, we will denote by X −i := N j =i X j , m −i := N j =i m j and will always identify X to X i × X −i i.e. will denote x = (x 1 , . . ., x N ) ∈ X as x = (x i , x −i ).The set of measures on (X, F ) having m 1 , . . ., m N as marginals will be denoted Π(m 1 , . . ., m N ).Given p ∈ [1, ∞] and (ϕ 1 , . . ., ϕ N ) ∈ N i=1 L p (X i , F i , m i ) we will use the notations Given a cost c ∈ L ∞ (X, F , m), we set The associated Gibbs kernel is so that e − c ∞ ≤ K ≤ e c ∞ , m-almost everywhere.We look for potentials belongs to Π(m 1 , . . ., m N ) i.e. solve the Schrödinger system: for every i and m i -a.e.x i .The system (2.3) is well-known to be the Euler-Lagrange optimality condition for the convex minimization problem and if ϕ solves (2.3), the measure Q ϕ solves the multi-marginal entropy minimization: inf Let us observe that whenever λ 1 , . . ., λ N are constants which sum to 0, then so that one can impose the N − 1 normalizing constraints: Denoting by L p ⋄ (X i , F i , m i ) the space of zero-mean L p potentials: The Sinkhorn algorithm is nothing but block coordinate descent for the minimization of F over E. Starting from ϕ 0 ∈ E, the updates of the Sinkhorn algorithm, consists, given i.e.
(2.14) The convergence of the Sinkhorn iterates to a solution of (2.3) (hence a minimizer of (2.4)) was established by Di Marino and Gerolin [8].The aim of the next paragraph is to slightly improve this result by showing that this convergence is linear.
Since F is bounded from below on E, the left-hand side of (3.5) converges to 0. Note also that since ϕ t and ϕ t+1 belong to E, one has the identity and we deduce from (3.5) Together with the uniform bounds from lemma 3.1, we deduce that ϕ t i − ϕ t+1 i as well as e ϕ t i − e ϕ t+1 i converge strongly to 0 in ), to the unique solution ϕ of (2.6).Moreover, there holds Proof.The convergence of ϕ t in every L p was obtained by Di Marino and Gerolin [8], we include a short proof for the sake of completeness.Setting a t i := e ϕ t i , passing to a subsequence if necessary, we may assume the constants λ t i in (2.9)-(2.11)converge and that a t i , a t+1 i converges weakly to some a i in L 2 (m i ).Hence, for every i, ⊗ j<i a t+1 j ⊗ j>i a t j weakly converges in L 2 (m −i ) to ⊗ j<i a j ⊗ j>i a j .By construction of the Sinkhorn iterates, e λ t i a t+1 i is expressed as a Hilbert-Schmidt hence compact integral functional of ⊗ j<i a t+1 j ⊗ j>i a t j , hence 1 a t+1 i converges strongly in L 2 (m i ).Since a t i is uniformly bounded and uniformly bounded away from 0, a t+1 i converges strongly in L 2 (m i ) as well as in in L p (m i ) for any p ∈ [1, +∞) by the bounds from lemma 3.1.Using again that a t i is uniformly bounded and uniformly bounded away from 0, ϕ t i also strongly converges in L p (m i ) to ϕ i := e a i and of course ϕ ∈ E. Observing that, by construction, ⊗ j≤i a t+1 j ⊗ j>i a t j Km admits e λ t i m i as i-th marginal for i = 1, . . ., N − 1 and m N as N-th marginal, one easily checks that e ⊕ N i=1 ϕ i Km = ⊗ N i=1 a i Km admits m 1 , . . ., m N as marginals (and all the constants λ t i , i = 1, . . ., N − 1, tend to 0).Thus ϕ solves the system (2.3) hence minimizes F over E, but since F is strictly convex over E, this minimizer is unique and in fact the whole sequence ϕ t strongly converges in L p to ϕ for every p ∈ [1, +∞).
Since ϕ satisfies the bounds of lemma 3.1, using (3.3) as we did in the proof of lemma 3.2, we arrive at where ν is the constant in (3.4) and Defining ϕ t i by (3.6), by construction of the Sinkhorn iterates, for i = 1, . . ., N, we have where we have used Young's inequality in the last line.We thus have shown that Using the second inequality in (3.3) together with the L ∞ bounds on ϕ t from lemma 3.1 and Jensen's inequality yield Remark 3.4.We also have linear convergence of ϕ t to ϕ in L 2 and every L p , p ∈ [1, +∞).