Interpolation inequalities, nonlinear flows, boundary terms, optimality and linearization

This paper is devoted to the computation of the asymptotic boundary terms in entropy methods applied to a fast diffusion equation with weights associated with Caffarelli-Kohn-Nirenberg interpolation inequalities. So far, only elliptic equations have been considered and our goal is to justify, at least partially, an extension of the carr{\'e} du champ / Bakry-Emery / R{\'e}nyi entropy methods to parabolic equations. This makes sense because evolution equations are at the core of the heuristics of the method even when only elliptic equations are considered, but this also raises difficult questions on the regularity and on the growth of the solutions in presence of weights.We also investigate the relations between the optimal constant in the entropy - entropy production inequality, the optimal constant in the information - information production inequality, the asymptotic growth rate of generalized R{\'e}nyi entropy powers under the action of the evolution equation and the optimal range of parameters for symmetry breaking issues in Caffarelli-Kohn-Nirenberg inequalities, under the assumption that the weights do not introduce singular boundary terms at x=0. These considerations are new even in the case without weights. For instance, we establish the equivalence of carr{\'e} du champ and R{\'e}nyi entropy methods and explain why entropy methods produce optimal constants in entropy - entropy production and Gagliardo-Nirenberg inequalities in absence of weights, or optimal symmetry ranges when weights are present.


INTRODUCTION
In this paper we consider the Gagliardo-Nirenberg inequality in relation with the nonlinear diffusion equation in R d , d ≥ 1, in the fast diffusion regime m ∈ [1 − 1/d, 1). We also consider more general interpolation inequalities with weights. With the norm defined by w q,γ := R d |w| q |x| −γ d x 1/q , which extends the case without weight w q = w q,0 , let us consider the family of Caffarelli-Kohn-Nirenberg interpolation inequalities given by (3) w 2p,γ ≤ C β,γ,p ∇w ϑ 2,β w 1−ϑ p+1,γ in a suitable functional space H p β,γ (R d ) obtained by completion of smooth functions with compact support in R d \ {0}, w.r.t. the norm given by w 2 := (p ⋆ − p) w 2 p+1,γ + ∇w 2 2,β . Here C β,γ,p denotes the optimal constant, the parameters β, γ and p are subject to the restrictions and the exponent ϑ is determined by the scaling invariance, i.e., These inequalities have been introduced, among other inequalities, by L. Caffarelli, R. Kohn and L. Nirenberg in [10]. The evolution equation associated with (3) is the weighted nonlinear diffusion equation (5) v t = |x| γ ∇ · |x| −β ∇v m , (t , x) ∈ R + × R d , with exponent m = p+1 2 p ∈ [m 1 , 1) where Details about the existence of solutions for the above evolution equation and their properties can be found in [7]. Our first goal is to give a proof of (3) with an integral remainder term using (5) whenever the optimal function in (3) is radially symmetric. This requires some parabolic estimates. As in the elliptic proof of (3) given in [21] and [22], the main difficulty arises from the justification of the integrations by parts. We also investigate why the method provides the optimal constant in (1) and the optimal range of symmetry in (3).

The symmetry breaking issue. Equality in (3) is achieved by Aubin-Talenti type functions
if we know that symmetry holds, that is, if we know that the equality is achieved among radial functions. In this case it is not very difficult to check that w ⋆ is the unique radial critical point, up to the transformations associated with the invariances of the equation. Of course, any element of the set of functions generated by the dilations and the multiplication by an arbitrary constant is also optimal if w ⋆ is optimal. Conversely, there is symmetry breaking if equality in (3) is not achieved among radial functions. Deciding whether symmetry or symmetry breaking holds is a central problem in physics, and it is also a difficult mathematical question. It is well known that symmetric energy functionals may have states of lowest energy that may or may not have these symmetries. In our example (3) the weights are radial and the functional is invariant under rotation. In the language of physics, a broken symmetry means that the symmetry group of the minimizer is strictly smaller than the symmetry group of the functional. For computing the optimal value of the functional it is of great advantage that an optimizer is symmetric. The optimal constant C β,γ,p can then be explicitly computed in terms of the Γ function. Otherwise, this is a difficult question which has only numerical solutions so far and involves a delicate energy minimization as shown in [18,19]. In other contexts the breaking of symmetry leads to various interesting phenomena and this is why it is important to decide what symmetry types, if any, an optimizer has. Our problem is a model case, in which homogeneity and scaling properties are essential to obtain a clear-cut answer to these symmetry issues.
To show that symmetry is broken in (3), one can minimize the associated functional in the class of symmetric functions and then check whether the value of the functional can be lowered by perturbing the minimizer away from the symmetric situation. This is the standard method, and it has been used to establish that symmetry breaking occurs in (3) if (6) γ < 0 and β FS (γ) In the critical case p = p ⋆ , the method was implemented by F. Catrina and Z.-Q. Wang in [14], and the sharp result was obtained by V. Felli and M. Schneider in [28]. The same condition was recently obtained in the sub-critical case p < p ⋆ , in [7]. Here by critical we simply mean that w 2p,γ scales like ∇w 2,β . One has to observe that proving symmetry breaking by establishing the linear instability is a local method, which is based on a painful but rather straightforward linearization around the special function w ⋆ . When the minimizer in the symmetric class is stable, i.e., all local perturbations that break the symmetry (in our case, non-radial perturbations) increase the energy, the problem to decide whether the optimizer is symmetric, is much more difficult. It is obvious that, in general, one cannot conclude that the minimizer is symmetric by using a local perturbation, because the minimizer in the symmetric class and the actual minimizer might not be close even in any notion of distance adapted to the functional space H p β,γ (R d ). In general it is extremely difficult to decide, assuming stability, wether the minimizer is symmetric or not. This is a global problem and not amenable to linear methods.
One general technique for establishing symmetry of optimizers are rearrangement inequalities and the moving plane method. These methods, however, can only be applied for functionals that are in one way or another related to the isoperimetric problem. Outside this context there are no general techniques available for understanding the symmetry of minimizers. This is quite obvious when the weights and the nonlinearity do not cooperate to decrease the energy under symmetrization and in these cases moving planes and related comparison techniques fail. As usual in nonlinear analysis, advances have always been made by studying relevant and non-trivial examples, such as finding the sharp constant in Sobolev's inequality [1,36], the Hardy-Littlewood-Sobolev inequality [33] or the logarithmic Sobolev inequality [31], to mention classical examples. In all these cases symmetrization and the moving plane methods work. Likewise, these techniques can be applied, in the case of Caffarelli-Kohn-Nirenberg inequalities, to prove that symmetry holds if p = p ⋆ and β > 0. In fact using symmetrization methods, fairly good ranges have been achieved in [4]. The results, however, are not optimal and can be improved by direct energy and spectral estimates as in [20]. Various perturbation techniques have also been implemented, as in [23,24], to extend the region of the parameters for which symmetry is known but the method, at least in [23] and related papers, is not constructive. To establish the optimal symmetry range in (3), and thus determine the sharp constant in the Caffarelli-Kohn-Nirenberg inequalities, a new method had to be designed. What has been proved in [21] in the critical case p = p ⋆ , and extended in [22] to the sub-critical case 1 < p < p ⋆ , is that the symmetry breaking range given in (6) is optimal, i.e., symmetry holds in the region of admissible parameters that is complementary to the region in which symmetry breaking was established.
The strategy used in [21,22] to prove symmetry in the desired parameter range consists of perturbing the functional about the (unknown) critical point in a particular direction. Notwithstanding what has been said before about perturbations being local, the direction depends in a non-linear fashion on the critical point. It turns out that this perturbation vanishes precisely if the critical point is a radial optimizer. Of course, this begs the question how this direction can be found. In the case at hand it turns out that the functional is monotone under the action of a particular non-linear flow, and the derivative of the functional at a critical point turns out to be strictly negative unless the critical point is a radial optimizer. In carrying out this program one has to perform integration by parts and a good deal of work enters in proving the necessary regularity properties of the critical points that justify these computations.
A more appealing possibility is to use the fact that the non-linear flow, written in suitable variables, converges to a Barenblatt profile. Starting with any reasonable initial condition one would, as above, differentiate the functional along the flow and, in a formal fashion, see that that the functional decreases as time tends to infinity towards its minimal value. In addition to having an intuitive approach, one would potentially obtain correction terms to the inequality. This can be carried out, but so far the corresponding computations are formal because they rely on various integrations by parts that have to be justified. It is the first purpose of this paper to (partially) fill this gap and establish the optimal symmetry range using the full picture of entropy methods, at least as far as integration by parts on unbounded domains is concerned. In the case of non constant coefficients, the problems that might arise when dealing with the singularities of the weights at x = 0 poses additional difficulties which are not studied in this paper, so that our results are still formal in the weighted case. But at least we make what we think is a significant step towards a complete parabolic proof.
Additionally a method based on a parabolic flow provides for free an integral remainder term, and sheds a fresh light on the method used in [21,22]. The results in [21,22] are surprising in the sense that the locally stable radial optimizers are precisely the global optimizers. From the flow perspective, however, this can be understood, because stationarity under the flow characterizes all critical points. The flow monotonously decreases the functional associated with (3): this also explains why we are able to extend a local property (the linear stability of radial solutions) to a global stability result (the uniqueness, up to the invariances, of the critical point).
The parabolic approach is based on an inequality between the Fisher information and its time derivative, i.e., the production of Fisher information, and provides us with the optimal range of symmetry. This is a remarkable fact, common to various nonlinear diffusion equations, that can be explained as follows. When there are no weights, the optimality in the entropy -entropy production inequality is achieved through a linearization which also provides us with large-time asymptotic rates of convergence. As a consequence the best constant in the inequality is equal to the optimal constant which arises from the computation of Fisher information -production of Fisher information (see [3,32]), and which is also reached in the largetime asymptotics. With general weights, the picture is actually slightly more complex, as discussed in [7,8], but by studying the large time asymptotics, one can at least understand why the optimal symmetry range is achieved in our flow approach. This will be detailed in the last section of this paper. One more comment has to be done at this point. Quite generally computations based on the second derivative of an entropy with respect to the time, along a flow, are known as Bakry-Emery or carré du champ methods. The geometry or the presence of an external potential usually allows us to relate a positivity estimate of the curvature or a uniform convexity bound of the potential with a rate of decay of the Fisher information. In the Rényi entropy powers approach, as can be seen from [27], there is no such bound neither on the curvature nor on the potential: what matters is only the fact that we apply a nonlinear flow to some nonlinear quantities. The interplay of the various quantities that are generated by integrations by parts and hitting powers of functions when taking derivatives delivers nontrivial coefficients that allows to relate the Fisher information with its time derivative, i.e., the production of Fisher information. As explained below, in order to control the boundary terms, we are dealing with the more classical setting of relative entropies and self-similar variables. By making the link with Rényi entropy powers, we finally get rid of any geometry or convexity requirements on an external potential. Although this is a side remark of our paper, we believe that it is of interest by itself.
In [21,22], we analyzed the symmetry properties not only of the extremal functions of (3), but also of all positive solutions in H p β,γ (R d ) of the corresponding Euler-Lagrange equations, i.e., up to a multiplication by a constant and a dilation, of Theorem 1. [21,22] Under Condition (4) assume that Assume that d ≥ 2 and (β, γ) = (0, 0). Then all positive solutions of (7) in H p β,γ (R d ) are radially symmetric and, up to a scaling and a multiplication by a constant, equal to w ⋆ . Theorem 1 determines the optimal symmetry range, as shown by (6). Our first result is actually a more precise version of Theorem 1, under a regularity assumption at x = 0 that still has to be proved.

Main result.
The Rényi entropy power functional relates (2) with (1). We adopt a similar approach in the weighted case. Let us consider the derivative of the generalized Rényi entropy power functional, defined up to a multiplicative constant as The exponent m is the one which appears in (5), and it is such that m ∈ [m 1 , 1). With we observe that m 1 = 1−1/n, so that, when (β, γ) = (0, 0), we have n = d and m 1 = 1−1/d. As we will see later using scalings, n plays the role of a dimension. (4) and (8) hold. In [21,22] we proved Theorem 1 using elliptic methods and well chosen multipliers inspired by the heuristics arising from the parabolic equation (5). However, so far, we were not able to deal with the time-dependent solution by lack of estimates for justifying integrations by parts and this is why we only worked with the elliptic equation. In this paper we study the evolution problem. When (β, γ) = (0, 0), we are not yet able to deal with the possible singularities of the solutions to (5) at the origin, but otherwise we can handle all integrations by parts. The method is based on the approximation of the solution on a ball in self-similar variables, with no-flux boundary conditions on the boundary, and then by letting the radius of the ball go to infinity. By using parabolic methods to prove Theorem 1, we obtain improvements of (1) and (3), with a remainder term computed as an integral term along the flow.
Let us define then there exists a positive constant C depending only on β, γ and d such that the following property holds.
If v 0 satisfies v 0 1,γ = M ⋆ and if there exist two positive constants C 1 and C 2 such that then for any positive solution of (5) with initial datum v 0 we have Here h and µ are defined by (10).
Here ω = x |x| , and ∇ ω denotes the gradient with respect to angular derivatives. The explicit expression of C and further remainder terms will be given in Theorem 9, in Section 3. The condition that v is smooth at x = 0 means that integrations by parts can be done without paying attention to the weight in a neighborhood of x = 0. Condition (11) may seem very restrictive, but it is probably not since, as explained in the introduction of [7], it is expected that for any smooth initial datum v 0 with finite mass, condition (11) will be satisfied by any solution after some finite time t > 0. At least this is known from [9] when (β, γ) = (0, 0).
The mass normalization v 0 1,γ = M ⋆ simplifies the computations but the result can easily be generalized to any positive mass. The smoothness condition at x = 0 simply means that all computations can be carried in R d \ B ε where B ε is the ball of radius ε > 0 centered at the origin, and that the boundary terms on ∂B ε vanish as ε → 0. Up to the smoothness assumption, the result of Theorem 2 is stronger than the result of Theorem 1. Indeed, if m and p are related by p = 1 2 m−1 and if w solves (7), then we have that which, as shown in [22], implies the result of Theorem 1.
1.3. Outline of the paper. Our goals are: (1) To give a proof of the monotonicity of G for the solution to the evolution equation and establish the remainder term of Theorem 9 under the smoothness assumption of the solutions of (5) at x = 0. (2) To study the outer boundary terms by using self-similar variables and an approximation scheme on large balls. The novelty here is that we are able to justify the integrations by parts away from the origin for the solution to the evolution problem (5). (3) To study the role of large time asymptotics and of the linearized problem, and consequently explain why the method in [21,22] determines the optimal range for symmetry breaking. Corresponding results are stated in Section 4.
Weights induce various technicalities, so that, in order to emphasize the strategy, we also consider the case without weights. In that case Theorem 9 is rigorous without any smoothness assumption on the solution at x = 0. This is not by itself new, but at least two observations are new: • The equivalence of the Rényi entropy powers introduced by G. Savaré and G. Toscani in [35] and the computation based on the relative Fisher information in self-similar variables, • The characterization of the optimality case in the Fisher information -production of Fisher information inequality, which explains why computations based on flows provide us with the optimal constant in the entropy -entropy production inequality and, as a consequence, in the Gagliardo-Nirenberg inequality (1), when there are no weights.

A proof of Gagliardo-Nirenberg inequalities based on the Rényi entropy powers.
2.1.1. Variation of the Fisher information along the flow. Let v be a smooth function on R d and define the Fisher information as Here P is the pressure variable. If v solves (2), in order to compute , we will use the fact that Using (2) and (12), we can compute The key computation relies on integrations by parts and requires a sufficient decay of the solutions as |x| → +∞ to ensure that all integrals are finite, including the boundary integrals. In the next result, we focus on the algebra of the integrations by parts used to deal with the r.h.s. of the above equality and time plays no role. How to apply this computation to a solution of the parabolic problem will be explained afterwards.

Lemma 3. Assume that v is a smooth and rapidly decaying function on R d , as well as its derivatives. If we let
Proof. We follow the computation of [27] or [35,Appendix B].
where the last line is given by the observation that v ∇P = − ∇(v m ) and an integration by parts: 1) Using the elementary identity 1 2

an integration by parts gives
Collecting terms establishes (13).
The result of Lemma 3 can be applied to a solution of (2). (12) and Proof. If we perform the same computations as in the proof of Lemma 3 in a ball B R (instead of R 2 ), we find additional boundary terms Here dσ denotes the measure induced on ∂B R by Lebesgue's measure. It follows from [5, Theorem 2, (iii)] that these boundary terms vanish as R → +∞, which completes the proof.

Concavity of the Rényi entropy powers and consequences. Lemma 3 establishes that I
and, as a consequence, we obtain In the sub-critical range m 1 < m < 1, let us define the entropy as E = R d v m d x and observe that, if v solves (2), Next we introduce the Rényi entropy power given by F = E σ with Using Lemma 3, we find that F ′′ = (E σ ) ′′ can be computed as Using Hence we get that This proves that σ (1 − m) G = F ′ is nonincreasing (so that the function t → F(t) is concave).
We recall that (β, γ) = (0, 0) and, as a consequence, n = d and µ = µ ⋆ : notations are consistent with those of (9). To obtain the expression of v ⋆ , it is standard to rephrase the evolution equation (2) in self-similar variables as follows. If we consider a solution v of (2) and make the change of variables then the function u solves It is straightforward to check that B ⋆ is a stationary solution of (16) and it is well known that B ⋆ attracts all nonnegative solutions with mass M ⋆ at least if m ∈ (m c , 1). Since x) as t → +∞ . We refer to [5] for details and further references. As a consequence, we obtain that for any t ≥ 0, which is exactly equivalent to (1) as noted in [27]. By keeping track of the remainder term, we get an improved inequality.

Proposition 5.
Under the assumptions of Corollary 4, with R defined by (14), for all t ≥ 0, we have If we write v m−1/2 0 = M m−1/2 ⋆ w/ w 2q with q = 1/(2m − 1), then this inequality amounts to ( In this way, we recover the Gagliardo-Nirenberg inequality that was established in [15], with an additional remainder term.
Proof. The only point that deserves a discussion is the equality case. Solving simultaneously shows that P(x) = a + b|x − x 0 | 2 for some real constants a and b, and for some x 0 ∈ R d .  [33][34][35][36]) and emphasize the role of the boundary terms when the problem is restricted to a ball. The major advantage of self-similar variables is that we control the sign of these boundary terms. Such computations can be traced back to [12,11,13] and are directly inspired by the carré du champ or Bakry-Emery method introduced in [2]. The algebra is slightly more involved than the one of Section 2.1 because of the presence of a drift term. The main advantage of this framework is that boundary terms have a definite sign, which is important in preparation of the computations of Section 3, in the weighted case. For a while we will consider Eq. (16) written on ball B R instead of R d , with no-flux boundary condition. Let u = u(τ, x) be a solution of (17) ∂u ∂τ where B R is a centered ball in R d with radius R > 0, and assume that u satisfies no-flux boundary conditions On ∂B R , ω = x/|x| denotes the unit outgoing normal vector to ∂B R . We define so that Eq. (17) can be rewritten with its boundary conditions as We recall that m ∈ [m 1 , 1) where m 1 = 1 − 1/d. It is straightforward to check that With these definitions, the time-derivative of relative Fisher information can be computed as using the above equations. By definition of z, we have Let us denote by dσ the measure induced by Lebesgue's measure on ∂B R . Taking into account the boundary condition z · ω = 0 on ∂B R , we integrate by parts and get Using the elementary identity 1 2 ∆|∇q| 2 = D 2 q 2 + ∇q · ∇∆q , with q := u m−1 − 1 − |x| 2 so that z = ∇q, we get that Moreover, since z · ω = 0 on ∂B R , we know from [29, Lemma 5.2], [34,Proposition 4.2] or [30] (also see [25] or [32,Lemma A.3]) that ∂B R u m ω · ∇|z| 2 dσ ≤ 0.
Therefore, we have shown that where, in the last step, we use the fact that For any m ∈ [m 1 , 1), this establishes that This allows us to prove a result similar to the one of Corollary 6. The relative entropy according to [15,5]. We deduce that It turns out that for all τ ≥ 0, For functions on R d , let us define the relative entropy the relative Fisher information if u solves (16) with initial datum u 0 ∈ L 1 + (R d ) such that u m 0 and u 0 |x| 2 are integrable.
Proof. To prove the result, one has to approximate a solution of (16) by the solution of (17) on the centered ball B R of radius R, and extend it to R d \ B R by B ⋆ . By passing to the limit as R → +∞, the result follows.
To conclude this subsection, let us list a few comments.
(1) The method of entropy -entropy production method in rescaled variables is not as accurate as the method of Rényi entropy powers. Boundary terms have a sign and can be dropped, but at the end we get an inequality instead of an equality. On the other hand, the method is very robust and applicable not only to large balls but also to any convex domain. This is the method that we shall extend to the case of the weighted evolution equation in Section 3.
q+1 . This is a non scale-invariant, but optimal, form of the Gagliardo-Nirenberg inequality, as shown in [15]. The inequality written with w replaced by M m−1/2 ⋆ w/ w 2q is, after optimization under scalings, equivalent to inequality (1). However R[u] is not invariant under scaling. In order to replace the improved inequality of Proposition 7 by an improved inequality similar to the one of Corollary 6, one should use delicate scaling properties involving the best matching Barenblatt instead of B ⋆ . See [26,27] for further considerations in this direction.
(3) An interesting remark which is important in our computations and results is that for any function p, As a consequence, the remainder terms in the entropy -entropy production method in rescaled variables are very similar to the remainder terms in the Rényi entropy powers method, and the |x| 2 term plays essentially no role.
2.3. The two methods are identical. The computations of Sections 2.1 and 2.2 look similar and are actually the same, if we do not take into consideration the boundary terms. Let us give some details.

A computation based on the time-dependent rescaling.
If v is a solution of (2), then the function u defined by the time-dependent rescaling (15) solves (16). With the choice R 0 = 1/κ, the initial data are identical As in Section 2.2, let us define z(x, τ) := ∇u m−1 − 2 x and consider the relative Fisher information • If m = m 1 = 1 − 1 d , then 1−m m d = 1 m and, by undoing the time-dependent rescaling (15), we obtain that is nonpositive by Corollary 4. By the computations of Section 2.1, we obtain that • If m ∈ [m 1 , 1), we observe that on the one hand, and Corollary 4 on the other hand, we end up with Notice that R ⋆ [u] does not depend on τ explicitly because, according to the time-dependent rescaling (15), This can be rewritten as With these considerations, we obtain an improvement of Proposition 7, which goes as follows.
if u solves (16) with initial datum u 0 ∈ L 1 + (R d ) such that u m 0 and u 0 |x| 2 are integrable.

A direct computation in rescaled variables.
Although this is equivalent to the computations of the previous subsection, it is instructive to redo the computation in the rescaled variables. Let us define p := u m−1 and observe that it solves ∂p ∂τ For simplicity, we consider only the case m = m 1 and observe that If we write that P := m 1−m p, then the r.h.s. can be rewritten as and we are back to the computations of Section 2.1. Using (13) with v = u and P = m if u solves (16). This concludes the section on Gagliardo-Nirenberg inequalities (1) and fast diffusion equations (2). So far proofs are rigorous. From now on, we shall work with weights, that is, on Caffarelli-Kohn-Nirenberg inequalities (3) and weighted parabolic equations (5), and assume that integrations by parts can be carried out at x = 0 without any precaution. Corresponding results will henceforth be formal.

THE CASE OF THE WEIGHTED DIFFUSION EQUATION
In order to study the weighted evolution equation (5), it is convenient to introduce as in [21,22] a change of variables which amounts to rephrase our problem in a space of higher, artificial dimension n ≥ d (here n is a dimension at least from the point of view of the scaling properties), or to be precise to consider a weight |x| n−d which is the same in all norms. With we claim that Inequality (3) can be rewritten for a function W such that where ∂ r = ∂/∂ r and ∇ ω is the gradient in the angular derivatives, ω ∈ S d −1 . The optimal constant K α,n,p is explicitly computed in terms of C β,γ,p and the condition (4) is equivalent to By our change of variables, w ⋆ is changed into The symmetry condition (8) now reads For any α ≥ 1, note that the operator D α can be rewritten as If D * α is the adjoint operator of D α , with respect to the measure dµ n := r n−1 dr dω, then D * α Z = − |x| δ ∇ · (|x| −δ Z ) − (α − 1) r 1−n ω · ∂ r (r n−1 Z ) for any vector-valued function Z and moreover we have the useful formula if W and Z are respectively scalar-and vector-valued functions. Let us define the operator L α by where ∆ ω denotes the Laplace-Beltrami operator on S d −1 .
We introduce the weighted equation ∂g ∂t which is obtained from (5) by the change of variables Next we use a self-similar change of variables similar to (15), but with a scaling which corresponds to the artificial dimension n. With µ = 2 + n (m − 1) and κ = 2 m 1−m 1/µ , let (18) g (t , x) = 1 κ n R n u τ, We observe that µ ⋆ = αµ with the notations of (9) in Section 1.2.
In self-similar variables the function u solves (19) ∂u ∂τ The exponent m is now in the range m 1 ≤ m < 1 with m 1 = 1 − 1/n. As in the case without weights, i.e. the case n = d, we also consider the problem restricted to a ball B R and assume no-flux boundary conditions, that is, It is straightforward to check that ∂z ∂τ and, as a consequence, Taking into account the boundary condition z · ω = 0 on ∂B R , a first integration by parts shows that Hence we get Now, by expanding and integrating by parts we see that Integrating again by parts, we obtain So, after observing that u z = m−1 Next, since z · ω = 0 on ∂B R , exactly for the same reasons as in Section 2.2, we know that Let us define p := u m−1 and By [22,Lemma 4.2], we know that As a consequence, we can write that We also know that where, according to [22,Lemma 4.3] (see details in the proof), for some explicit constants a and b which depend only on α, n and d, Q[p] is such that Here g denotes the standard metric on S d −1 . The case d = 2 has to be treated separately. According to [22,Lemma 4.3], there exists also an explicit constant, that we still denote by b, such that Collecting these observations, we have shown that Since x · D α = αr ∂ r , x · z = αr ∂ r q, x · ∇ ω = 0, and z · ∂ r z = z · ∂ r (D α q) = z · D α ∂ r q − 1 r 2 |∇ ω q| 2 , we have that After observing that 1 We can extend the function u outside B R by the function B α and pass to the limit as R goes to +∞.
This inequality implies (3) in a non scale-invariant form (as in Section 2.2 when there are no weights), but also provides an additional integral remainder term. With P = m 1−m g m−1 and v(t , x) = g t , r α−1 x , r = |x|, let us define then then there are two positive constants C 1 , C 2 , and a constant b, depending only on β, γ and d such that the following property holds. Assume that v 0 satisfies v 0 1,γ = M ⋆ and that there exist two positive constants C 1 and C 2 such that (11) holds. Let us consider a positive solution of (5) with initial datum v 0 such that v is smooth at x = 0 for any t ≥ 0. Then, with the above notations, we have for any t ≥ 0. Here µ and h are given by (10).
The expressions of the constants b, C 1 and C 2 are explicit. See [22] for details. Since α < α FS is equivalent to β < β FS and Theorem 2 is a straightforward consequence of Theorem 9. In the opposite direction, by keeping all terms in Q[p], it is possible to give a sharper estimate than (20), which has however no simple expression.
Under the above assumptions, Theorem 1 is a consequence of Theorem 9. Indeed, if we take w 2p = v 0 , then we know that d by differentiating (20) at t = 0, so that R[v 0 ] = 0, and this is enough to conclude. We can also notice that (20) implies (3) simply by dropping the right hand side and using a density argument, if necessary.
Proof of Theorem 9. For any solution v of (5), we apply the above computations with v(t , x) = g t , |x| α−1 x and u given by (18). Let us observe that if R and h are given by (10) and (18). It is then enough to undo the above changes of variables to obtain (20).

LINEARIZATION AND OPTIMALITY
4.1. The linearized fast diffusion flow and the spectral gap. Let us perform a formal linearization of (19) around the Barenblatt profile B α by considering a solution u ε with mass R d u ε d x = M ⋆ such that u ε = B α 1 + ε f B 1−m α , and by taking formally the limit as ε → 0. We obtain that f solves We define the scalar products  According to [7], we know that f 1 ∈ Y ⊂ X , so that 〈〈 f 1 , f 1 〉〉 = − 〈 f 1 , L α f 1 〉 = λ 1 〈 f 1 , f 1 〉 and f 1 yields the equality case in the Hardy-Poincaré inequality 〈〈g , g 〉〉 = − 〈g , L α g 〉 ≥ λ 1 g −ḡ 2 ∀ g ∈ Y .
See [5,6] and [16,17] for more details on the results in X and Y respectively. It has been observed in [6] that the operator L α on X and its restriction to Y are unitarily equivalent when (α, n) = (1, d) and the extension to the general case is straightforward. The kernel of L α is generated by f 0 (x) = 1, and the eigenspaces corresponding to the next two eigenvalues are generated by f 1,k (x) = x k and f 2 (x) = |x| 2 − c, for some explicit constant c. If (β, γ) = (0, 0), the eigenvalues λ 1 and λ 2 are strictly ordered if 1 − 1/d < m < 1 and coincide if m = 1 − 1/d, but the spectrum is more complicated in the general case: see [7,Appendix B] for details. The key observation for our analysis is the fact that (21) λ 1 ≥ 4 ⇐⇒ α ≤ α FS := d − 1 n − 1 .

Symmetry breaking in Caffarelli-Kohn-Nirenberg inequalities.
It has been shown in [22] that the best constant in (3) is determined by the infimum of 4.2.4. Optimality of the information -production of information inequality. From Section 4.2.2, we know that the infimum of K /I is achieved in the asymptotic regime as u → B α and determined by the spectral gap of L α when λ 1 = 4. This covers in particular the case without weights of the Gagliardo-Nirenberg inequalities (1) and of the fast diffusion equation (2) studied in Section 2.1. We also know that if α > α FS , and that if α < α FS . The inequality is strict because, otherwise, if 4 was optimal, it would be achieved in the asymptotic regime and therefore would be equal to λ 1 > 4, a contradiction.

4.2.5.
Optimality of the entropy -production of entropy inequality. Arguing as in Section 4.2.3, we know that As a consequence, we have that With u ε = B α 1 + ε f B 1−m α , we observe that If α = α FS , then λ 1 = 4 = C 1 = C 2 . Again this covers in particular the case without weights of the Gagliardo-Nirenberg inequalities (1) and of the fast diffusion equation (2). If α < α FS , then C 1 ≥ C 2 > 4. Conversely, if α > α FS , then C 1 ≤ λ 1 < 4. We know from [7] that C 1 > 0, and also that the optimal constant is achieved, but the precise value of C 1 is so far unknown.