Variance Reduction in Actor Critic Methods (ACM)
Benhamou, Éric (2019), Variance Reduction in Actor Critic Methods (ACM). https://basepub.dauphine.fr/handle/123456789/21200
TypeDocument de travail / Working paper
External document linkhttps://hal.archives-ouvertes.fr/hal-02886487
Series titlePreprint Lamsade
MetadataShow full item record
Laboratoire d'analyse et modélisation de systèmes pour l'aide à la décision [LAMSADE]
Abstract (EN)After presenting Actor Critic Methods (ACM), we show ACM are control variate estimators. Using the projection theorem, we prove that the Q and Advantage Actor Critic (A2C) methods are optimal in the sense of the L 2 norm for the control variate estima-tors spanned by functions conditioned by the current state and action. This straightforward application of Pythagoras theorem provides a theoretical justification of the strong performance of QAC and AAC most often referred to as A2C methods in deep policy gradient methods. This enables us to derive a new formulation for Advantage Actor Critic methods that has lower variance and improves the traditional A2C method.
Subjects / KeywordsActor critic method; Variance reduction; Projection; Deep RL
Showing items related by title and author.