
Variance Reduction in Actor Critic Methods (ACM)
Benhamou, Éric (2019), Variance Reduction in Actor Critic Methods (ACM). https://basepub.dauphine.fr/handle/123456789/21200
View/ Open
Type
Document de travail / Working paperExternal document link
https://hal.archives-ouvertes.fr/hal-02886487Date
2019Series title
Preprint LamsadePublished in
Paris
Metadata
Show full item recordAuthor(s)
Benhamou, ÉricLaboratoire d'analyse et modélisation de systèmes pour l'aide à la décision [LAMSADE]
Abstract (EN)
After presenting Actor Critic Methods (ACM), we show ACM are control variate estimators. Using the projection theorem, we prove that the Q and Advantage Actor Critic (A2C) methods are optimal in the sense of the L 2 norm for the control variate estima-tors spanned by functions conditioned by the current state and action. This straightforward application of Pythagoras theorem provides a theoretical justification of the strong performance of QAC and AAC most often referred to as A2C methods in deep policy gradient methods. This enables us to derive a new formulation for Advantage Actor Critic methods that has lower variance and improves the traditional A2C method.Subjects / Keywords
Actor critic method; Variance reduction; Projection; Deep RLRelated items
Showing items related by title and author.
-
Benhamou, Éric (2019) Document de travail / Working paper
-
Benhamou, Eric (2018) Article accepté pour publication ou publié
-
Benhamou, Eric; Guez, Beatrice; Paris, Nicolas (2018) Article accepté pour publication ou publié
-
Benhamou, Eric (2018) Document de travail / Working paper
-
Benhamou, Eric (2018) Document de travail / Working paper