• xmlui.mirage2.page-structure.header.title
    • français
    • English
  • Help
  • Login
  • Language 
    • Français
    • English
View Item 
  •   BIRD Home
  • LAMSADE (UMR CNRS 7243)
  • LAMSADE : Publications
  • View Item
  •   BIRD Home
  • LAMSADE (UMR CNRS 7243)
  • LAMSADE : Publications
  • View Item
JavaScript is disabled for your browser. Some features of this site may not work without it.

Browse

BIRDResearch centres & CollectionsBy Issue DateAuthorsTitlesTypeThis CollectionBy Issue DateAuthorsTitlesType

My Account

LoginRegister

Statistics

Most Popular ItemsStatistics by CountryMost Popular Authors
Thumbnail

Similarities between policy gradient methods (PGM) in reinforcement learning (RL) and supervised learning (SL)

Benhamou, Éric (2019), Similarities between policy gradient methods (PGM) in reinforcement learning (RL) and supervised learning (SL). https://basepub.dauphine.fr/handle/123456789/21202

View/Open
policy_gradient.pdf (345.7Kb)
Type
Document de travail / Working paper
External document link
https://hal.archives-ouvertes.fr/hal-02886505
Date
2019
Series title
Preprint Lamsade
Published in
Paris
Metadata
Show full item record
Author(s)
Benhamou, Éric
Laboratoire d'analyse et modélisation de systèmes pour l'aide à la décision [LAMSADE]
Abstract (EN)
Reinforcement learning (RL) is about sequential decision making and is traditionally opposed to supervised learning (SL) and unsupervised learning (USL). In RL, given the current state, the agent makes a decision that may influence the next state as opposed to SL (and USL) where, the next state remains the same, regardless of the decisions taken, either in batch or on-line learning. Although this difference is fundamental between SL and RL, there are connections that have been overlooked. In particular, we prove in this paper that gradient policy method can be cast as a supervised learning problem where true label are replaced with discounted rewards. We provide a new proof of policy gradient methods (PGM) that emphasizes the tight link with the cross entropy and supervised learning. We provide a simple experiment where we interchange label and pseudo rewards. We conclude that other relationships with SL could be made if we modify the reward functions wisely.
Subjects / Keywords
Policy gradient; Supervised learning; Cross entropy; Kullback Leibler divergence; entropy

Related items

Showing items related by title and author.

  • Thumbnail
    Bridging the gap between Markowitz planning and deep reinforcement learning 
    Benhamou, Éric; Saltiel, David; Ungari, Sandrine; Mukhopadhyay, Abhishek (2020) Document de travail / Working paper
  • Thumbnail
    Trade Selection with Supervised Learning and OCA 
    Saltiel, David; Benhamou, Eric (2018) Document de travail / Working paper
  • Thumbnail
    Trade Selection with Supervised Learning and Optimal Coordinate Ascent (OCA) 
    Saltiel, David; Benhamou, Eric; Laraki, Rida; Atif, Jamal (2021) Communication / Conférence
  • Thumbnail
    Time your hedge with Deep Reinforcement Learning 
    Benhamou, Éric; Saltiel, David; Ungari, Sandrine; Mukhopadhyay, Abhishek (2020) Document de travail / Working paper
  • Thumbnail
    Distinguish the indistinguishable: a Deep Reinforcement Learning approach for volatility targeting models 
    Benhamou, Éric; Saltiel, David; Tabachnik, Serge; Wong, Sui Kai; Chareyron, François (2021) Document de travail / Working paper
Dauphine PSL Bibliothèque logo
Place du Maréchal de Lattre de Tassigny 75775 Paris Cedex 16
Phone: 01 44 05 40 94
Contact
Dauphine PSL logoEQUIS logoCreative Commons logo