Improving Generalization of Reinforcement Learning with Minimax Distributional Soft Actor-Critic

Yangang Ren; Jingliang Duan; Yang Guan; S. Li

DOI:10.1109/ITSC45102.2020.9294300
Corpus ID: 211096594

Improving Generalization of Reinforcement Learning with Minimax Distributional Soft Actor-Critic

@article{Ren2020ImprovingGO,
  title={Improving Generalization of Reinforcement Learning with Minimax Distributional Soft Actor-Critic},
  author={Yangang Ren and Jingliang Duan and Yang Guan and Shengbo Eben Li},
  journal={2020 IEEE 23rd International Conference on Intelligent Transportation Systems (ITSC)},
  year={2020},
  pages={1-6},
  url={https://api.semanticscholar.org/CorpusID:211096594}
}

Yangang RenJingliang Duan S. Li
Published in IEEE 23rd International… 13 February 2020
Computer Science
2020 IEEE 23rd International Conference on Intelligent Transportation Systems (ITSC)

The minimax formulation and distributional framework are introduced to improve the generalization ability of RL algorithms and the Minimax Distributional Soft Actor-Critic (Minimax DSAC) algorithm is developed.

[PDF] Semantic Reader

Figures and Tables from this paper

Topics

Reinforcement Learning Generalization Ability State-action Return Distribution Autonomous Vehicles Soft Actor-Critic Safety-critical Systems Sequential Decision Making Generalization Action-value Function Adversary Policy

Distributional Soft Actor-Critic With Three Refinements

Jingliang DuanWenxuan Wang Keqiang Li

Computer Science

IEEE Transactions on Pattern Analysis and Machine…

2025

Three key refinements to DSACv1 are introduced to overcome limitations and further improve Q-value estimation accuracy: expected value substitution, twin value distribution learning, and variance-based critic gradient adjustment.

[PDF]

DSAC-T: Distributional Soft Actor-Critic with Three Refinements

Jingliang DuanWenxuan WangLiming XiaoJiaxin GaoS. Li

Computer Science

arXiv.org

2023

An off-policy RL algorithm, called distributional soft actor-critic (DSAC or DSAC-v1), which can effectively improve the value estimation accuracy by learning a continuous Gaussian value distribution, and its performances are systematically evaluated on a diverse set of benchmark tasks.

Robust Reinforcement Learning for Shifting Dynamics During Deployment

S. StantonRasool FakoorJonas MuellerAndrew Gordon WilsonAlex Smola

Computer Science

2021

A new adversarial variant of soft actor-critic is proposed, which produces policies on Mujoco continuous control tasks that are simultaneously more robust across various environment shifts, such as changes to friction and body mass.

Smoothing Policy Iteration for Zero-sum Markov Games

Yangang RenYao LyuWenxuan WangSheng LiZeyang LiJingliang Duan

Computer Science, Mathematics

arXiv.org

2022

The smoothing policy iteration (SPI) algorithm is proposed to solve the zero-sum MGs approximately, where the maximum operator is replaced by the weighted LogSumExp (WLSE) function to obtain the nearly optimal equilibrium policies.

[PDF]

Multi-Style Distributional Soft Actor-Critic: Learning a Unified Policy for Diverse Control Behaviors

Liming XiaoYao Lyu Jingliang Duan

Computer Science, Engineering

IEEE Transactions on Intelligent Vehicles

2025

This work proposes the multi-style distributional soft actor-critic (M-DSAC) algorithm, capable of learning a single policy that supports multiple control behaviors, and develops a multi-style policy iteration (MPI) framework that learns the entire distribution of returns, known as the value distribution.

A Survey of Generalisation in Deep Reinforcement Learning

Robert KirkAmy ZhangEdward GrefenstetteTim Rocktaschel

Computer Science

arXiv.org

2021

It is argued that taking a purely procedural content generation approach to benchmark design is not conducive to progress in generalisation, and fast online adaptation and tackling RL-speciﬁc problems as some areas for future work on methods for generalisation are suggested.

Model-Based Chance-Constrained Reinforcement Learning via Separated Proportional-Integral Lagrangian

Baiyu PengJingliang Duan Enxin Sun

Computer Science

IEEE Transactions on Neural Networks and Learning…

2024

This article proposes a separated proportional-integral Lagrangian (SPIL) algorithm that can reduce the oscillations and conservatism of RL policy in a car-following simulation and applies it to a real-world mobile robot navigation task.

[PDF]

A Survey of Zero-shot Generalisation in Deep Reinforcement Learning

Robert KirkAmy ZhangEdward GrefenstetteTim Rocktäschel

Computer Science

Journal of Artificial Intelligence Research

2023

It is argued that taking a purely procedural content generation approach to benchmark design is not conducive to progress in ZSG, and it is suggested fast online adaptation and tackling RL-specific problems as some areas for future work on methods for ZSG.

[PDF]

Feasible Actor-Critic: Constrained Reinforcement Learning for Ensuring Statewise Safety

Haitong MaYang GuanShegnbo Eben LiXiangteng ZhangSifa ZhengJianyu Chen

Computer Science

arXiv.org

2021

The feasible actor-critic (FAC) algorithm is introduced, which is the first model-free constrained RL method that considers statewise safety, e.g, safety for each initial state, and theoretical guarantees that FAC outperforms previous expectation-based constrained RL methods in terms of both constraint satisfaction and reward optimization.

[PDF]

Improve Generalization of Driving Policy at Signalized Intersections with Adversarial Learning

Yangang RenGuojian ZhanLiye TangS. LiJianhua JiangJingliang Duan

Computer Science, Engineering

Transportation Research Part C: Emerging…

2023

An adversarial learning paradigm to boost the intelligence and robustness of driving policy for signalized intersections with dense traffic flow is introduced and enables a large-margin improvement of the resistance to the abnormal behaviors and thus ensures a high safety level for the autonomous vehicle.

[PDF]

Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor

Tuomas HaarnojaAurick ZhouP. AbbeelS. Levine

Computer Science

International Conference on Machine Learning

2018

This paper proposes soft actor-critic, an off-policy actor-Critic deep RL algorithm based on the maximum entropy reinforcement learning framework, and achieves state-of-the-art performance on a range of continuous control benchmark tasks, outperforming prior on-policy and off- policy methods.

8,812

[PDF]

Risk Averse Robust Adversarial Reinforcement Learning

Xinlei PanDaniel SeitaYang GaoJ. Canny

Computer Science, Engineering

IEEE International Conference on Robotics and…

2019

It is shown through experiments that a risk-averse agent is better equipped to handle arisk-seeking adversary, and experiences substantially fewer crashes compared to agents trained without an adversary.

[PDF]

Robust Adversarial Reinforcement Learning

Lerrel PintoJames DavidsonR. SukthankarA. Gupta

Computer Science, Engineering

International Conference on Machine Learning

2017

RARL is proposed, where an agent is trained to operate in the presence of a destabilizing adversary that applies disturbance forces to the system and the jointly trained adversary is reinforced - that is, it learns an optimal destabilization policy.

[PDF]

Addressing Value Estimation Errors in Reinforcement Learning with a State-Action Return Distribution Function

Jingliang DuanYang GuanYangang RenS. LiB. Cheng

Computer Science

arXiv.org

2020

This work combines the distributional return function within the maximum entropy RL framework in order to develop what it calls the Distributional Soft Actor-Critic algorithm, DSAC, which is an off-policy method for continuous control setting and proposes a new Parallel Asynchronous Buffer-Actor-Learner architecture to improve the learning efficiency.

Risk-Sensitive Reinforcement Learning: A Constrained Optimization Viewpoint

A. PrashanthL.M. Fu

Computer Science

arXiv.org

2018

This article focuses on the combination of risk criteria and reinforcement learning in a constrained optimization framework, i.e., a setting where the goal to find a policy that optimizes the usual objective of infinite-horizon discounted/average cost, while ensuring that an explicit risk constraint is satisfied.

Robust Reinforcement Learning

Jun MorimotoKenji Doya

Computer Science

Neural Computation

2005

A new reinforcement learning paradigm that explicitly takes into account input disturbance as well as modeling errors is proposed, which is called robust reinforcement learning (RRL) and tested on the control task of an inverted pendulum.

Assessing Generalization in Deep Reinforcement Learning

Charles PackerKatelyn GaoJernej KosPhilipp KrähenbühlV. KoltunD. Song

Computer Science

arXiv.org

2018

The key finding is that `vanilla' deep RL algorithms generalize better than specialized schemes that were proposed specifically to tackle generalization.

[PDF]

Continuous control with deep reinforcement learning

T. LillicrapJonathan J. Hunt D. Wierstra

Computer Science

International Conference on Learning…

2016

This work presents an actor-critic, model-free algorithm based on the deterministic policy gradient that can operate over continuous action spaces, and demonstrates that for many of the tasks the algorithm can learn policies end-to-end: directly from raw pixel inputs.

13,606

[PDF]

Investigating Generalisation in Continuous Deep Reinforcement Learning

Chenyang ZhaoOlivier SigaudF. StulpTimothy M. Hospedales

Computer Science

arXiv.org

2019

It is shown that, if generalisation is the goal, then common practice of evaluating algorithms based on their training performance leads to the wrong conclusions about algorithm choice, and a new benchmark and thorough empirical evaluation of generalisation challenges for state of the art Deep RL methods are provided.

[PDF]

Composable Deep Reinforcement Learning for Robotic Manipulation

Tuomas HaarnojaVitchyr H. PongAurick ZhouMurtaza DalalP. AbbeelS. Levine

Computer Science, Engineering

IEEE International Conference on Robotics and…

2018

This paper shows that policies learned with soft Q-learning can be composed to create new policies, and that the optimality of the resulting policy can be bounded in terms of the divergence between the composed policies.

[PDF]

Improving Generalization of Reinforcement Learning with Minimax Distributional Soft Actor-Critic

Ask This Paper

Ask a question about " "

Supporting Statements

Figures and Tables from this paper

Topics

Distributional Soft Actor-Critic With Three Refinements

DSAC-T: Distributional Soft Actor-Critic with Three Refinements

Robust Reinforcement Learning for Shifting Dynamics During Deployment

Smoothing Policy Iteration for Zero-sum Markov Games

Multi-Style Distributional Soft Actor-Critic: Learning a Unified Policy for Diverse Control Behaviors

A Survey of Generalisation in Deep Reinforcement Learning

Model-Based Chance-Constrained Reinforcement Learning via Separated Proportional-Integral Lagrangian

A Survey of Zero-shot Generalisation in Deep Reinforcement Learning

Feasible Actor-Critic: Constrained Reinforcement Learning for Ensuring Statewise Safety

Improve Generalization of Driving Policy at Signalized Intersections with Adversarial Learning

Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor

Risk Averse Robust Adversarial Reinforcement Learning

Robust Adversarial Reinforcement Learning

Addressing Value Estimation Errors in Reinforcement Learning with a State-Action Return Distribution Function

Risk-Sensitive Reinforcement Learning: A Constrained Optimization Viewpoint

Robust Reinforcement Learning

Assessing Generalization in Deep Reinforcement Learning

Continuous control with deep reinforcement learning

Investigating Generalisation in Continuous Deep Reinforcement Learning

Composable Deep Reinforcement Learning for Robotic Manipulation

Improving Generalization of Reinforcement Learning with Minimax Distributional Soft Actor-Critic

Ask This Paper

Ask a question about " "

Supporting Statements

Figures and Tables from this paper

Topics

31 Citations

Distributional Soft Actor-Critic With Three Refinements

DSAC-T: Distributional Soft Actor-Critic with Three Refinements

Robust Reinforcement Learning for Shifting Dynamics During Deployment

Smoothing Policy Iteration for Zero-sum Markov Games

Multi-Style Distributional Soft Actor-Critic: Learning a Unified Policy for Diverse Control Behaviors

A Survey of Generalisation in Deep Reinforcement Learning

Model-Based Chance-Constrained Reinforcement Learning via Separated Proportional-Integral Lagrangian

A Survey of Zero-shot Generalisation in Deep Reinforcement Learning

Feasible Actor-Critic: Constrained Reinforcement Learning for Ensuring Statewise Safety

Improve Generalization of Driving Policy at Signalized Intersections with Adversarial Learning

28 References

Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor

Risk Averse Robust Adversarial Reinforcement Learning

Robust Adversarial Reinforcement Learning

Addressing Value Estimation Errors in Reinforcement Learning with a State-Action Return Distribution Function

Risk-Sensitive Reinforcement Learning: A Constrained Optimization Viewpoint

Robust Reinforcement Learning

Assessing Generalization in Deep Reinforcement Learning

Continuous control with deep reinforcement learning

Investigating Generalisation in Continuous Deep Reinforcement Learning

Composable Deep Reinforcement Learning for Robotic Manipulation

Related Papers