Improving Generalization of Reinforcement Learning with Minimax Distributional Soft Actor-Critic
@article{Ren2020ImprovingGO, title={Improving Generalization of Reinforcement Learning with Minimax Distributional Soft Actor-Critic}, author={Yangang Ren and Jingliang Duan and Yang Guan and Shengbo Eben Li}, journal={2020 IEEE 23rd International Conference on Intelligent Transportation Systems (ITSC)}, year={2020}, pages={1-6}, url={https://api.semanticscholar.org/CorpusID:211096594} }
The minimax formulation and distributional framework are introduced to improve the generalization ability of RL algorithms and the Minimax Distributional Soft Actor-Critic (Minimax DSAC) algorithm is developed.
Topics
Reinforcement Learning (opens in a new tab)Generalization Ability (opens in a new tab)State-action Return Distribution (opens in a new tab)Autonomous Vehicles (opens in a new tab)Soft Actor-Critic (opens in a new tab)Safety-critical Systems (opens in a new tab)Sequential Decision Making (opens in a new tab)Generalization (opens in a new tab)Action-value Function (opens in a new tab)Adversary Policy (opens in a new tab)
31 Citations
Distributional Soft Actor-Critic With Three Refinements
- 2025
Computer Science
Three key refinements to DSACv1 are introduced to overcome limitations and further improve Q-value estimation accuracy: expected value substitution, twin value distribution learning, and variance-based critic gradient adjustment.
DSAC-T: Distributional Soft Actor-Critic with Three Refinements
- 2023
Computer Science
An off-policy RL algorithm, called distributional soft actor-critic (DSAC or DSAC-v1), which can effectively improve the value estimation accuracy by learning a continuous Gaussian value distribution, and its performances are systematically evaluated on a diverse set of benchmark tasks.
Robust Reinforcement Learning for Shifting Dynamics During Deployment
- 2021
Computer Science
A new adversarial variant of soft actor-critic is proposed, which produces policies on Mujoco continuous control tasks that are simultaneously more robust across various environment shifts, such as changes to friction and body mass.
Smoothing Policy Iteration for Zero-sum Markov Games
- 2022
Computer Science, Mathematics
The smoothing policy iteration (SPI) algorithm is proposed to solve the zero-sum MGs approximately, where the maximum operator is replaced by the weighted LogSumExp (WLSE) function to obtain the nearly optimal equilibrium policies.
Multi-Style Distributional Soft Actor-Critic: Learning a Unified Policy for Diverse Control Behaviors
- 2025
Computer Science, Engineering
This work proposes the multi-style distributional soft actor-critic (M-DSAC) algorithm, capable of learning a single policy that supports multiple control behaviors, and develops a multi-style policy iteration (MPI) framework that learns the entire distribution of returns, known as the value distribution.
A Survey of Generalisation in Deep Reinforcement Learning
- 2021
Computer Science
It is argued that taking a purely procedural content generation approach to benchmark design is not conducive to progress in generalisation, and fast online adaptation and tackling RL-speci๏ฌc problems as some areas for future work on methods for generalisation are suggested.
Model-Based Chance-Constrained Reinforcement Learning via Separated Proportional-Integral Lagrangian
- 2024
Computer Science
This article proposes a separated proportional-integral Lagrangian (SPIL) algorithm that can reduce the oscillations and conservatism of RL policy in a car-following simulation and applies it to a real-world mobile robot navigation task.
A Survey of Zero-shot Generalisation in Deep Reinforcement Learning
- 2023
Computer Science
It is argued that taking a purely procedural content generation approach to benchmark design is not conducive to progress in ZSG, and it is suggested fast online adaptation and tackling RL-specific problems as some areas for future work on methods for ZSG.
Feasible Actor-Critic: Constrained Reinforcement Learning for Ensuring Statewise Safety
- 2021
Computer Science
The feasible actor-critic (FAC) algorithm is introduced, which is the first model-free constrained RL method that considers statewise safety, e.g, safety for each initial state, and theoretical guarantees that FAC outperforms previous expectation-based constrained RL methods in terms of both constraint satisfaction and reward optimization.
Improve Generalization of Driving Policy at Signalized Intersections with Adversarial Learning
- 2023
Computer Science, Engineering
An adversarial learning paradigm to boost the intelligence and robustness of driving policy for signalized intersections with dense traffic flow is introduced and enables a large-margin improvement of the resistance to the abnormal behaviors and thus ensures a high safety level for the autonomous vehicle.
28 References
Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor
- 2018
Computer Science
This paper proposes soft actor-critic, an off-policy actor-Critic deep RL algorithm based on the maximum entropy reinforcement learning framework, and achieves state-of-the-art performance on a range of continuous control benchmark tasks, outperforming prior on-policy and off- policy methods.
Risk Averse Robust Adversarial Reinforcement Learning
- 2019
Computer Science, Engineering
It is shown through experiments that a risk-averse agent is better equipped to handle arisk-seeking adversary, and experiences substantially fewer crashes compared to agents trained without an adversary.
Robust Adversarial Reinforcement Learning
- 2017
Computer Science, Engineering
RARL is proposed, where an agent is trained to operate in the presence of a destabilizing adversary that applies disturbance forces to the system and the jointly trained adversary is reinforced - that is, it learns an optimal destabilization policy.
Addressing Value Estimation Errors in Reinforcement Learning with a State-Action Return Distribution Function
- 2020
Computer Science
This work combines the distributional return function within the maximum entropy RL framework in order to develop what it calls the Distributional Soft Actor-Critic algorithm, DSAC, which is an off-policy method for continuous control setting and proposes a new Parallel Asynchronous Buffer-Actor-Learner architecture to improve the learning efficiency.
Risk-Sensitive Reinforcement Learning: A Constrained Optimization Viewpoint
- 2018
Computer Science
This article focuses on the combination of risk criteria and reinforcement learning in a constrained optimization framework, i.e., a setting where the goal to find a policy that optimizes the usual objective of infinite-horizon discounted/average cost, while ensuring that an explicit risk constraint is satisfied.
Robust Reinforcement Learning
- 2005
Computer Science
A new reinforcement learning paradigm that explicitly takes into account input disturbance as well as modeling errors is proposed, which is called robust reinforcement learning (RRL) and tested on the control task of an inverted pendulum.
Assessing Generalization in Deep Reinforcement Learning
- 2018
Computer Science
The key finding is that `vanilla' deep RL algorithms generalize better than specialized schemes that were proposed specifically to tackle generalization.
Continuous control with deep reinforcement learning
- 2016
Computer Science
This work presents an actor-critic, model-free algorithm based on the deterministic policy gradient that can operate over continuous action spaces, and demonstrates that for many of the tasks the algorithm can learn policies end-to-end: directly from raw pixel inputs.
Investigating Generalisation in Continuous Deep Reinforcement Learning
- 2019
Computer Science
It is shown that, if generalisation is the goal, then common practice of evaluating algorithms based on their training performance leads to the wrong conclusions about algorithm choice, and a new benchmark and thorough empirical evaluation of generalisation challenges for state of the art Deep RL methods are provided.
Composable Deep Reinforcement Learning for Robotic Manipulation
- 2018
Computer Science, Engineering
This paper shows that policies learned with soft Q-learning can be composed to create new policies, and that the optimality of the resulting policy can be bounded in terms of the divergence between the composed policies.