Reinforcement Learning for Sequential Decision and Optimal Control
@article{Li2023ReinforcementLF, title={Reinforcement Learning for Sequential Decision and Optimal Control}, author={Sheng Li}, journal={Reinforcement Learning for Sequential Decision and Optimal Control}, year={2023}, url={https://api.semanticscholar.org/CorpusID:257928563} }
76 Citations
Feasible Policy Iteration With Guaranteed Safe Exploration
- 2025
Computer Science
A feasible policy iteration framework that can guarantee absolute safety during online exploration, i.e., constraint violations never happen in real-world interactions is proposed, suggesting a substantial potential for applying feasible policy iteration in real-world tasks, enabling the online evolution of intricate systems.
LipsNet: A Smooth and Robust Neural Network with Adaptive Lipschitz Constant for High Accuracy Optimal Control
- 2023
Computer Science, Engineering
This work proposes a neural network named LipsNet, which addresses the action ๏ฌuctuation problem at network level rather than algorithm level, which can serve as actor networks in most RL algorithms, making it more accessible and user-friendly than previous works.
Soft Actor-Critic Deep Reinforcement Learning for Train Timetable Collaborative Optimization of Large-Scale Urban Rail Transit Network Under Dynamic Demand
- 2025
Engineering, Computer Science
An adaptive real-time control framework based on the Soft Actor-Critic (SAC) deep reinforcement learning (DRL) method, featuring flexible train scheduling capabilities is proposed, showing superior performance compared to other reinforcement learning algorithms and traditional heuristic optimization algorithms.
Model-Free Safe Reinforcement Learning Through Neural Barrier Certificate
- 2023
Computer Science, Engineering
A model-free safe RL algorithm that achieves near-zero constraint violations and high performance compared to the baselines is proposed, and the learned barrier certificates successfully identify the feasible regions on multiple tasks.
Smoothing Policy Iteration for Zero-sum Markov Games
- 2022
Computer Science, Mathematics
The smoothing policy iteration (SPI) algorithm is proposed to solve the zero-sum MGs approximately, where the maximum operator is replaced by the weighted LogSumExp (WLSE) function to obtain the nearly optimal equilibrium policies.
Approximate Optimal Filter Design for Vehicle System through Actor-Critic Reinforcement Learning
- 2022
Engineering, Computer Science
This paper proposes to approximate the optimal filter gain by considering the effect factors within infinite time horizon, on the basis of estimation-control duality, and shows that the obtained filter policy via RL with different discount factors can converge to theoretical optimal gain with an error within 5%,.
Zeroth-Order ActorโCritic: An Evolutionary Framework for Sequential Decision Problems
- 2025
Computer Science
A novel evolutionary framework zeroth-order actor-critic (ZOAC) is proposed, which uses stepwise exploration in parameter space and theoretically derive the zeroth-order policy gradient to effectively leverage the Markov property of SDPs and reduce the variance of gradient estimators.
MA-HRL: Multi-Agent Hierarchical Reinforcement Learning for Medical Diagnostic Dialogue Systems
- 2025
Medicine, Computer Science
This work proposes MA-HRL, a multi-agent hierarchical reinforcement learning framework that decomposes the diagnostic task into specialized agents, and designs an information entropy-based reward function that encourages agents to acquire maximally informative symptoms.
AI-Driven Optimization Framework for Smart EV Charging Systems Integrated with Solar PV and BESS in High-Density Residential Environments
- 2025
Engineering, Environmental Science
The rapid growth of electric vehicle (EV) adoption necessitates advanced energy management strategies to ensure sustainable, reliable, and efficient operation of charging infrastructure. This studyโฆ
Boosting Exploration in Reinforcement Learning for Sparse Reward Tasks
- 2025
Computer Science
This paper proposes a Distributional Soft Actor-Critic algorithm for Sparse reward tasks (DSAC-S), which comprises three modules: dual-policy guided exploration, curiosity signal learning, and actor-critic training.