Zap Q-Learning
- Adithya M. DevrajSean P. Meyn
- 2017
Computer Science, Mathematics
The Zap Q-learning algorithm introduced in this paper is an improvement of Watkins' original algorithm and recent competitors in several respects and suggests that the approach will lead to stable and efficient computation even for non-ideal parameterized settings.
Model-Free Primal-Dual Methods for Network Optimization with Application to Real-Time Optimal Power Flow
- Yue-Chun ChenA. BernsteinAdithya M. DevrajSean P. Meyn
- 28 September 2019
Engineering, Computer Science
This paper examines the problem of real-time optimization of networked systems and develops online algorithms that steer the system towards the optimal trajectory without explicit knowledge of the system model, and leverages the online zero-order primal-dual projected-gradient method.
Explicit Mean-Square Error Bounds for Monte-Carlo and Linear Stochastic Approximation
- Shuhang ChenAdithya M. DevrajAna BuลกiฤSean P. Meyn
- 7 February 2020
Mathematics, Computer Science
It is shown that mean square error achieves the optimal rate of $O(1/n)$, subject to conditions on the step-size sequence, which is of great value in algorithm design.
Fastest Convergence for Q-learning
- Adithya M. DevrajSean P. Meyn
- 12 July 2017
Computer Science, Mathematics
The Zap Q-learning algorithm introduced in this paper is an improvement of Watkins' original algorithm and recent competitors in several respects. It is a matrix-gain algorithm designed so that itsโฆ
The ODE Method for Asymptotic Statistics in Stochastic Approximation and Reinforcement Learning
- V. BorkarShuhang ChenAdithya M. DevrajIoannis KontoyiannisS. Meyn
- 27 October 2021
Computer Science, Mathematics
The main results now allow for parameter-dependent noise, as is often the case in applications to reinforcement learning.
Q-Learning With Uniformly Bounded Variance
- Adithya M. DevrajS. Meyn
- 24 February 2020
Computer Science, Mathematics
It is shown that the asymptotic covariance of the tabular Q-learning algorithm with an optimized step-size sequence is a quadratic function of a factor that goes to infinity, as discount factor approaches 1; an essentially known result.
On Matrix Momentum Stochastic Approximation and Applications to Q-learning
- Adithya M. DevrajAna BuลกiฤSean P. Meyn
- 1 September 2019
Computer Science, Mathematics
It is shown that the parameter estimates obtained from the PolSA algorithm couple with those of the optimal variance SNR algorithm, at a rate of O(1/n^{2})$, and numerical results confirm the coupling of PolSA and SNR.
Zap Q-Learning With Nonlinear Function Approximation
- Shuhang ChenAdithya M. DevrajAna BuลกiฤSean P. Meyn
- 11 October 2019
Computer Science
This class of algorithms is generalized in this paper, and stability is established under very general conditions, and this general result can be applied to a wide range of algorithms found in reinforcement learning.
Stochastic Variance Reduced Primal Dual Algorithms for Empirical Composition Optimization
- Adithya M. DevrajJianshu Chen
- 1 July 2019
Computer Science, Mathematics
This work reforms the original minimization objective into an equivalent min-max objective, which brings out all the empirical averages that are originally inside the nonlinear loss functions, and develops a stochastic primal-dual algorithm, SVRPDA-I, which is shown to converge at a linear rate when the problem is strongly convex.
Revisiting the ODE Method for Recursive Algorithms: Fast Convergence Using Quasi Stochastic Approximation
- Shuhang ChenAdithya M. DevrajA. BersteinS. Meyn
- 1 October 2021
Mathematics, Computer Science
A brief survey of recent research in machine learning that shows the power of algorithm design in continuous time, following by careful approximation to obtain a practical recursive algorithm.
...
...