Linear Least-Squares Algorithms for Temporal Difference Learning
@article{Bradtke2005LinearLA, title={Linear Least-Squares Algorithms for Temporal Difference Learning}, author={Steven J. Bradtke and Andrew G. Barto}, journal={Machine Learning}, year={2005}, volume={22}, pages={33-57}, url={https://api.semanticscholar.org/CorpusID:20327856} }
Two new temporal difference algorithms based on the theory of linear least-squares function approximation, LS TD and RLS TD, are introduced and prove probability-one convergence when it is used with a function approximator linear in the adjustable parameters.
47 Citations
An efficient L2-norm regularized least-squares temporal difference learning algorithm
- 2013
Computer Science, Mathematics
Stochastic approximation for efficient LSTD and least squares regression
- 2014
Computer Science, Mathematics
This paper considers a βbig dataβ regime where both the dimension, d, of the data and the number, T, of training samples are large and proposes stochastic approximation based methods with randomization of samples in two different settings - one for policy evaluation using the least squares temporal difference (LSTD) algorithm and the other for solving the most squares problem.
12-009 Least-squares methods for policy iteration β
- 2016
Computer Science
This chapter reviews leastsquares methods for policy iteration, an important class of algorithms for approximate reinforcement learning, and discusses three techniques for solving the core, policy evaluation component of policy iteration: least-squares temporal difference, least-Squares policy evaluation, and Bellman residual minimization.
Sparse Temporal Difference Learning via Alternating Direction Method of Multipliers
- 2015
Computer Science, Mathematics
This paper proposes a new algorithm for approximating the fixed-point based on the Alternating Direction Method of Multipliers (ADMM), and demonstrates, with experimental results, that the proposed algorithm is more stable for policy iteration compared to prior work.
Fastest Convergence for Q-learning
- 2017
Computer Science, Mathematics
The Zap Q-learning algorithm introduced in this paper is an improvement of Watkins' original algorithm and recent competitors in several respects. It is a matrix-gain algorithm designed so that itsβ¦
Optimization methods for structured machine learning problems
- 2019
Computer Science, Mathematics
This thesis attempts to solve the `1regularized fixed-point problem with the help of Alternating Direction Method of Multipliers (ADMM) and argues that the proposed method is well suited to the structure of the aforementioned fixed- point problem.
Q-learning algorithms for optimal stopping based on least squares
- 2007
Computer Science, Mathematics
This work considers the solution of discounted optimal stopping problems using linear function approximation methods and proposes alternative algorithms, which are based on projected value iteration ideas and least squares, which prove the convergence of some of these algorithms.
Compressed Conditional Mean Embeddings for Model-Based Reinforcement Learning
- 2016
Computer Science
It is demonstrated that the loss function for the CME model suggests a principled approach to compressing the induced (pseudo-)MDP, leading to faster planning, while maintaining guarantees, and superior performance to existing methods in this class of modelbased approaches on a range of MDPs.
Off-Policy Neural Fitted Actor-Critic
- 2016
Computer Science
A new off-policy, offline, model-free, actor-critic reinforcement learning algorithm dealing with continuous environments in both states and actions is presented, which allows to trade-off between data-efficiency and scalability.
Applying Q ( Ξ» )-learning in Deep Reinforcement Learning to Play Atari Games
- 2017
Computer Science
Empirical results on a range of games show that the deep Q(Ξ») network significantly reduces learning time, and this method provides faster learning in comparison with the DQN method.
25 References
The convergence of TD(Ξ») for general Ξ»
- 2004
Mathematics, Computer Science
Watkins' theorem that Q-learning, his closely related prediction and action learning method, converges with probability one is adapted to demonstrate this strong form of convergence for a slightly modified version of TD.
Practical issues in temporal difference learning
- 2004
Computer Science, Psychology
It is found that, with zero knowledge built in, the network is able to learn from scratch to play the entire game at a fairly strong intermediate level of performance, which is clearly better than conventional commercial programs, and which surpasses comparable networks trained on a massive human expert data set.
On the Convergence of Stochastic Iterative Dynamic Programming Algorithms
- 1994
Computer Science
A rigorous proof of convergence of DP-based learning algorithms is provided by relating them to the powerful techniques of stochastic approximation theory via a new convergence theorem, which establishes a general class of convergent algorithms to which both TD() and Q-learning belong.
Incremental dynamic programming for on-line adaptive optimal control
- 1995
Computer Science
This dissertation expands the theoretical and empirical understanding of IDP algorithms and increases their domain of practical application, and proves convergence of a DP-based reinforcement learning algorithm to the optimal policy for any continuous domain.
Consistency of HDP applied to a simple reinforcement learning problem
- 1990
Computer Science, Engineering
Q-learning
- 2004
Computer Science
This paper presents and proves in detail a convergence theorem forQ-learning based on that outlined in Watkins (1989), showing that Q-learning converges to the optimum action-values with probability 1 so long as all actions are repeatedly sampled in all states and the action- values are represented discretely.
Expectation Driven Learning with an Associative Memory
- 1990
Computer Science
In these experiments, the automaton has expectations of the minimum future cost of actions leading to a goal state; learning occurs when expectations in the associative memory are modified and the effect on learning is noted.
Generalization of backpropagation with application to a recurrent gas market model
- 1988
Computer Science
Recursive estimation and time-series analysis
- 1986
Mathematics, Computer Science
It's important for you to start having that hobby that will lead you to join in better concept of life and reading will be a positive activity to do every time.
Learning rate schedules for faster stochastic gradient search
- 1992
Computer Science
The authors propose a new methodology for creating the first automatically adapting learning rates that achieve the optimal rate of convergence for stochastic gradient descent. Empirical tests agreeβ¦