Linear Least-Squares Algorithms for Temporal Difference Learning

@article{Bradtke2005LinearLA,
  title={Linear Least-Squares Algorithms for Temporal Difference Learning},
  author={Steven J. Bradtke and Andrew G. Barto},
  journal={Machine Learning},
  year={2005},
  volume={22},
  pages={33-57},
  url={https://api.semanticscholar.org/CorpusID:20327856}
}
Two new temporal difference algorithms based on the theory of linear least-squares function approximation, LS TD and RLS TD, are introduced and prove probability-one convergence when it is used with a function approximator linear in the adjustable parameters.

Stochastic approximation for efficient LSTD and least squares regression

This paper considers a β€œbig data” regime where both the dimension, d, of the data and the number, T, of training samples are large and proposes stochastic approximation based methods with randomization of samples in two different settings - one for policy evaluation using the least squares temporal difference (LSTD) algorithm and the other for solving the most squares problem.

12-009 Least-squares methods for policy iteration βˆ—

This chapter reviews leastsquares methods for policy iteration, an important class of algorithms for approximate reinforcement learning, and discusses three techniques for solving the core, policy evaluation component of policy iteration: least-squares temporal difference, least-Squares policy evaluation, and Bellman residual minimization.

Sparse Temporal Difference Learning via Alternating Direction Method of Multipliers

This paper proposes a new algorithm for approximating the fixed-point based on the Alternating Direction Method of Multipliers (ADMM), and demonstrates, with experimental results, that the proposed algorithm is more stable for policy iteration compared to prior work.

Fastest Convergence for Q-learning

The Zap Q-learning algorithm introduced in this paper is an improvement of Watkins' original algorithm and recent competitors in several respects. It is a matrix-gain algorithm designed so that its…

Optimization methods for structured machine learning problems

This thesis attempts to solve the `1regularized fixed-point problem with the help of Alternating Direction Method of Multipliers (ADMM) and argues that the proposed method is well suited to the structure of the aforementioned fixed- point problem.

Q-learning algorithms for optimal stopping based on least squares

This work considers the solution of discounted optimal stopping problems using linear function approximation methods and proposes alternative algorithms, which are based on projected value iteration ideas and least squares, which prove the convergence of some of these algorithms.

Compressed Conditional Mean Embeddings for Model-Based Reinforcement Learning

It is demonstrated that the loss function for the CME model suggests a principled approach to compressing the induced (pseudo-)MDP, leading to faster planning, while maintaining guarantees, and superior performance to existing methods in this class of modelbased approaches on a range of MDPs.

Off-Policy Neural Fitted Actor-Critic

A new off-policy, offline, model-free, actor-critic reinforcement learning algorithm dealing with continuous environments in both states and actions is presented, which allows to trade-off between data-efficiency and scalability.

Applying Q ( Ξ» )-learning in Deep Reinforcement Learning to Play Atari Games

Empirical results on a range of games show that the deep Q(Ξ») network significantly reduces learning time, and this method provides faster learning in comparison with the DQN method.

The convergence of TD(Ξ») for general Ξ»

Watkins' theorem that Q-learning, his closely related prediction and action learning method, converges with probability one is adapted to demonstrate this strong form of convergence for a slightly modified version of TD.

Practical issues in temporal difference learning

It is found that, with zero knowledge built in, the network is able to learn from scratch to play the entire game at a fairly strong intermediate level of performance, which is clearly better than conventional commercial programs, and which surpasses comparable networks trained on a massive human expert data set.

On the Convergence of Stochastic Iterative Dynamic Programming Algorithms

A rigorous proof of convergence of DP-based learning algorithms is provided by relating them to the powerful techniques of stochastic approximation theory via a new convergence theorem, which establishes a general class of convergent algorithms to which both TD() and Q-learning belong.

Incremental dynamic programming for on-line adaptive optimal control

This dissertation expands the theoretical and empirical understanding of IDP algorithms and increases their domain of practical application, and proves convergence of a DP-based reinforcement learning algorithm to the optimal policy for any continuous domain.

Q-learning

This paper presents and proves in detail a convergence theorem forQ-learning based on that outlined in Watkins (1989), showing that Q-learning converges to the optimum action-values with probability 1 so long as all actions are repeatedly sampled in all states and the action- values are represented discretely.

Expectation Driven Learning with an Associative Memory

In these experiments, the automaton has expectations of the minimum future cost of actions leading to a goal state; learning occurs when expectations in the associative memory are modified and the effect on learning is noted.

Recursive estimation and time-series analysis

It's important for you to start having that hobby that will lead you to join in better concept of life and reading will be a positive activity to do every time.

Learning rate schedules for faster stochastic gradient search

The authors propose a new methodology for creating the first automatically adapting learning rates that achieve the optimal rate of convergence for stochastic gradient descent. Empirical tests agree…