Exploiting the structural properties of the underlying markov decision problem in the Q-learning algorithm

Kunnumkal, S and Topaloglu, H (2008) Exploiting the structural properties of the underlying markov decision problem in the Q-learning algorithm. INFORMS Journal on Computing, 20 (2). pp. 288-301.

Full text not available from this repository. (Request a copy)

Abstract

This paper shows how to exploit the structural properties of the underlying Markov decision problem to improve the convergence behavior of the Q-learning algorithm. In particular, we consider infinite-horizon discounted-cost Markov decision problems where there is a natural ordering between the states of the system and the value function is known to be monotone in the state. We propose a new variant of the Q-learning algorithm that ensures that the value function approximations obtained during the intermediate iterations are also monotone in the state. We establish the convergence of the proposed algorithm and experimentally show that it significantly improves the convergence behavior of the standard version of the Q-learning algorithm

ISB Creators:
ISB CreatorsORCiD
Kunnumkal, SUNSPECIFIED
Item Type: Article
Uncontrolled Keywords: Markov decision processes; Q-learning; Stochastic approximation methods
Subjects: Business and Management
Depositing User: Veeramani R
Date Deposited: 01 Nov 2014 17:12
Last Modified: 14 Apr 2015 07:14
URI: http://eprints.exchange.isb.edu/id/eprint/119
Publisher URL: http://dx.doi.org./10.1287/ijoc.1070.0240
Publisher OA policy: http://www.sherpa.ac.uk/romeo/issn/1091-9856/
Related URLs:

Actions (login required)

View Item View Item
Statistics for DESI ePrint 119 Statistics for this ePrint Item