Authors

Manasa Mandava

Type

Text

Type

Dissertation

Advisor

Feinberg, Eugene | Rachev, Svetlozar | Takhtajan, Leon | Hu, Jiaqiao.

Date

2015-05-01

Keywords

Compensator, continuous-time Markov decision process, Jump Markov process, Kolmogorov's equation, Markov policies | Applied mathematics

Department

Department of Applied Mathematics and Statistics.

Language

en_US

Source

This work is sponsored by the Stony Brook University Graduate School in compliance with the requirements for completion of degree.

Identifier

http://hdl.handle.net/11401/77454

Publisher

The Graduate School, Stony Brook University: Stony Brook, NY.

Format

application/pdf

Abstract

A basic fact in the theory of Discrete-time Markov Decision Processes is that for any policy there exists a Markov policy with the same marginal state-action distributions. This fact implies that the study of control problems with multiple criteria and constraints that are determined by marginal distribution (for e.g. expected total discounted and non-discounted costs, average cost per unit time) can be restricted to the set of Markov policies. This dissertation presents a similar result for Continuous-Time Markov Decision Processes (CTMDPs). In CTMDPs with Borel state and action spaces, unbounded transition and cost rates, for an arbitrary policy, we construct a Markov policy such that the marginal distribution on the state-action pairs is the same for both the policies. This fact implies that the expected cost rates at each time instant are equal for these two policies. Thus, the constructed Markov policy performs equally to the original policy for problems with multiple criteria and constraints that are determined by marginal distribution. The proof consists of two major steps: The first step describes the properties of solutions to Kolmogorov's equations for jump Markov processes. In particular, for given transition intensities, the three approaches to construct a jump Markov process: (i) via the compensator of the random measure of a multivariate point process, (ii) as a minimal solution of Kolmogorov's backward equation, and (iii) as a minimal solution of Kolmogorov's forward equation define the same transition function. If the jump Markov process associated with the transition function has no accumulation points, then it is the unique solution of both Kolmogorov's equations. The second step applies these results to CTMDPs and establishes that the marginal distribution on the state for both the policies satisfy Kolmogorov's forward equation defined by the Markov policy. This fact immediately implies that the marginal distributions on the state for both the policies coincide if the transition intensities corresponding to the Markov policy are bounded. In the general case, it is possible to consider a sequence of policies with bounded transition intensities and that converge to the original policy. The proof for the general case follows from these approximations. | 70 pages

Share

COinS
 
 

To view the content in your browser, please download Adobe Reader or, alternately,
you may Download the file to your hard drive.

NOTE: The latest versions of Adobe Reader do not support viewing PDF files within Firefox on Mac OS and if you are using a modern (Intel) Mac, there is no official plugin for viewing PDF files within the browser window.