作者 唐昊,袁继彬,陆阳,程文娟
单位 School of Computer and Information
摘要 An alpha-uniformized Markov chain is defined by the concept of equivalent infinitesimalgenerator for a semi-Markov decision process (SMDP) with both average- and discounted-criteria.According to the relations of their performance measures and performance potentials, the optimization of an SMDP can be realized by simulating the chain. For the critic model of neuro-dynamic programming (NDP), a neuro-policy iteration (NPI) algorithm is presented, and the performance error bound is shown as there are approximate error and improvement error in each iteration step.The obtained results may be extended to Markov systems, and have much applicability. Finally, a numerical example is provided.
刊物 自动化学报
关键词 Semi-Markov decision processes performance potentials neuro-dynamic program-ming
在线阅读 下载