学习是一种重要的强化学习算法。
讨论平均准则控制马氏链的强化学习算法。
An average reward reinforcement learning algorithm for control Markov chains is presented.
论文主要研究了基于平均型强化学习算法的动态调度方法。
The thesis mainly focuses on the dynamic scheduling method based on the averaged rewards reinforcement learning algorithms.
应用推荐