The method uses actor-critic architecture, in which a recursive least-squares TD method is used to estimate parameters of value function during critic training and a value gradient method is used to improve control policy during actor training.

 
  • 该方法采用动作-评价者结构,在评价者训练中使用递推最小二乘TD(RLS-TD)方法估计值函数参数,在动作者训练中使用值梯度下降方法改进控制策略。
今日热词
目录 附录 查词历史