The objective is to find an optimal policy which maximizes the expected average reward per time step over infinite horizon.

 
  • 目的是寻找使得长期每阶段期望平均报酬最大的最优控制策略。
今日热词
目录 附录 查词历史