论文部分内容阅读
针对解决对传统的多A gen t再励学习算法中,A gen t只能独立学习、不能合作学习的问题和启发式算法中只考虑了单个A gen t而没有推广到多A gen t的情况,给出了对称和非对称环境下的基于启发式的多A gen t再励学习算法。该算法基于A gen t之间的通信来获取其它A gen t的历史信息,以及动作选择策略,结合启发式算法思想,达到A gen t在学习过程中的合作的目的,最终提高学习的效率。以2个A gen t的2个状态3个动作选择为例,表明该算法的收敛速度高于传统分布式再励学习算法的收敛速度。
In order to solve the traditional multi-A gen t re-learning algorithm, A gen t can only learn independently, can not cooperate with the learning problem and the heuristic algorithm considers only a single A gen t without promotion to more A gen t , Gives a heuristic-based multi-A-gen learning algorithm under symmetric and asymmetric conditions. Based on the communication between A gen t, this algorithm obtains other A gen t historical information and action selection strategy, combined with the heuristic algorithm idea to achieve the goal of A gen t cooperation in the learning process and finally enhances the learning efficiency. The case of 2 actions of 2 A gen t is chosen as an example, which shows that the convergence speed of this algorithm is higher than that of the traditional distributed re-learning algorithm.