A Method for Learning Macro-Actions for Virtual Characters Using Programming by Demonstration and Reinforcement Learning

A Method for Learning Macro-Actions for Virtual Characters Using Programming by Demonstration and Reinforcement Learning
A Method for Learning Macro-Actions for Virtual Characters Using Programming by Demonstration and Reinforcement Learning

ㆍ 저자명: Sung. Yun-Sick,Cho. Kyun-Geun
ㆍ 간행물명: Journal of information processing systems
ㆍ 권/호정보: 2012년|8권 3호|pp.409-420 (12 pages)
ㆍ 발행정보: 한국정보처리학회
ㆍ 파일정보: 정기간행물|ENG|
PDF텍스트
ㆍ 주제분야: 기타

이 논문은 한국과학기술정보연구원과 논문 연계를 통해 무료로 제공되는 원문입니다.

서지반출

기타언어초록

The decision-making by agents in games is commonly based on reinforcement learning. To improve the quality of agents, it is necessary to solve the problems of the time and state space that are required for learning. Such problems can be solved by Macro-Actions, which are defined and executed by a sequence of primitive actions. In this line of research, the learning time is reduced by cutting down the number of policy decisions by agents. Macro-Actions were originally defined as combinations of the same primitive actions. Based on studies that showed the generation of Macro-Actions by learning, Macro-Actions are now thought to consist of diverse kinds of primitive actions. However an enormous amount of learning time and state space are required to generate Macro-Actions. To resolve these issues, we can apply insights from studies on the learning of tasks through Programming by Demonstration (PbD) to generate Macro-Actions that reduce the learning time and state space. In this paper, we propose a method to define and execute Macro-Actions. Macro-Actions are learned from a human subject via PbD and a policy is learned by reinforcement learning. In an experiment, the proposed method was applied to a car simulation to verify the scalability of the proposed method. Data was collected from the driving control of a human subject, and then the Macro-Actions that are required for running a car were generated. Furthermore, the policy that is necessary for driving on a track was learned. The acquisition of Macro-Actions by PbD reduced the driving time by about 16% compared to the case in which Macro-Actions were directly defined by a human subject. In addition, the learning time was also reduced by a faster convergence of the optimum policies.

키워드

Reinforcement Learning Monte Carlo Method Behavior Generation Model Programming By Demonstration Macro-Action Multi-Step Action

다운URL