KR20210126236A

KR20210126236A - AI's action intention realization system in a reinforced learning system, and method thereof

Info

Publication number: KR20210126236A
Application number: KR1020200043773A
Authority: KR
Inventors: 권판검
Original assignee: 오산대학교 산학협력단
Priority date: 2020-04-10
Filing date: 2020-04-10
Publication date: 2021-10-20
Also published as: KR102342458B1

Abstract

The present invention relates to a system for implementing behavioral intention of AI in a system having reinforcement learning applied thereto and a method thereof and, more specifically, to a system (1) for implementing behavioral intention of AI in a system having reinforcement learning applied thereto, comprising: an AI terminal (100) including an input and output part (110), a control part (120) and an external data connection part (140); and at least one AI Agent (10) for performing the transmission and reception of signals and data to and from the control part (120) of the AI terminal (100) through the external data connection part (140) of the AI terminal (100). The control part includes an XAII providing module (120c) for receiving the behavioral intention implementation of an AI Agent (10) on the system having reinforcement learning by artificial intelligence, which is performed by the AI Agent (10), applied thereto, outputting the behavioral intention implementation to the input and output part (110) and receiving a result value from the input and output part (110) in return and applying the result value. The system for implementing behavioral intention of AI in a system having reinforcement learning applied thereto and the method thereof according to an embodiment of the present invention can provide a model for explaining the intention of AI to a given person before behavior by explaining the intention to the person in advance in a system having reinforcement learning applied thereto, realize the behavioral intention simply by adding lines into an existing reinforcement learning Pseudo Code, and show the intention to a user through motions to enable swift decision-making.

Description

강화학습이 적용된 시스템에서 ＡＩ의 행동 의도 구현 시스템 및 그 방법{ＡＩ's action intention realization system in a reinforced learning system, and method thereof}AA's action intention realization system in a reinforced learning system, and method thereof in a system to which reinforcement learning is applied

본 발명은 강화학습이 적용된 시스템에서 AI의 행동 의도 구현 시스템 및 그 방*에 관한 것으로, 보다 구체적으로는, 강화학습이 적용된 시스템에서 AI의 행동 의도를 설명할 수 있도록 할 뿐만 아니라, 강화학습이 적용된 시스템에서 자신의 의도를 사람에게 미리 설명함으로써 주어진 사람에게 AI의 의도를 행동 전에 설명할 수 있는 모델을 제공하도록 하기 위한 강화학습이 적용된 시스템에서 AI의 행동 의도 구현 시스템 및 그 방법에 관한 것이다. The present invention relates to a system and a method for implementing the behavioral intention of AI in a system to which reinforcement learning is applied, and more specifically, to a system to which reinforcement learning is applied. To a system and method for implementing the behavioral intention of AI in a system to which reinforcement learning is applied, in order to provide a model that can explain the intention of AI to a given person before the action by explaining his intention in the applied system to a person in advance.

인공지능(Artificial Intelligence, 이하 'AI')의 신뢰성, 필요성, 효과성 등 많은 분야에 대한 기술적 도전은 진행 중이며, 그 중 하나가 Explainable AI (설명 가능한 AI, 이하 'XAI') 이다. Technical challenges in many fields, such as the reliability, necessity, and effectiveness of artificial intelligence (hereinafter 'AI'), are ongoing, and one of them is Explainable AI (hereinafter 'XAI').

AI는 문제 해결능력은 뛰어나지만, 알고리즘의 조작가능성, 의사결정의 편향성 등 부작용이 발생할 수도 있다. 또한, AI는 사람이 원하는 결과를 잘 도출하긴 하지만, 왜 그렇게 결과를 도출했는지, 그 의도를 설명해 주지는 않는다. Although AI has excellent problem-solving ability, side effects such as operability of algorithms and bias in decision-making may occur. Also, while AI is good at producing the results that people want, it does not explain why the results are produced the way they are or their intentions.

예를 들어 의약품 분야에서 AI가 환자를 진찰하여 알약을 처방한 경우, 그 알약이 그 병에 효과적일 것으로 판단은 되나 AI가 알약을 처방하게 된 과정을 설명해 주지는 않기 때문에 사람이 부담없이 알약을 복용하기란 쉽지 않다. 이러한 생각은 '신뢰성 있는 AI를 만들어야 한다.'라는 의견으로 수렴한다.For example, in the pharmaceutical field, if AI examines a patient and prescribes a pill, it is judged that the pill will be effective for the disease, but the AI does not explain the process of prescribing the pill, so people can take the pill without burden. It is not easy to take. These thoughts converge with the opinion that 'a reliable AI should be created'.

XAI에 관한 대표적인 연구기관은 美 국방부 산하 국방연구원(Defense Advanced Research Projects Agency, 이하, 'DARPA') 이다. DARPA는 AI 신뢰성 문제를 해결하기 위한 기술로 도 1과 같은 XAI를 제시하였다. 도 1을 참조하면 AI가 제시한 결과에 대한 근거와 이유 등을 사람에게 설명하여 AI의 신뢰성을 확보하는 것으로, 개발목표는 아래와 같다.A representative research institute on XAI is the Defense Advanced Research Projects Agency (hereinafter, 'DARPA') under the US Department of Defense. DARPA presented XAI as shown in FIG. 1 as a technology for solving the AI reliability problem. Referring to FIG. 1 , the rationale and reason for the results presented by AI are explained to humans to secure the reliability of AI, and the development goals are as follows.

Produce more explainable models, while maintaining a high level of learning performance(prediction accuracy)->높은 수준의 학습 성능을 유지하면서 보다 설명 가능한 모델 생성(예측 정확도)Produce more explainable models, while maintaining a high level of learning performance(prediction accuracy) Enable human users to understand, appropriately trust, and effectively manage the emerging generation of artificially intelligent partners->사용자가 새롭게 등장한 인공 지능 파트너의 세대를 이해하고 적절하게 신뢰하며 효과적으로 관리 할 수 있도록 지원Enable human users to understand, appropriately trust, and effectively manage the emerging generation of artificially intelligent partners

한편, DARPA는 관련 기술 구현을 위해 세 가지 방법을 제시하였다. Meanwhile, DARPA has suggested three methods for implementing the related technology.

첫째, "신경회로망 노드에 설명라벨 붙이기"이다. 이는 결과값을 도출하는 과정에서 고려한 요인을 외부에 보여줌으로 AI의 논리를 확인 하는 방법이다. First, "Attaching descriptive labels to neural network nodes". This is a method of confirming the logic of AI by showing the factors considered in the process of deriving the result value to the outside.

둘째, "의사결정트리를 이용한 설명모델 만들기"이다. 이는 AI 생각을 감시하는 시스템을 개발하는 것으로 학습방법과 연계하여 일치성을 확인하는 방법이다.The second is "creating an explanatory model using a decision tree". This is to develop a system to monitor AI thoughts, and it is a method to check consistency in connection with the learning method.

셋째 "통계적 방법을 이용하여 설명모델 유추하기"이다. 이는 AI가 주어진 환경에서 산출하는 결과값을 사람이 관찰하여 그 생각 방식을 통계적으로 유추하는 방법이다. The third is "Inferring explanatory models using statistical methods". This is a method of statistically inferring the way of thinking by observing the result value produced by AI in a given environment.

그런데 위와 같은 방법은 사람이 알고리즘에 많이 개입하면 할수록 높은 정확도를 보일 것으로 판단되며, SL(Supervised Learning), UL(Unsupervised Learning)에 의해 파생되는 분야일수록 신뢰성은 높을 것으로 보인다.However, the above method is judged to show higher accuracy the more people intervene in the algorithm, and the more reliable the field derived from SL (Supervised Learning) and UL (Unsupervised Learning), the higher the reliability.

사용자의 신속한 의사결정을 원하는 시스템은 위와 같은 방법의 적용이 제한적일 수도 있다. 왜냐하면, 아무리 많은 데이터(Data)가 있다고 할지라도, 상황을 고려시 그에 적합한 데이터 큐레이팅(Data Curating)이 제한되고, 이후 행동에 대한 AI의 의도를 사람에게 알려주지 않음으로 사람의 의사결정 소요시간이 늘어날 수도 있기 때문이다. The application of the above method may be limited in a system that wants the user to make a quick decision. Because no matter how much data there is, data curating suitable for it is limited when considering the situation, and the time required for human decision-making will increase by not letting people know the intention of AI for subsequent actions. because it might be

한편, DARPA의 XAI는 AI가 결과값을 도출하기까지의 과정에 주로 관심을 둔다. 즉 어떻게, 어떤 요소로 인해 결과값을 도출하였는지를 설명하는 개념으로, 해당 기술분야에 있어서는 과거의 과정을 살펴보는 것이 아니라 AI 기반으로 가까운 미래 의도를 설명하고 적용하기 위한 기술 개발이 요구되고 있다. On the other hand, DARPA's XAI is mainly concerned with the process from which AI derives results. In other words, it is a concept that explains how and by what factors the result was derived. In the technical field, technology development is required to explain and apply the intention of the near future based on AI, rather than looking at the past process.

한편, 관련기술로서 대한민국 특허출원 출원번호 제10-2017-0093256호(강화 학습 기반 CCTV용 차량 번호 인식 방법)는 차량 번호판을 촬영한 영상 데이터에서 검출된 경계선 및 에지 내 문자들을 분할 및 인식하여 인공 지능 학습 방식인 강화 학습 및 기울기 보정을 통해 차량 번호판 내 문자들을 정확하게 판별함으로써, 차량 번호 인식률을 향상시킬 수 있는 강화 학습 기반 CCTV용 차량 번호 인식 방법에 관한 것이다.On the other hand, as a related technology, the Republic of Korea Patent Application No. 10-2017-0093256 (a method of recognizing a vehicle number for CCTV based on reinforcement learning) divides and recognizes the characters in the boundaries and edges detected in the image data of the vehicle license plate and artificially It relates to a reinforcement learning-based CCTV vehicle number recognition method that can improve the vehicle number recognition rate by accurately discriminating the characters in the license plate through reinforcement learning and tilt correction, which are intelligent learning methods.

또한, 대한민국 특허출원 출원번호 제10-2016-0104905호(스마트 환경에서의 강화 학습을 이용한 태스크 중심의 서비스 개인화 방법)은 스마트 오브젝트와 상기 스마트 오브젝트를 제어하는 서버를 포함하는 시스템으로 이용하여 환경을 자동으로 조성하는 방법으로서, 상기 서버에 의해 스마트 오브젝트로부터 사용자의 사용 이력 및 환경이 기록되는 기록 단계; 상기 서버에 의해 상기 기록 단계에 의해 기록된 사용 이력 및 환경 기록을 바탕으로 사용자의 선호도가 학습되는 학습 단계; 및 상기 학습 단계를 통해 학습된 선호도를 바탕으로 상기 서버에 의해 스마트 오브젝트의 선호 환경이 조성되는 환경 조성 단계를 포함하는 스마트 환경에서의 강화 학습을 이용한 서비스 개인화 방법이 제공됨으로써, 스마트 환경에서 스마트 오브젝트가 사용자의 태스크에 적합한 환경을 학습하여 선호 환경을 자동으로 조성하는 방법을 제공하는 기술에 관한 것이다. In addition, the Republic of Korea Patent Application No. 10-2016-0104905 (task-oriented service personalization method using reinforcement learning in a smart environment) is a system including a smart object and a server that controls the smart object by using the environment A method for automatically creating, a recording step in which a user's usage history and environment are recorded from a smart object by the server; a learning step in which the user's preference is learned based on the usage history and environment record recorded by the recording step by the server; and an environment creation step in which a preference environment of a smart object is created by the server based on the preference learned through the learning step. By providing a service personalization method using reinforcement learning in a smart environment, To a technology that provides a method for automatically creating a preferred environment by learning an environment suitable for a user's task.

또한, 대한민국 특허출원 출원번호 제10-2017-0057305호(HEMS용 AI모듈을 통한 지능형 홈 에너지 관리 장치)는 홈(집)에 사람이 없어도, HEMS용 AI모듈을 통해, 최적전력효율로 디바이스들이 구동되도록 하는 홈 에너지 모니터링 및 관리서비스뿐만 아니라, 홈(집) 공간에 발생되는 화재 및 과전압으로 인한 위험요소들을 감지하여 집 주인 및 원격중앙서버로 자동으로 통보시켜 줄 수 있는 위기 관리 서비스 기능을 수행시킬 수 있는 HEMS용 AI모듈을 통한 지능형 홈 에너지 관리 장치에 관한 것이다.In addition, the Republic of Korea Patent Application No. 10-2017-0057305 (Intelligent home energy management device through AI module for HEMS) even if there is no person in the home (home), through the AI module for HEMS, devices can be installed with optimum power efficiency. In addition to the home energy monitoring and management service that enables the operation, it performs a crisis management service function that can detect risk factors due to fire and overvoltage occurring in the home space and automatically notify the owner and remote central server. It relates to an intelligent home energy management device through an AI module for HEMS that can

또한, 대한민국 특허출원 출원번호 제10-2019-0096272호(사용자의 행동 패턴에 기반한 AI(Artificial Intelligence) 장치와 디바이스를 연계하는 방법 및 이를 위한 장치)는 제 1 카메라에 의해 감지된 상기 사용자의 기 설정된 행동 패턴을 상기 제 1 카메라로부터 수신하고, 상기 사용자로부터 상기 디바이스의 동작 제어를 위한 음성 명령어를 수신하고, 상기 디바이스로 상기 음성 명령어를 전송하여, AI 기능이 없는 디바이스들도 AI 기기와 연동하여 사용할 수 있도록 하는 기술에 관한 것이다. In addition, the Republic of Korea Patent Application No. 10-2019-0096272 (a method for linking an AI (Artificial Intelligence) device and a device based on a user's behavior pattern and an apparatus for the same) is the user's Receives a set behavior pattern from the first camera, receives a voice command for controlling the operation of the device from the user, and transmits the voice command to the device, so that devices without an AI function can also be linked with an AI device It's about technology that can be used.

그러나 상기 기술들은 모두 AI 또는 AI 학습 중 강화학습을 이용하고 있으나, AI 기반의 가까운 미래 의도를 설명하고 적용하는 기술을 제공하지 못하는 한계점이 있다. However, all of the above technologies use AI or reinforcement learning during AI learning, but there is a limitation in that they cannot provide technology to explain and apply AI-based near-future intentions.

대한민국 특허출원 출원번호 제10-2017-0093256호 "강화 학습 기반 CCTV용 차량 번호 인식 방법(A reinforcement learning based vehicle number recognition method for CCTV)"Republic of Korea Patent Application No. 10-2017-0093256 "A reinforcement learning based vehicle number recognition method for CCTV" 대한민국 특허출원 출원번호 제10-2016-0104905호 "스마트 환경에서의 강화 학습을 이용한 태스크 중심의 서비스 개인화 방법(A TASK-ORIENTED SERVICE PERSONALIZAION METHOD FOR SMART ENVIRONMENT USING REINFORCEMENT LEARNING)"Korean Patent Application No. 10-2016-0104905 "A TASK-ORIENTED SERVICE PERSONALIZAION METHOD FOR SMART ENVIRONMENT USING REINFORCEMENT LEARNING using reinforcement learning in a smart environment" 대한민국 특허출원 출원번호 제10-2017-0057305호 "HEMS용 AI모듈을 통한 지능형 홈 에너지 관리 장치(THE APPARATUS OF HOME NETWORK FOR USING AI MODULE)"Republic of Korea Patent Application No. 10-2017-0057305 "Intelligent home energy management device through AI module for HEMS (THE APPARATUS OF HOME NETWORK FOR USING AI MODULE)" 대한민국 특허출원 출원번호 제10-2019-0096272호 "사용자의 행동 패턴에 기반한 AI(Artificial Intelligence) 장치와 디바이스를 연계하는 방법 및 이를 위한 장치(A METHOD FOR ASSOCIATING AN AI DEVICE WITH A DEVICE BASED ON A BEHAVIOR PATTERN OF A USER AND AN APPARATUS THEREFOR)"Republic of Korea Patent Application No. 10-2019-0096272 "A METHOD FOR ASSOCIATING AN AI DEVICE WITH A DEVICE BASED ON A BEHAVIOR A METHOD FOR ASSOCIATING AN AI DEVICE WITH A DEVICE BASED ON A BEHAVIOR PATTERN OF A USER AND AN APPARATUS THEREFOR)

본 발명은 상기의 문제점을 해결하기 위한 것으로, 강화학습이 적용된 시스템에서 AI의 행동 의도를 설명할 수 있도록 하기 위한 강화학습이 적용된 시스템에서 AI의 행동 의도 구현 시스템 및 그 방법을 제공하기 위한 것이다.The present invention is to solve the above problems, and to provide a system and method for implementing the behavioral intention of AI in a system to which reinforcement learning is applied so that the behavioral intention of AI can be explained in the system to which reinforcement learning is applied.

또한, 본 발명은 강화학습이 적용된 시스템에서 자신의 의도를 사람에게 미리 설명함으로써 주어진 사람에게 AI의 의도를 행동 전에 설명할 수 있는 모델을 제공하도록 하기 위한 강화학습이 적용된 시스템에서 AI의 행동 의도 구현 시스템 및 그 방*을 제공하기 위한 것이다.In addition, the present invention implements the behavioral intention of AI in a system to which reinforcement learning is applied to provide a model that can explain the intention of AI to a given person before action by explaining his intention to a person in advance in a system to which reinforcement learning is applied to provide the system and its rooms*.

또한, 본 발명은 기존 강화학습 Pseudo Code에 Line을 추가함으로써 간단히 구현 가능할 뿐만 아니라, 자신의 의도를 움직임을 통해 사용자에게 보여줌으로써 신속한 의사결정이 가능하도록 하기 위한 강화학습이 적용된 시스템에서 AI의 행동 의도 구현 시스템 및 그 방법을 제공하기 위한 것이다.In addition, the present invention can be implemented simply by adding a line to the existing reinforcement learning pseudo code, and the behavioral intention of AI in a system to which reinforcement learning is applied to enable rapid decision making by showing one's intention to the user through movement An implementation system and method are provided.

그러나 본 발명의 목적들은 상기에 언급된 목적으로 제한되지 않으며, 언급되지 않은 또 다른 목적들은 아래의 기재로부터 당업자에게 명확하게 이해될 수 있을 것이다.However, the objects of the present invention are not limited to the above-mentioned objects, and other objects not mentioned will be clearly understood by those skilled in the art from the following description.

상기의 목적을 달성하기 위해 본 발명의 실시예에 따른 강화학습이 적용된 시스템에서 AI의 행동 의도 구현 시스템은, 입출력부(110), 제어부(120), 외부 데이터 연동부(140)를 포함하여 이루어지는 AI 터미널(100), 그리고 AI 터미널(100)의 외부 데이터 연동부(140)를 통해 AI 터미널(100)의 제어부(120)와 신호 및 데이터 송수신을 수행하는 적어도 하나 이상의 AI 에이전트(AI Agent)(10)를 포함하는 강화학습이 적용된 시스템에서 AI의 행동 의도 구현 시스템(1)에 있어서, 제어부(120)는, AI 에이전트(AI Agent)(10)에 의해 수행되는 인공지능에 의한 강화학습이 적용된 시스템 상에서 AI 에이전트(AI Agent)(10)의 행동 의도 구현을 수신하여 입출력부(110)로 출력하고 결과값을 입출력부(110)로부터 반환받아 적용시키는 XAII 제공 모듈(120c); 을 구비하는 것을 특징으로 한다.In order to achieve the above object, in the system to which reinforcement learning is applied according to an embodiment of the present invention, the system for implementing the behavioral intention of AI includes an input/output unit 110 , a control unit 120 , and an external data interworking unit 140 . AI terminal 100, and at least one AI agent (AI Agent) that performs signal and data transmission and reception with the control unit 120 of the AI terminal 100 through the external data interworking unit 140 of the AI terminal 100 ( 10) In the system (1) for implementing the action intention of AI in the system to which reinforcement learning is applied, including XAII providing module 120c for receiving the implementation of the action intention of the AI agent (AI Agent) 10 on the system, outputting it to the input/output unit 110, and receiving and applying the result value returned from the input/output unit 110; It is characterized in that it is provided.

본 발명의 일 실시예에 있어서, XAII 제공 모듈(120c)은, 강화학습이 적용된 시스템에서 AI의 행동 의도를 설명하기 위한 Pseudo Code를 제공하는 것을 특징으로 한다.In one embodiment of the present invention, the XAII providing module 120c is characterized in that it provides a pseudo code for explaining the behavioral intention of AI in a system to which reinforcement learning is applied.

또한, 본 발명의 일 실시예에 있어서, XAII 제공 모듈(130)은, 객체에 대해서 임무뿐만 아니라 과정(Process)도 규칙(Rules) 등에 부합해야 하기 때문에 사람의 개입이 필요하므로, 강화학습(RL)을 사용하여 액션체계를 구현하기 위해서는 제약조건들과 사람의 의사 개입이 이루어질 수 있도록 수정된 강화학습(RL) 모델을 제시하는 것을 특징으로 한다.In addition, in one embodiment of the present invention, since the XAII providing module 130 requires human intervention because not only the task but also the process for the object must conform to the rules, reinforcement learning (RL) ), it is characterized by presenting a modified reinforcement learning (RL) model so that constraints and human intentional intervention can be made to implement the action system.

또한, 본 발명의 일 실시예에 있어서, 제어부(120)는, AI 모드 제공 모듈(120b); 및 AI 모드 제공 모듈(120b)에 의해 수행되는 강화학습에 기반하여 AI Mode로 액션(action) 추론을 수행하고, 데이터 저장부(130)에 저장되는 추론되는 액션에 따른 객체 수행에 필요한 데이터를 제공받아 AI 에이전트(AI Agent)에 대해서 액션을 수행하도록 함으로써, 인공지능에 의한 기존 강화학습 모델에 AI의 개입 수준 정의와 정의가 개입할 수 있는 강화학습 모델을 기반으로 액션이 수행되어 수정된 강화학습 모델과 구현 방법을 제공하는 추론 모듈(120a); 을 더 포함하는 것을 특징으로 한다.In addition, in one embodiment of the present invention, the control unit 120, AI mode providing module (120b); And based on the reinforcement learning performed by the AI mode providing module 120b performs action (action) inference in AI mode, and provides data necessary for performing an object according to the inferred action stored in the data storage unit 130 Reinforcement learning modified by performing actions based on the reinforcement learning model in which AI intervention level definition and definition can intervene in the existing reinforcement learning model by artificial intelligence an inference module 120a that provides a model and an implementation method; It is characterized in that it further comprises.

또한, 본 발명의 일 실시예에 있어서, AI 에이전트(AI Agent)(10)는, 강화학습(RL)에서 학습자로 스스로 학습하는 컴퓨터인 것을 특징으로 할 수 있다.In addition, in one embodiment of the present invention, the AI agent (AI Agent) 10 may be characterized as a computer that learns itself as a learner in reinforcement learning (RL).

상기의 목적을 달성하기 위해 본 발명의 실시예에 따른 강화학습이 적용된 시스템에서 AI의 행동 의도 구현 방법은, 제어부(120)의 추론 모듈(120a)이, AI 모드 제공 모듈(120b)에 의해 수행되는 강화학습에 기반하여 AI Mode로 액션(action) 추론을 수행하고, 데이터 저장부(130)에 저장되는 추론되는 액션에 따른 객체 수행에 필요한 데이터를 제공받아 AI 에이전트(AI Agent)(10)에 대해서 액션을 수행하도록 하는 단계; 및 제어부(120)의 XAII 제공 모듈(120c)이, AI 에이전트(AI Agent)(10)에 의해 수행되는 인공지능에 의한 강화학습이 적용된 시스템 상에서 AI 에이전트(AI Agent)(10)의 행동 의도 구현을 수신하여 입출력부(110)로 출력하고 결과값을 입출력부(110)로부터 반환받아 적용시키는 단계; 를 포함하는 것을 특징으로 한다.In order to achieve the above object, the method for implementing the action intention of AI in the system to which reinforcement learning is applied according to an embodiment of the present invention is performed by the inference module 120a of the control unit 120, and the AI mode providing module 120b Action inference is performed in AI Mode based on the reinforcement learning that becomes to perform an action for it; and the XAII providing module 120c of the control unit 120 implements the action intention of the AI agent 10 on the system to which reinforcement learning by artificial intelligence performed by the AI agent 10 is applied. receiving and outputting to the input/output unit 110, and receiving and applying a result value returned from the input/output unit 110; It is characterized in that it includes.

본 발명의 실시예에 따른 강화학습이 적용된 시스템에서 AI의 행동 의도 구현 시스템 및 그 방법은, 강화학습이 적용된 시스템에서 자신의 의도를 사람에게 미리 설명함으로써 주어진 사람에게 AI의 의도를 행동 전에 설명할 수 있는 모델을 제공하는 효과가 있다. In a system to which reinforcement learning is applied according to an embodiment of the present invention, the system and method for implementing the intention of the AI in the system to which the reinforcement learning is applied, by explaining the intention of the person in advance to the person in the system to which the reinforcement learning is applied. It has the effect of providing a model that can be

뿐만 아니라, 본 발명의 다른 실시예에 따른 강화학습이 적용된 시스템에서 AI의 행동 의도 구현 시스템 및 그 방법은, 기존 강화학습 Pseudo Code에 Line을 추가함으로써 간단히 구현 가능할 뿐만 아니라, 자신의 의도를 움직임을 통해 사용자에게 보여줌으로써 신속한 의사결정이 가능하도록 하는 효과가 있다. In addition, in the system to which reinforcement learning is applied according to another embodiment of the present invention, the AI behavioral intention implementation system and its method can be implemented simply by adding a line to the existing reinforcement learning pseudo code, as well as moving one's own intention. It has the effect of enabling quick decision-making by showing it to the user.

도 1은 종래의 기술에 따른 XAI 모델을 설명하기 위한 도면이다.
도 2는 본 발명의 실시예에 따른 강화학습이 적용된 시스템에서 AI의 행동 의도 구현 시스템(1)을 나타내는 도면이다.
도 3은 본 발명의 실시예에 따른 강화학습이 적용된 시스템에서 AI의 행동 의도 구현 시스템(1)의 강화학습(RL) 모델을 설명하기 위한 블록도이다.
도 4는 본 발명의 실시예에 따른 강화학습이 적용된 시스템에서 AI의 행동 의도 구현 시스템(1) 중 XAII 제공 모듈(130)에 의해 구현되는 AI가 자신의 의도를 설명할 수 있는 인공지능 모델(XAII)을 나타내는 도면이다.
도 5는 본 발명의 실시예에 따른 강화학습이 적용된 시스템에서 AI의 행동 의도 구현 시스템(1)에 의해 제공되는 XAII Pseudo Code를 나타내는 도면이다.
도 6은 본 발명의 실시예에 따른 강화학습이 적용된 시스템에서 AI의 행동 의도 구현 시스템(1)에서 운영에 대한 XAII 프레젠테이션(XAII Presentation to Operations)을 나타내는 도면이다. 1 is a view for explaining an XAI model according to the prior art.
2 is a diagram illustrating an AI behavioral intention implementation system 1 in a system to which reinforcement learning is applied according to an embodiment of the present invention.
3 is a block diagram for explaining a reinforcement learning (RL) model of the system 1 for implementing the intention of AI in a system to which reinforcement learning is applied according to an embodiment of the present invention.
4 is an artificial intelligence model in which the AI implemented by the XAII providing module 130 among the behavioral intention implementation system 1 of the AI in the system to which reinforcement learning is applied according to an embodiment of the present invention can explain its intention ( It is a figure which shows XAII).
5 is a diagram showing the XAII pseudo code provided by the system 1 for implementing the behavioral intention of AI in a system to which reinforcement learning is applied according to an embodiment of the present invention.
6 is a diagram illustrating an XAII Presentation to Operations for an operation in the system 1 for implementing the intention of an AI in a system to which reinforcement learning is applied according to an embodiment of the present invention.

이하, 본 발명의 바람직한 실시예의 상세한 설명은 첨부된 도면들을 참조하여 설명할 것이다. 하기에서 본 발명을 설명함에 있어서, 관련된 공지 기능 또는 구성에 대한 구체적인 설명이 본 발명의 요지를 불필요하게 흐릴 수 있다고 판단되는 경우에는 그 상세한 설명을 생략할 것이다.DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS Hereinafter, detailed description of preferred embodiments of the present invention will be described with reference to the accompanying drawings. In the following description of the present invention, if it is determined that a detailed description of a related known function or configuration may unnecessarily obscure the gist of the present invention, the detailed description thereof will be omitted.

본 명세서에 있어서는 어느 하나의 구성요소가 다른 구성요소로 데이터 또는 신호를 '전송'하는 경우에는 구성요소는 다른 구성요소로 직접 상기 데이터 또는 신호를 전송할 수 있고, 적어도 하나의 또 다른 구성요소를 통하여 데이터 또는 신호를 다른 구성요소로 전송할 수 있음을 의미한다.In the present specification, when one component 'transmits' data or signal to another component, the component may directly transmit the data or signal to another component, and through at least one other component This means that data or signals can be transmitted to other components.

도 2는 본 발명의 실시예에 따른 강화학습이 적용된 시스템에서 AI의 행동 의도 구현 시스템(1)을 나타내는 도면이다. 도 3은 본 발명의 실시예에 따른 강화학습이 적용된 시스템에서 AI의 행동 의도 구현 시스템(1)의 강화학습(RL) 모델을 설명하기 위한 블록도이다. 도 4는 본 발명의 실시예에 따른 강화학습이 적용된 시스템에서 AI의 행동 의도 구현 시스템(1) 중 XAII 제공 모듈(130)에 의해 구현되는 AI가 자신의 의도를 설명할 수 있는 인공지능 모델(eXplainable AI's Intention, 이하, 'XAII')을 나타내는 도면이다. 도 5는 본 발명의 실시예에 따른 강화학습이 적용된 시스템에서 AI의 행동 의도 구현 시스템(1)에 의해 제공되는 XAII Pseudo Code를 나타내는 도면이다. 도 6은 본 발명의 실시예에 따른 강화학습이 적용된 시스템에서 AI의 행동 의도 구현 시스템(1)에서 운영에 대한 XAII 프레젠테이션(XAII Presentation to Operations)을 나타내는 도면이다. 2 is a diagram illustrating an AI behavioral intention implementation system 1 in a system to which reinforcement learning is applied according to an embodiment of the present invention. 3 is a block diagram for explaining a reinforcement learning (RL) model of the system 1 for implementing the behavioral intention of AI in a system to which reinforcement learning is applied according to an embodiment of the present invention. 4 is an artificial intelligence model in which the AI implemented by the XAII providing module 130 among the behavioral intention implementation system 1 of the AI in the system to which reinforcement learning is applied according to an embodiment of the present invention can explain its intention ( It is a diagram showing eXplainable AI's Intention, hereinafter, 'XAII'). 5 is a diagram showing the XAII pseudo code provided by the system 1 for implementing the behavioral intention of AI in a system to which reinforcement learning is applied according to an embodiment of the present invention. 6 is a diagram illustrating an XAII Presentation to Operations for an operation in the system 1 for implementing the intention of an AI in a system to which reinforcement learning is applied according to an embodiment of the present invention.

먼저, 도 2를 참조하면, 강화학습이 적용된 시스템에서 AI의 행동 의도 구현 시스템(1)은 디스플레이와 입력장치로 이루어진 입출력부(110), 제어부(120), 데이터 저장부(130), 외부 데이터 연동부(140)를 포함하는 AI 터미널(100), 그리고 AI 터미널(100)의 외부 데이터 연동부(140)를 통해 AI 터미널(100)의 제어부(120)와 신호 및 데이터 송수신을 수행하는 적어도 하나 이상의 AI 에이전트(AI Agent)(10)를 포함할 수 있다. First, referring to FIG. 2 , in a system to which reinforcement learning is applied, the AI behavioral intention implementation system 1 includes an input/output unit 110 comprising a display and an input device, a control unit 120 , a data storage unit 130 , and external data. AI terminal 100 including an interlocking unit 140, and at least one for transmitting and receiving signals and data with the control unit 120 of the AI terminal 100 through the external data interworking unit 140 of the AI terminal 100 It may include more than one AI agent (AI Agent) (10).

이에 따라, 제어부(120)는 추론 모듈(120a) 및 AI 모드 제공 모듈(120b), XAII 제공 모듈(120c)을 포함함으로써, 입출력부(110)를 통해 XAII 제공 모듈(120c)에 의해 수행되는 인공지능에 의한 강화학습이 적용된 시스템에서 AI의 행동 의도 구현을 입출력부(110)로 출력하고 결과값을 입출력부(110)로부터 반환받아 적용시킬 수 있다. Accordingly, the control unit 120 includes the reasoning module 120a, the AI mode providing module 120b, and the XAII providing module 120c, so that the artificial intelligence performed by the XAII providing module 120c through the input/output unit 110 is In a system to which reinforcement learning by intelligence is applied, the implementation of the action intention of AI may be output to the input/output unit 110 , and the result value may be returned from the input/output unit 110 and applied.

추론 모듈(120a)이 AI 모드 제공 모듈(120b)에 의해 수행되는 강화학습에 기반하여 AI Mode로 액션(action) 추론을 수행하고, 데이터 저장부(130)에 저장되는 추론되는 액션에 따른 객체 수행에 필요한 데이터를 제공받아 AI 에이전트(AI Agent)에 대해서 액션을 수행하도록 함으로써, 인공지능에 의한 기존 강화학습 모델에 AI의 개입 수준 정의와 정의가 개입할 수 있는 강화학습 모델을 기반으로 액션이 수행되어 수정된 강화학습 모델과 구현 방법을 제공할 수 있다. The inference module 120a performs action inference in AI Mode based on reinforcement learning performed by the AI mode providing module 120b, and performs an object according to the inferred action stored in the data storage unit 130 The action is performed based on the reinforcement learning model in which the definition and definition of the AI intervention level can be intervened in the existing reinforcement learning model by artificial intelligence by receiving the data required for the AI agent and performing the action. It can provide a modified reinforcement learning model and implementation method.

즉, 이러한 구성을 갖는 강화학습이 적용된 시스템에서 AI의 행동 의도 구현 시스템(1)은 AI의 분야 중 학습 분야인 강화학습(Reinforcement, 'RL')을 객체 액션에 사용을 액션체계를 선정하고, 액션체계에 강화학습(RL)을 적용하는데 발생 가능한 위험을 최소화하기 위해 수정된 강화학습(RL) 모델을 AI 모드(Mode) 정의와 함께 제시하기 위해 AI 모드 제공 모듈(120b)을 구비함으로써, 액션체계를 중심으로 AI 모드(Mode) 및 수정된 강화학습(RL) 모델을 제공할 수 있는 것이다. That is, in the system to which reinforcement learning having such a configuration is applied, the action intention implementation system (1) of AI selects an action system to use reinforcement learning ('RL'), a learning field among the fields of AI, for object actions, By having an AI mode providing module 120b to present a modified reinforcement learning (RL) model together with an AI mode definition to minimize the risk that may arise in applying reinforcement learning (RL) to the action system, the action It is possible to provide AI mode and modified reinforcement learning (RL) models around the system.

다음으로, 도 3을 참조하면, 강화학습(RL)에서 학습자 즉 스스로 학습하는 컴퓨터를 AI 에이전트(AI Agent)(10)라고 한다. AI 에이전트(AI Agent)(10)는 센서를 통한 환경을 인식하고 이펙터(effector)로 환경(Environment)(20)에 따라 활동하는 자율적인 존재로, 학습을 하는 주체로 AI 모드 제공 모듈(120b)에서 제공하는 AI 분야 중 하나의 분야인 강화학습에 따라 보상(Reward)(r_t)을 통해 학습한다.Next, referring to FIG. 3 , in reinforcement learning (RL), a learner, that is, a computer that learns by itself is called an AI agent 10 . AI agent (AI Agent) 10 is an autonomous entity that recognizes the environment through the sensor and acts according to the environment (Environment) 20 as an effector, AI mode providing module 120b as a learning subject of the AI field provided in accordance with one of the field of reinforcement learning and learning via the compensation (reward) (r _t).

여기서, 보상(r_t)은 직접적인 답보다는 간접적인 답의 역할을 하는 것으로, 강화학습(RL)을 수행하는 AI 에이전트(AI Agent)는 일종의 '강화'처럼 보상을 얻게 하는 행동을 점점 많이 하도록 행동하며, 보상(r_t)은 '환경(20)'으로부터 받는다. 즉, 강화학습(RL)은 환경(20)으로부터 누적되는 보상(r_t)을 최대화하도록 지속적으로 학습을 하는 모델에 해당한다. Here, the reward ( r _t ) acts as an indirect answer rather than a direct answer, and the AI agent performing reinforcement learning (RL) acts more and more to obtain a reward, such as a kind of 'reinforcement'. and the reward (r _t ) is received from the 'environment (20)'. That is, reinforcement learning (RL) corresponds to a model that continuously learns to maximize the _{reward (r t ) accumulated from the environment ( 20 ).}

한편, 액션체계란 객체가 함정인 경우 함정에 탑재된 모든 탐지 장비, 무장, 항해 지원 장비 등을 네트워크로 연결하여 통합된 액션 상황 정보를 만들어서 공유하고 표적의 탐지, 추적에서부터 위협 분석, 무장 할당, 교전 및 명중 여부 평가 분석에 이르기까지 지휘 및 무장 통제를 자동화함으로써, 위협에 대한 액션 효과를 최대화시키기 위한 통합 체계로 지휘통제, 무장 통제, 액션 자료 교환 및 전시 등을 수행하는 함정무기체계 중 하나로, 함정에서 브레인 역할을 수행한다.On the other hand, if the object is a ship, the action system connects all detection equipment, armament, and navigation support equipment installed on the ship through a network to create and share integrated action situation information, It is one of the naval weapon systems that perform command and control, arms control, action data exchange and display, etc. as an integrated system to maximize the effect of actions against threats by automating command and arm control from engagement and hit evaluation analysis. It acts as a brain in the trap.

본 발명에서 AI 에이전트(AI Agent)(10)는 사람의 개입이 이루어지지 아니한 상태에서 일정 제약조건(Constraints)이 주어진 상황에서 학습을 하였기 때문에 임무(Mission) 또는 목적(Object) 지향적 학습이라고 간주할 수 있다. In the present invention, the AI agent 10 is considered as mission- or object-oriented learning because it has learned in a given situation with certain constraints without human intervention. can

즉, AI 에이전트(AI Agent)(10)는 AI 모드 제공 모듈(120b)에 의해 제공되는 AI Mode의 복수의 단계 중 AI Mode 0에서부터 순차적으로 단계를 올려가는 강화학습 진행 중에 AI 모드 제공 모듈(120b)에서 제공하는 AI Mode로 학습을 함에도 원하는 결과이 나오지 아니하는 경우 Pseudo code(의사 코드)를 사용하여 원하는 결과에 해당하는 AI Mode를 제공받을 수 있다That is, the AI agent (AI Agent) 10 is the AI mode providing module 120b during the reinforcement learning progress of sequentially increasing the steps from AI Mode 0 among a plurality of steps of the AI Mode provided by the AI mode providing module 120b. ), if the desired result is not obtained even after learning with the AI Mode provided by

AI 에이전트(AI Agent)(10)는 행동 Action(a_t) 전에 AI Mode를 판단한다. AI 에이전트(AI Agent)(10)는 AI Mode를 통해 제약조건 하에서 특정 행동을 수행하며 기존 강화학습(RL) 개념과 같이 탐색(Exploration)과 이용(Exploitation)을 통해 최적의 정책(Policy)을 추론 모듈(120a)을 통해 추론하여 구체화할 수 있다. AI Agent (AI Agent) 10 determines the AI Mode before the _{action Action (a t).} The AI Agent 10 performs specific actions under constraints through AI Mode and infers the optimal policy through Exploration and Exploitation like the existing Reinforcement Learning (RL) concept. It can be specified by inferring through the module 120a.

보다 구체적으로, 본 발명에서는 객체에 대해서 임무뿐만 아니라 과정(Process)도 규칙(Rules) 등에 부합해야 하기 때문에 사람의 개입이 필요하므로, 강화학습(RL)을 사용하여 액션체계를 구현하기 위해서는 일종의 제약조건들과 사람의 의사 개입이 이루어질 수 있도록 수정된 강화학습(RL) 모델 제시로 XAII 제공 모듈(130)을 구비할 수 있다. More specifically, in the present invention, human intervention is required because not only the task but also the process for the object must conform to the rules, etc. The XAII providing module 130 may be provided by presenting a modified reinforcement learning (RL) model so that conditions and human intentional intervention can be made.

도 4를 참조하면, XAII 제공 모듈(130)은 AI 에이전트(AI Agent)(10) 상에서 추론 모듈(120a)에 의해 AI 모드 제공 모듈(120b)을 통한 강화학습에 기반의 AI Mode로 액션(action) 추론을 수행하고 액션을 수행하기 위한 시간 T를 수평 축으로 설정한다. Referring to FIG. 4 , the XAII providing module 130 is an AI Mode based on reinforcement learning through the AI mode providing module 120b by the inference module 120a on the AI Agent 10 . ) to perform inference and set the time T for performing the action on the horizontal axis.

한편, 시간 T를 고려하면, 종래의 기술에 따른 DARPA의 XAI는 AI가 결과값을 도출하기까지의 과정에 주로 관심을 둔다. 즉 어떻게, 어떤 요소로 인해 결과값을 도출하였는지를 설명하는 개념이다. On the other hand, considering the time T, DARPA's XAI according to the prior art is mainly concerned with the process until the AI derives a result value. That is, it is a concept that explains how and by what factors the result was derived.

반면, 본 발명인 강화학습이 적용된 시스템에서 AI의 행동 의도 구현 시스템(1)의 XAII 제공 모듈(130)에 의해 수행되는 XAII 방식은 과거의 과정을 살펴보는 것이 아니라 AI의 가까운 미래 의도를 사람에게 설명하는 것으로 DARPA의 방식과는 차이가 있다. On the other hand, the XAII method performed by the XAII providing module 130 of the AI behavioral intention implementation system 1 in the system to which the present invention, reinforcement learning is applied, explains the near future intention of AI to a person, rather than looking at the past process. This is different from DARPA's method.

XAI와 구분을 위해 본 발명인 강화학습이 적용된 시스템에서 AI의 행동 의도 구현 시스템(1)의 XAII 제공 모듈(130)에 의한 방식을 XAII로 명명한다. 도 4에서 XAII 제공 모듈(130)은 현재 시간을 t로 가정한 경우, AI 에이전트(AI Agent)(10)가 사람에게 권고하는 행동(●)의 시행 시간은 t^*이 되며, 현재시간으로부터 권고하는 시행 시간간의 차이(t^*-t)는 사람의 의사결정 소요 시간이 된다. 현재 시간 t부터 에피소드(Episode)가 종료되는 시간 T까지 AI가 선택할 수 있는 액션(Action)은 추론 모듈(120a)에 의해 제공하는 AI mode 외에 제약조건, 작전지시 등에 종속될 수 있다. 이를 Pseudo Code(의사 코드)로 나타내면 도 5와 같을 수 있다. In order to distinguish from XAI, the method by the XAII providing module 130 of the AI behavioral intention implementation system 1 in the system to which the present inventors reinforcement learning is applied is named XAII. In FIG. 4, when the XAII providing module 130 assumes that the current time is t, the execution time of the action (●) recommended by the AI Agent 10 to a person becomes t ^* , and is recommended from the current time. The difference between execution times (t ^* -t) becomes the time required for human decision-making. From the current time t to the time T when the episode (Episode) ends, the action (Action) that the AI can select may be dependent on constraints, operation instructions, etc. in addition to the AI mode provided by the inference module 120a. If this is expressed as a pseudo code, it may be as shown in FIG. 5 .

도 5에서 파란 글씨체 부분인 "

"은 '윤리, 위험’에 관한 부분이고, 빨간 글쎄체 부분인 "

"은 XAII 제공 모듈(130)에서 제공하는 XAII 알고리즘이다.In FIG. 5, the blue font part is "

" is the part about 'ethics and risk', and the part in red font is "

" is the XAII algorithm provided by the XAII providing module 130 .

만약 AI 에이전트(AI Agent)(10)가 주어진 시간 t에서 주어진 AI Mode를 벗어나는 액션(Action)(a)을 선택한 경우(아직 결정은 안 한 상태임), AI 에이전트(AI Agent)(10)는 외부 데이터 연동부(140)를 통해 XAII 제공 모듈(120c)로 a와 동시에 차선책 a^-를 권고함으로써, XAII 제공 모듈(120c)에 의해 입출력부(110) 상에 권고 정보가 출력되도록 할 수 있다. 만약 사람이 입출력부(110)를 통해 a^-를 선택한 경우 XAII 제공 모듈(120c)은 a^-를 a로 최신화하고 알고리즘을 진행한다. 만약 사람이 입출력부(110)를 통해 a^-를 선택하지 않거나, 일정 시간 동안(t^*-t) 사람으로부터 응답이 없는 경우 XAII 제공 모듈(120c)은 AI 에이전트(AI Agent)(10)가 제시한 a^-는 사람에 의해 거절된 것으로 간주하여 AI 에이전트(AI Agent)(10)로 통지할 수 있다. If the AI agent (10) selects an action (a) that leaves the given AI mode at a given time t (the decision has not been made yet), the AI agent (10) ^{By recommending the next best solution a -} to the XAII providing module 120c through the external data interworking unit 140 at the same time as a, it is possible to output the recommended information on the input/output unit 110 by the XAII providing module 120c. ^{If a person selects a −} through the input/output unit 110 , the XAII providing module 120c ^{updates a −} to a and proceeds with the algorithm. ^{If the person does not select a -} through the input/output unit 110 or there is no response from the person for a certain period of time (t ^* -t), the XAII providing module 120c is presented by the AI agent 10 One a ^- may be deemed rejected by a person and notify the AI Agent (10).

AI 에이전트(AI Agent)(10)는 사람이 a^-를 생각할 수 있도록 일정 시간만큼 의사결정 시간(t^*-t)을 제공할 수 있다. 즉 이는 AI가 사람에게 의사결정 소요시간 t^*-t을 부여하고, AI 에이전트(AI Agent)(10)의 의도가 a^-인데 사람은 어떻게 생각하는지 물어봄으로써 AI의 의도를 설명할 수 있게 되는 알고리즘이다. AI Agent (AI Agent) 10 may provide a decision time (t ^* -t) for a ^{certain amount of time so that a person can think of a -.} In other words, this means that AI ^{can explain the intention of AI by giving a person a decision-making time t *} -t, and asking what the person thinks when the intention of the AI agent 10 is a ^- It is an algorithm.

AI의 의도가 사람의 의사와 일치하지 않은 경우, 사람은 AI 에이전트(AI Agent)(10)에 의해 제공된 다음 차선책을 선택할 수 있다. AI 에이전트(AI Agent)(10)는 하기의 표 2의 AI Mode가 낮을수록 저차원(Low level) 권고 내용, AI mode가 높을수록 고차원(High Level) 권고내용을 차선책으로 선택하여 제공할 수 있다. If the intention of the AI does not coincide with the intention of the person, the person may select the next next best solution provided by the AI Agent 10 . The AI Agent 10 may select and provide as a suboptimal solution the lower the AI Mode of Table 2 below, the lower the Low level recommendation content, and the higher the AI mode the higher the High Level recommendation content. .

이를 위해, XAII 제공 모듈(120c)은 강화학습 모델의 AI 에이전트(AI Agent)(10)가 추론 모듈(120a)에서 제공하는 액션(Action)(a)과 다른 차선택(a')를 선택하기 전에 AI mode 모델을 제공할 수 있다. 즉, 하기의 표 2는 AI 모드 제공 모듈(120b)의 수준이 0 내지 5까지 6단계로 나눠지며, 각 정의를 나타내고 있다.To this end, the XAII providing module 120c selects an action (a) different from the action (a) provided by the AI agent 10 of the reinforcement learning model in the inference module 120a. You can provide AI mode model before. That is, the following Table 2 is divided into six levels from 0 to 5 of the level of the AI mode providing module 120b, and each definition is shown.

AI ModeAI Mode 개념concept 구분division 내 용
Contents
제약조건constraint 00 사람이 상황 모니터링person monitoring the situation No AINo AI 사람에 의한 모든 국면 모니터링
AI 알고리즘 미개입Monitoring of all phases by humans
No AI Algorithm Intervention 미해당not applicable 1One 특정기능 AIspecific function AI 한 가지 액션 수행에 AI 적용
작전 수행 중 Mode 3까지 전환 가능Apply AI to performing a single action
Can be switched to Mode 3 during operation 해제불가non-cancellable 22 조합AIcombination AI 두 가지 이상 액션 수행에 AI 적용
작전 수행 중 Mode 3까지 전환 가능Apply AI to performing more than one action
Can be switched to Mode 3 during operation 〃〃 33 AI가 작전상황 모니터링AI monitoring the operational situation 조건부
AIconditional
AI 특정한 환경 하에서 모든 기능을 AI가 제어하고, 사람은 특수한 경우에만 개입AI controls all functions under a specific environment, and humans intervene only in special cases 사람에 의해 해제가능Can be released by a person 44 고도 AIAltitude AI AI가 모든 기능을 제어 및 모니터링, 사람은 주요상황 발생 대비 대기AI controls and monitors all functions, and humans wait for major situations 〃〃 55 완전 AIfull AI AA 사람이 개입하지 아니하고, 스스로 주어진 임무를 수행Without human intervention, the task is carried out on its own 〃〃 BB AI에 의해 해제가능Can be unlocked by AI

상기의 표 1의 AI Mode는 AI가 모델에 개입하는 수준을 나타내고, AI Mode 0은 현재의 액션체계를 나타내며, AI Mode의 단계가 높아질수록 AI 에이전트(AI Agent)(10)의 개입수준이 높아진다. AI Mode 0 내지 AI Mode 2는 사람이 상황을 모니터링 하는 개념이며, 제약조건(Constraints)은 해제 불가능하다. AI Mode in Table 1 above indicates the level at which AI intervenes in the model, AI Mode 0 indicates the current action system, and the higher the level of AI Mode, the higher the level of intervention of the AI Agent 10 increases. . AI Mode 0 to AI Mode 2 is a concept in which a person monitors the situation, and constraints cannot be released.

여기서 제약조건은 단위 객체 이외의 상급 객체에서 제시한 지침이며, 제약조건(작전지침)을 해제하거나 수정하기 위해서는 상급 객체의 수정된 지침 하달이 필요하다. 이러한 개념은 대형 단위 객체 그룹 및 중소형 단위 객체 그룹 대형 단위 및 중소형 단위에서 사용가능한 멀티-에이전트(Multi-Agent) 개념의 지휘통제를 위해 확장성을 고려하여 제시되는 것이 바람직하다. Here, the constraint is a guideline presented by a higher level object other than the unit object, and in order to release or modify the constraint (operation guideline), it is necessary to receive the modified guideline from the higher level object. This concept is preferably presented in consideration of scalability for command and control of a multi-agent concept that can be used in large unit object groups and small and medium unit object groups large units and small and medium-sized units.

AI Mode 0은 현재의 액션체계와 같고, AI Mode 1은 한 가지 액션 그룹 수행 시 AI를 활용하는 개념으로 각 성분 액션이 이에 해당한다. 현재 운용중인 액션체계의 대형 단위 및 중소형 단위 등의 등의 성분작전 모듈 개념과 유사하나, AI가 적용되었다는 차이점이 있다.AI Mode 0 is the same as the current action system, and AI Mode 1 is a concept that utilizes AI when performing one action group, and each component action corresponds to this. It is similar to the concept of component operation module such as large units and small and medium-sized units of the currently operating action system, but with the difference that AI is applied.

AI Mode 2는 조합 AI로 두 가지 이상의 성분작전이 조합된 AI이다. AI Mode 1 및 AI Mode 2는 필요시 AI Mode 3까지 자동 또는 수동적으로 전환이 가능하다. 이러한 이유는 AI Mode 3부터는 AI가 액션 그룹 상황을 모니터링하기 때문에, 신속한 임무 전환 및 임무 추가 등을 위해서이다.AI Mode 2 is a combination AI, an AI in which two or more component operations are combined. AI Mode 1 and AI Mode 2 can be automatically or manually switched to AI Mode 3 if necessary. The reason for this is that from AI Mode 3, AI monitors the action group situation, so it is for rapid mission switching and addition of missions.

AI Mode 3은 특정한 환경 하에서 모든 기능을 AI가 제어하며, 객체의 사람인 AI 에이전트(AI Agent)(10)와 같이 작전수행을 하는 사람은 특수한 경우(예, 무기사용, 특정구역 진입 등)에만 개입하는 것에 해당한다.In AI Mode 3, all functions are controlled by AI under a specific environment, and a person performing an operation such as an AI agent (10), a person of the object, intervenes only in special cases (eg, use of weapons, entry into a specific area, etc.) corresponds to doing

또한, AI Mode 4는 모든 환경을 AI가 모니터링을 하는 것으로 작전요원은 주요 상황 발생 대비 대기를 하는 개념으로, 주요 상황이란 대응세력의 특정 무기체계 사용, 군사적 상황과 관련된 정보/첩보가 수집되어 행동이 필요한 상황을 의미한다.In addition, AI Mode 4 is a concept in which AI monitors all environments, and the operational personnel wait for a major situation to occur. This means a necessary situation.

AI Mode 5는 AI에 의해 모든 것이 수행되나, AI Mode 5(A)는 사람에 의해서만 제약조건 해제가 가능하다. AI Mode 5(B)는 인공지능이 스스로 제약조건을 해제하는 개념으로, 일반적으로 생각하는 '킬러 로봇' 또는 '자율살상무기'가 여기에 해당된다. 미래 군사력 건설개념을 빗대어 살펴보면 AI Mode 5(B)는 구현될 가능성이 희박하며 해당 Mode 사용은 철저한 통제와 제한이 필요하다.In AI Mode 5, everything is performed by AI, but in AI Mode 5(A), constraints can be released only by humans. AI Mode 5(B) is a concept in which artificial intelligence releases constraints on its own, and generally considered 'killer robots' or 'autonomous lethal weapons' fall into this category. Comparing the concept of future military power construction, AI Mode 5(B) is unlikely to be implemented, and the use of this mode requires strict control and restrictions.

표 2와 같이 정의한 AI Mode를 적용하며, AI 에이전트(AI Agent)(10)는 행동 Action(a_t) 전에 AI Mode를 판단한다. AI 에이전트(AI Agent)(10)는 AI Mode를 통해 제약조건 하에서 특정 액션을 수행하며 기존 강화학습(RL) 개념과 같이 탐색(Exploration)과 이용(Exploitation)을 통해 최적의 정책(Policy)을 추론 모듈(120a)을 통해 추론하여 구체화하는 방식으로 차선책을 권고할 수 있다. The AI Mode defined as shown in Table 2 is applied, and the AI Agent 10 determines the AI Mode before the _{action Action (a t ).} AI Agent 10 performs specific actions under constraints through AI Mode and infers the optimal policy through Exploration and Exploitation like the existing Reinforcement Learning (RL) concept. A suboptimal solution may be recommended by inferring through the module 120a and specifying it.

한편, 해당 AI Mode로 학습에 따라 차선책 권고에 따라서도 원하는 결과이 나오지 아니한 경우 AI 에이전트(AI Agent)(10)는 AI 모드 제공 모듈(120b)로 AI Mode에 대한 1단계 상향 권고를 요청할 수 있다. On the other hand, if the desired result is not obtained even according to the next-best recommendation according to the learning in the corresponding AI mode, the AI agent 10 may request a recommendation for the AI mode one-step upward to the AI mode providing module 120b.

여기서, AI 에이전트(AI Agent)(10)는 평소 AI Mode 5로 셀프-트레이닝(Self-Training)을 통해 데이터를 축적하며, MUM-T(Manned and Unmanned Teaming) 개념의 훈련 상황 시 사람에 의해 습득한 데이터를 축적한다. Here, the AI agent (AI Agent) 10 accumulates data through self-training in AI Mode 5 as usual, and is acquired by a human during the training situation of the MUM-T (Manned and Unmanned Teaming) concept. accumulate one data.

제 6 단계로 훈련을 하는 이유는 대응세력의 각종 행동 예측이 사실상 제한되고, AI 에이전트(AI Agent)(10)는 다양한 환경에서 훈련을 한 상태에서 사람에게 권고(Recommend)를 해야 수용 가능성이 높아지기 때문이다. 그리고 AI 에이전트(AI Agent)(10)가 다양한 환경(20) 하에서 훈련을 하면, 특수상황에 대처할 수 있는 권고 능력을 갖기 때문이다. 이는 강화학습(RL)의 탐색(Exploration)과 이용(Exploitation) 개념과 같다.The reason for training in the sixth stage is that the prediction of various behaviors of the responding force is practically limited, and the AI agent 10 needs to make recommendations to humans in a state of training in various environments to increase acceptance. Because. And when the AI agent (AI Agent) 10 is trained under various environments 20, it is because it has a recommendation ability to cope with a special situation. This is the same as the concept of Exploration and Exploitation in reinforcement learning (RL).

본 발명에서 추론 모듈(120a)은 환경(20)의 현재 상태를 특징짓는 에이전트(10)의 센서 중 카메라 센서에 의해 제공된 촬영 이미지를 수신하고, 촬영 이미지에 포함된 파라미터들(적 함포 인식, 초계기 등장 등의 환경 파라미터)의 현재 값들에 따라 동작 선택출력을 생성하도록 촬영 이미지를 포함하는 입력을 프로세싱하고, 기하학적 예측 뉴럴 네트워크를 이용하여, 현재 상태에서의 환경 파라미터들의 기하학적 피처의 값을 예측하도록 동작 선택 정책 뉴럴 네트워크에 의해서 생성된 출력을 프로세싱하여 AI 모드 제공 모듈(120b)에 의해 수행되는 강화학습에 기반하여 AI Mode에 맞는 전술을 데이터 저장부(130)에서 추출하여 활용할 수 있다. In the present invention, the inference module 120a receives the captured image provided by the camera sensor among the sensors of the agent 10 characterizing the current state of the environment 20, and parameters included in the captured image (recognition of enemy guns, patrol aircraft) processing an input comprising a captured image to generate a motion selection output according to current values of environmental parameters such as appearance, etc.) Based on the reinforcement learning performed by the AI mode providing module 120b by processing the output generated by the selection policy neural network, a tactic suitable for the AI Mode may be extracted from the data storage unit 130 and utilized.

결과적으로, XAII 제공 모듈(120c)은 강화학습 모델의 AI 에이전트(AI Agent)(10)가 추론 모듈(120a)에서 제공하는 액션(Action)(a)과 다른 차선택(a')를 선택하기 전에 AI mode 모델을 제공한 뒤, 현재의 AI mode에 따라 차선책(a^-)를 제공한 뒤, 사람에 의해 거절된 경우 보다 Reinforcement Learning function(강화 학습 기능)의 State-Action Value가 낮은 차선책을 제공할 수 있다. As a result, the XAII providing module 120c selects a difference (a') different from the action (A) (a) provided by the AI agent 10 of the reinforcement learning model in the inference module 120a. After providing the AI mode model before, providing the next best solution (a ^- ) according to the current AI mode, and then providing the next best solution with a lower State-Action Value of the Reinforcement Learning function (reinforcement learning function) than when rejected by a human can do.

보다 구체적으로, AI 에이전트(AI Agent)(10)의 액션 의도 a는 추적레이더를 작동시키는 것이지만, 이 행동은 주어진 Mode 1을 만족하지 아니하는 행동이기 때문에, 사람에게 첫 번째 차선책인 탐색레이더, 두 번째 차선책인 전자광학추적장비, 세 번째 차선책인 항해레이더를 작동하도록 권고할 수 있으며, a^-는 다음과 같이 표현 가능하다. a^-1, a^-2, a^-3, ... . 예시) : Search R/D On, a^-2 : EOTS On, a^-3 : SPS R/D On, ... . 한편, 저차원(Low Level) 권고 내용으로는 권고 침로·속력, 항해 R/D 작동 등일 수 있으며, 고차원(High Level) 권고 내용은 대함 미사일 발사준비, SONAR Active Mode 작동 등일 수 있다. More specifically, the action intention a of the AI agent 10 is to operate the tracking radar, but since this action is an action that does not satisfy the given Mode 1, the first suboptimal search radar, two It can be recommended to operate the electro-optical tracking device, which is the second best solution, and the navigation radar, which is the third best solution, and a ^- can be expressed as follows. a ^-1 , a ^-2 , a ^-3 , ... . Example) : Search R/D On, a ^-2 : EOTS On, a ^-3 : SPS R/D On, ... . On the other hand, low-level recommendations may include recommended course/speed, navigation R/D operation, etc., and high-level recommendations may include anti-ship missile launch preparation and SONAR Active Mode operation.

한편 객체가 함정 등인 경우 XAII를 사용하기 위해서는 사람인 軍 작전요원이 쉽게 이해하도록 UI(User Interface)를 구축하는 것도 중요하다. 지금까지 공개된 DARPA의 XAI 연구내용을 살펴본 바로는 사용자에게 문자(Text)로 표현하는 방법을 연구으로 이 방법은 작전요원이 문자를 읽고, 이해하고, 다음 행동을 고려하는데 편리한 방법으로 보이지는 않는다. On the other hand, in order to use XAII when the object is a ship, it is also important to build a UI (User Interface) so that the human military operation personnel can easily understand it. According to DARPA's XAI research, which has been published so far, it is a study of how to express text to users. .

본 발명에서는 이런 예상되는 문제를 경감하기 위해, 작전요원의 작전 반응시간과 의사결정 시간을 줄이고, 직관적으로 판단할 수 있도록 XAII의 표현 방법을 도 6과 같이 제시할 수 있다. In the present invention, in order to alleviate this expected problem, the expression method of XAII can be presented as shown in FIG. 6 to reduce the operational reaction time and decision-making time of operational personnel, and to intuitively judge.

즉 도 6은 운영에 대한 XAII 프레젠테이션(XAII Presentation to Operations)을 나타내는 것으로 좌상단으로부터 화살표 방향으로 좌하단으로 순차적으로 순번이 "①, ②, ..., ⑧"로 정해질 수 있다.That is, FIG. 6 shows the XAII Presentation to Operations, and sequentially from the upper left to the lower left in the direction of the arrow, the order may be determined as "①, ②, ..., ⑧".

XAII 제공 모듈(120c)은 입출력부(110) 상으로 도 6의 검은색 글씨와 화살표는 현재 시간 t에서의 함정 액션(Action)을 표시하도록 제어하고, 파란색은 가까운 미래 시간 에서의 AI 의도(AI's Intention in the near future)를 표시하도록 제어할 수 있다. 도 6에서 순번 ①의 AI 의도는 우현 변침 및 속력 감속이고, 순번 ②는 좌현 SAM 미사일 발사를 의도한다. 순번 ③은 실제 SAM을 발사하고, 현 침로와 속력을 유지하기를, 순번 ④는 우현 변침을 의도한다. The XAII providing module 120c controls the input/output unit 110 to display the trap action (Action) at the current time t in the black text and arrows of FIG. 6, and the blue color in the near future time (AI's Intention in the near future) can be controlled. In FIG. 6, the AI intention of turn ① is starboard shifting and speed reduction, and turn ② intends to launch a portside SAM missile. Turn ③ intends to fire the actual SAM and maintain the current course and speed, and turn ④ intends to shift to starboard.

순번 ⑤ AI는 우현 Chaff 발사를 의도하고, ⑥ Chaff 雲으로 들어가는 것을, 순번 ⑦은 Chaff 雲에서 우현 변침을, 순번 ⑧은 우현으로 SAM 발사를 의도한다. TSCE에서는 기관과 항해장비 등 Platform Capability를 AI가 사용할 수 있기 때문에, AI Mode가 높을수록 침로-속력 등 저수준(Low Level)의 의도와 권고보다는, 고수준(High Level, 예), SAM, Chaff 발사, EA 등) 의도와 권고 내용으로 상향된다.Sequence number ⑤ AI intends to launch the starboard chaff, ⑥ intends to enter the chaff 雲, the sequence number ⑦ intends to fire the SAM from the chaff 雲 to the starboard, and the sequence number ⑧ intends to launch the SAM to the starboard. In TSCE, since AI can use platform capabilities such as engines and navigation equipment, the higher the AI Mode, the higher the level of intention and recommendation, such as course-speed, and the higher level (High Level, e.g.), SAM, Chaff launch, EA, etc.) intentions and recommendations.

사람은 AI가 자신의 의도하는 행동에 해당하는 Time-Step(도 6의 "

")에 개입하여 자신의 의사를 AI에게 전달할 수 있다. 즉 XAII 제공 모듈(120c)의 XAII 알고리즘은 AI가 자신의 의도를 사람에게 미리 설명함으로써, 구체화되는 알고리즘이며, 사람이 의도치 않은 AI의 행동을 방지할 수 있는 알고리즘이다. A human is a Time-Step (“ in Fig. 6

") to communicate their intention to the AI. That is, the XAII algorithm of the XAII providing module 120c is an algorithm that is materialized by the AI explaining its intention to a person in advance, and It is an algorithm that can prevent behavior.

본 발명은 또한 컴퓨터로 읽을 수 있는 기록매체에 컴퓨터가 읽을 수 있는 코드로서 구현하는 것이 가능하다. 컴퓨터가 읽을 수 있는 기록매체는 컴퓨터 시스템에 의하여 읽혀질 수 있는 데이터가 저장되는 모든 종류의 기록 장치를 포함한다.The present invention can also be implemented as computer-readable codes on a computer-readable recording medium. The computer-readable recording medium includes all kinds of recording devices in which data readable by a computer system is stored.

컴퓨터가 읽을 수 있는 기록매체의 예로는 ROM, RAM, CD-ROM, 자기테이프, 플로피 디스크, 광 데이터 저장장치 등이 있으며, 또한 캐리어 웨이브(예를 들어, 인터넷을 통한 전송)의 형태로 구현되는 것도 포함한다. Examples of computer-readable recording media include ROM, RAM, CD-ROM, magnetic tape, floppy disk, optical data storage device, etc. also includes

또한 컴퓨터가 읽을 수 있는 기록매체는 네트워크로 연결된 컴퓨터 시스템에 분산되어, 분산방식으로 컴퓨터가 읽을 수 있는 코드가 저장되고 실행될 수 있다. 그리고 본 발명을 구현하기 위한 기능적인(functional) 프로그램, 코드 및 코드 세그먼트들은 본 발명이 속하는 기술 분야의 프로그래머들에 의해 용이하게 추론될 수 있다.In addition, the computer-readable recording medium is distributed in a computer system connected to a network, so that the computer-readable code can be stored and executed in a distributed manner. And functional programs, codes, and code segments for implementing the present invention can be easily inferred by programmers in the technical field to which the present invention pertains.

이상과 같이, 본 명세서와 도면에는 본 발명의 바람직한 실시예에 대하여 개시하였으며, 비록 특정 용어들이 사용되었으나, 이는 단지 본 발명의 기술 내용을 쉽게 설명하고 발명의 이해를 돕기 위한 일반적인 의미에서 사용된 것이지, 본 발명의 범위를 한정하고자 하는 것은 아니다. 여기에 개시된 실시예 외에도 본 발명의 기술적 사상에 바탕을 둔 다른 변형 예들이 실시 가능하다는 것은 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자에게 자명한 것이다.As described above, preferred embodiments of the present invention have been disclosed in the present specification and drawings, and although specific terms are used, these are only used in a general sense to easily explain the technical contents of the present invention and to help the understanding of the present invention. , it is not intended to limit the scope of the present invention. It is apparent to those of ordinary skill in the art to which the present invention pertains that other modifications based on the technical spirit of the present invention can be implemented in addition to the embodiments disclosed herein.

1 : 강화학습이 적용된 시스템에서 AI의 행동 의도 구현 시스템
10 : AI 에이전트(AI Agent) 100 : AI 터미널
110 : 입출력부 120 : 제어부
120a : 추론 모듈 120b : AI 모드 제공 모듈
120c : XAII 제공 모듈 130 : 데이터 저장부
140 : 외부 데이터 연동부1: AI behavioral intention implementation system in a system to which reinforcement learning is applied
10: AI Agent 100: AI Terminal
110: input/output unit 120: control unit
120a: inference module 120b: AI mode providing module
120c: XAII providing module 130: data storage unit
140: external data linkage unit

Claims

입출력부(110), 제어부(120), 외부 데이터 연동부(140)를 포함하여 이루어지는 AI 터미널(100), 그리고 AI 터미널(100)의 외부 데이터 연동부(140)를 통해 AI 터미널(100)의 제어부(120)와 신호 및 데이터 송수신을 수행하는 적어도 하나 이상의 AI 에이전트(AI Agent)(10)를 포함하는 강화학습이 적용된 시스템에서 AI의 행동 의도 구현 시스템(1)에 있어서, 제어부(120)는, AI 에이전트(AI Agent)(10)에 의해 수행되는 인공지능에 의한 강화학습이 적용된 시스템 상에서 AI 에이전트(AI Agent)(10)의 행동 의도 구현을 수신하여 입출력부(110)로 출력하고 결과값을 입출력부(110)로부터 반환받아 적용시키는 XAII(eXplainable AI's Intention) 제공 모듈(120c); 을 구비하는 것을 특징으로 하는 강화학습이 적용된 시스템에서 AI의 행동 의도 구현 시스템.
AI terminal 100 including the input/output unit 110, the control unit 120, the external data interworking unit 140, and the AI terminal 100 through the external data interworking unit 140 of the AI terminal 100 In the system 1 to which reinforcement learning is applied, including at least one AI agent 10 that performs signal and data transmission and reception with the control unit 120, the control unit 120 includes: , receives the implementation of the action intention of the AI agent (AI Agent) 10 on the system to which reinforcement learning by artificial intelligence performed by the AI agent 10 is applied, and outputs the result to the input/output unit 110 an eXplainable AI's Intention (XAII) providing module 120c for receiving and applying the returned from the input/output unit 110; AI behavioral intention implementation system in a system to which reinforcement learning is applied, characterized in that it comprises a.

청구항 1에 있어서, XAII 제공 모듈(120c)은,
강화학습이 적용된 시스템에서 AI의 행동 의도를 설명하기 위한 Pseudo Code를 제공하는 것을 특징으로 하는 강화학습이 적용된 시스템에서 AI의 행동 의도 구현 시스템.
The method according to claim 1, XAII providing module (120c),
A system for implementing the behavioral intention of AI in a system to which reinforcement learning is applied, characterized in that it provides a pseudo code for explaining the behavioral intention of AI in a system to which reinforcement learning is applied.

청구항 1에 있어서, XAII 제공 모듈(130)은,
객체에 대해서 임무뿐만 아니라 과정(Process)도 규칙(Rules) 등에 부합해야 하기 때문에 사람의 개입이 필요하므로, 강화학습(RL)을 사용하여 액션체계를 구현하기 위해서는 제약조건들과 사람의 의사 개입이 이루어질 수 있도록 수정된 강화학습(RL) 모델을 제시하는 것을 특징으로 하는 강화학습이 적용된 시스템에서 AI의 행동 의도 구현 시스템.
The method according to claim 1, XAII providing module 130,
For an object, human intervention is required because not only the task but also the process must conform to the rules, so to implement the action system using reinforcement learning (RL), constraints and human intervention AI behavioral intention implementation system in a system to which reinforcement learning is applied, characterized in that it presents a modified reinforcement learning (RL) model so that it can be made.

청구항 1에 있어서, 제어부(120)는,
AI 모드 제공 모듈(120b); 및
AI 모드 제공 모듈(120b)에 의해 수행되는 강화학습에 기반하여 AI Mode로 액션(action) 추론을 수행하고, 데이터 저장부(130)에 저장되는 추론되는 액션에 따른 객체 수행에 필요한 데이터를 제공받아 AI 에이전트(AI Agent)에 대해서 액션을 수행하도록 함으로써, 인공지능에 의한 기존 강화학습 모델에 AI의 개입 수준 정의와 정의가 개입할 수 있는 강화학습 모델을 기반으로 액션이 수행되어 수정된 강화학습 모델과 구현 방법을 제공하는 추론 모듈(120a); 을 더 포함하는 것을 특징으로 하는 강화학습이 적용된 시스템에서 AI의 행동 의도 구현 시스템.
The method according to claim 1, The control unit 120,
AI mode providing module (120b); and
Action inference is performed in AI Mode based on reinforcement learning performed by the AI mode providing module 120b, and data required for performing an object according to the inferred action stored in the data storage unit 130 is provided. Reinforcement learning model modified by performing actions based on the reinforcement learning model in which AI intervention level definition and definition can intervene in the existing reinforcement learning model by artificial intelligence by allowing the AI agent to perform the action and an inference module 120a that provides an implementation method; AI behavioral intention implementation system in a system to which reinforcement learning is applied, characterized in that it further comprises a.

청구항 1에 있어서, AI 에이전트(AI Agent)(10)는,
강화학습(RL)에서 학습자로 스스로 학습하는 컴퓨터인 것을 특징으로 하는 강화학습이 적용된 시스템에서 AI의 행동 의도 구현 시스템.
The method according to claim 1, AI agent (AI Agent) (10),
In Reinforcement Learning (RL), a system for implementing the behavioral intention of AI in a system to which reinforcement learning is applied, characterized in that it is a computer that learns by itself as a learner.

제어부(120)의 추론 모듈(120a)이, AI 모드 제공 모듈(120b)에 의해 수행되는 강화학습에 기반하여 AI Mode로 액션(action) 추론을 수행하고, 데이터 저장부(130)에 저장되는 추론되는 액션에 따른 객체 수행에 필요한 데이터를 제공받아 AI 에이전트(AI Agent)(10)에 대해서 액션을 수행하도록 하는 단계; 및
제어부(120)의 XAII 제공 모듈(120c)이, AI 에이전트(AI Agent)(10)에 의해 수행되는 인공지능에 의한 강화학습이 적용된 시스템 상에서 AI 에이전트(AI Agent)(10)의 행동 의도 구현을 수신하여 입출력부(110)로 출력하고 결과값을 입출력부(110)로부터 반환받아 적용시키는 단계; 를 포함하는 것을 특징으로 하는 강화학습이 적용된 시스템에서 AI의 행동 의도 구현 방법.The reasoning module 120a of the control unit 120 performs action inference in AI Mode based on reinforcement learning performed by the AI mode providing module 120b, and inference stored in the data storage unit 130 receiving data necessary for performing an object according to an action to be performed and performing an action on an AI agent (AI Agent) (10); and
The XAII providing module 120c of the control unit 120 implements the action intention of the AI agent 10 on the system to which reinforcement learning by artificial intelligence performed by the AI agent 10 is applied. receiving, outputting to the input/output unit 110, and receiving and applying a result value returned from the input/output unit 110; A method of implementing the behavioral intention of AI in a system to which reinforcement learning is applied, characterized in that it comprises a.