KR20190094068A

KR20190094068A - Learning method of classifier for classifying behavior type of gamer in online game and apparatus comprising the classifier

Info

Publication number: KR20190094068A
Application number: KR1020180059538A
Authority: KR
Inventors: 권형진; 양성일; 지형근
Original assignee: 한국전자통신연구원
Priority date: 2018-01-11
Filing date: 2018-05-25
Publication date: 2019-08-12

Abstract

According to the present invention, as a behavior type of a gamer is classified by reflecting all of a change in a game environment and a change in a simple behavior tendency of the gamer, through a classification result, behavior in a game of the gamer is accurately and dynamically understood to provide an appropriate guideline for game management.

Description

온라인 게임에서 게이머 행동 유형을 분류하는 분류기의 학습 방법 및 상기 분류기를 포함하는 장치{LEARNING METHOD OF CLASSIFIER FOR CLASSIFYING BEHAVIOR TYPE OF GAMER IN ONLINE GAME AND APPARATUS COMPRISING THE CLASSIFIER}LEARNING METHOD OF CLASSIFIER FOR CLASSIFYING BEHAVIOR TYPE OF GAMER IN ONLINE GAME AND APPARATUS COMPRISING THE CLASSIFIER}

본 발명은 온라인 게임에서 게이머의 행동을 예측하기 위해 게이머의 행동 유형을 분류하는 기술에 관한 것이다.The present invention relates to a technique for classifying gamers 'behavior types to predict gamers' behavior in online games.

최근, 게임 서비스의 운영 정책을 결정하기 위해, 게임 내에서 게이머의 행동을 예측하는 행동 예측 모델에 대해 연구가 활발히 진행되고 있다. 게이머의 행동을 예측하기 위해, 게이머의 행동 변화를 일으키는 다양한 원인을 입력변수로 사용하여 게이머의 행동 유형을 분류하는 분류기가 필요하다.Recently, in order to determine an operation policy of a game service, research is being actively conducted on a behavior prediction model that predicts a gamer's behavior in a game. In order to predict the behavior of the gamers, a classifier is needed to classify the types of the gamers 'behaviors using various causes as the input variables that cause the gamers' behavioral changes.

그런데 종래의 분류기는 게임 내의 환경 변화를 입력변수로 하여 게이머의 행동 유형을 분류한 결과를 제공하고 있을 뿐, 게이머의 행동 성향의 변화를 고려한 분류 결과를 제공하지 못하고 있다.However, the conventional classifier provides only the result of classifying the gamers 'behavior type by using the environment change in the game as an input variable, and does not provide the classification result considering the change of the gamers' behavioral disposition.

즉, 종래의 분류기에서 제공하는 게이머의 행동 유형을 분류한 결과는, 게임 내의 환경 변화가 일어난 경우, 게임 내에서 게이머의 행동 변화의 원인이 게임 내의 환경 변화인지 아니면 게이머의 행동 성향의 변화인지를 정확히 반영하지 못한다. 게임 내의 환경 변화가 일어나더라도 이와는 무관하게 게이머의 내재적 성향(Intrinsic propensity 또는 personality)에 의해 게이머의 행동 변화는 일어날 수 있다.In other words, the result of classifying the types of gamers' behaviors provided by the conventional classifiers is to determine whether the cause of the gamers' behavior change in the game is the environment change or the gamers' behavioral propensity when the environment changes in the game. It does not reflect accurately. Even if the environment changes in the game, regardless of the gamers 'intrinsic propensity or personality, the gamers' behavior change may occur.

또한, 종래의 분류기는 게임의 내부적인 환경 변화만을 반영하여 게이머의 행동 유형을 분류한 결과를 제공하고 있을 뿐, 게임의 외부적인 환경 변화를 반영하여 게이머의 행동 유형을 분류한 결과를 제공하지는 못한다.In addition, the conventional classifier provides a result of classifying the gamers 'behavior type by reflecting only the internal environment change of the game, and does not provide a result of classifying the gamers' behavior type by reflecting the external environment changes of the game. .

이와 같이, 종래의 분류기는 게이머의 행동 성향, 게임의 내부적 환경변화 및 외부적 환경 변화를 모두 고려하여 게이머의 행동 유형을 분류하기 때문에, 이러한 분류 결과를 기반으로 한 게이머의 행동 예측 결과로는 최적의 게임 운영 정책을 수립하는 데 한계가 있다.As such, the conventional classifier classifies gamers' behavior types in consideration of gamers' behavioral tendencies, game internal environment changes, and external environment changes, and thus is the best predictor for gamers' behavior prediction. There is a limit to establishing a game operation policy.

따라서, 본 발명의 목적은 게임의 내부 환경 변화, 게임의 외부 환경 변화 및 게이머의 행동 성향의 변화를 모두 반영하여 게이머의 행동 유형을 분류하는 온라인 게임 서비스에서 게이머 행동 유형을 분류하는 분류기의 학습 방법 및 상기 분류기를 포함하는 장치를 제공하는 데 있다.Accordingly, an object of the present invention is a learning method of a classifier for classifying gamers' behavior types in an online game service that classifies gamers' behavior types by reflecting all changes in the game's internal environment, the game's external environment, and gamers' behavioral dispositions. And a device comprising the classifier.

상술한 목적을 달성하기 위한 본 발명의 일면에 따른 분류기의 학습 방법은, 특징 벡터 추출 알고리즘을 이용하여, 수집된 게이머의 행동 속성 및 온라인 게임의 환경 속성과 관련된 데이터로부터 특징 벡터들을 추출하는 단계; 군집화 알고리즘을 이용하여, 상기 추출된 특징 벡터들을 군집화하여 학습 데이터 군집을 생성하는 단계; 상기 학습 데이터 군집에 대해 데이터 레이블을 레이블링하는 단계; 및 기계학습 알고리즘을 이용하여 상기 데이터 레이블로 상기 분류기를 학습시키는 단계를 포함한다.According to an aspect of the present invention, there is provided a method of learning a classifier, the method comprising: extracting feature vectors from data associated with collected gamers' behavioral attributes and environment attributes of an online game using a feature vector extraction algorithm; Generating a training data cluster by clustering the extracted feature vectors using a clustering algorithm; Labeling data labels for the training data clusters; And training the classifier with the data label using a machine learning algorithm.

본 발명의 다른 일면에 따른 분류기를 포함하는 장치는, 통신망에 접속된 상기 온라인 게임을 서비스하는 게임 서버, 다른 온라인 게임을 서비스하는 다른 온라인 게임 서버 및 SNS 서버와 통신하는 통신 모듈; 상기 통신 모듈을 통해 게이머의 행동 속성 및 온라인 게임의 환경 속성과 관련된 데이터를 수집하는 데이터 수집 모듈; 및 특징 벡터 추출 알고리즘을 실행하여 상기 수집된 게이머의 행동 속성 및 온라인 게임의 환경 속성과 관련된 데이터로부터 특징 벡터들을 추출하고, 군집화 알고리즘을 실행하여 상기 추출된 특징 벡터들이 군집화된 학습 데이터 군집을 생성하고, 상기 학습 데이터 군집에 대해 데이터 레이블을 레이블링하고, 상기 레이블링된 데이터 레이블을 이용하여 상기 분류기를 학습시키는 프로세서 모듈을 포함한다.According to another aspect of the present invention, an apparatus including a classifier includes: a communication server configured to communicate with a game server for servicing the online game connected to a communication network, another online game server for servicing another online game, and an SNS server; A data collection module that collects data related to behavioral attributes of gamers and environmental attributes of online games through the communication module; And a feature vector extraction algorithm to extract feature vectors from the collected gamers' behavioral attributes and online game environment data, and to perform a clustering algorithm to generate a cluster of learning data clustered with the extracted feature vectors. And a processor module for labeling a data label for the training data cluster and for training the classifier using the labeled data label.

본 발명에 따르면, 게이머의 행동을 다이나믹(dynamics)하게 이해할 수 있는 데이터(게이머의 행동 속성, 온라인 게임의 내부적 및 외부적 환경 속성과 관련된 데이터)를 이용하여 게이머의 행동 유형을 군집화하고, 그 군집화 결과를 기반으로 학습된 분류기를 제공함으로써, 이러한 분류기의 분류 결과로부터 게이머의 행동 유형을 정확하게 예측할 수 있고, 온라인 게임 내에서 게이머의 행동을 다이나믹(dynamics)하게 이해하여 게임 운영에 대해 적절한 가이드라인을 제공할 수 있다.According to the present invention, the types of gamers' behaviors are clustered and clustered using data that can dynamically understand gamers' behaviors (data related to gamers' behavioral attributes and internal and external environmental attributes of online games). By providing a classifier that is learned based on the results, it is possible to accurately predict the types of gamers 'behaviors from the classification results of these classifiers, and to dynamically understand the gamers' behaviors within the online game to develop appropriate guidelines for game operation. Can provide.

도 1은 본 발명의 일 실시 예에 따른 온라인 게임 서비스에서 게이머 행동 유형을 분류하는 장치를 나타낸 블록도이다.
도 2 및 3은 도 1에 도시한 군집화부에서 실행되는 군집화 개념을 도식적으로 나타낸 도면이다.
도 4은 본 발명의 일 실시 예에 따른 분류기의 학습 방법을 나타낸 흐름도이다.
도 5는 본 발명의 다른 실시 예에 따른 분류기의 학습 방법을 나타낸 흐름도이다.
도 6은 본 발명의 일 실시 예에 따른 학습 데이터 군집과 테스트 데이터 군집 간의 공통 영역의 개념을 도식적으로 나타낸 도면이다.
도 7은 도 5에 도시한 단계 S520의 상세 흐름도이다.
도 8은 본 발명의 일 실시 예에 따른 테스트 데이터 군집이 학습 데이터 군집의 군집 멤버와 군집 특성을 모두 유지한 경우를 도식적으로 나타낸 도면이다.
도 9는 본 발명의 일 실시 예에 따른 테스트 데이터 군집이 학습 데이터 군집의 군집 멤버를 유지하고, 학습 데이터 군집의 군집 특성을 유지하지 않는 경우를 도식적으로 나타낸 도면이다.
도 10 및 11은 본 발명의 일 실시 예에 따른 다수의 학습 데이터 군집이 환경 속성의 변화 시점 이후에 병합된 다양한 예들을 도식적으로 나타낸 도면들이다.1 is a block diagram illustrating an apparatus for classifying gamer behavior types in an online game service according to an embodiment of the present invention.
2 and 3 are diagrams schematically illustrating a clustering concept executed in the clustering unit shown in FIG. 1.
4 is a flowchart illustrating a learning method of a classifier according to an exemplary embodiment.
5 is a flowchart illustrating a learning method of a classifier according to another exemplary embodiment of the present invention.
6 is a diagram schematically illustrating a concept of a common area between a training data cluster and a test data cluster according to an embodiment of the present invention.
FIG. 7 is a detailed flowchart of step S520 shown in FIG. 5.
8 is a diagram schematically illustrating a case in which a test data cluster according to an embodiment of the present invention maintains both cluster members and cluster characteristics of a learning data cluster.
9 is a diagram schematically illustrating a case in which a test data cluster according to an embodiment of the present invention maintains cluster members of the training data cluster and does not maintain cluster characteristics of the training data cluster.
10 and 11 are diagrams illustrating various examples in which a plurality of learning data clusters are merged after a change point of an environment attribute according to an exemplary embodiment.

본 발명의 다양한 실시예는 다양한 변경을 가할 수 있고 여러 가지 실시예를 가질 수 있는 바, 특정 실시예들이 도면에 예시되고 관련된 상세한 설명이 기재되어 있다. 그러나 이는 본 발명의 다양한 실시예를 특정한 실시 형태에 대해 한정하려는 것이 아니며, 본 발명의 다양한 실시예의 사상 및 기술 범위에 포함되는 모든 변경 및/또는 균등물 내지 대체물을 포함하는 것으로 이해되어야 한다. 도면의 설명과 관련하여, 유사한 구성요소에 대해서는 유사한 참조 부호가 사용되었다.Various embodiments of the present invention may have various changes and various embodiments, and specific embodiments are illustrated in the drawings and related detailed descriptions are described. However, this is not intended to limit the various embodiments of the present invention to specific embodiments, it should be understood to include all modifications and / or equivalents and substitutes included in the spirit and scope of the various embodiments of the present invention. In the description of the drawings, similar reference numerals are used for similar elements.

본 발명의 다양한 실시예에서 사용될 수 있는“포함한다” 또는 “포함할 수 있다” 등의 표현은 개시(disclosure)된 해당 기능, 동작 또는 구성요소 등의 존재를 가리키며, 추가적인 하나 이상의 기능, 동작 또는 구성요소 등을 제한하지 않는다. 또한, 본 발명의 다양한 실시예에서, "포함하다" 또는 "가지다" 등의 용어는 명세서에 기재된 특징, 숫자, 단계, 동작, 구성요소, 부품 또는 이들을 조합한 것이 존재함을 지정하려는 것이지, 하나 또는 그 이상의 다른 특징들이나 숫자, 단계, 동작, 구성요소, 부품 또는 이들을 조합한 것들의 존재 또는 부가 가능성을 미리 배제하지 않는 것으로 이해되어야 한다.Expressions such as "comprises" or "can include" as used in various embodiments of the present invention indicate the existence of the corresponding function, operation or component disclosed, and additional one or more functions, operations or It does not restrict the components. In addition, in various embodiments of the invention, the terms "comprise" or "have" are intended to indicate that there is a feature, number, step, action, component, part, or combination thereof described in the specification, one Or other features or numbers, steps, operations, components, parts or combinations thereof in any way should not be excluded in advance.

본 발명의 다양한 실시예에서 "또는" 등의 표현은 함께 나열된 단어들의 어떠한, 그리고 모든 조합을 포함한 다. 예를 들어, "A 또는 B"는, A를 포함할 수도, B를 포함할 수도, 또는 A 와 B 모두를 포함할 수도 있다.In various embodiments of the present invention, the expression "or" includes any and all combinations of words listed together. For example, "A or B" may include A, may include B, or may include both A and B.

본 발명의 다양한 실시예에서 사용된 "제 1," "제2", "첫째" 또는 "둘째," 등의 표현들은 다양한 실시예들의 다양한 구성요소들을 수식할 수 있지만, 해당 구성요소들을 한정하지 않는다. 예를 들어, 상기 표현들은 해당 구성요소들의 순서 및/또는 중요도 등을 한정하지 않는다. 상기 표현들은 한 구성요소를 다른 구성요소와 구분하기 위해 사용될 수 있다. 예를 들어, 제1 사용자 기기와 제 2 사용자 기기는 모두 사용자 기기이며, 서로 다른 사용자 기기를 나타낸다. 예를 들어, 본 발명의 다양한 실시예의 권리 범위를 벗어나지 않으면서 제1 구성요소는 제2 구성요소로 명명될 수 있고, 유사하게 제2 구성요소도 제1 구성요소로 명명될 수 있다.Expressions such as "first," "second," "first," or "second," and the like used in various embodiments of the present invention may modify various elements of the various embodiments, but do not limit the corresponding elements. Do not. For example, the above expressions do not limit the order and / or importance of the corresponding elements. The above expressions may be used to distinguish one component from another. For example, both a first user device and a second user device are user devices and represent different user devices. For example, without departing from the scope of the various embodiments of the present invention, the first component may be referred to as the second component, and similarly, the second component may also be referred to as the first component.

어떤 구성요소가 다른 구성요소에 "연결되어" 있다거나 "접속되어" 있다고 언급된 때에는, 상기 어떤 구성요소가 상기 다른 구성요소에 직접적으로 연결되어 있거나 또는 접속되어 있을 수도 있지만, 상기 어떤 구성요소와 상기 다른 구성요소 사이에 새로운 다른 구성요소가 존재할 수도 있다고 이해되어야 할 것이다. 반면에, 어떤 구성요소가 다른 구성요소에 "직접 연결되어" 있다거나 "직접 접속되어" 있다고 언급된 때에는, 상기 어떤 구성 요소와 상기 다른 구성요소 사이에 새로운 다른 구성요소가 존재하지 않는 것으로 이해될 수 있어야 할 것이다.When a component is said to be "connected" or "connected" to another component, the component may or may not be directly connected to or connected to the other component. It is to be understood that there may be new other components between the other components. On the other hand, when a component is referred to as being "directly connected" or "directly connected" to another component, it will be understood that there is no new other component between the component and the other component. Should be able.

본 발명의 실시예에서 사용한 용어는 단지 특정한 실시 예를 설명하기 위해 사용된 것으로, 본 발명의 실시 예를 한정하려는 의도가 아니다. 단수의 표현은 문맥상 명백하게 다르게 뜻하지 않는 한, 복수의 표현을 포함한다.The terms used in the embodiments of the present invention are merely used to describe specific embodiments, and are not intended to limit the embodiments of the present invention. Singular expressions include plural expressions unless the context clearly indicates otherwise.

다르게 정의되지 않는 한, 기술적이거나 과학적인 용어를 포함해서 여기서 사용되는 모든 용어들은 본 발명의 실시예가 속하는 기술 분야에서 통상의 지식을 가진 자에 의해 일반적으로 이해되는 것과 동일한 의미를 가지고 있다. Unless defined otherwise, all terms used herein, including technical or scientific terms, have the same meaning as commonly understood by one of ordinary skill in the art to which embodiments of the present invention belong.

일반적으로 사용되는 사전에 정의되어 있는 것과 같은 용어들은 관련 기술의 문맥 상 가지는 의미와 일치하는 의미를 가지는 것으로 해석되어야 하며, 본 발명의 다양한 실시 예에서 명백하게 정의되지 않는 한, 이상적이거나 과도하게 형식적인 의미로 해석되지 않는다.Terms such as those defined in the commonly used dictionaries should be construed as having meanings consistent with the meanings in the context of the related art, and are ideally or excessively formal unless otherwise defined in various embodiments of the present invention. It is not interpreted in the sense.

이하, 도면을 참조하여 본 발명의 실시 예에 대해 상세히 설명한다.Hereinafter, exemplary embodiments of the present invention will be described in detail with reference to the accompanying drawings.

도 1은 본 발명의 일 실시 예에 따른 온라인 게임 서비스에서 게이머의 행동 유형을 분류하는 장치의 개략적인 구성을 나타낸 블록도이다.FIG. 1 is a block diagram illustrating a schematic configuration of an apparatus for classifying gamers' behavior types in an online game service according to an embodiment of the present invention.

도 1을 참조하면, 본 발명의 일 실시 예에 따른 게이머의 행동 유형을 분류하는 장치(100, 이하, '분류 장치')는 CPU, ROM과 RAM 등의 메모리, 디스플레이 등의 표시 장치, 키패드 및 마우스 등의 입력 장치와 데이터 기억 장치를 구비하는 일반적인 전자 장치를 말한다. 다만, 도 1에서는 본 발명과 관련된 기능 중심으로 분류 장치의 구성이 도시된다. Referring to FIG. 1, an apparatus for classifying a gamers' behavior type according to an embodiment of the present invention (hereinafter, referred to as “classifier”) includes a memory such as a CPU, a ROM, and a RAM, a display device such as a display, a keypad, and the like. A general electronic device including an input device such as a mouse and a data storage device. However, FIG. 1 illustrates a configuration of a classification apparatus centering on functions related to the present invention.

본 발명의 다양한 실시 예들에 따른 전자 장치는, 예를 들면, 스마트폰(smartphone), 태블릿 PC(tablet personal computer), 이동 전화기(mobile phone), 영상 전화기, 전자책 리더기(e-book reader), 데스크탑 PC(desktop personal computer), 랩탑 PC(laptop personal computer), 넷북 컴퓨터(netbook computer), 워크스테이션(workstation), 서버, PDA(personal digital assistant), PMP(portable multimedia player), MP3 플레이어, 모바일 의료기기, 카메라(camera), 또는 웨어러블 장치(wearable device) 중 적어도 하나를 포함할 수 있다.An electronic device according to various embodiments of the present disclosure may include, for example, a smartphone, a tablet personal computer, a mobile phone, a video phone, an e-book reader, Desktop personal computer (PC), laptop personal computer (PC), netbook computer, workstation, server, personal digital assistant (PDA), portable multimedia player (PMP), MP3 player, mobile medical It may include at least one of a device, a camera, or a wearable device.

전자 장치로 구현되는 분류 장치(100)는 온라인 게임 서비스에서 게이머의 행동을 예측하기 위해 게이머의 행동 유형을 분류하는 프로세스를 수행한다.The classification apparatus 100 implemented as an electronic device performs a process of classifying a gamers 'behavior type in order to predict the gamers' behavior in an online game service.

이를 위해, 본 발명의 일 실시 예에 따른 분류 장치(100)는 통신 모듈(110), 데이터 수집 모듈(120), 저장 모듈(130), 프로세서 모듈(140) 및 표시 모듈(150)을 포함하도록 구성될 수 있다. 여기서, 본 명세서에서 사용된 표현 "~하도록 구성된(configured to)"은 상황에 따라, 예를 들면, "~에 적합한(suitable for)," "~하는 능력을 가지는(having the capacity to)," "~하도록 설계된(designed to)," "~하도록 변경된(adapted to)," "~하도록 만들어진(made to)," 또는 "~를 할 수 있는(capable of)"과 바꾸어 사용될 수 있다.To this end, the classification apparatus 100 according to an embodiment of the present invention includes a communication module 110, a data collection module 120, a storage module 130, a processor module 140, and a display module 150. Can be configured. Here, the expression "configured to" as used herein is, for example, "suitable for," "having the capacity to," depending on the situation. It may be used interchangeably with "designed to," "adapted to," "made to," or "capable of."

통신 모듈(110)Communication module (110)

통신 모듈(110)은 네트워크(50)에 접속할 수 있는 구성으로, "통신 인터페이스"라는 용어로 대체될 수 있다. 통신 모듈(110)은 분류 장치(100)와 외부 장치와의 통신을 설정할 수 있다. 예를 들면, 통신 모듈(110)은 무선 통신 또는 유선 통신을 통해 네트워크(50)에 접속에 할 수 있는 외부 장치와 통신할 수 있다.The communication module 110 may be replaced with the term "communication interface" in a configuration capable of connecting to the network 50. The communication module 110 may establish communication between the classification apparatus 100 and an external device. For example, the communication module 110 may communicate with an external device capable of accessing the network 50 through wireless or wired communication.

무선 통신은, 예를 들면, LTE(long-term evolution), LTE-A(LTE Advance), CDMA(code division multiple access), WCDMA(wideband CDMA), UMTS(universal mobile telecommunications system), WiBro(Wireless Broadband), 또는 GSM(Global System for Mobile Communications) 등 중 적어도 하나를 포함할 수 있다. Wireless communication includes, for example, long-term evolution (LTE), LTE Advance (LTE-A), code division multiple access (CDMA), wideband CDMA (WCDMA), universal mobile telecommunications system (UMTS), and wireless broadband (WiBro). ), Or Global System for Mobile Communications (GSM).

또한, 무선 통신은, 예를 들면, 근거리 통신을 포함할 수 있다. 근거리 통신은, 예를 들면, WiFi(wireless fidelity), 블루투스(Bluetooth), NFC(near field communication), MST(magnetic stripe transmission), 또는 GNSS(global navigation satellite system) 등 중 적어도 하나를 포함할 수 있다.In addition, wireless communication may include, for example, near field communication. The local area communication may include, for example, at least one of wireless fidelity (WiFi), Bluetooth, near field communication (NFC), magnetic stripe transmission (MST), global navigation satellite system (GNSS), and the like. .

유선 통신은, 예를 들면, USB(universal serial bus), HDMI(high definition multimedia interface), RS-232(recommended standard232), 또는 POTS(plain old telephone service) 등 중 적어도 하나를 포함할 수 있다. The wired communication may include, for example, at least one of a universal serial bus (USB), a high definition multimedia interface (HDMI), a reduced standard232 (RS-232), a plain old telephone service (POTS), and the like.

네트워크(50)은, 예를 들면, 컴퓨터 네트워크(computer network)(예: LAN 또는 WAN), 인터넷, 또는 전화망(telephone network) 중 적어도 하나를 포함할 수 있다.The network 50 may include, for example, at least one of a computer network (eg, LAN or WAN), the Internet, or a telephone network.

통신 모듈(110)에 의해, 상기 분류 장치(100)와 통신하는 외부 장치는, 예를 들면, 상기 온라인 게임 서비스를 제공하는 게임 서버(10), 상기 온라인 게임 서비스와 다른 온라인 게임 서비스를 제공하는 다른 게임 서버(20), 소셜 네트워킹 서비스(Social Networking Service: SNS) 서버(30) 및 상기 온라인 게임 서비스에서 제공하는 온라인 게임을 실행하는 게이머의 전자 장치(40)를 포함할 수 있다.The external device that communicates with the classification apparatus 100 by the communication module 110 may be, for example, a game server 10 that provides the online game service, or an online game service that is different from the online game service. Another game server 20, a social networking service (SNS) server 30, and an electronic device 40 of a gamer executing an online game provided by the online game service may be included.

데이터 수집 모듈(120)Data collection module 120

데이터 수집 모듈(120)은 통신 모듈(110)을 통해 상기 게임 서버(10)로부터 게이머의 행동 속성과 관련된 데이터를 실시간으로 수집하고, 수집된 데이터를 데이터베이스 형태로 가공한 데이터베이스(132)를 구축하고, 구축된 데이터베이스(132)를 저장 모듈 (130)의 제1 저장 영역에 저장할 수 있다. The data collection module 120 collects data related to the behavior attribute of the gamers from the game server 10 through the communication module 110 in real time, and builds a database 132 that processes the collected data into a database form. The constructed database 132 may be stored in the first storage area of the storage module 130.

게이머의 행동 속성과 관련된 데이터는 캐릭터 행동과 관련된 로그 데이터와 게이머 행동과 관련된 원시(raw) 데이터를 포함할 수 있다.The data related to the gamer's behavioral attributes may include log data related to the character's behavior and raw data related to the gamer's behavior.

게이머의 캐릭터 행동과 관련된 로그 데이터는 게이머의 캐릭터가 게임 내의 특정 장소에서 몬스터들을 사냥하는 것과 관련된 데이터일 수 있다. 이러한 로그 데이터는, 예를 들면, 캐릭터가 다른 게임 유저의 캐릭터나 몬스터들을 공격하는 동작 횟수와 관련된 데이터 및 캐릭터가 다른 게임 유저의 캐릭터나 몬스터들의 공격을 회피하는 동작 횟수와 관련된 데이터일 수 있다. 동작 횟수와 관련된 데이터는 "Actions Per Minute"일 수 있다. 또한, 로그 데이터는 게이머의 게임 플레이 시간과 관련된 데이터일 수 있다.Log data related to the player's character behavior may be data related to the player's character hunting monsters at a specific place in the game. Such log data may be, for example, data related to the number of operations of the character attacking the character or monsters of other game users and data related to the number of operations of the character avoiding the attack of the characters or monsters of other game users. Data related to the number of operations may be "Actions Per Minute". In addition, the log data may be data related to the game play time of the gamer.

게이머 행동과 관련된 raw 데이터는 게이머의 전자 장치(40)에 구비된 입력 장치(도시하지 않음)로부터 수집될 수 있는 데이터일 수 있다. 도시하지는 않았으나, 게이머의 전자 장치(40)에 구비된 입력 장치는, 예를 들면, 마우스, 키보드, 마이크, 카메라 등을 포함할 수 있다. 이 경우, 게이머 행동과 관련된 raw 데이터는, 게이머가 마우스를 움직이는 속도와 관련된 데이터, 게이머가 키보드를 터치하는 속도와 관련된 데이터, 마이크를 통해 수집된 게이머의 음성톤과 관련된 데이터 및 카메라가 캡쳐한 게이머의 영상과 관련된 데이터를 포함할 수 있다. The raw data related to the gamer behavior may be data that may be collected from an input device (not shown) included in the gamer's electronic device 40. Although not shown, the input device included in the gamer's electronic device 40 may include, for example, a mouse, a keyboard, a microphone, a camera, and the like. In this case, raw data related to gamers' behavior may include data related to the speed at which the player moves the mouse, data related to the speed at which the player touches the keyboard, data related to the voice tone of the gamers collected through the microphone, and gamers captured by the camera. It may include data related to the image of the.

즉, 게이머의 캐릭터 행동과 관련된 로그 데이터는 캐릭터라는 중간 매개체를 통해 게이머의 행동 속성을 확인할 수 있는 데이터이고, 게이머 행동과 관련된 raw 데이터는 게이머의 전자 장치(40)에 구비된 입력 장치라는 중간 매개체를 통해 게이머의 행동 속성을 확인할 수 있는 데이터를 의미한다.In other words, the log data related to the character behavior of the gamer is data that can confirm the behavior attribute of the gamer through an intermediate medium called a character, and the raw data related to the gamer behavior is an intermediate medium called an input device provided in the electronic device 40 of the gamer. Through the data, it is possible to check the behavior property of the gamers.

한편, 게임 서버(10)는 게이머 행동과 관련된 raw 데이터를 네트워크(50)을 통해 통신하는 전자 장치(40)로부터 수집할 수 있다. Meanwhile, the game server 10 may collect raw data related to gamer behavior from the electronic device 40 communicating through the network 50.

또한, 데이터 수집 모듈(120)은 통신 모듈(110)을 통해 게임 서버(10)에서 서비스되는 온라인 게임의 내부적 환경 속성과 관련된 데이터를 실시간으로 수집하고, 수집된 데이터를 데이터베이스 형태로 가공한 데이터베이스(134)를 구축하고, 구축된 데이터베이스(134)를 저장 모듈(130)의 제2 저장 영역에 저장할 수 있다.In addition, the data collection module 120 collects data related to the internal environment property of the online game serviced by the game server 10 in real time through the communication module 110, and processes the collected data into a database ( 134 may be constructed, and the constructed database 134 may be stored in the second storage area of the storage module 130.

일 예에서, 온라인 게임의 내부적 환경 속성과 관련된 데이터는 하나의 캐릭터가 보유한 게임 머니의 변동량 또는 전체 캐릭터가 보유한 전체 게임 머니의 변동량과 관련된 로그 데이터일 수 있다.In one example, the data related to the internal environment attribute of the online game may be log data related to the amount of change in the game money held by one character or the amount of change in the total game money held by the entire character.

다른 예에서, 온라인 게임의 내부적 환경 속성과 관련된 데이터는 온라인 게임의 업데이트, 특정 게임 이벤트 등과 관련된 데이터일 수 있다. In another example, the data related to the internal environmental attributes of the online game may be data related to updates of the online game, certain game events, and the like.

또한, 데이터 수집 모듈(120)은 통신 모듈(110)을 통해 상기 다른 게임 서버(20) 및 상기 SNS 서버(30) 중 적어도 2개의 서버로부터 상기 온라인 게임 서비스의 외부적 환경 속성과 관련된 데이터를 실시간으로 수집하고, 수집된 데이터를 데이터베이스 형태로 가공한 데이터베이스(136)를 구축하고, 구축된 데이터베이스(136)를 저장 모듈(130)의 제3 저장 영역에 저장할 수 있다. In addition, the data collection module 120 receives data related to the external environment property of the online game service from at least two of the other game server 20 and the SNS server 30 through the communication module 110 in real time. The database 136 may be collected, the processed data may be processed into a database, and the constructed database 136 may be stored in the third storage area of the storage module 130.

일 예로, 상기 온라인 게임의 외부적 환경 속성과 관련된 데이터는 상기 게임 서버(10)에 의해 서비스되는 온라인 게임의 평가에 영향을 끼치는 데이터로서, 예를 들면, SNS, 매스미디어, 커뮤니티, 블로그, 카페 등이 온라인에서 제공되는 상기 온라인 게임의 평가에 대한 게시글 및 댓글과 관련된 데이터일 수 있다. 게시글 및 댓글에 관련된 데이터는 게시글 및 댓글에 포함된 특정 키워드일 수 있다.For example, the data related to the external environment property of the online game is data that affects the evaluation of the online game serviced by the game server 10. For example, SNS, mass media, community, blog, cafe, etc. And the like may be data related to posts and comments about the evaluation of the online game provided online. Data related to posts and comments may be specific keywords included in posts and comments.

다른 예로, 상기 온라인 게임 서비스의 외부적 환경 속성과 관련된 데이터는 온라인 게임 시장의 동향(trend)과 관련된 데이터로서, 예를 들면, 다른 게임 서버(20)에서 제공하는 다른 온라인 게임 서비스의 배포 일정 또는 다른 온라인 게임 서비스의 업데이트 일정과 관련된 데이터일 일 수 있다. As another example, data related to external environment attributes of the online game service may be data related to a trend of the online game market, for example, a distribution schedule of another online game service provided by another game server 20, or It may be data related to an update schedule of another online game service.

또 다른 예로, 상기 게임 서버(10)에서 서비스되는 온라인 게임의 외부적 환경 속성과 관련된 데이터는 날씨, 휴일, 계절과 관련된 데이터일 수 있다.As another example, data related to external environment attributes of an online game serviced by the game server 10 may be data related to weather, holiday, and season.

여기서, 상기 외부적 환경 속성과 관련된 데이터는 게임 서버(10)가 아니라 SNS 서버(30), 다른 온라인 게임 서비스를 제공하는 다른 게임 서버(20) 또는 날씨, 휴일, 계절과 관련된 정보를 제공하는 기관 서버(도시하지 않음)로부터 수집되는 점을 주의해야 한다.Here, the data related to the external environment property is not the game server 10, but the SNS server 30, another game server 20 that provides other online game services, or an institution that provides information related to weather, holidays, and seasons. Note that it is collected from a server (not shown).

한편, 게이머의 행동 속성과 관련된 데이터, 온라인 게임의 내부적 환경 속성과 관련된 데이터들 및 외부적 환경 속성과 관련된 데이터들은 관계형 데이터 구조를 갖는 수치 또는 범주화된 데이터이거나 비정형 또는 정형의 텍스트 데이터이거나 이들의 조합된 데이터일 수 있다.On the other hand, the data relating to the behavioral attributes of the gamers, the data relating to the internal environmental attributes of the online game, and the data relating to the external environmental attributes are numerical or categorized data having relational data structures, or unstructured or structured text data, or a combination thereof. May be data.

저장 모듈(130)Storage module 130

저장 모듈(130)은 데이터 수집 모듈(120)이 구축한 데이터베이스들(132, 134 및 136)을 저장할 수 있다.The storage module 130 may store the databases 132, 134, and 136 built by the data collection module 120.

저장 모듈(130)은 아래에서 설명할 프로세서 모듈(140)로부터의 쿼리(query)에 응답하여 데이터베이스들(132, 134 및 136)에 저장된 데이터들을 프로세서 모듈(140)로 출력할 수 있다.The storage module 130 may output data stored in the databases 132, 134, and 136 to the processor module 140 in response to a query from the processor module 140, which will be described below.

저장 모듈(130)은 프로세서 모듈(140)에 의해 처리된 중간 데이터 및 최종 데이터를 저장할 수 있다. The storage module 130 may store intermediate data and final data processed by the processor module 140.

저장 모듈(130)은 프로세서 모듈(140)의 요청에 따라 호출되어 실행되는 다수의 알고리즘을 저장할 수 있다. 알고리즘은, 예를 들면, 데이터베이스(132, 134, 136)에 저장된 데이터들의 특징을 n차원의 벡터 공간에서 특징 벡터와 같은 수학적 표현으로 변환할 수 있는 특징 벡터 추출 알고리즘, 특징 벡터 추출 알고리즘에 의해 추출된 특징 벡터들을 유사한 특징 벡터들끼리 군집화하는 군집화 알고리즘 및 상기 군집화 알고리즘에 의해 군집화된 데이터들을 이용하여 분류기를 학습하는 기계학습 알고리즘을 포함할 수 있다.The storage module 130 may store a plurality of algorithms that are called and executed at the request of the processor module 140. The algorithm is, for example, extracted by a feature vector extraction algorithm, a feature vector extraction algorithm that can convert the features of the data stored in the database 132, 134, 136 into a mathematical expression such as a feature vector in the n-dimensional vector space The clustering algorithm may cluster the extracted feature vectors among similar feature vectors, and the machine learning algorithm may learn a classifier using data clustered by the clustering algorithm.

저장 모듈(130)은 다양한 하드웨어로 구현될 수 있으며, 하드웨어는, 예를 들면, RAM(Random Access Memory), ROM(Read Only Memory), EPROM(Erasable Programmable ROM), EEPROM(Electrically EPROM), 플래시 메모리(Flash Memory), 하드 디스크, 착탈형 디스크, 또는 본 발명이 속하는 기술 분야에서 잘 알려진 임의의 형태의 컴퓨터로 읽을 수 있는 기록 매체를 포함할 수 있다.The storage module 130 may be implemented with various hardware, and the hardware may include, for example, random access memory (RAM), read only memory (ROM), erasable programmable ROM (EPROM), electrically EPROM (EPEP), and flash memory. (Flash Memory), a hard disk, a removable disk, or any type of computer readable recording medium well known in the art.

프로세서 모듈(140)Processor module 140

프로세서 모듈(140)은 데이터베이스들(132, 134, 136)로부터 제공된 게이머의 행동 속성과 관련된 데이터(이하, 게이머의 행동 데이터들), 온라인 게임의 내부적 환경 속성과 관련된 데이터들(이하, 온라인 게임의 내부 환경 데이터들) 및 외부적 환경 속성과 관련된 데이터들(이하, 온라인 게임의 외부 환경 데이터들)로부터 특징 벡터들을 추출하는 제1 프로세스를 수행할 수 있다.The processor module 140 may include data related to the gamers 'behavioral attributes (hereinafter referred to as gamers' behavioral data) provided from the databases 132, 134, and 136, data related to the internal environment attributes of the online game (hereinafter referred to as the A first process may be performed to extract feature vectors from internal environmental data) and data related to external environmental attributes (hereinafter, external environmental data of an online game).

또한, 프로세서 모듈(140)은 추출 특징 벡터들을 유사한 특징 벡터들끼리 군집화하는 제2 프로세스를 수행할 수 있다.In addition, the processor module 140 may perform a second process of grouping the extracted feature vectors with similar feature vectors.

또한, 프로세서 모듈(140)은 군집화한 결과 데이터를 이용하여 게이머의 행동 유형을 분류하는 분류기(147A)를 학습시키는 제3 프로세스를 수행할 수 있다.In addition, the processor module 140 may perform a third process of learning the classifier 147A classifying the gamers' behavior types using the clustered result data.

프로세서 모듈(140)은, 상기 제1 내지 제3 프로세스를 수행할 수 있는 하나 이상의 범용 마이크로프로세서들, 디지털 신호 프로세서들(DSP들), 하드웨어 코어들, ASIC들(application specific integrated circuits), FPGA들(field programmable gate arrays), 또는 이들의 임의의 결합에 의해서 구현될 수 있다. Processor module 140 may include one or more general purpose microprocessors, digital signal processors (DSPs), hardware cores, application specific integrated circuits (ASICs), and FPGAs capable of performing the first to third processes. (field programmable gate arrays), or any combination thereof.

이러한 프로세서 모듈(140)는, 상기 제1 내지 제3 프로세스를 수행하기 위해, 제1 특징 추출부(141), 제2 특징 추출부(143), 군집화부(145) 및 학습부(147)를 포함하도록 구성될 수 있다. The processor module 140 may include a first feature extractor 141, a second feature extractor 143, a clusterer 145, and a learner 147 to perform the first to third processes. It can be configured to include.

제1 특징 추출부(141)은 특징 벡터 추출 알고리즘을 실행하여, 저장 모듈(130)에 저장된 데이터베이스(132)로부터 입력된 게이머의 행동 데이터들로부터 특징 벡터를 추출할 수 있다.The first feature extractor 141 may execute a feature vector extraction algorithm to extract a feature vector from gamers' behavior data input from the database 132 stored in the storage module 130.

제2 특징 추출부(143)는 특징 벡터 추출 알고리즘을 실행하여, 저장 모듈(130)에 분리 저장된 데이터베이스들(134 및 136)로부터 각각 입력된 온라인 게임의 내부 환경 데이터들 및 외부 환경 데이터들로부터 특징 벡터를 추출할 수 있다.The second feature extractor 143 executes a feature vector extraction algorithm to extract features from the internal environment data and the external environment data of the online game input from the databases 134 and 136 separately stored in the storage module 130. Vectors can be extracted.

본 발명의 기술적 핵심은 특징 벡터 추출 알고리즘을 한정하는 데 있는 것이 아니므로, 이에 대한 설명은 생략한다. 다만, 특징 벡터 추출 알고리즘은 관계형 데이터 구조를 갖는 수치 또는 범주화된 데이터 및 비정형 또는 정형의 텍스트 데이터를 분석 및 처리하는 다양한 데이터 마이닝 기술 분야에서 활용될 수 있는 것이라면 그 종류에 제한이 없다.Since the technical core of the present invention is not intended to limit the feature vector extraction algorithm, the description thereof will be omitted. However, the feature vector extraction algorithm is not limited as long as it can be used in various data mining techniques for analyzing and processing numerical or categorized data having a relational data structure and unstructured or structured text data.

군집화부(145)는 군집화 알고리즘을 기반으로 제1 특징 벡터 추출부(141)에서 추출한 특징 벡터들과 제2 특징 벡터 추출부(143)에서 추출한 특징 벡터들 중에서 유사한 특징 벡터들을 하나의 군집으로 군집화하고, 군집화된 군집에 데이터 레이블을 부여하는 군집 분석(cluster analysis process)를 수행하여 게이머의 행동 유형을 군집화한다.The clustering unit 145 clusters similar feature vectors into one cluster among the feature vectors extracted by the first feature vector extractor 141 and the feature vectors extracted by the second feature vector extractor 143 based on a clustering algorithm. In addition, a cluster analysis process for labeling the clustered clusters is performed to cluster the types of gamers' behaviors.

군집화 알고리즘은, 예를 들면, 계층적 군집 알고리즘(Hierarchical clustering algorithm 또는 Hierarchical agglomerative clustering algorithm), 분할 군집 알고리즘(Partitioning clustering algorithm) 및 밀도 기반의 공간적 군집 알고리즘(Density Based Spatial Clustering algorithm)을 포함한다. Clustering algorithms include, for example, a hierarchical clustering algorithm or a hierarchical agglomerative clustering algorithm, a partitioning clustering algorithm, and a density-based spatial clustering algorithm.

본 실시 예에서는, 군집화부(145)가 계층적 군집 알고리즘(Hierarchical clustering algorithm), 분할 군집 알고리즘(Partitioning clustering algorithm) 및 밀도 기반의 공간적 군집 알고리즘(Density Based Spatial Clustering algorithm)을 적절하게 조합하여 게이머의 행동 유형을 군집화하는 다양한 군집 분석 프로세스를 수행할 수 있다.In this embodiment, the clustering unit 145 appropriately combines a hierarchical clustering algorithm, a partitioning clustering algorithm, and a density-based spatial clustering algorithm to match the gamers'. Various cluster analysis processes can be performed to cluster behavior types.

계층적 군집 알고리즘(Hierarchical clustering algorithm)은 유사한 특징을 갖는 데이터를 하나의 군집으로 군집화하는 것으로, 모든 데이터가 하나의 군집으로 군집화될 때까지 반복 수행된다. Hierarchical clustering algorithm clusters data having similar characteristics into one cluster, and is repeatedly performed until all data is clustered into one cluster.

이러한 계층적 군집 알고리즘(Hierarchical clustering algorithm)은 유사한 특징을 갖는 개체들로 군집화된 하나의 군집과 다른 유사한 개체들로 군집화된 다른 모든 군집들 간의 거리를 계산하는 과정(이하, 거리 계산 과정)과 계산된 모든 거리들 중에서 가장 작은 거리를 갖는 두 군집을 하나의 군집으로 병합하는 과정(재군집화 과정)으로 이루어질 수 있다. This hierarchical clustering algorithm calculates the distance between one cluster clustered with similar features and all other clusters clustered with other similar entities (hereinafter, referred to as a distance calculation process). It is possible to combine the two clusters having the smallest distance among all the combined distances into one cluster (regrouping process).

군집들 간의 거리는 거리 함수에 의해 계산될 수 있다. 거리 함수는, 예를 들면, Euclidean 거리 함수, Manhattan 거리함수, Minkowski 거리함수, Canberra 거리함수를 포함할 수 있다.The distance between the clusters can be calculated by the distance function. The distance function may include, for example, the Euclidean distance function, the Manhattan distance function, the Minkowski distance function, and the Canberra distance function.

거리 계산 과정과 재군집화 과정은 모든 데이터들(예를 들면, 특징 벡터들)이 하나의 군집으로 군집화될 때까지 반복적으로 수행될 수 있다.The distance calculation process and the regrouping process may be performed repeatedly until all the data (eg, feature vectors) are clustered into one cluster.

군집들 간의 거리는 군집들 간의 유사도를 의미할 수 있다. 하나의 군집과 다른 하나의 군집 간의 거리가 기설정된 값보다 작으면, 하나의 군집과 다른 하나의 군집은 높은 유사도를 갖는 관계에 있고, 반대로 하나의 군집과 다른 하나의 군집 간의 거리가 기설정된 값보다 크거나 같으면, 하나의 군집과 다른 하나의 군집은 낮은 유사도를 갖는 관계에 있다 볼 수 있다.The distance between the clusters may mean the similarity between the clusters. If the distance between one cluster and another cluster is smaller than the predetermined value, one cluster and the other cluster have a high similarity relationship, and conversely, the distance between one cluster and the other cluster is a predetermined value. If greater than or equal to, it can be seen that one cluster and the other have a relationship with low similarity.

계층적 군집 알고리즘(Hierarchical clustering algorithm)은 군집들 간의 거리 계산 방법에 따라 단일 연결(single linkage) 방법, 완전 연결(complete linkage) 방법, 평균 연결(average linkage) 방법 및 중심 연결(centroid linkage) 방법으로 구분할 수 있다.The hierarchical clustering algorithm is a single linkage method, a complete linkage method, an average linkage method, and a centroid linkage method according to the distance calculation method between clusters. Can be distinguished.

단일 연결(single linkage) 방법은 한 군집의 점(또는 개체)과 다른 군집의 점(또는 개체) 사이의 가장 짧은 거리(shortest distance)를 계산하고, 이렇게 계산된 모든 거리들 중에서 가장 작은 거리를 갖는 두 군집들을 병합하는 방법이다. The single linkage method calculates the shortest distance between a point (or object) in one cluster and a point (or object) in another cluster, and has the smallest distance among all the calculated distances. How to merge two clusters.

완전 연결(complete linkage) 방법은 한 군집의 점과 다른 군집의 점 사이의 가장 긴 거리(longest distance)를 계산하고, 이렇게 계산된 모든 거리들 중에서 가장 작은 거리를 갖는 두 군집들을 병합하는 방법이다. The complete linkage method is a method of calculating the longest distance between a point of one cluster and a point of another cluster, and merging two clusters having the smallest distance among all the calculated distances.

평균 연결(average linkage) 방법은 한 군집의 점들과 다른 군집의 점들 사이의 평균 거리를 계산하고, 이렇게 계산된 평균 거리들 중에서 가장 작은 거리를 갖는 두 군집들을 병합하는 방법이다.The average linkage method calculates an average distance between points in one cluster and points in another cluster, and merges two clusters having the smallest distance among the calculated average distances.

중심 연결(centroid linkage) 방법은 한 군집의 centroids와 다른 군집의 centroids 사이의 거리를 계산하고, 이렇게 계산된 거리들 중에서 가장 작은 거리를 갖는 두 군집들을 병합하는 방법이다. 여기서, centroids는 군집의 중심점으로서, 중심점으로 결정되는 변수는 군집에 속한 변수들의 평균값 또는 이 평균값에 근접한 값을 갖는 변수로 정의될 수 있다. The centroid linkage method calculates the distance between centroids in one cluster and centroids in another, and merges two clusters with the smallest distance among the calculated distances. Here, centroids is a center point of a cluster, and a variable determined as a center point may be defined as a mean value of variables belonging to a cluster or a value having a value close to the mean value.

분할 군집 알고리즘(Partitioning clustering algorithm)은 먼저, 군집의 개수 K를 정한 후, 데이터를 무작위로 K개의 군으로 배정한 후 다시 특정한 계산 과정을 통해 군집으로 나눈다. 분할 군집 알고리즘(Partitioning clustering algorithm)은, 예를 들면, K-평균 군집화(K-means Clustering) 방법 및 PAM(partitioning around medoids) 방법을 포함할 수 있다.The partitioning clustering algorithm first determines the number of clusters K, then randomly assigns the data to K groups, and divides the data into clusters through a specific calculation process. The partitioning clustering algorithm may include, for example, a K-means clustering method and a partitioning around medoids (PAM) method.

밀도 기반의 공간적 군집 알고리즘(Density Based Spatial Clustering algorithm)은 K-평균 군집화(K-means Clustering) 방법의 한계인 오목한 형태의 데이터를 군집화하기 위해, 데이터의 밀도를 기준으로 개체들을 공간적으로 군집화하는 기법이다.Density Based Spatial Clustering algorithm is a technique for spatially grouping objects based on the density of data to cluster concave data, which is the limitation of K-means clustering method. to be.

학습부(147)는 군집화부(145)에서 군집 분석을 수행한 결과 데이터(이하, 군집 분석 결과 데이터)를 이용하여 게이머의 행동 유형을 분류하는 분류기(classifier, 147A)를 학습시키는 프로세스를 수행한다. The learning unit 147 performs a process of learning a classifier 147A classifying gamers' behavior types using the result data (hereinafter, referred to as cluster analysis result data) of the clustering unit 145. .

분류기의 학습을 위해, 학습부(147)는 기계학습 알고리즘을 실행할 수 있다. To learn the classifier, the learner 147 may execute a machine learning algorithm.

기계학습 알고리즘은, 예를 들면, 레이블(label)의 유무에 따라 지도 학습(Supervised Learning) 알고리즘, 비지도 학습(Unsupervised Learning) 알고리즘, 반지도 학습(Semisupervised learning) 알고리즘, 강화 학습(Reinforcement Learning) 알고리즘 및 이들을 적절히 조합한 알고리즘 등을 포함할 수 있다.Machine learning algorithms include, for example, supervised learning algorithms, unsupervised learning algorithms, semisupervised learning algorithms, and reinforcement learning algorithms, with or without labels. And algorithms in which these are appropriately combined.

지도 학습(Supervised Learning) 알고리즘은 레이블(label)이 있는 학습 데이터(또는 훈련 데이터)를 이용하여 분류기(147A)를 학습하는 방법이다. 비지도 학습 알고리즘은 레이블이 없는 학습 데이터(또는 훈련 데이터)를 이용하여 분류기(147A)를 학습하는 방법이다. 반지도 학습 알고리즘은 레이블이 있는 학습 데이터(또는 훈련 데이터)와 레이블이 없는 학습 데이터(또는 훈련 데이터, training data)를 모두 활용하여 분류기(147A)를 학습하는 방법이다.The supervised learning algorithm is a method of learning the classifier 147A using labeled training data (or training data). The unsupervised learning algorithm is a method of learning the classifier 147A using unlabeled training data (or training data). The ring diagram learning algorithm is a method of learning the classifier 147A using both labeled training data (or training data) and unlabeled training data (or training data).

본 발명에서 게이머의 행동 유형을 분류하는 분류기(147A)를 학습시키기 위해 사용되는 학습 데이터, 즉, 군집 분석 결과 데이터는 레이블이 있는 데이터 및 레이블(label)이 없는 데이터를 모두 포함할 수 있다.In the present invention, the training data used to train the classifier 147A for classifying the types of gamers' behaviors, that is, the cluster analysis result data, may include both labeled data and unlabeled data.

표시 모듈(150)Display module (150)

표시 모듈(150)은 학습부(140)에 의해 학습된 분류기(147A)로부터 입력되는 게이머의 행동 유형을 분류한 결과 데이터를 표시할 수 있다. 표시 모듈(150)에 의해 표시되는 결과 데이터는 텍스트, 이미지, 비디오, 아이콘, 또는 심볼 등과 같은 형태로 표시될 수 있다.The display module 150 may display the result data of classifying the gamers' behavior types input from the classifier 147A learned by the learner 140. The result data displayed by the display module 150 may be displayed in the form of text, an image, a video, an icon, or a symbol.

표시 모듈(150)은, 예를 들면, 액정 디스플레이(liquid crystal display(LCD)), 발광 다이오드(lightemitting diode(LED)) 디스플레이, 유기 발광 다이오드(organic light-emitting diode(OLED)) 디스플레이, 또는 마이크로 전자기계 시스템(microelectromechanical systems(MEMS)) 디스플레이, 또는 전자종이(electronic paper) 디스플레이를 포함할 수 있다. The display module 150 may be, for example, a liquid crystal display (LCD), a light emitting diode (LED) display, an organic light-emitting diode (OLED) display, or a micro Microelectromechanical systems (MEMS) displays, or electronic paper displays.

표시 모듈(150)은 터치 스크린을 포함할 수 있으며, 예를 들면, 전자 펜 또는 사용자의 신체의 일부를 이용한 터치 입력 또는 제스쳐 입력을 수신할 수 있다.The display module 150 may include a touch screen. For example, the display module 150 may receive a touch input or a gesture input using an electronic pen or a part of a user's body.

이하, 도 2를 참조하여, 전술한 군집화부(145)에서 수행되는 군집화 과정에 대해 상세히 설명한다.Hereinafter, the clustering process performed by the clustering unit 145 described above will be described in detail with reference to FIG. 2.

도 2 및 3은 도 1에 도시한 군집화부에서 수행되는 군집화 과정을 설명하는 도면들이다.2 and 3 are diagrams illustrating a clustering process performed by the clustering unit shown in FIG. 1.

도 2를 참조하면, 본 발명의 일 실시 예에 따른 군집화 과정은 게이머의 행동 속성과 관련된 데이터와 온라인 게임의 환경 속성과 관련된 데이터로부터 추출된 개체 데이터(또는 특징 벡터)에 대해 군집화를 수행하여 게이머의 행동 유형을 군집화하는 과정이다.Referring to FIG. 2, in the clustering process according to an embodiment of the present invention, the gamers perform clustering on object data (or feature vectors) extracted from data related to gamers' behavioral attributes and data related to environment attributes of online games. Is the process of clustering behavioral patterns.

이러한 군집화 과정은 단위 시간(20) 동안 수행되는 군집화 과정(22)과 단위 시간(20)의 군집화 과정에서 생성된 군집들을 주어진 기간(a given period)(30) 동안 재군집화하는 재군집화 과정(32)을 포함한다. 여기서, 단위 시간은, 예를 들면, 시간(a hour), 일일(a day), 한 주(a week) 또는 한달(a month) 등 다양하게 설정될 수 있다.This clustering process is a clustering process 22 performed during the unit time (20) and a re- clustering process (32) for re- clustering the clusters generated in the clustering process of the unit time (20) for a given period (30) ). Here, the unit time may be variously set, for example, a hour, a day, a week, or a month.

주어진 기간 내에서 수행되는 재군집화 과정에서는, 단위 시간별로 수행되는 군집화 과정에 의해 생성된 군집들 간의 유사도(또는 거리)를 계산하여, 유사도(또는 거리)가 높은 관계에 있는 군집들이 병합되고, 병합되지 않은 나머지 군집들 중에서 특정한 군집화 조건을 만족하는 군집들은 새로운 군집으로 생성된다.In a regrouping process performed within a given period, similarities (or distances) between clusters generated by the clustering process performed by unit time are calculated so that clusters having a high similarity (or distance) relationship are merged and merged. Among the remaining clusters, clusters that meet specific clustering conditions are created as new clusters.

한편, 주어진 기간 내에서 수행되는 재군집화 과정에서는 게이머의 행동 유형을 군집화한 군집의 분포 특성은 환경 속성의 변화 시점(T)을 기준으로 크게 변할 수 있다. 즉, 시점 (T) 이전의 행동 유형의 재군집화(32) 과정에서 군집화된 군집의 분포 특성과 시점 (T) 이후의 행동 유형의 재군집화(34) 과정에서 군집화된 군집의 분포 특성은 서로 다를 수 있다. 여기서, 분포 특성은 군집을 대표할 수 있는 대표값으로 표현될 수 있으며, 대표값은, 예를 들면, 군집의 centroid 또는 covariance 등일 수 있다. 따라서, 분포 특성의 변화는 centroid 또는 covariance의 변화를 의미할 수 있다.On the other hand, in the regrouping process performed within a given period of time, the distribution characteristics of the community grouping the gamers' behavioral types may change significantly based on the change point T of the environmental property. In other words, the distribution characteristics of the clustered clusters during the regrouping of behavior types before the time point (T) and the distribution characteristics of the clustered clusters during the regrouping of the behavior types after the viewpoint (T) are different from each other. Can be. Herein, the distribution characteristic may be represented by a representative value that can represent a cluster, and the representative value may be, for example, centroid or covariance of a cluster. Thus, a change in distribution characteristics may mean a change in centroid or covariance.

예를 들면, 온라인 게임의 업데이트, 온라인 게임 내의 이벤트, 다른 온라인 게임의 서비스 배포 및 온라인에서 제공되는 온라인 게임에 대한 게시글과 같은 환경 속성의 변화가 발생하면, 게이머의 행동 유형을 군집화한 군집들의 분포 특성이 크게 변할 수 있다. 여기서, 업데이트는 패치 업데이트와 같이 버그 또는 게임 환경이 크게 바뀌는 환경 속성을 의미하고, 이벤트는 이벤트 기간 동안 특정 조건을 달성하면 보상을 제공하는 환경 속성을 의미한다.For example, the distribution of clusters that cluster gamers' behavioral patterns when changes in environmental properties occur, such as updates to online games, events within online games, service distribution of other online games, and posts about online games available online. Properties can vary greatly. Here, the update refers to an environment property that greatly changes a bug or game environment, such as a patch update, and the event refers to an environment property that provides a reward when a certain condition is achieved during the event period.

그러나 시점 이전에 생성된 군집들의 분포 특성이 시점(T) 이후에 변하더라도 그 변화량이 기준치 이하라면, 분포 특성이 변화한 군집들을 재군집화한 결과가 게이머의 행동 유형 변화를 분석하는데 충분한 데이터로 활용되기에는 미흡한 점이 있다.However, even if the distribution characteristics of clusters created before the point in time change after the point in time (T), if the change is less than the reference value, the results of the regrouping of the clusters with the changed distribution are used as sufficient data to analyze the change in the behavior type of the gamers. There is not enough to be.

즉, 도 3에 도시된 바와 같이, 재 군집화에 따라, 환경 속성 변화 시점 T 이전의 군집 C₁과 C₂가 환경 속성의 변화 시점 T 이후에 군집 C₃로 병합되는 경우, That is, as shown in FIG. 3, when the clusters C ₁ and C ₂ before the change point T of the environmental property are merged into the cluster C ₃ after the change point T of the environmental property, according to the re-clustering

군집 C₃의 군집 분포 특성과 군집 C₁와 군집 C₂' 의 군집 분포 특성을 각각 비교하면, 군집 C₃은 군집의 조건에 따라 환경 속성 변화 시점 T 이전의 군집 C₁또는 새로운 군집(C₃)으로 분석될 수 있다.Comparing the cluster distribution characteristics of cluster C _{3 and} the cluster distribution characteristics of clusters C ₁ and C ₂ ′, respectively, cluster C ₃ may be a cluster C ₁ or a new cluster (C ₃ before the change of environmental properties T depending on the conditions of the cluster). ) Can be analyzed.

새로운 군집(C₃)으로 분석된 경우, 새로운 군집 C₃는 환경 속성 변화 시점 T 이전의 군집 C₁ 또는 C₂과는 다른 군집으로 분석된다. 이때, 새로운 군집 C₃은 기존의 군집(C1)의 분포 특성을 포함하는 부분이 존재하고, 또한 분포특성의 변화가 있는 군집(C2')은 환경 속성 변화 시점 T' 이후에 다시 원래의 분포 특성을 갖는 군집 C₂으로 돌아갈 수도 있다.When analyzed as a new cluster (C ₃ ), the new cluster C ₃ is analyzed as a cluster different from the cluster C ₁ or C ₂ before the time point T of environmental change. At this time, the new cluster C ₃ has a portion including the distribution characteristics of the existing cluster C1, and the cluster C2 'with the change in the distribution characteristics is returned to the original distribution characteristic after the change point T' of the environmental properties. Can return to cluster C ₂ with.

이와 같이, 환경 속성의 변화에 따라 원래의 분포 특성(원래의 행동 속성)으로 돌아가는 군집을 추적 관찰하는 것도 게이머의 행동 유형 변화를 분석하는데 중요한 데이터로 활용될 수 있다.As such, following the clusters returning to the original distribution characteristics (original behavioral attributes) according to the change of the environmental attributes may also be used as important data for analyzing the change in the behavior type of the gamers.

원래의 분포 특성으로 돌아가는 군집을 추적 관찰하기 위해서는, 분포 특성의 변화가 없는 군집(C1)과 분포 특성의 변화가 있는 군집 C2를 새로운 군집 C₃로 재군집화하고, 그 재군집화 결과에 대한 정보만을 저장하는 것만으로는 충분하지 않다.To follow up on the clusters returning to their original distribution characteristics, clusters C1 with no change in distribution characteristics and clusters C2 with changes in distribution characteristics were regrouped into a new cluster C ₃ , and only information about the results of the recombination was found. Saving is not enough.

본 발명은 환경 속성의 변화 시점 T 이후에 생성된 새로운 군집 C₃을 환경 속성의 변화 시점 T 이전의 군집들 C₁, C₂과 비교하여, 새로운 군집 C₃을 이전의 군집 C₁과 유사한 군집 분포 특성을 유지하는 서브 군집 C1'과 이전의 군집 C₂와 유사한 군집 분포 특성을 유지하는 서브 군집 C2'로 분할하여 분할된 서브 군집 C1'과 C2'에 대한 정보를 저장함으로써, 이전 군집들 C₁과 C₂과의 상관관계를 추적 관찰할 수 있는 방법을 제공한다.The present invention compares the new cluster C ₃ created after the change point T of the environmental property with the clusters C ₁ , C ₂ before the change point T of the environmental property, so that the new cluster C ₃ is similar to the previous cluster C _1. by dividing into sub-clusters C1 'and the sub-cluster C2 for holding the cluster distribution characteristics similar to the previous cluster C _2' for holding a distribution characteristic storing information on the divided sub-cluster C1 'and C2', the previous cluster C _It provides a way to track the correlation between ₁ and C ₂ .

도 2 및 3에 도시된 바와 같은 군집화 과정에 의한 군집화 결과는 게이머의 행동 유형을 분류하는 분류기의 학습에 이용된다. The clustering result by the clustering process as shown in FIGS. 2 and 3 is used to learn the classifier classifying the gamers' behavior type.

게이머의 행동 유형을 분류하기 위해서는, 게이머의 행동 유형에 대한 참값(ground truth)이 필요하지만, 게이머의 행동 유형에 대한 참값(ground truth)은 알 수 없다. 따라서, 본 발명에서는 게이머의 행동 유형을 분류하는 분류기를 학습시키는 데이터로서, 군집화 결과가 이용될 수 있다. 군집화 결과는 생성된 군집에 부여되는 레이블(label)일 수 있다. In order to classify gamers' behavior types, the ground truth of the gamers' behavior types is required, but the ground truth of the gamers' behavior types is unknown. Therefore, in the present invention, clustering results may be used as data for classifying a classifier for classifying gamers' behavior types. The clustering result may be a label given to the generated cluster.

일 예로, 레이블(label)은 군집화부(145)에 의해 생성된 군집의 속성, 클래스(class) 또는 카테고리(category)를 지시하는 데이터일 수 있다.For example, a label may be data indicating an attribute, a class, or a category of a cluster generated by the clustering unit 145.

다른 예로, 레이블(label)은 생성된 군집을 대표하는 대표값일 수 있다. 여기서, 대표값은, 예를 들면, 군집의 중심점(centroid)을 지시하는 값 또는 군집의 밀도로부터 계산된 공분산(covariance)값일 수 있다.As another example, the label may be a representative value representing the generated cluster. Here, the representative value may be, for example, a value indicating a centroid of the cluster or a covariance value calculated from the density of the cluster.

또 다른 예로, 레이블(label)은 게임 데이터에 관한 전문가가 응용에 따라 부여하는 임의의 값일 수 있다. 임의의 값은, 예를 들면, 게임 업데이트에 대한 기대 행동 유형을 보이는 게이머를 식별할 수 있는 값일 수 있다.As another example, a label may be any value assigned by an expert in game data, depending on the application. The random value can be, for example, a value that can identify gamers showing the expected behavior type for game updates.

도 4는 본 발명의 일 실시 예에 따른 온라인 게임에서 게이머의 행동 유형을 분류하는 분류기를 학습시키는 학습 방법을 나타내는 순서도이다.4 is a flowchart illustrating a learning method of learning a classifier that classifies a behavior type of a gamer in an online game according to an embodiment of the present invention.

아래의 각 단계의 수행 주체는 특별한 언급이 없는 한, 도 1에 도시한 프로세서 모듈(140)로 가정한다. 또한, 아래의 각 단계를 설명함에 있어 도 1 내지 3을 참조한 설명과 중복된 내용은 생략하거나 간단하게 설명하기로 한다.The performing subject of each step below is assumed to be the processor module 140 shown in FIG. 1 unless otherwise specified. In addition, in the description of each step below, duplicated descriptions with reference to FIGS. 1 to 3 will be omitted or briefly described.

도 4를 참조하면, 먼저, 단계 S410에서, 특징 벡터 추출 알고리즘을 이용하여, 수집된 게이머의 행동 속성 및 온라인 게임의 환경 속성과 관련된 데이터들로부터 특징 벡터를 추출하는 과정이 수행된다. 게이머의 행동 속성 데이터는 캐릭터 행동과 관련된 로그 데이터와 게이머 행동과 관련된 raw 데이터를 포함한다. 온라인 게임의 환경 속성 데이터는 내부적 환경 속성 데이터 및 외부적 환경 속성 데이터를 포함한다.Referring to FIG. 4, first, in step S410, a process of extracting a feature vector from data related to behavioral attributes of the collected gamers and environment attributes of an online game is performed using a feature vector extraction algorithm. Gamer behavior attribute data includes log data related to character behavior and raw data related to gamer behavior. The environment attribute data of the online game includes internal environment attribute data and external environment attribute data.

이어, 단계 S420에서, 게이머의 행동 유형을 군집화하기 위해, 군집화 알고리즘(clustering algorithm)을 이용하여, 전 단계 S410에서 추출한 특징 벡터들을 군집화하는 과정이 수행된다. 군집화 알고리즘은, 예를 들면, 계층적 군집 알고리즘(Hierarchical clustering algorithm 또는 Hierarchical agglomerative clustering algorithm), 분할 군집 알고리즘(Partitioning clustering algorithm), 밀도 기반의 공간적 군집 알고리즘(Density Based Spatial Clustering algorithm) 및 이들을 적절히 조합한 알고리즘을 포함한다. Then, in step S420, in order to cluster the gamers' behavior types, a process of clustering the feature vectors extracted in the previous step S410 is performed using a clustering algorithm. Clustering algorithms include, for example, a hierarchical clustering algorithm or a hierarchical agglomerative clustering algorithm, a partitioning clustering algorithm, a density based spatial clustering algorithm, and a combination thereof. Contains an algorithm.

이어, 단계 S430에서, 전 단계 S420의 군집화 과정에서 생성된 군집들에게 데이터 레이블(data label)을 레이블링(labeling)하는(할당하는 또는 태깅하는) 과정이 수행된다. Subsequently, in step S430, a process of labeling (assigning or tagging) a data label to the clusters generated in the clustering process of the previous step S420 is performed.

일 예로, 밀도 기반의 공간적 군집 알고리즘(Density Based Spatial Clustering algorithm)의 경우, 군집에 포함된 데이터(특징 벡터)들은 유사한 특징을 가지므로, 하나의 군집에 속한 모든 데이터들은 동일한 데이터 레이블로 레이블링 된다. For example, in the case of a density-based spatial clustering algorithm, since the data (feature vectors) included in the cluster have similar characteristics, all data belonging to one cluster are labeled with the same data label.

다른 예로, 분할 군집화 알고리즘(Partitioning clustering algorithm)의 경우, 군집을 대표하는 대표값이 데이터 레이블로서 상기 군집에 레이블링 된다. As another example, in the case of a partitioning clustering algorithm, a representative value representing a cluster is labeled in the cluster as a data label.

대표값은, 군집의 중심(centroid)을 지시하는 중심 값 또는 군집의 밀도로부터 계산된 공분산(covariance)값일 수 있다. The representative value may be a covariance value calculated from the center value indicating the centroid of the cluster or the density of the cluster.

또한, 대표값은 게임 데이터의 전문가가 임의로 설정한 값일 수 있다.In addition, the representative value may be a value arbitrarily set by an expert in game data.

한편, 본 명세서에서는 설명의 이해를 돕기 위해, 단계 S420의 군집화 과정과 단계 S430의 레이블링 과정을 분리하여 설명하지만, 군집에 데이터 레이블을 레이블링하는 과정은 군집화 알고리즘의 실행에 따라 처리될 수 있는 프로세스일 수 있다. 따라서, 단계들 S420 및 S430은 하나의 단계로 통합될 수 있다.Meanwhile, in the present specification, for better understanding of the description, the clustering process of step S420 and the labeling process of step S430 are described separately, but the process of labeling data labels on the cluster is a process that can be processed according to the execution of the clustering algorithm. Can be. Thus, steps S420 and S430 can be combined into one step.

이어, 단계 S440에서, 전 단계 S430에서 군집에 레이블링된 데이터 레이블을 이용하여 분류기를 학습시키는 과정이 수행된다. 여기서, 분류기를 학습시키는 방법으로, 기계학습 알고리즘이 사용될 수 있다.Subsequently, in step S440, a process of training the classifier using the data label labeled in the cluster is performed in step S430. Here, as a method of learning the classifier, a machine learning algorithm may be used.

이어, 단계 S450에서, 전 단계들 S420 및 S430에서 군집화 알고리즘에서 따른 군집화 조건을 만족하지 못하여 군집을 형성하지 못한 특징 벡터들(이하, 언레이블된(unlabeled) 데이터)에 대한 데이터 레이블을 전 단계 S440에서 학습된 분류기를 이용하여 예측하는 과정이 수행된다. Subsequently, in step S450, data labels for feature vectors (hereinafter, unlabeled data) that do not form a cluster because the clustering conditions according to the clustering algorithm are not satisfied in the previous steps S420 and S430 are obtained. The prediction process is performed using the classifier learned in.

이어, 단계 S460에서, 전 단계 S440에서 수행된 학습 과정의 학습 결과가 기계학습 알고리즘에서 정한 학습 종료 조건을 만족하지는 지를 판단하는 과정이 수행된다. Subsequently, in step S460, a process of determining whether the learning result of the learning process performed in the previous step S440 satisfies the learning end condition set by the machine learning algorithm is performed.

학습 결과가 학습 종료 조건을 만족하면, 분류기의 학습과 관련된 일련의 모든 과정들은 종료된다. 반대로, 학습 종료 조건을 만족하지 않으면, 학습부에 의한 분류기에 대한 학습 과정은 반복된다. If the learning result satisfies the learning end condition, all the series of processes related to the learning of the classifier are terminated. On the contrary, if the learning end condition is not satisfied, the learning process for the classifier by the learning unit is repeated.

학습 종료 조건의 일 예로, 모든 언레이블된 데이터에 대한 데이터 레이블의 예측 완료를 학습 종료 조건으로 한다면, 언레이블된 데이터가 남아 있는 경우, 분류기에 대한 학습 과정은 반복된다.As an example of the learning end condition, if the prediction completion condition of the data label for all unlabeled data is a learning end condition, the learning process for the classifier is repeated when the unlabeled data remains.

학습 종료 조건의 다른 예로, 학습 과정 반복 횟수(iteration)가 기설정된 횟수에 도달한 경우, 학습된 분류기의 비용(cost) 함수가 임계값보다 작아진 경우, 분류기의 특정 파라미터(예로 support vector machine의 soft margin 값)로 정의되는 분류 경계에 분포된 데이터가 없는 경우 등을 포함할 수 있다.As another example of the end-of-learning condition, when the iteration of the learning process reaches a predetermined number of times, when the cost function of the learned classifier is less than the threshold, a specific parameter of the classifier (for example, It may include the case that there is no data distributed on the classification boundary defined by the soft margin value).

이어, 학습 결과가 학습 종료 조건을 만족하지 않는 경우, 단계 S470에서, 남아 있는 언레이블된 데이터에게 근사된 레이블 값을 부여하는 과정이 수행된다. Then, when the learning result does not satisfy the learning end condition, in step S470, a process of giving an approximate label value to the remaining unlabeled data is performed.

근사된 레이블 값의 부여는 다양한 방법을 사용할 수 있다. 예를 들어 이진 분류기의 경우 분류 결정 경계로부터의 거리에 따라 0~1 사이의 값을 부여할 수 있다. Assigning approximate label values can use a variety of methods. For example, the binary classifier may assign a value between 0 and 1 depending on the distance from the classification decision boundary.

또 다른 예로 언레이블된 데이터 근방에 데이터 레이블의 값을 사용하여 근사화할 수 있다. As another example, an approximation can be made using the value of a data label near an unlabeled data.

이후, 근사화된 레이블 값을 사용하여 분류기의 학습을 반복하고, 이러한 반복은 전술한 바와 같이 학습 종료 조건을 만족할 때까지 수행된다. Then, the classifier is repeated using the approximated label value, and this iteration is performed until the learning end condition is satisfied as described above.

도 5는 본 발명의 다른 실시 예에 따른 온라인 게임에서 게이머의 행동 유형을 분류하는 분류기의 적응적 학습 방법을 나타내는 순서도이고, 도 6은 도 5에 도시한 단계 S510에서의 공통 영역의 개념을 설명하기 위한 도면이다.FIG. 5 is a flowchart illustrating an adaptive learning method of a classifier classifying gamers' behavior types in an online game according to another embodiment of the present invention, and FIG. 6 illustrates the concept of a common area in step S510 of FIG. 5. It is a figure for following.

아래의 각 단계의 수행 주체는 특별한 언급이 없는 한, 도 1에 도시한 프로세서 모듈(140) 또는 프로세서 모듈(140) 내의 학습부(147)로 가정한다. Unless otherwise specified, the performing subject of each step below is assumed to be the processor module 140 shown in FIG. 1 or the learning unit 147 in the processor module 140.

또한, 아래의 각 단계를 설명함에 있어 전술한 설명과 중복된 내용은 생략하거나 간단하게 설명하기로 한다.In addition, in the description of each step below, duplicated descriptions will be omitted or simply described.

도 5를 참조하면, 먼저, 환경 속성의 변화 시점 이전에 일 단위(day unit)로 군집화된 군집은 학습 데이터 군집(또는 훈련 데이터 군집)로 간주하고, 환경 속성의 변화 시점 이후에 군집화된 군집은 테스트 데이터 군집으로 간주하는 가정으로부터 출발한다. 또한, 테스트 데이터 군집에 포함된 데이터들은 언레이블된 데이터(unlabeled)로 가정한다. Referring to FIG. 5, first, a cluster grouped into a day unit before a change point of an environment attribute is regarded as a learning data cluster (or a training data cluster), and a clustered group after a change point of an environment attribute Start with the assumption that you consider a test data cluster. In addition, it is assumed that data included in the test data cluster is unlabeled data.

단계 S510에서, 도 6에 도시된 바와 같이, 주어진 기간 동안 일 단위(day unit)로 군집화된 환경 속성의 변화 시점 이전의 학습 데이터 군집들(61, 63, 65, 67)과 환경 속성의 변화 시점 이후의 테스트 데이터 군집(71)의 공통 영역(C)을 하이퍼스피어(hypersphere) 모델(69)로 모델링(이하, 하이퍼스피어 모델링)하는 과정이 수행된다. In operation S510, as illustrated in FIG. 6, the change point of the learning data clusters 61, 63, 65, and 67 and the change point of the environment attribute before the change point of the environment attribute clustered in a day unit for a given period of time. Thereafter, a process of modeling the common area C of the test data cluster 71 with the hypersphere model 69 (hereinafter, referred to as hypersphere modeling) is performed.

하이퍼스피어(hypersphere) 모델링은 상기 공통 영역(C)에 포함된 데이터들(특징 벡터들)을 중심(center)으로 불리는 주어진 점으로부터 일정한 반지름(radius)에 있는 점들의 집합으로 모델링하는 것으로서, 공통 영역(C)에 포함된 데이터들(특징 벡터들)을 볼(ball)과 같은 기하학적 형태로 근사화하는 프로세스이다.Hypersphere modeling is modeling data (feature vectors) included in the common area C as a set of points at a constant radius from a given point called a center. It is a process of approximating the data (feature vectors) contained in (C) to a geometric shape such as a ball.

하이퍼스피어 모델링에 의해 생성된 하이퍼스피어 모델(70)은 구 방정식과 같은 수학적 표현으로 나타낼 수 있다.The hypersphere model 70 generated by hypersphere modeling may be represented by a mathematical expression such as a sphere equation.

하이퍼스피어 모델을 구 방정식과 같은 수학적 표현으로 나타내기 위해, 중심(center)과 반지름(radius)을 추정하는 프로세스가 수행될 수 있다. In order to represent the hypersphere model in a mathematical expression such as a sphere equation, a process of estimating the center and radius can be performed.

중심(center)과 반지름(radius)의 추정은 상기 공통 영역(C)에 포함된 데이터들(특징 벡터들)에 대한 기계 학습(또는 지도 학습(Supervised Learning))을 통해 추정될 수 있다. Estimation of center and radius may be estimated through machine learning (or supervised learning) on data (feature vectors) included in the common area C.

추정된 중심(center)과 반지름(radius)은 하이퍼스피어(69)를 대표할 수 있는 특정 파라미터로 사용될 수 있다.The estimated center and radius can be used as specific parameters that can represent the hypersphere 69.

이어, 단계 S520에서, 하이퍼스피어 모델링 과정에서 학습된 상기 하이퍼스피어 모델(69)의 특정 파라미터를 이용하여 상기 하이퍼스피어 모델(69)에 포함된 테스트 데이터 군집(71', 도 6에서 빗금 친 영역의 테스트 데이터 군집)을 레이블링 하는 과정이 수행된다. 하이퍼스피어 모델(69)에 포함되지 않은 테스트 데이터 군집의 레이블링 과정은 도 4에서 설명한 레이블링 과정과 유사하게 하이퍼스피어 경계에서의 거리를 이용하여 레이블링 근사화를 수행할 수 있다.Subsequently, in step S520, the test data clusters 71 ′ included in the hypersphere model 69 using specific parameters of the hypersphere model 69 learned in the hypersphere modeling process may be included in the hatched area in FIG. 6. Labeling test data clusters). The labeling process of the test data cluster not included in the hypersphere model 69 may perform a labeling approximation using a distance from the hypersphere boundary similarly to the labeling process described with reference to FIG. 4.

이어, 단계 S530에서, 상기 테스트 데이터 군집에 레이블링 된 데이터 레이블을 이용하여 분류기를 학습시키는 과정이 수행된다.Subsequently, in step S530, a process of training a classifier using a data label labeled in the test data cluster is performed.

한편, 테스트 데이터 군집(71)에 포함된 데이터들 중에서 상기 하이퍼스피어 모델(69)에 포함되지 않는 데이터들은 환경 속성의 변화 시점(T) 이후에 생성된 새로운 서브 군집으로 군집화될 수 있다. 따라서, 이러한 서브 군집으로 군집화된 데이터들에 대해서도 데이터 레이블을 레이블링 하는 과정이 필요하다.Meanwhile, among the data included in the test data cluster 71, data not included in the hypersphere model 69 may be clustered into a new sub-group created after the change point T of the environment property. Therefore, a process of labeling data labels is required for data clustered into such sub-groups.

또한, 도 6에서는 테스트 데이터 군집(71)과 하이퍼스피어 모델(69)로 모델링된 학습 데이터 군집이 일부 중첩되는 경우를 도시하고 있으나, 환경 속성의 변화 시점(T) 이전의 학습 데이터 군집들이 환경 속성의 변화 시점(T) 이후에 병합되거나 분할되는 경우가 있을 수 있다. 이와 같이, 병합 및 분할에 의해 생성된 군집들에 대해서도 데이터 레이블을 레이블링 하는 과정이 필요하다. In addition, although FIG. 6 illustrates a case where the test data cluster 71 and the training data cluster modeled by the hypersphere model 69 partially overlap each other, the training data clusters before the change point T of the environment attribute are the environment attributes. There may be a case where the merge or split after the change time point T of. As such, a process of labeling data labels is required for clusters created by merging and dividing.

이하, 도 7을 참조하여, 학습 데이터 군집이 병합 및 분할된 것으로 분석되는 테스트 데이터 군집에 대한 데이터 레이블을 레이블링 하는 과정에 대해 상세히 설명하기로 한다.Hereinafter, a process of labeling a data label for a test data cluster in which training data clusters are analyzed as being merged and divided will be described in detail with reference to FIG. 7.

도 7은 도 5에 도시한 단계 S520의 상세 흐름도이다.FIG. 7 is a detailed flowchart of step S520 shown in FIG. 5.

도 7을 참조하면, 먼저, 단계 S521에서, 테스트 데이터 군집과 하이퍼스피어 모델로 모델링된 학습 데이터 군집을 비교하는 과정이 수행된다.Referring to FIG. 7, first, a process of comparing a test data cluster and a training data cluster modeled with a hypersphere model is performed in step S521.

이러한 비교 과정은 테스트 데이터 군집이 학습 데이터 군집에 포함된 멤버들(특징 벡터들)을 유지하는 지를 판별하는 제1 비교 과정과 테스트 데이터 군집이 학습 데이터 군집의 군집 특성을 유지하는 지를 판별하는 제2 비교 과정으로 이루어질 수 있다.This comparison process includes a first comparison process for determining whether the test data cluster maintains members (feature vectors) included in the training data cluster, and a second comparison process for determining whether the test data cluster maintains cluster characteristics of the training data cluster. It can be done as a comparison process.

여기서, 군집 멤버는 군집에 속한 특징 벡터들을 의미한다. 군집 특성은 군집을 대표하는 대표값(중심값(centroid value) 또는 공분산값(covariance value))을 의미한다.Here, the cluster member means feature vectors belonging to the cluster. The cluster characteristic means a representative value (centroid value or covariance value) representing the cluster.

제1 비교 과정에서, 학습 데이터 군집에 포함된 전체 멤버들(전체 특징 벡터들)에서 학습 데이터 군집과 테스트 데이터 군집에 모두 포함된 멤버들(특징 벡터들)의 비율(fraction)을 기반으로, 테스트 데이터 군집이 학습 데이터 군집의 멤버들을 유지하는 지가 판별될 수 있다. In the first comparison process, based on the fraction of the members (feature vectors) included in both the training data cluster and the test data cluster in the total members (full feature vectors) included in the training data cluster, the test is performed. It may be determined whether the data cluster maintains members of the training data cluster.

예를 들면, 상기 비율이 임계치보다 큰 경우, 테스트 데이터 군집은 학습 데이터 군집의 군집 멤버를 유지하는 것으로 판별되고, 반대의 경우, 테스트 데이터 군집이 학습 데이터 군집의 멤버들을 유지하지 않는 것으로 판별된다.For example, if the ratio is greater than the threshold, the test data cluster is determined to retain cluster members of the training data cluster, and vice versa, the test data cluster is determined to not retain members of the training data cluster.

제2 비교 과정에서, 학습 데이터 군집의 대표값과 테스트 데이터 군집의 대표값을 비교한 결과를 기반으로, 테스트 데이터 군집이 학습 데이터 군집의 군집 특성을 유지하는 지가 판별될 수 있다. In the second comparison process, it may be determined whether the test data cluster maintains the cluster characteristics of the training data cluster based on a result of comparing the representative value of the training data cluster with the representative value of the test data cluster.

예를 들면, 학습 데이터 군집의 대표값과 테스트 데이터 군집의 대표값의 차이값이 허용 오차 범위 이내인 경우, 테스트 데이터 군집은 학습 데이터 군집의 군집 특성을 유지하는 것으로 판별되고, 반대의 경우, 테스트 데이터 군집은 학습 데이터 군집의 군집 특성을 유지 않는 것으로 판별된다.For example, if the difference between the representative value of the training data cluster and the representative value of the test data cluster is within the tolerance range, the test data cluster is determined to maintain the cluster characteristics of the training data cluster, and vice versa. The data cluster is determined not to maintain cluster characteristics of the training data cluster.

한편, 제2 비교 과정에 따라 군집 특성이 유지되지 않는 것으로 판별될지라도 테스트 데이터 군집의 특징 벡터들과 학습 데이터 군집의 특징 벡터들을 게이머의 행동 속성과 관련된 요소(component) 별로 세분화하여 비교하면, 테스트 데이터 군집의 군집 특성은 학습 데이터 군집의 군집 특성 일부를 유지하는 경우와 유지하지 않는 경우로 다시 나눌 수 있다. On the other hand, even if it is determined that the cluster characteristics are not maintained according to the second comparison process, when the feature vectors of the test data cluster and the feature vectors of the training data cluster are divided and compared by component related to the behavior attribute of the gamers, the test is performed. The cluster characteristics of the data clusters can be divided into the cases of maintaining and not maintaining some of the cluster characteristics of the training data cluster.

이러한 각각의 경우를 모두 고려하여 테스트 데이터 군집에 대한 레이블링 과정을 수행하고, 그러한 레이블링 과정을 이용하여 분류기를 학습한다면, 그 분류기는 보다 정밀한 게이머의 행동 유형을 분류할 수 있게 된다.Considering each of these cases, if the labeling process for the test data cluster is performed and the classifier is trained using the labeling process, the classifier can classify the more precise gamer's behavior type.

도 7에서는 제1 및 제2 비교 과정에 따른 4가지의 비교 결과들(521_1, 521_2, 521_3 및 521_4 )이 예시된다. In FIG. 7, four comparison results 521_1, 521_2, 521_3, and 521_4 according to the first and second comparison processes are illustrated.

제1 비교 결과(521_1)는 테스트 데이터 군집이 학습 데이터 군집의 군집 멤버와 군집 특성을 모두 유지한 경우로서, 이러한 경우를 도식적으로 표현하면, 도 8과 같다. 도 8에는 학습 데이터 군집(92)의 군집 멤버와 군집 특성을 모두 유지하는 테스트 데이터 군집(94)이 도시된다.The first comparison result 521_1 is a case in which the test data cluster maintains both cluster members and cluster characteristics of the training data cluster, which is illustrated schematically in FIG. 8. 8 shows a test data cluster 94 that maintains both cluster members and cluster characteristics of the training data cluster 92.

이러한 제1 비교 결과(521_1)가 확인되면, 단계 S523A에서, 학습 데이터 군집에 레이블링된 데이터 레이블을 이용하여 테스트 데이터 군집을 레이블링 하는 과정이 수행된다. When the first comparison result 521_1 is confirmed, in step S523A, a process of labeling the test data cluster using the data label labeled in the training data cluster is performed.

제2 비교 결과(521_2)는 테스트 데이터 군집이 학습 데이터 군집의 군집 멤버를 유지하고, 학습 데이터 군집의 군집 특성을 유지하지 않는 경우로서, 이러한 경우를 도식적으로 나타내면, 도 9와 같다.The second comparison result 521_2 is a case in which the test data cluster maintains cluster members of the training data cluster and does not maintain cluster characteristics of the training data cluster. FIG. 9 schematically illustrates this case.

도 9에 도시된 바와 같이, 학습 데이터 군집(82)의 대표값과 테스트 데이터 군집(84)의 대표값 간의 차이로 인해, 테스트 데이터 군집(84)이 학습 데이터 군집(82)의 군집 특성을 유지하지 못하는 경우에도, 학습 데이터 군집(82)과 테스트 데이터 군집(84)을 게이머의 행동 속성과 관련된 요소(component) 별로 세분화하여 비교하면, 테스트 데이터 군집(84)에 포함된 특징 벡터들(F1, F2, F3')은 학습 데이터 군집(82)에 포함된 특징 벡터들(F1, F2, F3) 중 환경 속성의 변화 시점 이후의 특징값이 허용 오차 범위 내에 있는 특징 벡터들(F1, F2)과 허용오차 범위를 벗어난 특징 벡터(F3')로 나눌 수 있다. As shown in FIG. 9, due to the difference between the representative value of the training data cluster 82 and the representative value of the test data cluster 84, the test data cluster 84 maintains the clustering characteristics of the training data cluster 82. If not, the training data cluster 82 and the test data cluster 84 are compared by subdividing them by components related to gamers' behavioral attributes, and the feature vectors F1, which are included in the test data cluster 84, are compared. F2 and F3 'are characterized by the feature vectors F1 and F2 among the feature vectors F1, F2 and F3 included in the training data cluster 82 whose feature values after the change point of the environmental property are within the tolerance range. It can be divided into feature vectors F3 'outside the tolerance range.

이때, 허용 오차 범위 내에 있는 특징 벡터들(F1, F2)은 테스트 데이터 군집(84)으로부터 제1 서브 군집(86)으로 분할하고, 허용 오차 범위를 벗어난 특징 벡터(F3')는 테스트 데이터 군집(84)으로부터 제2 서브 군집(88)으로 분할될 수 있다.In this case, the feature vectors F1 and F2 within the allowable error range are divided into the first subgroup 86 from the test data cluster 84, and the feature vector F3 ′ that is out of the allowable error range is divided into the test data cluster ( 84) into a second sub-cluster 88.

분할된 서브 군집들(86, 88) 중에서 서브 군집(86), 즉, 환경 속성이 변화된 시점 이후에도 이전의 특징값을 유지하는 특징 벡터들(F1, F2)을 레이블링하는 과정은 게이머의 행동 속성과 온라인 게임의 환경 속성에 따른 게이머의 군집 특성이 시간에 따라 어떻게 변해가는지를 분석할 수 있는 중요한 처리과정이다.The process of labeling the subgroup 86 among the divided subgroups 86 and 88, that is, the feature vectors F1 and F2 maintaining the previous feature value even after the point in time at which the environmental property is changed, is determined by the behavior attribute of the gamer. It is an important process to analyze how gamers' cluster characteristics change according to the environment of online games.

즉, 환경 속성의 변화 시점(T)을 기준으로 게이머의 행동 속성이 행동 속성 A에서 행동 속성 B로 일시적으로 변화했다가 다시 환경 속성의 변화 시점(T) 이전의 행동 속성 A로 돌아가는 경우, 이전의 행동 속성으로 돌아갔는지를 판단하기 위해, 환경 속성의 변화 시점(T)을 기준으로 이전과 이후에 유사한 행동 속성(유사한 특징 벡터들의 특징값)으로 군집화한 서브 군집(86)에 대한 기록(데이터 레이블)이 필요하기 때문이다. In other words, when a player's behavioral property temporarily changes from behavioral property A to behavioral property B based on the change point (T) of the environmental property and returns to the behavioral property A before the change point (T) of the environmental property, Records (data) for sub-clusters 86 clustered with similar behavioral attributes (feature values of similar feature vectors) before and after, based on the point of change (T) of the environmental attribute, to determine whether to return to the behavioral attribute of Label).

이에, 제2 비교 결과(521_2)에서는, 단계 S523B에서, 상기 학습 데이터 군집과 상기 테스트 데이터 군집에 모두 포함되는 특징 벡터들을 검출하고, 상기 검출된 특징 벡터들 중에서 상기 환경 속성의 변화 시점(T) 이전의 특징값과 상기 환경 속성의 변화 시점(T) 이후의 특징값의 차이가 허용 오차 범위 내에 있는 특징 벡터들(F1, F2)과 상기 허용 오차 범위를 벗어난 특징 벡터들(F3)을 검출한 후, 상기 테스트 데이터 군집을 상기 허용 오차 범위 내에 있는 특징 벡터들을 포함하는 제1 서브 군집과 상기 허용 오차 범위를 벗어난 특징 벡터들을 포함하는 제2 서브 군집으로 분할하는 과정이 수행된다.Accordingly, in the second comparison result 521_2, in step S523B, feature vectors included in both the training data cluster and the test data cluster are detected, and the change time point T of the environment attribute among the detected feature vectors is detected. Feature vectors F1 and F2 whose difference between the feature value after the change point T of the environmental property and the environmental property are within the tolerance range and the feature vectors F3 outside the tolerance range are detected. Thereafter, a process of dividing the test data cluster into a first sub-group including feature vectors within the tolerance range and a second sub-group including feature vectors outside the tolerance range is performed.

이어, 단계 S525B에서, 상기 제1 서브 군집은 학습 데이터 군집에 레이블링된 데이터 레이블로 레이블링하고, 상기 제2 서브 군집은 학습 데이터 군집에 레이블링된 데이터 레이블에 근사화된 데이터 레이블로 레이블링하는 과정이 수행된다.Subsequently, in step S525B, the first sub cluster is labeled with a data label labeled in the training data cluster, and the second sub cluster is labeled with a data label approximated to a data label labeled in the training data cluster. .

제3 비교 결과(521_3)는 테스트 데이터 군집이 다수의 학습 데이터 군집들의 병합으로 판별되고, 다수의 학습 데이터 군집들의 각 군집 특성을 모두 유지하는 경우이다. The third comparison result 521_3 is a case where the test data cluster is determined as a merge of a plurality of training data clusters and maintains all cluster characteristics of the plurality of training data clusters.

이러한 제3 비교 결과(521_3)가 확인되면, 단계 S523C에서, 대응되는 학습 데이터 군집에 레이블링된 데이터 레이블로 테스트 데이터 군집을 레이블링 하는 과정이 수행된다.When the third comparison result 521_3 is confirmed, in step S523C, a process of labeling the test data clusters with data labels labeled in the corresponding learning data clusters is performed.

제4 비교 결과(523_4)는 테스트 데이터 군집이 다수의 학습 데이터 군집이 병합되고, 다수의 학습 데이터 군집 중에서 일부 학습 데이터 군집의 군집 특성만을 유지하여 군집 특성을 유지하지 못한 경우로서, 이러한 경우를 도식적으로 나타내면 도 10 및 11과 같다.The fourth comparison result 523_4 is a case in which a test data cluster is merged with a plurality of training data clusters and fails to maintain cluster characteristics by maintaining only cluster characteristics of some training data clusters among a plurality of training data clusters. 10 and 11 are shown.

도 10은 본 발명의 실시 예에 따른 다수의 학습 데이터 군집이 병합된 일 예를 도식적으로 나타낸 것이다.10 is a diagram schematically showing an example in which a plurality of learning data clusters are merged according to an exemplary embodiment of the present invention.

도 10을 참조하면, 먼저, 테스트 데이터 군집(97)은 환경 속성의 변화 시점(T) 이후에 군집 멤버와 군집 특성이 유지되는 제1 학습 데이터 군집(95A)과 환경 속성의 변화 시점 이후에 군집 멤버는 유지되지만 군집 특성이 변화된 제2 학습 데이터 군집(95B)이 병합된 군집으로 가정한다.Referring to FIG. 10, first, the test data cluster 97 is clustered after the first learning data cluster 95A in which the cluster members and the cluster characteristics are maintained after the change point T of the environment attribute and the change point of the environment attribute. Assume that the second training data cluster 95B in which the member is maintained but the cluster characteristic is changed is a merged cluster.

제4 비교 결과(521_4)에서는, 단계 S523D에서, 게이머의 행동 속성과 온라인 게임의 환경 속성에 따른 게이머의 군집 특성이 시간에 따라 어떻게 변해가는지를 분석하기 위해, 테스트 데이터 군집(97)을 제1 및 제2 서브 군집(95A', 95B')으로 분할하는 과정이 수행된다. In a fourth comparison result 521_4, in step S523D, the test data cluster 97 is first analyzed to analyze how the gamers 'cluster characteristics according to the gamers' behavioral attributes and the online game's environmental attributes change over time. And dividing into second sub-groups 95A 'and 95B'.

제1 서브 군집(95A')은 제1 학습 데이터 군집(95A)과 테스트 데이터 군집(97)에 모두 포함된 특징 벡터들을 테스트 데이터 군집(97)으로부터 검출함으로써, 생성될 수 있다. The first sub-cluster 95A 'may be generated by detecting feature vectors included in both the first training data cluster 95A and the test data cluster 97 from the test data cluster 97.

제2 서브 군집(95B')은 제2 학습 데이터 군집(95B)과 테스트 데이터 군집(97)에 모두 포함되는 특징 벡터들 중에서 환경 속성의 변화 시점(T) 이전의 특징값과 환경 속성의 변화 시점(T) 이후의 특징값의 차이가 허용 오차 범위 내에 있는 특징 벡터들을 검출함으로써, 생성될 수 있다.The second sub-group 95B 'is a point of change of the feature value and the environment attribute before the change point T of the environment attribute among the feature vectors included in both the second training data cluster 95B and the test data cluster 97. The difference of the feature values after (T) can be generated by detecting feature vectors that are within the tolerance range.

테스트 데이터 군집(97)에서 제1 및 제2 서브 군집(95A', 95B')이 분할되면, 단계 S525D에서, 제1 서브 군집(95A')을 제1 학습 데이터 군집(95A)에 레이블링된 데이터 레이블로 레이블링하고, 제2 서브 군집(95B')을 제2 학습 데이터 군집(95B)에 레이블링된 데이터 레이블로 레이블링하는 과정이 수행된다. If the first and second sub-clusters 95A 'and 95B' are divided in the test data cluster 97, in step S525D, the first sub-cluster 95A 'is labeled with the first training data cluster 95A. Labeling is performed, and the second sub-cluster 95B 'is labeled with a data label labeled in the second training data cluster 95B.

도 11은 본 발명의 실시 예에 따른 다수의 학습 데이터 군집이 병합된 다른 예를 도식적으로 나타낸 것이다.11 is a diagram illustrating another example in which a plurality of learning data clusters are merged according to an embodiment of the present invention.

도 11을 참조하면, 먼저, 환경 속성의 변화 시점(T)을 기준으로 2개의 학습 데이터 군집들(71, 72)과 2개의 테스트 데이터 군집들(81, 82)이 가정된다. Referring to FIG. 11, first, two training data clusters 71 and 72 and two test data clusters 81 and 82 are assumed based on a change time point T of an environment attribute.

이때, 제1 테스트 데이터 군집(81)은 제1 학습 데이터 군집(71)의 군집 멤버와 군집 특성을 모두 유지하는 특징 벡터들과 제2 학습 데이터 군집(72)의 군집 멤버를 유지하지만 제2 학습 데이터 군집(72)의 군집 특성을 유지하지 않는 특징 벡터들을 포함하고, 제2 테스트 데이터 군집(82)은 제2 학습 데이터 군집(72)의 군집 멤버와 군집 특성을 모두 유지하는 특징 벡터들을 포함하는 것으로 가정한다.In this case, the first test data cluster 81 retains both the cluster members of the first training data cluster 71 and the cluster members of the second training data cluster 72 while maintaining both cluster members and cluster characteristics. Feature vectors that do not maintain cluster characteristics of the data cluster 72, and the second test data cluster 82 includes feature vectors that maintain both cluster members and cluster characteristics of the second training data cluster 72. Assume that

이러한 제4 비교 결과(523_4)를 가정할 때, 도 7의 단계 S523D에서는, 게이머의 행동 속성과 온라인 게임의 환경 속성에 따른 게이머의 군집 특성이 시간에 따라 어떻게 변해가는지를 분석하기 위해, 제1 테스트 데이터 군집(81)을 제1 및 제2 서브 군집(81A, 81B)로 분할하는 과정이 수행된다. Assuming such a fourth comparison result 523_4, in step S523D of FIG. 7, in order to analyze how gamers 'clustering characteristics according to gamers' behavioral attributes and environment attributes of online games change over time, A process of dividing the test data cluster 81 into first and second sub-groups 81A and 81B is performed.

제1 서브 군집(81A)은 제1 학습 데이터 군집(71)과 제1 테스트 데이터 군집(81)에 모두 포함되는 특징 벡터를 검출함으로써, 생성될 수 있다.The first sub cluster 81A may be generated by detecting a feature vector included in both the first training data cluster 71 and the first test data cluster 81.

제2 서브 군집(81B)은 제2 학습 데이터 군집(72)과 제1 테스트 데이터 군집(81)에 모두 포함되는 특징 벡터들 중에서 환경 속성의 변화 시점(T) 이전의 특징값과 환경 속성 변화 시점 이후의 특징값의 차이가 허용 오차범위 내에 있는 특징 벡터들을 검출함으로써, 생성될 수 있다.The second sub-cluster 81B is a feature value and an environment attribute change point T before the change point T of the environment attribute among the feature vectors included in both the second training data cluster 72 and the first test data cluster 81. The difference of the feature values thereafter may be generated by detecting feature vectors that are within an allowable error range.

제1 테스트 데이터 군집(81)으로부터 제1 및 제2 서브 군집(81A, 81B)의 분할이 완료되면, 도 7의 단계 S525D에서, 제1 서브 군집(81A)은 제1 학습 데이터 군집에 레이블링된 데이터 레이블로 레이블링하고, 제2 서브 군집(81B)은 제2 학습 데이터 군집(72)에 레이블링된 데이터 레이블로 레이블링하는 과정이 수행된다. When the division of the first and second sub-clusters 81A and 81B from the first test data cluster 81 is completed, in step S525D of FIG. 7, the first sub-cluster 81A is labeled with the first training data cluster. Labeling is performed with the data label, and the second sub-cluster 81B is labeled with the data label labeled with the second learning data cluster 72.

한편, 제2 테스트 데이터 군집(82)은 제2 학습 데이터 군집(72)의 군집 멤버와 군집 특성을 모두 유지하기 때문에, 제2 학습 데이터 군집에 레이블링된 데이터 레이블로 레이블링될 수 있다.Meanwhile, since the second test data cluster 82 maintains both cluster members and cluster characteristics of the second training data cluster 72, the second test data cluster 82 may be labeled with a data label labeled in the second training data cluster.

제1 내지 제4 비교 결과에 따라 레이블링된 데이터 레이블은 게이머의 행동 유형을 분류하는 분류기의 학습 데이터로 사용된다. 이렇게 학습된 분류기는 게이머의 행동 유형을 정확하게 예측할 수 있고, 온라인 게임 내에서 게이머의 행동을 다이나믹(dynamics)하게 이해하여 게임 운영에 대한 적절한 가이드라인을 제공할 수 있다.The data labels labeled according to the first to fourth comparison results are used as learning data of the classifier classifying gamers' behavior types. The learned classifier can accurately predict the type of behavior of the gamers, and can provide a proper guideline for game operation by dynamically understanding the gamers' behavior within the online game.

또한, 환경 속성의 변화 시점(T) 이전의 군집들이 환경 속성의 변화 시점(T) 이후에 그 군집 멤버와 군집 특성을 유지하지 못하여 새로운 군집으로 재군집화되는 경우에도 새롭게 재군집화된 군집에 포함된 특징 벡터들 중에서 환경 속성의 변화 시점(T) 이전의 군집 특성을 유지하는 특징 벡터들을 서브 군집으로 생성하고, 생성된 서브 군집에 대한 레이블링 과정을 통해 서브 군집에 대한 속성을 기록함으로써, 게이머의 행동 속성과 온라인 게임의 환경 속성에 따른 게이머의 군집 특성이 시간에 따라 어떻게 변해가는지를 정확히 분석할 수 있다.In addition, even if the clusters before the change point of the environmental property (T) are regrouped into new clusters because they cannot maintain the cluster members and the cluster characteristics after the change point (T) of the environment property, they are included in the newly regrouped cluster. Gamers' behavior by generating feature vectors that maintain cluster characteristics before the change point (T) of environmental properties among the feature vectors, and recording the properties of the sub clusters through the labeling process of the generated sub clusters. We can accurately analyze how gamers' clustering characteristics change according to the attributes and the environmental attributes of online games.

또한, 군집 특성을 유지하는 서브 군집에 대한 기록을 통해, 게이머가 기존 행동 유형 A에서 행동 유형 B로 일시적으로 변화한 후, 다시 기존 행동 유형 A로 돌아가는 지 아니면 새로운 행동 유형의 군집을 형성하는 지를 판단할 수 있다.In addition, a record of subgroups that maintain cluster characteristics can be used to determine whether gamers temporarily change from existing behavior type A to behavior type B and then return to existing behavior type A or form a cluster of new behavior types. You can judge.

이상에서 본 발명에 대하여 실시예를 중심으로 설명하였으나 이는 단지 예시일 뿐 본 발명을 한정하는 것이 아니며, 본 발명이 속하는 분야의 통상의 지식을 가진 자라면 본 발명의 본질적인 특성을 벗어나지 않는 범위에서 이상에 예시되지 않은 여러 가지의 변형과 응용이 가능함을 알 수 있을 것이다. 예를 들어, 본 발명의 실시예에 구체적으로 나타난 각 구성 요소는 변형하여 실시할 수 있는 것이다. 그리고 이러한 변형과 응용에 관계된 차이점들은 첨부된 청구 범위에서 규정하는 본 발명의 범위에 포함되는 것으로 해석되어야 할 것이다.Although the present invention has been described above with reference to the embodiments, these are only examples and are not intended to limit the present invention, and those skilled in the art to which the present invention pertains may have an abnormality within the scope not departing from the essential characteristics of the present invention. It will be appreciated that various modifications and applications are not illustrated. For example, each component specifically shown in the embodiment of the present invention can be modified. And differences relating to such modifications and applications will have to be construed as being included in the scope of the invention defined in the appended claims.

Claims

온라인 게임에서 게이머 행동 유형을 분류하는 분류기의 학습 방법에서,
특징 벡터 추출 알고리즘을 이용하여, 수집된 게이머의 행동 속성 및 온라인 게임의 환경 속성과 관련된 데이터로부터 특징 벡터들을 추출하는 단계;
군집화 알고리즘을 이용하여, 상기 추출된 특징 벡터들을 군집화하여 학습 데이터 군집을 생성하는 단계;
상기 학습 데이터 군집에 대해 데이터 레이블을 레이블링하는 단계; 및
기계학습 알고리즘을 이용하여 상기 데이터 레이블로 상기 분류기를 학습시키는 단계
를 포함하는 분류기의 학습 방법.
In the classifier's learning method of classifying gamer behavior types in online games,
Extracting feature vectors from data associated with the collected gamers' behavioral attributes and the online attributes of the game using a feature vector extraction algorithm;
Generating a training data cluster by clustering the extracted feature vectors using a clustering algorithm;
Labeling data labels for the training data clusters; And
Training the classifier with the data label using a machine learning algorithm
Learning method of the classifier comprising a.

제1항에서, 상기 추출하는 단계에서, 상기 게이머의 행동 속성과 관련된 데이터는,
캐릭터 행동과 관련된 로그 데이터와 게이머 행동과 관련된 원시(raw) 데이터를 포함함을 특징으로 하는 분류기의 학습 방법.
The method of claim 1, wherein in the extracting, the data related to the behavior attribute of the gamer is
A classifier learning method comprising log data related to character behavior and raw data related to gamer behavior.

제2항에서, 상기 캐릭터 행동과 관련된 로그 데이터는,
캐릭터가 다른 게임 유저의 캐릭터나 몬스터들을 공격하는 동작 횟수와 관련된 데이터 및 캐릭터가 다른 게임 유저의 캐릭터나 몬스터들의 공격을 회피하는 동작 횟수와 관련된 데이터를 포함하고,
상기 게이머 행동과 관련된 원시 데이터는,
상기 분류기가 탑재된 전자 장치의 입력 장치로부터 수집되는 데이터이고,
상기 입력 장치로부터 수집되는 데이터는,
게이머가 마우스를 움직이는 속도와 관련된 데이터, 게이머가 키보드를 터치하는 속도와 관련된 데이터, 마이크를 통해 수집된 게이머의 음성톤과 관련된 데이터 및 카메라가 캡쳐한 게이머의 영상과 관련된 데이터
를 포함하는 것인 분류기의 학습 방법.
The method of claim 2, wherein the log data related to the character behavior,
And data related to the number of actions that the character attacks other game users' characters or monsters, and data related to the number of actions that the character avoids attacking characters or monsters of other game users,
Raw data related to the gamer behavior,
Data collected from an input device of the electronic device equipped with the classifier,
Data collected from the input device,
Data related to the speed at which the gamer moves the mouse, data related to the speed at which the gamer touches the keyboard, data related to the voice tone of the gamer collected through the microphone, and data related to the video of the gamer captured by the camera.
Learning method of the classifier that includes.

제1항에서, 상기 추출하는 단계에서, 상기 온라인 게임의 환경 속성과 관련된 데이터는,
상기 온라인 게임의 내부적 환경 속성과 관련된 데이터 및 상기 온라인 게임의 외부적 환경 속성과 관련된 데이터
를 포함하는 것인 분류기의 학습 방법.
The method of claim 1, wherein in the extracting, the data related to the environmental property of the online game is
Data related to an internal environment property of the online game and data related to an external environment property of the online game
Learning method of the classifier that includes.

제4항에서, 상기 내부적 환경 속성과 관련된 데이터는,
상기 온라인 게임의 업데이트 일정과 관련된 데이터를 포함하고,
상기 외부적 환경 속성과 관련된 데이터는,
SMS 서버를 통해 수집된 상기 온라인 게임에 대한 게시글 및 댓글과 관련된 데이터, 다른 온라인 게임의 배포 일정 또는 업데이트 일정과 관련된 데이터 및, 날씨, 휴일, 계절과 관련된 데이터
를 포함하는 것인 분류기의 학습 방법.
The method of claim 4, wherein the data related to the internal environmental property is
Including data related to the update schedule of the online game,
The data related to the external environmental property,
Data related to posts and comments about the online game collected through the SMS server, data related to the distribution schedule or update schedule of other online games, and data related to weather, holidays, and seasons.
Learning method of the classifier that includes.

제1항에서, 상기 군집화 단계는,
주어진 기간 동안 일 단위(day unit)로 수행되는 것인 분류기의 학습 방법.
The method of claim 1, wherein the clustering step,
A classifier's learning method, which is performed in day units for a given time period.

제1항에서, 상기 분류기를 학습시키는 단계는,
상기 환경 속성의 변화 시점 이후에 군집화된 테스트 데이터 군집이 상기 환경 속성의 변화 시점 이전에 군집화된 상기 학습 데이터 군집의 군집 멤버와 군집 특성을 유지하는지를 판별하기 위해, 상기 테스트 데이터 군집과 상기 학습 데이터 군집을 비교하는 단계;
상기 군집 멤버와 상기 군집 특성을 비교한 결과에 따라, 상기 테스트 데이터 군집을 다수의 서브 군집으로 분할하고, 상기 다수의 서브 군집을 상기 학습 데이터 군집에 레이블링된 데이터 레이블을 이용하여 레이블링하는 단계; 및
상기 다수의 서브 군집에 레이블링된 데이터를 이용하여 상기 분류기를 학습시키는 단계
를 포함하는 것인 분류기의 학습 방법.
The method of claim 1, wherein the training of the classifier comprises:
The test data cluster and the learning data cluster to determine whether the test data cluster clustered after the change of the environmental attribute maintains cluster members and cluster characteristics of the learning data cluster clustered before the change of the environmental attribute. Comparing the;
Partitioning the test data cluster into a plurality of sub clusters according to a result of comparing the cluster member with the cluster characteristics, and labeling the plurality of sub clusters using data labels labeled in the learning data clusters; And
Training the classifier using data labeled in the plurality of subgroups
Learning method of the classifier that includes.

제7항에서, 상기 판별하는 단계는,
상기 학습 데이터 군집에 포함된 전체 특징 벡터들 중에서 상기 학습 데이터 군집과 상기 테스트 데이터 군집에 모두 포함된 특징 벡터들의 비율이 임계치 보다 크면, 상기 테스트 데이터 군집은 상기 학습 데이터 군집의 군집 멤버를 유지하는 것으로 판별하는 단계; 및
상기 학습 데이터 군집의 대표값과 상기 테스트 데이터 군집의 대표값의 차이값이 허용 오차 범위 이내인 경우, 상기 테스트 데이터 군집은 상기 학습 데이터 군집의 군집 특성을 유지하는 것으로 판별하는 단계
를 포함하는 것인 분류기의 학습 방법.
The method of claim 7, wherein the determining step,
If the ratio of the feature vectors included in both the training data cluster and the test data cluster is greater than a threshold among all the feature vectors included in the training data cluster, the test data cluster maintains a cluster member of the training data cluster. Determining; And
When the difference between the representative value of the training data cluster and the representative value of the test data cluster is within an allowable error range, determining that the test data cluster maintains the clustering characteristics of the training data cluster.
Learning method of the classifier that includes.

제7항에서, 상기 레이블링 하는 단계는,
상기 판별한 결과에 따라, 상기 테스트 데이터 군집이 상기 학습 데이터 군집의 상기 군집 멤버와 상기 군집 특성을 유지하는 것으로 판별되는 경우,
상기 학습 데이터 군집에 레이블링된 데이터 레이블을 이용하여 상기 테스트 데이터 군집을 레이블링하는 것인 분류기의 학습 방법.
The method of claim 7, wherein the labeling step,
If it is determined that the test data cluster maintains the cluster member and the cluster characteristic of the training data cluster according to the determined result,
And classifying the test data clusters using the data labels labeled in the training data clusters.

제7항에서, 상기 레이블링 하는 단계는,
상기 판별한 결과에 따라, 상기 테스트 데이터 군집이 상기 학습 데이터 군집의 상기 군집 멤버를 유지하고, 상기 학습 데이터의 상기 군집 특성을 유지하는 않는 것으로 판별되는 경우,
상기 학습 데이터 군집과 상기 테스트 데이터 군집에 모두 포함되는 특징 벡터들을 검출하고, 상기 검출된 특징 벡터들 중에서 상기 환경 속성의 변화 시점 이전의 특징값과 상기 환경 속성의 변화 시점 이후의 특징값의 차이가 허용 오차 범위 내에 있는 특징 벡터들과 상기 허용 오차 범위를 벗어난 특징 벡터들을 검출하는 단계;
상기 테스트 데이터 군집을 상기 허용 오차 범위 내에 있는 특징 벡터들을 포함하는 제1 서브 군집과 상기 허용 오차 범위를 벗어난 특징 벡터들을 포함하는 제2 서브 군집으로 분할하는 단계; 및
상기 제1 서브 군집을 상기 학습 데이터 군집에 레이블링된 데이터 레이블로 레이블링하고, 상기 제2 서브 군집을 상기 학습 데이터 군집에 레이블링된 데이터 레이블에 근사화된 데이터 레이블로 레이블링하는 단계
를 포함하는 것인 분류기의 학습 방법.
The method of claim 7, wherein the labeling step,
If it is determined that the test data cluster maintains the cluster member of the training data cluster and does not maintain the cluster characteristic of the training data according to the determined result,
The feature vectors included in both the training data cluster and the test data cluster are detected, and a difference between the feature value before the change point of the environment property and the feature value after the change point of the environment property is detected. Detecting feature vectors within a tolerance range and feature vectors outside the tolerance range;
Dividing the test data cluster into a first subgroup including feature vectors within the tolerance range and a second subgroup including feature vectors outside the tolerance range; And
Labeling the first sub-cluster with a data label labeled in the training data cluster, and labeling the second sub-cluster with a data label approximated to a data label labeled in the training data cluster.
Learning method of the classifier that includes.

제7항에서, 상기 레이블링 하는 단계는,
상기 테스트 데이터 군집이 상기 환경 속성의 변화 시점 이후에 군집 멤버와 군집 특성이 유지되는 제1 학습 데이터 군집과 환경 속성의 변화 시점 이후에 군집 멤버는 유지되고 군집 특성이 변화된 제2 학습 데이터 군집이 병합된 것으로 판별되는 경우,
상기 제1 학습 데이터 군집과 상기 테스트 데이터 군집에 모두 포함된 특징 벡터들을 포함하는 제1 서브 군집과 상기 제2 학습 데이터 군집과 상기 테스트 데이터 군집에 모두 포함되는 특징 벡터들 중에서 상기 환경 속성의 변화 시점 이전의 특징값과 환경 속성의 변화 시점 이후의 특징값의 차이가 허용 오차 범위 내에 있는 특징 벡터들을 포함하는 제2 서브 군집을 상기 테스트 데이터 군집으로부터 분할하는 단계; 및
상기 제1 서브 군집을 상기 제1 학습 데이터 군집에 레이블링된 데이터 레이블로 레이블링하고, 상기 제2 서브 군집을 상기 제2 학습 데이터 군집에 레이블링된 데이터 레이블로 레이블링하는 단계
를 포함하는 것인 분류기의 학습 방법.
The method of claim 7, wherein the labeling step,
The first training data cluster in which the test data cluster maintains a cluster member and cluster characteristics after the change point of the environment property and the second training data cluster in which the cluster member is maintained and the cluster property is changed after the change point of the environment property are merged. If it is determined that
A point of change of the environment property among feature vectors included in both the first sub-collection and the second learning data cluster and the test data cluster, including the feature vectors included in both the first training data cluster and the test data cluster. Dividing from the test data cluster a second sub-group comprising feature vectors whose difference between the previous feature value and the feature value after the change point of the environmental property is within an allowable error range; And
Labeling the first subgroup with a data label labeled in the first training data cluster and labeling the second subgroup with a data label labeled in the second training data cluster
Learning method of the classifier that includes.

제7항에서, 상기 레이블링 하는 단계는,
상기 학습 데이터 군집은 제1 및 제2 학습 데이터 군집을 포함하고, 상기 테스트 데이터 군집은 제1 및 제2 테스트 데이터 군집을 포함하고, 제1 테스트 데이터 군집은 상기 제1 학습 데이터 군집의 군집 멤버와 군집 특성을 모두 유지하는 특징 벡터들과 상기 제2 학습 데이터 군집의 군집 멤버를 유지하고 상기 제2 학습 데이터 군집의 군집 특성을 유지하지 않는 특징 벡터들을 포함하고, 상기 제2 테스트 데이터 군집은 제2 학습 데이터 군집의 군집 멤버와 군집 특성을 모두 유지하는 특징 벡터들을 포함하는 것으로 판별된 경우,
상기 제1 테스트 데이터 군집을 상기 제1 학습 데이터 군집과 제1 테스트 데이터 군집에 모두 포함되는 특징 벡터들을 포함하는 제1 서브 군집과 상기 제2 학습 데이터 군집과 상기 제1 테스트 데이터 군집에 모두 포함되는 특징 벡터들 중에서 상기 환경 속성의 변화 시점 이전의 특징값과 상기 환경 속성의 변화 시점 이후의 특징값의 차이가 허용 오차범위 내에 있는 특징 벡터들을 포함하는 제2 서브 군집으로 분할하는 단계; 및
상기 제1 서브 군집을 상기 제1 학습 데이터 군집에 레이블링된 데이터 레이블로 레이블링하고, 상기 제2 서브 군집을 상기 제2 학습 데이터 군집에 레이블링된 데이터 레이블로 레이블링하고, 상기 제2 테스트 데이터 군집을 상기 제2 학습 데이터 군집에 레이블링된 데이터 레이블로 레이블링하는 단계
를 포함하는 것인 분류기의 학습 방법.
The method of claim 7, wherein the labeling step,
The training data cluster includes first and second training data clusters, wherein the test data cluster includes first and second test data clusters, and the first test data cluster includes cluster members of the first training data cluster. And feature vectors that maintain both cluster characteristics and feature vectors that maintain a cluster member of the second training data cluster and do not maintain the cluster characteristics of the second training data cluster. If it is determined to include feature vectors that maintain both cluster members and cluster characteristics of the training data cluster,
The first test data cluster is included in both the first sub data cluster including the feature vectors included in both the first training data cluster and the first test data cluster. Dividing among feature vectors into second sub-groups including feature vectors in which a difference between a feature value before the change point of the environment property and a feature value after the change point of the environment property is within an allowable error range; And
Label the first subgroup with a data label labeled in the first training data cluster, label the second subgroup with a data label labeled in the second training data cluster, and label the second test data cluster with the data label. Labeling with Labeled Data Labels in a Second Learning Data Cluster
Learning method of the classifier that includes.

온라인 게임에서 게이머 행동 유형을 분류하는 분류기를 포함하는 장치에서,
통신망에 접속된 상기 온라인 게임을 서비스하는 게임 서버, 다른 온라인 게임을 서비스하는 다른 온라인 게임 서버 및 SNS 서버와 통신하는 통신 모듈;
상기 통신 모듈을 통해 게이머의 행동 속성 및 온라인 게임의 환경 속성과 관련된 데이터를 수집하는 데이터 수집 모듈; 및
특징 벡터 추출 알고리즘을 실행하여 상기 수집된 게이머의 행동 속성 및 온라인 게임의 환경 속성과 관련된 데이터로부터 특징 벡터들을 추출하고, 군집화 알고리즘을 실행하여 상기 추출된 특징 벡터들이 군집화된 학습 데이터 군집을 생성하고, 상기 학습 데이터 군집에 대해 데이터 레이블을 레이블링하고, 상기 레이블링된 데이터 레이블을 이용하여 상기 분류기를 학습시키는 프로세서 모듈을
포함하는 분류기를 포함하는 장치
In a device that includes a classifier that classifies gamer behavior types in online games,
A communication module for communicating with a game server for servicing the online game connected to a communication network, another online game server for servicing another online game, and an SNS server;
A data collection module that collects data related to behavioral attributes of gamers and environmental attributes of online games through the communication module; And
A feature vector extraction algorithm is performed to extract feature vectors from the collected gamers' behavioral attributes and online game's environment attributes, and a clustering algorithm is performed to generate a cluster of learning data clustered with the extracted feature vectors; A processor module for labeling a data label for the training data cluster and for training the classifier using the labeled data label;
Device with classifier containing

제13항에서, 상기 게이머의 행동 속성과 관련된 데이터는,
캐릭터 행동과 관련된 로그 데이터와 게이머 행동과 관련된 원시(raw) 데이터를 포함하는 것인 분류기를 포함하는 장치.
The method of claim 13, wherein the data related to the behavior attribute of the gamer is
And a classifier that includes log data related to character behavior and raw data related to gamer behavior.

제13항에서, 상기 온라인 게임의 환경 속성과 관련된 데이터는,
상기 온라인 게임의 내부적 환경 속성과 관련된 데이터 및 상기 온라인 게임의 외부적 환경 속성과 관련된 데이터 를 포함하는 것인 분류기를 포함하는 장치.
The method of claim 13, wherein the data related to the environmental attributes of the online game are:
And a data related to the internal environmental properties of the online game and data related to the external environmental properties of the online game.

제13항에서, 상기 프로세서 모듈은,
상기 추출된 특징 벡터들을 주어진 기간 동안 일 단위(day unit)로 군집화하는 것인 분류기를 포함하는 장치.
The processor module of claim 13, wherein the processor module comprises:
And classify the extracted feature vectors into day units for a given time period.

제1항에서, 상기 프로세서 모듈은,
상기 환경 속성의 변화 시점 이후에 군집화된 테스트 데이터 군집이 상기 환경 속성의 변화 시점 이전에 군집화된 상기 학습 데이터 군집의 군집 멤버와 군집 특성을 유지하는지를 판별하기 위해, 상기 테스트 데이터 군집과 상기 학습 데이터 군집을 비교하는 프로세스,
상기 군집 멤버와 상기 군집 특성을 비교한 결과에 따라, 상기 테스트 데이터 군집을 다수의 서브 군집으로 분할하고, 상기 다수의 서브 군집을 상기 학습 데이터 군집에 레이블링된 데이터 레이블을 이용하여 레이블링하는 프로세스, 및
상기 다수의 서브 군집에 레이블링된 데이터를 이용하여 상기 분류기를 학습시키는 프로세스
를 실행하는 것인 분류기를 포함하는 장치.The method of claim 1, wherein the processor module,
The test data cluster and the learning data cluster to determine whether the test data cluster clustered after the change of the environmental attribute maintains cluster members and cluster characteristics of the learning data cluster clustered before the change of the environmental attribute. Process of comparing
According to a result of comparing the cluster member with the cluster characteristic, dividing the test data cluster into a plurality of sub clusters, and labeling the plurality of sub clusters using data labels labeled in the learning data clusters; and
A process of training the classifier using data labeled in the plurality of subgroups
Apparatus comprising a classifier to execute.