KR102555733B1

KR102555733B1 - Object management for improving machine learning performance, control method thereof

Info

Publication number: KR102555733B1
Application number: KR1020200186393A
Authority: KR
Inventors: 남준; 정진호
Original assignee: 케이웨어 (주)
Priority date: 2020-12-29
Filing date: 2020-12-29
Publication date: 2023-07-14
Also published as: KR20220094796A

Abstract

본 발명은 머신러닝 성능 향상을 위한 객체 관리 서버 및 그 제어방법에 관한 것이다. 본 발명에 따른 객체 관리 서버가 수행하는 제어방법은, 원본 데이터에 대해서 최초 사용자가 지정한 기준에 따라 패턴을 구분하여 패턴별 데이터를 클러스터링하는 단계; 상기 클러스터별 라벨이 설정되지 않은 데이터를 인식하여 사용자에게 1차 라벨링을 요청하는 단계; 상기 사용자에 의해 입력된 내용을 참고하여 데이터 별 라벨을 설정하는 단계; 라벨링 된 데이터를 이용해 모델을 학습시키고 학습성능 결과를 바탕으로 추가 라벨링 필요 여부를 판단하여 사용자에게 안내하는 단계를 포함하는 것을 특징으로 한다.The present invention relates to an object management server and a control method thereof for improving machine learning performance. A control method performed by an object management server according to the present invention includes clustering data for each pattern by classifying original data according to a criterion specified by a first user; recognizing the data for which the label for each cluster is not set and requesting primary labeling from the user; setting a label for each data by referring to the contents input by the user; It is characterized by including the step of learning a model using the labeled data, determining whether additional labeling is necessary based on the learning performance result, and guiding the user.

Description

머신러닝 성능 향상을 위한 객체 관리 서버 및 그 제어방법{OBJECT MANAGEMENT FOR IMPROVING MACHINE LEARNING PERFORMANCE, CONTROL METHOD THEREOF}Object management server and its control method for improving machine learning performance

본 발명은 객체 관리 서버 및 그 제어방법에 관한 것으로, 보다 상세하게는 머신러닝 모델들의 학습 성능 향상을 위한 데이터 내 객체들에 대한 데이터 라벨링을 처리하는 객체 관리 서버 및 그 제어방법에 관한 것이다.The present invention relates to an object management server and a control method thereof, and more particularly, to an object management server processing data labeling for objects in data for improving learning performance of machine learning models and a control method thereof.

AI 기술이 발전함에 따라, 다양한 분야에 AI가 활용되며 AI의 성능에 절대적인 영향을 주는 학습용 데이터의 생산 방법과 품질향상 방법에 대한 요구도 증가하고 있다.As AI technology develops, AI is used in various fields, and the demand for methods of producing and improving the quality of learning data that has an absolute impact on the performance of AI is also increasing.

AI를 학습시키기 위해서는 학습을 위한 별도의 학습용 데이터를 만들어야 하는데, 이를 데이터 라벨링(data labeling)이라 하며, 사람이 일일이 데이터를 확인하여 라벨을 설정해야 하기 때문에 라벨 설정 작업에 많은 시간과 비용이 소요될 수 밖에 없다.In order to train AI, it is necessary to create separate training data for learning, which is called data labeling, and since a person must check the data and set the label, it can take a lot of time and money to set the label. there is only

또한, 학습성능 향상을 위해서는 어떤 데이터를 얼마나 학습시키는가가 매우 중요해 무작정 데이터 라벨링을 진행할 경우 투입한 시간과 비용 대비 만족스러운 학습성능을 기대하기 어렵다는 문제도 있다.In addition, since it is very important to learn how much data to improve learning performance, it is difficult to expect satisfactory learning performance compared to the time and cost invested in blindly labeling data.

따라서, 데이터 라벨 대상 및 라벨링 양의 선택에 있어 학습성능을 보장하면서 분석 대상 데이터를 목적에 맞게 분류해 라벨링 현황과 기대학습 성능을 관리하여 라벨링 생산성을 향상시키고자 하는 요구가 증대되고 있으며, 상술한 문제점들을 해결할 수 있는 방안이 시급한 실정이다.Therefore, there is an increasing demand to improve labeling productivity by managing the labeling status and expected learning performance by classifying the data to be analyzed according to the purpose while ensuring the learning performance in selecting the data label target and labeling amount. A solution to the problems is urgently needed.

등록특허 제10-2147097호Registered Patent No. 10-2147097

본 발명은 상기한 종래의 문제점을 해결하기 위해 안출된 것으로서, 그 목적은 기계 학습을 위한 데이터 라벨 대상 및 라벨링 양을 적절한 수로 보장하고, 기계 학습의 효과를 극대화하도록 하는 객체 관리 서버 및 그 제어방법을 제공하는 것이다. The present invention has been made to solve the above conventional problems, and its object is to ensure an appropriate number of data label objects and labeling amounts for machine learning, and to maximize the effect of machine learning. An object management server and a control method thereof is to provide

상기한 목적을 달성하기 위해 본 발명에 따른 데이터의 라벨링을 효과적으로 수행하기 위한 객체 관리 서버는, 사용자 단말, 데이터 관리 서버, 별도의 데이터베이스 등 다양한 방식을 통해 입력되는 데이터 정보를 획득하고 사용자가 설정한 기준에 따라 데이터를 분류하여 클러스터링하는 데이터 관리부; 데이터 관리부에서 획득/분류한 클러스터별 데이터를 분석해 객체를 인식하고 객체별 라벨링 설정 여부를 확인하는 객체 분석부; 클러스터별 라벨링 된 데이터와 라벨링 되지 않은 데이터 현황 정보를 사용자에게 시각화하고, 라벨 분석부로부터 획득한 학습성능 분석정보와 라벨링 대상 추천 정보를 통해 추가 라벨링양과 라벨링 추천 대상 데이터를 제시하며, 사용자가 라벨을 설정할 수 있는 라벨 설정부; 실시간으로 라벨링 된 데이터를 라벨링 되지 않은 데이터와 비교분석하여 라벨링 된 데이터와 다른 형식의 라벨링 되지 않은 데이터에 대한 라베링 추천과 라벨링 된 데이터를 활용한 머신러닝 학습성능을 분석하여 결과를 제시하는 라벨 분석부를 포함하여 구성된다.In order to achieve the above object, an object management server for effectively labeling data according to the present invention obtains data information input through various methods such as a user terminal, a data management server, and a separate database, and sets a data management unit that classifies and clusters data according to criteria; an object analysis unit that recognizes objects by analyzing data for each cluster acquired/classified by the data management unit and checks whether labeling for each object is set; It visualizes the status information of labeled data and unlabeled data by cluster to the user, presents additional labeling amount and labeling recommendation target data through the learning performance analysis information obtained from the label analysis unit and labeling target recommendation information, and allows the user to select the label. Label setting unit that can be set; Label analysis that compares and analyzes labeled data with unlabeled data in real time, recommends labeling for labeled data and unlabeled data in different formats, and analyzes machine learning learning performance using labeled data to present results It consists of wealth.

여기서,상기 라벨 설정부는, 상기 데이터 관리부와 객체 관리부에서 획득한 클러스터별 라벨링 대상 데이터(객체)의 전체 양과 라벨링 설정 여부 비율 및 동일 클러스터 내 객체빈도 정보를 함께 사용자에게 제공할 수 있다.Here, the label setting unit may provide a total amount of labeling target data (objects) for each cluster acquired by the data management unit and the object management unit, a labeling setting ratio, and object frequency information in the same cluster together to provide the user.

또, 상기한 목적을 달성하기 위해 본 발명에 따른 객체 관리 서버가 수행하는 제어방법은, 원본 데이터에 대해서 최초 사용자가 지정한 기준에 따라 패턴을 구분하여 패턴별 데이터를 클러스터링하는 단계; 상기 클러스터별 라벨이 설정되지 않은 데이터를 인식하여 사용자에게 1차 라벨링을 요청하는 단계; 상기 사용자에 의해 입력된 내용을 참고하여 데이터 별 라벨을 설정하는 단계; 라벨링 된 데이터를 이용해 모델을 학습시키고 학습성능 결과를 바탕으로 추가 라벨링 필요 여부를 판단하여 사용자에게 안내하는 단계를 포함하여 이루어질 수 있다.In addition, the control method performed by the object management server according to the present invention in order to achieve the above object includes clustering data for each pattern by classifying patterns according to a criterion designated by a first user with respect to original data; recognizing the data for which the label for each cluster is not set and requesting primary labeling from the user; setting a label for each data by referring to the contents input by the user; It may include a step of learning a model using the labeled data, determining whether additional labeling is necessary based on the learning performance result, and guiding the user.

여기서, 사용자가 지정한 분류 기준에 따라 클러스터링한 데이터 클러스터별 데이터양, 라벨링 된 데이터, 라벨링 되지 않은 데이터를 구분하여 사용자에게 제공하는 단계를 더 포함할 수 있다.Here, the method may further include classifying data amount, labeled data, and unlabeled data for each data cluster clustered according to a classification criterion designated by the user, and providing the data to the user.

이상 설명한 바와 같이 본 발명에 따르면, 사용자의 기준에 따른 데이터 패턴별 클러스터 분류를 통해 데이터 패턴 분포를 확인하여 편중되지 않은 고른 학습 데이터 구성이 가능하고, 실시간으로 라벨 설정된 학습 데이터를 모델에 학습시켜 학습성능을 확인할 수 있으므로, 객체의 라벨 설정 및 라벨링 처리 과정을 수행하는데 있어, 효과적이고 정확한 라벨 설정 대상 데이터 선별과 최적의 학습 데이터 양 선택이 가능해 라벨링 작업 생산성과 학습성능을 증대시킬 수 있다.As described above, according to the present invention, it is possible to configure unbiased and uniform training data by checking the data pattern distribution through cluster classification for each data pattern according to the user's criterion, and learning by training the model with the training data set in real time. Since performance can be checked, it is possible to effectively and accurately select target data for label setting and select the optimal amount of learning data in performing label setting and labeling processing of objects, thereby increasing labeling work productivity and learning performance.

즉, 최초 입력된 데이터의 패턴을 사용자가 제시한 기준에 따라 구분하여 패턴별 데이터 분포를 파악하고 적절한 데이터를 선택하여 라벨 설정을 수행할 수 있도록 지원하며, 작업자에 의해 라벨이 설정되면 라벨링 설정에 따른 실시간 학습성능 평가를 자동으로 수행하고, 학습에 필요한 데이터를 기계가 판단하여 작업자에게 라벨링을 요청하는 과정을 통해 라벨링된 고른 데이터들에 대한 학습이 이루어지도록 할 수 있다.In other words, the pattern of the initially input data is classified according to the criteria presented by the user, the data distribution for each pattern is identified, and appropriate data is selected to support label setting. It is possible to automatically perform real-time learning performance evaluation according to the present invention, and to perform learning on selected and labeled data through a process in which the machine determines data necessary for learning and requests labeling from an operator.

도 1은 본 발명의 일 실시예에 따른 라벨링 관리 시스템을 도시한 도면이고,
도 2는 본 발명의 일 실시예에 따른 객체 관리 서버의 구성을 도시한 블록도이고,
도 3은 본 발명의 일 실시예에 따른 데이터의 라벨링을 처리하는 과정을 도시한 도면이고,
도 4는 본 발명의 일 실시예에 따른 원본 데이터의 패턴 클러스터별 라벨링 설정 과정을 도시한 도면이다.1 is a diagram showing a labeling management system according to an embodiment of the present invention,
2 is a block diagram showing the configuration of an object management server according to an embodiment of the present invention;
3 is a diagram illustrating a process of processing labeling of data according to an embodiment of the present invention;
4 is a diagram illustrating a labeling setting process for each pattern cluster of original data according to an embodiment of the present invention.

이하에서는 첨부도면을 참조하여 본 발명에 대해 상세히 설명한다.Hereinafter, the present invention will be described in detail with reference to the accompanying drawings.

이하 본 발명에 따른 각 실시예는 본 발명의 이해를 돕기 위한 하나의 예에 불과하고, 본 발명이 이러한 실시예에 한정되는 것은 아니다. 특히 본 발명은 각 실시예에 포함되는 개별 구성, 개별 기능, 또는 개별 단계 중 적어도 어느 하나 이상의 조합으로 구성될 수 있다.Hereinafter, each embodiment according to the present invention is only one example to aid understanding of the present invention, and the present invention is not limited to these embodiments. In particular, the present invention may be composed of at least one or more combinations of individual components, individual functions, or individual steps included in each embodiment.

특히, 편의상 청구 범위의 일부 청구항에는 '(a)'와 같은 알파벳을 포함시켰으나, 이러한 알파벳이 각 단계의 순서를 규정하는 것은 아니다.In particular, although alphabets such as '(a)' are included in some claims of the claims for convenience, these alphabets do not prescribe the order of each step.

또한 이하 본 발명에 따른 각 실시예에서 언급하는 각 신호는 한 번의 연결 등에 의해 전송되는 하나의 신호를 의미할 수도 있지만, 후술하는 특정 기능 수행을 목적으로 전송되는 일련의 신호 그룹을 의미할 수도 있다. 즉, 각 실시예에서는 소정의 시간 간격을 두고 전송되거나 상대 장치로부터의 응답 신호를 수신한 이후에 전송되는 복수 개의 신호들이 편의상 하나의 신호명으로 표현될 수 있는 것이다.In addition, each signal referred to in each embodiment according to the present invention may refer to one signal transmitted through a single connection or the like, but may also refer to a group of signals transmitted for the purpose of performing a specific function described later. . That is, in each embodiment, a plurality of signals transmitted at predetermined time intervals or transmitted after receiving a response signal from the counterpart device may be expressed as one signal name for convenience.

즉, 본 발명은 여러 가지 상이한 형태로 구현될 수 있으며, 따라서 여기에서 설명하는 실시예로 한정되는 것은 아니고, 도면에서 본 발명을 명확하게 설명하기 위해서 설명과 관계없는 부분은 생략하였으며, 명세서 전체를 통하여 유사한 부분에 대해서는 유사한 도면 부호를 붙였다. That is, the present invention can be implemented in many different forms, and therefore is not limited to the embodiments described herein, and parts irrelevant to the description have been omitted to clearly describe the present invention in the drawings, and the entire specification Similar reference numerals have been assigned to similar parts throughout.

명세서 전체에서, 어떤 부분이 다른 부분과 "연결"되어 있다고 할 때, 이는 "직접적으로 연결"되어 있는 경우뿐 아니라, 그 중간에 다른 부재를 사이에 두고 "간접적으로 연결"되어 있는 경우도 포함한다. 또한, 어떤 부분이 어떤 구성요소를 "포함"한다고 할 때, 이는 특별히 반대되는 기재가 없는 한 다른 구성요소를 제외하는 것이 아니라 다른 구성요소를 더 구비 할 수 있다는 것을 의미한다. Throughout the specification, when a part is said to be "connected" to another part, this includes not only the case where it is "directly connected" but also the case where it is "indirectly connected" with another member interposed therebetween. . In addition, when a part "includes" a certain component, it means that it may further include other components without excluding other components unless otherwise stated.

이하 첨부된 도면을 참고하여 본 발명의 실시예를 상세히 설명하기로 한다. Hereinafter, embodiments of the present invention will be described in detail with reference to the accompanying drawings.

도1은 본 발명의 일 실시예에 따른 라벨링 관리시스템을 도시한 도면이다. 1 is a diagram showing a labeling management system according to an embodiment of the present invention.

도1을 참조하면, 본발명의 일 실시예에 따른 객체 관리 시스템은 통신망을 통해 서로 통신 가능한 사용자단말(100), 객체 관리 서버(200) 및 데이터 관리 서버(300)를 포함할 수 있다. Referring to Figure 1, the object management system according to an embodiment of the present invention may include a user terminal 100, an object management server 200 and a data management server 300 capable of communicating with each other through a communication network.

먼저, 통신망은 유선 및 무선 등과 같은 그 통신 양태를 가리지 않고 구성될 수 있다. 근거리통신망(LAN: Local Area Network), 도시권 통신망(MAN: Metropolitan Area Network), 광역 통신망(WAN: Wide Area Network) 등 다양한 통신망으로 구성될 수 있다. First, a communication network can be configured regardless of its communication mode, such as wired and wireless. It may be composed of various communication networks such as a local area network (LAN), a metropolitan area network (MAN), and a wide area network (WAN).

사용자 단말(100)은 휴대폰, 스마트폰, PDA(Personal Digital Assistant), PMP(Po rtable Multimedia Player), 태블릿 PC 등과 같이 무선 통신망을 통하여 외부 서버와 연결될 수 있는 모든 종류의 핸드헬드(Handheld) 기반의 무선 통신 장치를 포함할 수 있으며, 이 외에도 데스크탑 PC, 랩탑 PC와 같이, 네트워크를 통하여 외부 서버와 연결될 수 있는 통신 장치도 포함할 수 있다. The user terminal 100 is based on all kinds of handhelds that can be connected to an external server through a wireless communication network, such as a mobile phone, a smart phone, a personal digital assistant (PDA), a portable multimedia player (PMP), and a tablet PC. It may include a wireless communication device, and may also include a communication device that can be connected to an external server through a network, such as a desktop PC or a laptop PC.

사용자 단말(100)은 정보를 표시하기 위한 디스플레이 장치, 정보를 입력하기 위한 인터페이스 장치 등을 포함할 수 있다. The user terminal 100 may include a display device for displaying information, an interface device for inputting information, and the like.

사용자 단말(100)은 라벨이 설정되지 않은 데이터에 대한 태그 정보가 요청되면, 태그 정보의 요청 메시지를 화면에 표시할 수 있으며, 인터페이스 장치를 통해 태그 정보가 입력되면, 입력된 태그 정보를 객체 관리 서버(200)로 전송하여, 태그 정보를 통해 객체의 라벨이 학습되도록 지원할 수 있다. The user terminal 100 may display a tag information request message on the screen when tag information on data for which no label is set is requested, and when tag information is input through the interface device, the input tag information is managed as an object. By transmitting to the server 200, the object label may be supported to be learned through tag information.

객체 관리 서버(200)는 원본 데이터를 사용자가 지정한 기준에 따라 분류할 수 있으며, 원본 데이터의 특성을 인식해 분류할 수 있다.The object management server 200 may classify the original data according to criteria specified by the user, and may recognize and classify the original data according to characteristics.

객체 관리 서버(200)는 데이터 특성 분류 시, 특성별로 데이터 클러스터를 생성해 관리할 수 있다.When classifying data characteristics, the object management server 200 may generate and manage data clusters for each characteristic.

객체 관리 서버(200)는 클러스터 내 특정 이미지 또는 텍스트 데이터 내에서 객체를 인식할 수 있으며, 데이터에서 인식된 객체를 추출할 수 있다.The object management server 200 may recognize an object within a specific image or text data within a cluster, and may extract the recognized object from the data.

객체 관리 서버(200)는 객체 추출 시, 이미지에서 인식된 객체의 라벨이 설정되었는지 여부를 확인할 수 있으며, 라벨이 설정된 객체와 라벨이 설정되지 않은 객체를 분류하여 추출할 수 있다. When extracting an object, the object management server 200 may check whether a label of an object recognized in an image is set, and may classify and extract a labeled object and an unlabeled object.

객체 관리 서버(200)는 이미지에서 라벨이 설정되지 않은 객체가 인식되면, 라벨이 설정되지 않은 객체를 추출한 후, 추출된 객체에 대한 라벨을 설정할 수 있다. When an object whose label is not set is recognized in the image, the object management server 200 may extract the object whose label is not set, and then set a label for the extracted object.

객체 관리 서버(200)는 객체의 라벨 설정 시, 데이터 클러스터별 데이터와 객체 수를 확인 할 수 있다. When setting labels for objects, the object management server 200 may check data and the number of objects for each data cluster.

객체 관리 서버(200)는 학습성능 영향도를 판단하여 데이터 클러스터별 라벨이 설정되지 않은 객체에 대한 라벨링 추가 및 중단을 사용자 단말(100)에 요청할 수 있다. The object management server 200 may request the user terminal 100 to add or stop labeling for an object for which a label for each data cluster has not been set by determining the degree of influence on learning performance.

데이터 관리 서버(300)는 다수의 이미지 정보, 이미지에 설정된 메타 정보, 이미지에서 추출된 객체 정보, 객체에 설정된 라벨 정보 등을 데이터베이스에 저장하여 관리할 수 있다. The data management server 300 may store and manage a plurality of image information, meta information set in images, object information extracted from images, label information set in objects, and the like in a database.

데이터 관리 서버(300)는 빅데이터 플랫폼 형식으로 구현되어, 이미지 정보, 객체 정보, 라벨 정보 등 다양한 정보를 객체 관리 서버(200)로 제공할 수 있다. The data management server 300 may be implemented in a big data platform format and provide various information such as image information, object information, and label information to the object management server 200 .

본 발명의 일 실시예에 따르면, 객체 관리 서버(200)와 데이터 관리 서버(300)는 각각의 별도 서버로 구현될 수 있지만, 하나의 통합 서버에서 객체 관리 서버(200)의 기능과 데이터 관리 서버(300)의 기능을 모두 수행할 수도 있다. According to an embodiment of the present invention, the object management server 200 and the data management server 300 may be implemented as separate servers, but the functions of the object management server 200 and the data management server in one integrated server. All of the functions of 300 may be performed.

도 2는 본 발명의 일 실시예에 따른 객체 관리 서버(200)의 구성을 도시한 블록도이다. 2 is a block diagram showing the configuration of an object management server 200 according to an embodiment of the present invention.

도 2를 참조하면, 객체 관리 서버(200)는 데이터 관리부(210), 객체 분석부(220), 라벨 설정부(230) 및 라벨 분석부(240)를 포함할 수 있다. Referring to FIG. 2 , the object management server 200 may include a data management unit 210 , an object analysis unit 220 , a label setting unit 230 and a label analysis unit 240 .

먼저, 데이터 관리부(210)는 사용자 단말(100), 데이터 관리 서버(300), 별도의 데이터베이스 등 다양한 방식을 통해 입력되는 데이터 정보를 획득할 수 있다. First, the data management unit 210 may acquire data information input through various methods such as the user terminal 100, the data management server 300, and a separate database.

데이터 관리부(210)는 데이터를 사용자가 미리 지정한 기준에 따라 분류할 수 있는데, 예를 들어, 국내외 출처, 사진/일러스트 등의 이미지 타입, 주간/야간 등의 분류 기준에 따라 제1 이미지 클러스터, 제2 이미지 클러스터 및 제3 이미지 클러스터 등 클러스터를 구분할 수 있으며, 사용자가 기준 변경 시 클러스터는 새롭게 구성될 수 있다. The data management unit 210 may classify the data according to a criterion designated by the user in advance. For example, a first image cluster, a second image cluster, a first image cluster, a second criterion, an image type such as a domestic and foreign source, a photo/illustration, and a daytime/night time. Clusters such as a 2-image cluster and a 3-image cluster can be distinguished, and when the user changes the criterion, the cluster can be newly configured.

객체 분석부(220)는 데이터 관리부(210)에서 획득된 이미지 정보를 분석하여, 이미지 내에서 객체를 인식할 수 있다. The object analyzer 220 may analyze the image information acquired by the data management unit 210 to recognize an object in the image.

예를 들어, 객체 분석부(220)는 사이즈, 해상도, 유형 등을 분석하여, 이미지에 객체가 포함되어 있는지 여부를 확인하는 과정을 통해, 이미지에 포함된 객체를 인식할 수 있다.For example, the object analyzer 220 may recognize an object included in the image through a process of determining whether the object is included in the image by analyzing the size, resolution, type, and the like.

객체 분석부(220)는 이미지에서 인식된 객체를 추출할 수 있으며, 예를 들어, RCNN(Regions with Convolutional Neural Network), Yolo V3 등 다양한 알고리즘을 통해 이미지에서 객체를 인식하고 추출할 수 있다. The object analyzer 220 may extract the recognized object from the image, and may recognize and extract the object from the image through various algorithms such as RCNN (Regions with Convolutional Neural Network) and Yolo V3.

객체 분석부(220)는 이미지에서 인식된 객체의 라벨이 설정되었는지 여부를 확인할 수 있으며, 라벨이 기 설정 된 객체와 라벨이 설정되지 않은 객체를 분류하여 추출할 수 있다. The object analyzer 220 may check whether a label of an object recognized in the image is set, and may classify and extract an object with a preset label and an object without a label.

예를 들어, 제1이미지에 라벨이 설정되지 않은 상태의 제1객체와 라벨이 기 설정되어 있는 상태의 제2객체가 포함되어 있는 경우, 객체 분석부(220)는 제1 이미지에서 제1 객체 및 제2 객체를 인식할 수 있으며, 제1 객체 및 제2 객체에 대한 라벨 설정 여부를 확인하여, 라벨이 설정되지 않은 제1 객체와 라벨이 기 설정된 제2 객체를 분류하여 추출할 수 있다. For example, when the first image includes a first object with no label set and a second object with a preset label, the object analyzer 220 determines the first object in the first image. and recognizes the second object, checks whether labels are set for the first object and the second object, and classifies and extracts the first object for which no label is set and the second object for which a label is previously set.

즉, 객체 분석부(220)는 제1 이미지에서 라벨이 설정되지 않은 제1 객체가 인식되면, 제1 객체의 라벨이 설정되었는지 여부를 확인할 수 있으며, 제1 객체의 라벨이 설정되어 있지 않은 것으로 확인되면, 제1 이미지에서 인식된 제1 객체를 라벨이 설정되지 않은 객체로 분류하여 추출할 수 있다.That is, when a first object whose label is not set is recognized in the first image, the object analyzer 220 may check whether the label of the first object is set, and determine that the label of the first object is not set. If confirmed, the first object recognized in the first image may be classified and extracted as an object to which no label is set.

또한, 객체 분석부(220)는 제1 이미지에서 라벨이 기 설정된 제2 객체가 인식되면, 제2 객체의 라벨이 설정되었는지 여부를 확인할 수 있으며, 제2 객체의 라벨이 설정되어 있는 것으로 확인되면, 제1 이미지에서 인식된 제2 객체를 라벨이 설정된 객체로 분류하여 추출할 수 있다.In addition, when a second object whose label is previously set is recognized in the first image, the object analyzer 220 may check whether the label of the second object is set, and if it is confirmed that the label of the second object is set, , the second object recognized in the first image may be classified and extracted as a labeled object.

라벨 설정부(230)는 클러스터별 객체의 수와 라벨링 비율을 확인하고 클러스터별 객체를 선택해 라벨을 설정할 수 있다. The label setting unit 230 may check the number of objects for each cluster and the labeling ratio, select an object for each cluster, and set a label.

라벨 설정부(230)는 라벨 설정 시, 라벨 분석부(240)로부터 학습성능 분석정보와 라벨링 대상 객체 추천정보를 획득할 수 있으며, 학습성능 분석정보를 통해 클러스터별 객체의 라벨링 필요양을 제시하고 라벨링 대상 객체 추천정보를 통해 빈도가 낮아 누락될 수 있는 라벨이 설정되지 않은 객체에 대한 라벨링을 요청할 수 있다. When setting labels, the label setting unit 230 may obtain learning performance analysis information and labeling target object recommendation information from the label analysis unit 240, and present the required amount of labeling for each cluster object through the learning performance analysis information. It is possible to request labeling of an object for which a label that may be omitted due to low frequency is not set through the labeling target object recommendation information.

구체적으로, 라벨 설정부(230)는 추가 라벨링 데이터가 필요한 클러스터의 라벨이 설정되지 않은 제1 객체에 대한 태그 정보를 요청할 수 있는데, 예를 들어, 태그 정보의 요청을 사용자 단말(100)로 전송할 수 있다. Specifically, the label setting unit 230 may request tag information for a first object for which a label of a cluster requiring additional labeling data is not set. For example, the request for tag information may be transmitted to the user terminal 100. can

라벨 분석부(240)는 이미지에서 인식된 객체가 라벨이 설정되어있는 것으로 확인되면, 객체에 설정된 라벨 정보를 이용하여 미리 사용자가 지정한 학습모델을 수행하고 학습성능 결과를 분석할 수 있다. When it is determined that a label is set for an object recognized in the image, the label analyzer 240 may perform a learning model specified by a user in advance using label information set for the object and analyze a learning performance result.

예를 들어, 라벨 분석부(240)는 지금까지 라벨이 설정된 객체가 포함된 데이터를 통해 머신러닝 학습을 수행하고, 학습성능을 기록할 수 있다.For example, the label analyzer 240 may perform machine learning learning through data including objects for which labels have been set so far, and record learning performance.

구체적으로, 라벨 분석부(240)는 지속적으로 라벨링 데이터가 일정량 추가될 때마다 머신런닝 학습을 반복 수행하고, 반복 수행 시마다 사용된 데이터의 클러스터를 확인하여 클러스터별 학습에 사용된 라벨링 데이터 양에 따른 학습성능 영향도를 분석하여 클러스터별로 학습성능 향상에 필요한 라벨링 데이터 양을 제시할 수 있다.Specifically, the label analysis unit 240 repeatedly performs machine learning learning whenever a certain amount of labeling data is continuously added, and identifies clusters of data used each time it is repeated, and determines the amount of labeling data used for learning for each cluster. By analyzing the influence of learning performance, it is possible to present the amount of labeling data required to improve learning performance for each cluster.

클러스터별로 필요한 라벨링 양을 상이하게 제시하고, 이미 성능 향상에 큰 영향이 없는 클러스터에 대해서는 추가 라벨링 작업 중단을 제안할 수 있다.Different amounts of labeling may be presented for each cluster, and additional labeling may be stopped for clusters that do not have a significant effect on performance improvement.

[실시예1][Example 1]

본 발명의 실효성을 위하여 도 3은 데이터의 라벨링을 처리하는 과정을 도시한 도식이다. 도 3은 원본데이터를 입력받아(S301) 사용자가 설정한 분류 기준(S302)에 따라 데이터를 자동 클러스터링한 후(S303) 사용자에게 데이터 클러스터별 정보를 시각화해 제공한다(S304). 사용자가 클러스터별 데이터를 선택하여 라벨 설정을 수행(S305)하면 시스템은 자동으로 사용자가 라벨 설정한 데이터에 기반한 머신러닝 모델의 기본적인 성능을 평가(S308)해 미리 설정된 학습목표 값에 따라 사용자에게 추가 라벨링 필요 및 충분 메시지를 전달한다(S309). 또한 동시에 시스템은 라벨 설정된 데이터와 그렇지 않은 데이터를 비교하여 그 성향이 다른 것을 골라줌으로써(S306) 빈도가 낮은 데이터를 다음 라벨링 대상으로 추천하여 라벨링이 편중되는 것을 방지할 수 있도록 라벨링 대상 데이터를 추천해 제시한다(S307). For effectiveness of the present invention, FIG. 3 is a diagram showing a process of labeling data. 3 receives original data (S301), automatically clusters the data according to the classification criteria set by the user (S302) (S303), and then visualizes and provides information for each data cluster to the user (S304). When the user selects data for each cluster and performs label setting (S305), the system automatically evaluates the basic performance of the machine learning model based on the data set by the user (S308) and adds additional values to the user according to the preset learning goal value. A labeling necessary and sufficient message is delivered (S309). At the same time, the system compares labeled data with unlabeled data and selects data with different inclinations (S306), recommending less frequent data as the next labeling target, and recommends labeling target data to prevent bias in labeling. presented (S307).

사용자는 데이터 대시보드(S304) 통해 실시간 라벨링 진행상황을 확인하며 클러스터별 효과적으로 데이터를 선택해 라벨링을 진행하고 시스템이 제시하는 현재 학습성능을 확인해 라벨링 추가 진행 여부를 결정할 수 있다(S305).The user checks the real-time labeling progress through the data dashboard (S304), effectively selects data for each cluster to proceed with labeling, and checks the current learning performance presented by the system to determine whether to proceed with labeling (S305).

이하에서는 상술한 본 발명의 일 실시예에 따른 객체 관리 서버(200)의 동작 과정에 대한 전체적인 과정을 구체적인 예를 들면서 첨언하기로 한다.Hereinafter, the overall process of the operation process of the object management server 200 according to an embodiment of the present invention described above will be added with specific examples.

객체 관리 서버(200)는 원본 데이터에 대해서 최초 사용자가 지정한 기준에 따라 패턴을 구분하여 패턴별 데이터를 클러스터링(일종의 분류에 해당함)하는데, 예를 들어 사용자가 주간/야간 사진 여부에 대해 미리 선택한 경우, 객체 관리 서버(200)는 사용자의 선택에 따라 원본 데이터들의 일부는 주간 사진이고, 다른 일부는 야간 사진이라고 판단할 수 있다.The object management server 200 divides the original data into patterns according to a criterion specified by the first user and clusters the data for each pattern (corresponding to a kind of classification). , The object management server 200 may determine that some of the original data are day photos and other portions are night photos according to the user's selection.

또는 객체 관리 서버(200)는 기 설정된 자체적인 알고리즘에 따라 원본 데이터를 패턴별로 클러스터링할 수 있는데, 예를 들어 원본 데이터가 모두 사진 데이터인 경우 사진 데이터에 포함된 픽셀들의 명암 정도를 분석하여 주간 사진인지 야간 사진인지 판단할 수 있고, 그 판단에 따라 원본 데이터를 주간 사진과 야간 사진으로 클러스트링 할 수 있는 것이다.Alternatively, the object management server 200 may cluster the original data for each pattern according to its own preset algorithm. For example, if the original data is all photo data, the intensity of pixels included in the photo data is analyzed to determine the weekly photo. It is possible to determine whether it is a photo or a night photo, and according to the judgment, the original data can be clustered into a day photo and a night photo.

또한, 객체 관리 서버(200)는 클러스터별로 라벨이 설정되지 않은 데이터를 인식하여 사용자에게(즉, 사용자 단말기(100)에게) 1차 라벨링을 요청할 수도 있다.In addition, the object management server 200 may request primary labeling from the user (ie, the user terminal 100) by recognizing data for which no label is set for each cluster.

예를 들어 앞서 클러스터링된 사진들 중에서 사용자에 의해 미리 라벨링이 되지 않은 사진들에 대해서 객체 관리 서버(200)는 그 라벨 지정을 요청할 수 있는 것이다.For example, the object management server 200 may request label designation for photos that have not previously been labeled by the user among previously clustered photos.

다른 예로써, 객체 관리 서버(200)는 라벨링 과정을 스스로 처리할 수도 있다.As another example, the object management server 200 may process the labeling process itself.

예를 들어 객체 관리 서버(200)는 라벨링 되지 않은 원본 데이터와 유사도가 기 설정된 값 이상인 다른 라벨링된 원본 데이터를 검색한 후, 동일한 라벨링을 처리할 수도 있는 것이다.For example, the object management server 200 may process the same labeling after searching for other labeled original data whose similarity to the unlabeled original data is greater than or equal to a preset value.

이후, 객체 관리 서버(200)는 라벨링 된 데이터(원본 데이터)를 이용해 모델을 학습시키게 되는데, 그 학습성능 결과를 바탕으로 추가 라벨링 필요 여부를 판단하여 사용자에게 안내할 수도 있다.Thereafter, the object management server 200 learns the model using the labeled data (original data), and based on the learning performance result, it may determine whether additional labeling is necessary and guide the user.

예를 들어 기계 학습 결과 그 성능이 기 설정된 수준에 미달한 경우 추가적인 학습이 필요하므로 라벨링 된 데이터의 등록을 사용자에게 더 요구할 수 있는 것이다.For example, if the performance of the machine learning results is lower than the predetermined level, additional learning is required, so the user may be further requested to register the labeled data.

이처럼 원본 데이터들에 대한 추가 라벨링이 필요한 경우 기 라벨링된 데이터와 비교하여 라벨링이 이루어지지 않은 데이터 중 형식이 다른 데이터에 대해 라벨링을 요구할 수도 있다.In this way, when additional labeling of the original data is required, labeling may be requested for data having a different format among unlabeled data compared to previously labeled data.

예를 들어 객체 관리 서버(200)는 동일한 클러스터링 내에서의 추가적인 학습이 필요함에 따라 추가 라벨링이 필요한 경우, 기존에 학습이 이루어진 라벨링된 데이터들에 대한 데이터 형식을 판단하여, 이와 다른 데이터 형식을 가지는 데이터들에 대한 라벨링이 추가되도록 사용자에게 요청 또는 자체적인 처리를 수행할 수 있는 것이다.For example, if additional labeling is required as additional learning within the same clustering is required, the object management server 200 determines the data format of the previously learned labeled data and has a different data format. It is possible to perform a request to the user or self-process so that the labeling for the data is added.

여기서 데이터 형식의 판단은 객체 관리 서버(200)가 원본 데이터(예를 들어 사진 데이터)에 저장된 정보(예를 들어 픽셀 정보 등)를 분석하여 유사도 여부를 판단할 수 있는데, 이러한 데이터 형식의 판단 그 자체는 공지된 기술에 해당하므로 보다 상세한 설명은 생략한다.Here, the determination of the data format may be performed by analyzing the information (eg, pixel information, etc.) stored in the original data (eg, photo data) by the object management server 200 to determine the degree of similarity. Since it corresponds to a known technology, a detailed description thereof will be omitted.

이러한 과정은 각 클러스터별로 이루어질 수 있다.This process may be performed for each cluster.

즉, 객체 관리 서버(200)는 복수 개의 클러스터로 데이터가 구분되어 있는 경우, 각 클러스터별로 상술한 라벨 설정 요청 등의 처리를 수행할 수 있는 것이다.That is, when data is divided into a plurality of clusters, the object management server 200 can perform processing such as the above-described label setting request for each cluster.

이를 위해 기계 학습이 이루어진 경우, 객체 관리 서버(200)는 클러스터별 데이터 분포와 라벨링 비율 정보를 사용자에게 제공할 수 있다.When machine learning is performed for this purpose, the object management server 200 may provide data distribution and labeling ratio information for each cluster to the user.

즉, 객체 관리 서버(200)는 클러스터별 라벨링 대상 데이터(객체)의 전체 양과 라벨링 설정 여부 비율 및 동일 클러스터 내 객체빈도 정보를 함께 사용자에게 제공할 수도 있는 것이다.That is, the object management server 200 may provide the user with the total amount of labeling target data (objects) for each cluster, a labeling setting ratio, and object frequency information in the same cluster.

이에 따라 사용자는 어떠한 데이터를 추가적으로 라벨 설정할 것인지를 판단할 수 있게 되는 것이다.Accordingly, the user can determine which data to additionally label.

예를 들어 특정 클러스터에 대한 학습이 많이 이루어진 경우, 객체 관리 서버(200)는 다른 클러스터에 대한 추가 학습이 이루어지도록 하기 위해 사용자에게 다른 클러스터에 해당하는 데이터들에 대한 라벨링을 요청할 수 있는 것이다.For example, when a lot of learning has been done on a specific cluster, the object management server 200 can request labeling of data corresponding to other clusters from the user so that additional learning on other clusters can be performed.

또한, 객체 관리 서버(200)는 상술한 바와 같이 동일한 클러스터 내에서도 데이터 패턴을 비교하여 기계 학습이 별로 이루어지지 않은 데이터들에 대한 추가 라벨링이 이루어지도록 할 수 있는 것이다.In addition, as described above, the object management server 200 can compare data patterns even within the same cluster so that additional labeling is performed on data for which machine learning has not been performed.

이러한 과정은 기계 학습이 이루어지는 원본 데이터가 특정 클러스터 또는 특정 패턴에 쏠림 현상이 발생하지 않도록 하기 위함이다.This process is to prevent the original data for machine learning from being biased towards a specific cluster or specific pattern.

쏠림 현상이 발생한 데이터들을 대상으로만 기계 학습이 이루어지는 경우, 그 성능은 당연히 떨어질 수밖에 없다.When machine learning is performed only on data with bias, the performance is bound to decrease.

즉, 본 발명은 편중되지 않은 고른 학습 데이터의 선정이 가능하도록 하는 것이므로 이러한 과정들은 중요하다.That is, these processes are important because the present invention enables selection of non-biased and even learning data.

특히, 객체 관리 서버(200)는 소정의 원본 데이터에 대한 라벨링이 추가된 경우 기계 학습 여부를 판단할 수 있는데, 이때 라벨링된 원본 데이터가 어떤 클러스터에 해당하는지, 또는 추가로 라벨링된 원본 데이터가 기존에 많은 학습이 이루어진 패턴을 갖고 있는지에 대한 판단을 통해 기계 학습 여부를 결정할 수 있는 것이다.In particular, the object management server 200 can determine whether machine learning is performed when a label for predetermined original data is added. At this time, which cluster the labeled original data corresponds to, or whether the additionally labeled original data It is possible to determine whether or not to perform machine learning by judging whether there is a pattern in which a lot of learning has been done.

이러한 과정들은 결국 기계 학습을 위해 얼마나 많은 원본 데이터들에 대한 라벨링이 되어야 하는가를 풀기 위한 Online-Learning과 어떤 데이터를 라벨링 해야 하는가를 풀기 위한 Active Learning 개념을 구체화하여 원본 데이터의 클러스터링 분류와 실시간 라벨링 현황 모니터링, 이에 따른 학습성능 모니터링을 제공하고 작업자가 직관적으로 라벨링 대상을 선택하여 학습효과 반영을 확인하며 작업을 수행할 수 있도록 하는 것이다.These processes eventually materialize the concepts of Online-Learning to solve how much original data should be labeled for machine learning and Active Learning to solve what data should be labeled, thereby real-time labeling and clustering classification of original data. It provides monitoring, monitoring of learning performance accordingly, and allows workers to intuitively select the labeling target and perform work while checking the reflection of the learning effect.

한편, 상술한 각 실시예를 수행하는 과정은 소정의 기록 매체(예를 들어 컴퓨터로 판독 가능한)에 저장된 프로그램 또는 애플리케이션에 의해 이루어질 수 있음은 물론이다. 여기서 기록 매체는 RAM(Random Access Memory)과 같은 전자적 기록 매체, 하드 디스크와 같은 자기적 기록 매체, CD(Compact Disk)와 같은 광학적 기록 매체 등을 모두 포함한다.Meanwhile, it goes without saying that the process of performing each of the above-described embodiments may be performed by a program or application stored in a predetermined recording medium (for example, computer-readable). Here, the recording medium includes all of an electronic recording medium such as RAM (Random Access Memory), a magnetic recording medium such as a hard disk, an optical recording medium such as a CD (Compact Disk), and the like.

이때, 기록 매체에 저장된 프로그램은 컴퓨터나 스마트폰 등과 같은 하드웨어 상에서 실행되어 상술한 각 실시예를 수행할 수 있다. 특히, 상술한 본 발명에 따른 객체 관리 서버의 기능 블록 중 적어도 어느 하나는 이러한 프로그램 또는 애플리케이션에 의해 구현될 수 있다.At this time, the program stored in the recording medium may be executed on hardware such as a computer or smart phone to perform each of the above-described embodiments. In particular, at least one of the functional blocks of the object management server according to the present invention described above may be implemented by such a program or application.

또한, 본 발명은 상기한 특정 실시예에 한정되는 것이 아니라 본 발명의 요지를 벗어나지 않는 범위 내에서 여러 가지로 변형 및 수정하여 실시할 수 있는 것이다. 이러한 변형 및 수정이 첨부되는 특허청구범위에 속한다면 본 발명에 포함된다는 것은 자명할 것이다. In addition, the present invention is not limited to the specific embodiment described above, but can be implemented by various modifications and variations within the scope of the present invention. It will be apparent that such variations and modifications are included in the present invention provided they fall within the scope of the appended claims.

100 : 사용자 단말
200 : 객체 관리 서버
210 : 데이터 관리부
220 : 객체 분석부
230 : 라벨 설정부
240 : 라벨 분석부
300 : 데이터 관리 서버
401 : 제1 데이터 클러스터
402 : 제2 데이터 클러스터
403 : 제3 데이터 클러스터
404 : 제4 데이터 클러스터
405 : 제5 데이터 클러스터
406 : 제6 데이터 클러스터
501 : 라벨 설정된 데이터
502 : 라벨링 요청 데이터
503 : 라벨 설정되지 않은 데이터100: user terminal
200: object management server
210: data management unit
220: object analysis unit
230: label setting unit
240: label analysis unit
300: data management server
401: first data cluster
402: second data cluster
403: third data cluster
404: 4th data cluster
405: 5th data cluster
406: sixth data cluster
501: Labeled data
502: labeling request data
503: unlabeled data

Claims

객체 관리 서버가 수행하는 제어방법에 있어서,
(a) 원본 데이터에 대해서 최초 사용자가 지정한 기준에 따라 패턴을 구분하여 패턴별 데이터를 클러스터링하는 단계;
(b) 상기 클러스터별 라벨이 설정되지 않은 데이터를 인식하여 사용자에게 1차 라벨링을 요청하는 단계;
(c) 상기 사용자에 의해 입력된 내용을 참고하여 데이터 별 라벨을 설정하는 단계;
(d) 라벨링 된 데이터를 이용해 모델을 학습시키고 학습성능 결과를 바탕으로 추가 라벨링 필요 여부를 판단하여 사용자에게 안내하는 단계를 포함하고,
상기 (d) 단계에서는 추가 라벨링이 필요하다고 판단한 경우에는 기 라벨링되지 않은 데이터 중 기 라벨링된 데이터와 비교하여 데이터 형식이 다른 데이터를 선별하여 라벨링 요구를 하는 것을 특징으로 하는 객체 관리 서버의 제어방법.
In the control method performed by the object management server,
(a) clustering data for each pattern by classifying the original data according to a criterion specified by the first user;
(b) recognizing the data for which the label for each cluster is not set and requesting primary labeling from the user;
(c) setting a label for each data by referring to the contents input by the user;
(d) training a model using the labeled data, determining whether additional labeling is necessary based on the learning performance results, and informing the user;
In the step (d), when it is determined that additional labeling is necessary, the control method of the object management server, characterized in that by comparing the previously labeled data with the previously labeled data, selecting data having a different data format and requesting labeling.

삭제delete