KR20190082453A

KR20190082453A - Method, apparatus and computer program for analyzing new learning contents for machine learning modeling

Info

Publication number: KR20190082453A
Application number: KR1020180000026A
Authority: KR
Inventors: 신동민; 차영민; 허재위; 장영준
Original assignee: (주)뤼이드
Priority date: 2018-01-02
Filing date: 2018-01-02
Publication date: 2019-07-10
Also published as: KR102269606B1; KR102117908B1; KR20200084816A

Abstract

The present invention relates to a method for generating a modeling vector of new learning content, an apparatus thereof and a computer program thereof. According to the present invention, the method for generating a modeling vector of a new problem comprises the steps of: (a) vectorizing, for an arbitrary problem, one or more problem characteristic information indicating the characteristic of the problem; (b) combining the vectorized problem characteristic information to generate problem metadata; (c) training a data analysis framework by applying the one or more problem metadata to the data analysis framework; and (d) generating new problem metadata for the new problem through steps (a) and (b) and applying the new problem metadata to the learned data analysis framework to generate a modeling vector of the new problem. The problem characteristic information includes at least one of problem content, an image included in the problem, a sound source characteristic, a sound source length, a problem length, the number of combined problems, or unit information to which the problem belongs. According to the present invention, the characteristics of the problem can be modeled by the problem itself, thereby analyzing and utilizing the new problem without collecting the results of solving the new problem of a user.

Description

기계학습 모델링을 위한 신규 학습 콘텐츠 분석 방법, 장치 및 컴퓨터 프로그램{METHOD, APPARATUS AND COMPUTER PROGRAM FOR ANALYZING NEW LEARNING CONTENTS FOR MACHINE LEARNING MODELING}[0001] METHOD, APPARATUS AND COMPUTER PROGRAM FOR ANALYZING NEW LEARNING CONTENTS FOR MACHINE LEARNING MODELING [0002]

본 발명은 신규 학습 콘텐츠 분석 방법, 장치 및 컴퓨터 프로그램에 관한 것으로, 보다 구체적으로 풀이 결과 데이터가 없어 분석이 어려운 신규 문제의 경우 그 자체로 특징을 분석할 수 있도록 하는 기계학습 모델링을 위한 신규 학습 콘텐츠 분석 방법, 장치 및 컴퓨터 프로그램에 관한 것이다. The present invention relates to a new learning content analysis method, apparatus, and computer program, and more particularly, to a novel learning content analysis method, apparatus, and computer program for analyzing a characteristic of a new problem, An analysis method, an apparatus, and a computer program.

최근 들어, 문제 풀이 결과를 학습하여 문제의 특성 및 사용자의 특성을 파악하여 제공하거나, 이를 바탕으로 개인 맞춤형 콘텐츠가 제공하는 서비스가 증가하는 추세이다. In recent years, the service provided by personalized contents has been increasing on the basis of the knowledge of the problem characteristics and user characteristics by learning the results of the problem solving.

예를 들어, 일본등록특허 제4447411호(발명의 명칭: 학습자 습득 특성 분석 시스템, 방법 및 프로그램, 공개일: 2006.03.16.)의 경우, 각 학습자에 대해 득점이나 편차치, 소속 타입에 맞추어 자동 선택된 교재 등의 결과 정보를 제공하고, 각 학습자가 자신의 습득 경향과 각 학습자에게 가까운 소속 타입의 습득 경향이나 항목의 실수 특성을 알 수 있도록 한다. For example, in the case of Japanese Patent Registration No. 4447411 (entitled " Learner Acquisition Characteristic Analysis System, Method and Program, Release Date: Mar. 16, 2006), each learner is automatically selected in accordance with score, Textbooks, and so on, so that each learner can know his / her learning trends, learning tendencies of the types of affiliates that are close to each learner, and the characteristics of errors in items.

한국등록특허 제10-1773065호(발명의 명칭: 개인 맞춤형 교육 컨텐츠를 제공하는 방법, 장치 및 컴퓨터 프로그램)의 경우, 복수의 사용자에 대한 하나 이상의 문제 풀이 결과 데이터를 수집하여, 문제 사이의 유사도를 계산하고 계산 결과를 문제에 인덱싱한다. Korean Patent No. 10-1773065 (entitled " method, apparatus and computer program for providing personalized educational contents ") collects one or more problem solving result data for a plurality of users, Compute and index the results of the calculations to the problem.

즉, 종래의 머신러닝을 적용한 학습 데이터의 분석은 각 문제에 대한 사용자의 문제 풀이 결과를 바탕으로 한다. 따라서 신규 유입된 사용자나 문제의 경우, 해당 사용자나 문제에 대한 데이터가 축적되기 전에는 분석 결과를 제공할 수 없으며, 그 특성이 분석된 문제 풀(pool)에 넣어 사용하는 것도 불가능하다. 그러나 사용자의 문제 풀이 결과를 수집하는 데는 적지 않은 시간이 소요된다는 점에서, 신규 문제를 모델링할 수 있는 새로운 방법이 요구된다. That is, the analysis of the learning data using the conventional machine learning is based on the user's problem solving result for each problem. Therefore, in case of newly introduced users or problems, the analysis result can not be provided until the data of the user or the problem is accumulated, and it is impossible to put the characteristics into the analyzed problem pool. However, there is a need for new ways to model new problems in that it takes a considerable amount of time to collect the user's problem solving results.

본 발명은 문제 그 자체만으로도 문제가 어떤 특성을 가지고 있는지를 모델링 함으로써, 신규 문제에 대한 즉각적인 활용을 가능하게 하는 것을 일 목적으로 한다.The purpose of the present invention is to enable instant use of new problems by modeling what characteristics the problems have on their own.

본 발명은 문제의 내용을 분석함에 있어서 문제에 포함된 각 단어들이 컨텍스트 내에서 갖는 함의를 문제 모델링에 반영하는 것을 다른 목적으로 한다. Another object of the present invention is to reflect the implications of each word contained in the problem in the problem modeling in analyzing the content of the problem.

이러한 목적을 달성하기 위한 본 발명은 신규 문제의 모델링 벡터를 생성하는 방법에 있어서, 임의의 문제에 대하여, 상기 문제의 특성을 나타내는 하나 이상의 문제 특성 정보를 각각 벡터화하는 a단계, 벡터화된 문제 특성 정보를 결합하여 문제 메타데이터를 생성하는 b단계, 하나 이상의 문제 메타데이터를 데이터 분석 프레임워크에 적용하여 상기 데이터 분석 프레임워크를 학습시키는 c단계, 상기 a 내지 b 단계를 통해 상기 신규 문제에 대한 신규문제 메타데이터를 생성하고, 상기 신규문제 메타데이터를 상기 학습된 데이터 분석 프레임워크에 적용하여 상기 신규 문제의 모델링 벡터를 생성하는 d 단계를 포함하며, 상기 문제 특성 정보는 문제 내용, 문제에 포함된 이미지, 음원 특징, 음원 길이, 문제 길이, 결합 문제 수 또는 문제가 속한 단원 정보 중 적어도 하나를 포함하는 것을 일 특징으로 한다. In order to achieve the above object, the present invention provides a method for generating a modeling vector of a new problem, comprising: a step of vectorizing one or more problem characteristic information indicating a characteristic of the problem, A step c) of applying the at least one problem meta data to a data analysis framework to learn the data analysis framework, and a step c) of learning a new problem And generating a modeling vector of the new problem by applying the new problem metadata to the learned data analysis framework, wherein the problem characteristic information includes at least one of a problem content, an image included in the problem , Sound source characteristics, sound source length, problem length, number of combinatorial problems, or the unit And in that it comprises at least one of a beam with an aspect.

전술한 바와 같은 본 발명에 의하면, 문제 그 자체만으로도 문제가 어떤 특성을 가지고 있는지를 모델링할 수 있어, 사용자의 신규 문제에 대한 풀이 결과를 수집하지 않고도 신규 문제에 대한 분석 및 활용이 가능하다. According to the present invention as described above, the problem itself can be modeled as to what characteristics the problem has, and it is possible to analyze and utilize new problems without collecting the results of the user's solution to the new problems.

또한, 본 발명에 의하면 문제의 내용을 분석함에 있어서 문제에 포함된 각 단어들이 컨텍스트 내에서 갖는 함의를 문제 모델링에 반영할 수 있다. In addition, according to the present invention, in analyzing the content of a problem, implications of each word included in the problem in the context can be reflected in the problem modeling.

도 1은 본 발명의 일 실시 예에 따른 신규 문제 모델링 벡터 생성 방법을 도시한 순서도,
도 2는 본 발명의 일 실시 예에 따른 문제 내용의 벡터화 방법을 설명하기 위한 도면,
도 3은 본 발명의 일 실시 예에 따른 풀이 결과 데이터를 이용한 프레임워크 학습 방법을 설명하기 위한 도면,
도 4는 본 발명의 일 실시 예에 따른 프레임워크 학습 방법을 설명하기 위한 도면이다. 1 is a flowchart showing a method of generating a new problem modeling vector according to an embodiment of the present invention;
2 is a diagram for explaining a vectorization method of a problem content according to an embodiment of the present invention;
3 is a diagram for explaining a framework learning method using pool result data according to an embodiment of the present invention;
4 is a diagram for explaining a framework learning method according to an embodiment of the present invention.

전술한 목적, 특징 및 장점은 첨부된 도면을 참조하여 상세하게 후술되며, 이에 따라 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자가 본 발명의 기술적 사상을 용이하게 실시할 수 있을 것이다. 본 발명을 설명함에 있어서 본 발명과 관련된 공지 기술에 대한 구체적인 설명이 본 발명의 요지를 불필요하게 흐릴 수 있다고 판단되는 경우에는 상세한 설명을 생략한다. 이하, 첨부된 도면을 참조하여 본 발명에 따른 바람직한 실시 예를 상세히 설명하기로 한다. 도면에서 동일한 참조부호는 동일 또는 유사한 구성요소를 가리키는 것으로 사용되며, 명세서 및 특허청구의 범위에 기재된 모든 조합은 임의의 방식으로 조합될 수 있다. 그리고 다른 식으로 규정하지 않는 한, 단수에 대한 언급은 하나 이상을 포함할 수 있고, 단수 표현에 대한 언급은 또한 복수 표현을 포함할 수 있음이 이해되어야 한다. The above and other objects, features, and advantages of the present invention will become more apparent by describing in detail exemplary embodiments thereof with reference to the attached drawings, which are not intended to limit the scope of the present invention. In the following description, well-known functions or constructions are not described in detail since they would obscure the invention in unnecessary detail. Hereinafter, preferred embodiments of the present invention will be described in detail with reference to the accompanying drawings. In the drawings, the same reference numerals are used to designate the same or similar components, and all combinations described in the specification and claims can be combined in any manner. It is to be understood that, unless the context requires otherwise, references to singular forms may include more than one, and references to singular forms may also include plural forms.

종래에는 컨텐츠와 사용자를 분석하기 위해 해당 과목의 개념들을 전문가에 의해 수작업으로 정의하고 해당 과목에 대한 각 문제가 어떤 개념을 포함하고 있는지 전문가가 개별적으로 판단하여 태깅하는 방식을 따랐다. 이후 각 사용자가 특정 개념에 대해 태깅된 문제들을 풀어본 결과 정보를 토대로 학습자의 실력을 분석하는 것이다. 그러나 이와 같은 방법은 태그 정보가 사람의 주관에 따라 정해지는 문제점이 있었다. 사람마다 난이도를 평가하는 기준이 상이하기 때문에 결과 데이터에 대한 신뢰도가 높지 않았다. Conventionally, to analyze contents and users, the concepts of the subject are manually defined by the experts, and the experts individually judge the concept of each subject of the subject and followed the method of tagging. Then, each user analyzes tagged problems about a specific concept and analyze the learner's ability based on the information. However, this method has a problem that the tag information is determined according to the subject of the person. The reliability of the result data was not high because the criterion for evaluating the difficulty was different for each person.

최근 들어 인공신경망을 이용한 기계학습이 여러 분야에 활용되면서, 머신 러닝 프레임워크를 이용하여 데이터 처리 과정에 사람의 개입을 배제하려는 노력이 확산되고 있다. 즉, 사용자 문제 풀이 결과 로그를 수집하고, 사용자와 문제로 구성된 다차원 공간을 구성하고, 사용자가 문제를 맞았는지 틀렸는지를 기준으로 다차원 공간에 값을 부여하여 각각의 사용자 및 문제에 대한 벡터를 계산하는 방식으로 사용자 및/또는 문제를 모델링하는 것이다. In recent years, machine learning using artificial neural networks has been used in various fields, and efforts to exclude human intervention in the data processing process using a machine learning framework are spreading. That is, the user problem solving result log is collected, the multidimensional space constituted by the user and the problem is constructed, and a value is given to the multidimensional space based on whether or not the user has a problem, And / or < / RTI >

이렇게 사용자 또는 문제에 대한 특성에 대응되는 다차원 공간상의 한 점을 부여하는 것을 임베딩(embedding)한다 또는 벡터화한다고 명명할 수 있다. 또한, 이렇게 임의의 객체에 대응하여 부여된 다차원 공간상의 한 점을 모델링 벡터, 벡터화된 객체, 객체 벡터라고 부를 수 있다. 즉, 임베딩하고자 하는 객체가 문제라면 문제 특성(feature)를 나타내는 다차원 공간상의 한 점은 문제 모델링 벡터, 벡터화된 문제, 문제 벡터로 명명 가능하며, 임베딩하고자 하는 객체가 사용자라면 해당 사용자의 특성을 나타내는 다차원 공간상의 한 점은 사용자 모델링 벡터, 벡터화된 사용자, 사용자 벡터로 명명할 수 있다. This can be termed embedding or vectorizing giving a point in the multidimensional space corresponding to the characteristic of the user or problem. Also, a point in the multidimensional space given in correspondence with an arbitrary object can be called a modeling vector, a vectorized object, and an object vector. That is, if the object to be embedded is a problem, one point in the multidimensional space representing a problem feature can be named as a problem modeling vector, a vectorized problem, and a problem vector. If the object to be embedded is a user, One point in the multidimensional space can be named user modeling vector, vectorized user, user vector.

이렇게 사용자 특성 또는 문제 특성을 벡터화하면, 전체 문제에서 특정 문제의 위치, 상기 문제와 유사한 그룹으로 클러스터링할 수 있는 다른 문제, 다른 문제와 특정 문제의 유사도 등을 수학적으로 계산하는 것이 가능해진다. 뿐만 아니라 벡터의 차원(dimension)에 대응하는 속성을 기준으로 사용자 또는 문제를 클러스터링하는 것도 가능하다. 본 발명에서는 사용자 벡터 또는 문제 벡터들이 어떤 특성 내지는 속성을 포함하고 있는지는 제한하여 해석될 수 없으나, 일 예로 문제 벡터의 경우 문제가 어떤 개념으로 구성되어 있는지(개념 구성도)를 포함할 수 있다. By vectorizing the user characteristics or problem characteristics, it becomes possible to mathematically calculate the location of a specific problem in the whole problem, other problems that can be clustered into groups similar to the problem, and similarity between different problems and specific problems. It is also possible to cluster users or problems based on attributes corresponding to the dimensions of the vector. In the present invention, it can not be construed that the user vector or problem vector includes certain characteristics or attributes. However, for example, the problem vector may include a concept (conceptual diagram) of the problem.

이하에서는 본 발명의 일 실시 예에 따라 객체의 특성을 나타낼 수 있는 모델링 벡터 생성 방법에 대해 자세히 살펴보기로 한다. 설명의 편의를 위하여 본 명세서는 모델링 벡터로 임베딩하고자 하는 객체가 신규 문제인 경우의 예를 중심으로 설명한다. 그러나 이는 일 실시 예에 불과하며, 본 발명은 사용자, 문제, 보기, 학습 콘텐츠 등 다양한 객체에 적용될 수 있음에 유의한다. Hereinafter, a method of generating a modeling vector capable of representing characteristics of an object according to an embodiment of the present invention will be described in detail. For convenience of description, the present specification focuses on an example in which an object to be embedded into a modeling vector is a new problem. It should be noted, however, that this is only an embodiment and that the present invention can be applied to various objects such as a user, a problem, a view, and a learning contents.

도 1은 본 발명의 일 실시 예에 따른 신규 문제의 모델링 벡터 생성 방법을 설명하기 위한 도면이다. 1 is a diagram for explaining a method for generating a modeling vector of a new problem according to an embodiment of the present invention.

도 1을 참조하면, 서버는 임의의 문제에 대하여, 문제의 특성을 나타내는 하나 이상의 문제 특성 정보를 각각 벡터화할 수 있다(S100). 여기서 문제 특성 정보는 문제 내용, 문제의 주제(분야), 문제에 포함된 이미지, 음원 특징(음원을 녹음한 화자의 국적, 억양, 주파수), 음원 길이, 문제 길이, 결합 문제(하나의 지문에 여러 개의 문제가 결합되어 있는 문제)의 수 또는 문제가 속한 단원(파트)의 종류, 언어의 종류(영어, 중국어, 일본어, 한국어 등), 문제의 종류(듣기, 읽기, 쓰기, 문법 등) 중 적어도 하나를 포함할 수 있다. Referring to FIG. 1, the server may vectorize one or more problem characteristic information indicating a characteristic of a problem, respectively, to an arbitrary problem (S100). Here, the problem characteristic information includes the content of the problem, the subject of the problem, the image included in the problem, the characteristics of the sound source (nationality, intonation, frequency of the speaker recording the sound source), sound source length, (English, Chinese, Japanese, Korean, etc.), kind of problem (listening, reading, writing, grammar, etc.) And may include at least one.

예를 들어, 음원의 특징을 벡터화 하고자 하는 경우, 기 설정된 구간 단위 별 평균 주파수 값을 각 차원에 대입시키는 방식으로 음원 특성을 임베딩할 수 있다. 음원의 길이, 파트의 종류 등 상기 문제 특성 정보 각각을 기 설정된 방식으로 임베딩할 수 있음은 물론이다. 벡터화된 문제 특성 정보는 서로 다른 차원을 가질 수 있다. 예를 들어, 음원 길이의 경우 다양한 특성을 포함하지는 않으므로, 저차원 벡터에 임베딩될 수 있다. 그러나 음원 특성 또는 문제 내용의 경우 어떠한 방식으로 벡터화하는지에 따라 차원이 크게 증가할 수 있다. For example, when a feature of a sound source is to be vectorized, a sound source characteristic may be embedded by assigning a predetermined average frequency value to each dimension. It is of course possible to embed each of the problem characteristic information such as the length of the sound source and the type of the part in a preset manner. The vectorized problem property information can have different dimensions. For example, sound source length does not include various characteristics and can be embedded in a low dimensional vector. However, the dimension can be greatly increased depending on the sound source characteristics or the vectorization in the case of the problem contents.

문제 내용을 벡터화하는 도 2의 실시 예를 살펴보자. Let's look at the embodiment of FIG. 2 to vectorize the problem content.

문제 내용에는 복수의 단어가 포함된다. 서버는 복수의 단어 각각에 임의의 n차원 벡터를 부여할 수 있다. 서버는 제1 단어에 부여된 제1 벡터를 인코딩 프레임워크(130)에 적용하여 저차원의 제2 벡터를 획득할 수 있다(S103). 서버는 제2 벡터를 디코딩 프레임워크(150)에 적용하여 n차원의 제3 벡터를 획득할 수 있는데(S105), 제3 벡터가 제1 단어로부터 기 설정된 거리 내에 포함되는 제2 단어에 임의로 부여된 제4 벡터에 상응하도록 인코딩 프레임워크(130) 및 디코딩 프레임워크(150)를 학습시킨다. The content of the problem includes a plurality of words. The server may assign an arbitrary n-dimensional vector to each of a plurality of words. The server may apply the first vector assigned to the first word to the encoding framework 130 to obtain a second vector of low dimension (S103). The server may apply the second vector to the decoding framework 150 to obtain a third vector of n dimensions (S105), and the third vector may be arbitrarily assigned to the second word included within the predetermined distance from the first word The encoding framework 130 and the decoding framework 150 are learned so as to correspond to the fourth vector.

여기서 인코딩 프레임워크(130)와 디코딩 프레임워크(150)는 심층 신경망(Deep Neural Network)구조를 가지며, 비선형 변환기법의 조합을 통해 벡터 값을 모델링한다. Here, the encoding framework 130 and the decoding framework 150 have a Deep Neural Network structure, and model vector values through a combination of nonlinear transformation techniques.

인코딩 프레임워크(130)와 디코딩 프레임워크(150)는 입력되는 값을 이용해 임의로 설정된 가중치 초기값이 최적의 가중치에 가까워지도록 가중치를 계속해서 갱신한다. 인코딩 프레임워크(130)는 벡터의 차원을 낮출 수 있으며, 디코딩 프레임워크(150)는 저차원으로 부호화된 벡터를 다시 원래의 차원으로 복구(reconstruction)시키는 것을 특징으로 한다. The encoding framework 130 and the decoding framework 150 continuously update the weights so that the initial values of the weights, which are arbitrarily set using the input values, are close to the optimal weights. The encoding framework 130 may lower the dimension of the vector and the decoding framework 150 may reconstruct the low-dimensional encoded vector back to the original dimension.

예를 들어, 문제 내용에 “I like an apple”이라는 문장이 포함되는 경우, 서버는 I=(1,0,0,0), like =(0,1,0,0), an=(0,0,1,0), apple=(0,0,0,1)의 4차원 벡터를 부여할 수 있다. 각 단어를 임베딩함에 있어서, 본 발명의 서버는 I가 문장 내에서 다른 단어들과 갖는 관계를 모델링 벡터에 반영하는 것을 일 목적으로 한다. For example, if the content of the problem contains the sentence "I like an apple", the server would have I = (1,0,0,0), like = (0,1,0,0), an = , 0,1,0), and apple = (0,0,0,1). In embedding each word, the server of the present invention aims to reflect the relation I has with other words in the sentence to the modeling vector.

인코딩 프레임워크(130)의 가중치 행렬을 w1, 디코딩 프레임워크(150)의 가중치 행렬을 w2라고 가정하자. I에 대응하는 제1 벡터(v1=(1,0,0,0))을 인코딩 프레임워크(130)에 입력하면, 인코딩 프레임워크(130)에서 출력되는 제2 벡터(v2)는 다음과 같이 나타낼 수 있다.Let w1 be the weighting matrix of the encoding framework 130 and w2 be the weighting matrix of the decoding framework 150. (V1 = (1,0,0,0)) corresponding to I is input to the encoding framework 130, the second vector v2 output from the encoding framework 130 is expressed as .

즉, 인코딩 프레임워크(130)에 4차원의 제1 벡터를 적용하면 2차원의 제2 벡터를 얻을 수 있다. 위 내용은 본 발명을 설명하기 위한 일 실시 예로, 실제로 문제에 사용되는 단어의 개수는 수십 만 개 이상이 될 수 있으며, 이러한 경우 인코딩 프레임워크(130)에 입력되는 제1 벡터의 차수는 수십만 차수이며, 인코딩 프레임워크(130)의 가중치 행렬을 어떻게 설정하는지에 따라 상기 인코딩 프레임워크(130)에서 출력되는 제2 벡터의 차수는 천분의 일, 만분의 일 수준의 저차원으로 변환될 수 있다. That is, if a first vector of four dimensions is applied to the encoding framework 130, a second vector of two dimensions can be obtained. In this case, the number of words of the first vector input to the encoding framework 130 may be several tens of thousands, , And the degree of the second vector output from the encoding framework 130 may be converted to a low level of one thousandth or one-tenth level depending on how the weighting matrix of the encoding framework 130 is set.

디코딩 프레임워크(150)의 가중치 행렬 w2의 초기값은 인코딩 프레임워크(130)의 가중치 행렬 w1의 전치 행렬일 수 있다. 즉, w1이 m x n 행렬이라면, w2 는 n x m 행렬일 수 있으며, 따라서 저차원(k차원)의 제2 벡터에 디코딩 프레임워크(150)를 적용하면 n차원의 제3 벡터가 출력될 수 있다. 즉, 제3 벡터는 다음과 같이 표시할 수 있다. The initial value of the weighting matrix w2 of the decoding framework 150 may be a transposed matrix of the weighting matrix w1 of the encoding framework 130. [ That is, if w1 is an mxn matrix, w2 may be an nxm matrix, and therefore, applying a decoding framework 150 to a second vector of a low dimension (k dimension), a third vector of n dimensions can be output. That is, the third vector can be expressed as follows.

본 발명의 서버는 디코딩 프레임워크(150)가 제3 행렬(v3)이 인접한 다른 단어에 상응하도록 디코딩 프레임워크(150)의 가중치(w2)를 갱신할 수 있다. 예를 들어, 제3벡터 v3=(v31,v32,v33,v34)가 like에 부여된 (0,1,0,0)가 되도록 w2를 갱신하고, v3이 an에 부여된 (0,0,1,0)가 되도록 w2를 갱신하고, v3이 apple에 대응되는 (0,0,0,1)에 상응하도록 w2를 갱신하는 방식으로 디코딩 프레임워크(150)를 학습시킬 수 있다. The server of the present invention may update the weight w2 of the decoding framework 150 so that the decoding framework 150 corresponds to the third word v3 adjacent to another word. For example, w2 is updated so that the third vector v3 = (v31, v32, v33, v34) is given to like in (0,1,0,0) 1,0), and the decoding framework 150 can be learned in such a manner that w2 is updated so that v3 corresponds to (0,0,0,1) corresponding to apple.

문제 내용을 벡터화하는 단계 100에서, 서버는 제1 단어의 벡터화를 위해 제1 단어로부터 기 설정된 거리 내에 포함되는 하나 이상의 제2 단어 각각에 대하여 상술한 단계 101 내지 105를 반복 수행할 수 있다. 예를 들어, 각 단어의 임베딩에 있어서 해당 단어 뒤쪽 방향의 3개 단어를 임베딩에 사용하는 경우를 가정하자. I like an apple라는 문장에서, I를 임베딩하기 위하여 서버는 I에 부여된 제1 벡터 (1,0,0,0)를 인코딩 프레임워크(130)와 디코딩 프레임워크(150)에 순차적으로 적용하여 제3 벡터를 획득한다. 서버는 제3 벡터가 like에 부여된 제2 벡터 (0,1,0,0)에 상응하도록 인코딩 프레임워크(130) 및 디코딩 프레임워크(150)를(이하, ‘프레임워크(100)’라 함) 학습시키고, 다음 단계에서는 제3 벡터가 an에 부여된 제2 벡터 (0,0,1,0)에 상응하도록 프레임워크(100)를 학습시키고, 다음 단계에서는 제3 벡터가 apple에 부여된 제2 벡터 (0,0,0,1)에 상응하도록(차이를 줄이는 방향으로) 프레임워크(100)를 학습시킬 수 있다. In step 100 of vectorizing the content of the problem, the server may repeat steps 101-105 described above for each of the one or more second words contained within a predetermined distance from the first word for vectorization of the first word. For example, suppose that in the embedding of each word, three words in the backward direction of the word are used for embedding. I like an apple, in order to embed I, the server sequentially applies the first vector (1, 0, 0, 0) assigned to I to the encoding framework 130 and the decoding framework 150 A third vector is obtained. The server may further include an encoding framework 130 and a decoding framework 150 (hereinafter referred to as 'framework 100') such that the third vector corresponds to a second vector (0,1,0,0) In the next step, the framework 100 is learned so that the third vector corresponds to the second vector (0, 0, 1, 0) given to an. In the next step, (In the direction of reducing the difference) to the second vector (0, 0, 0, 1).

그리고 서버는 이러한 방식으로 학습이 완료된 인코딩 프레임워크(130)에 제1 벡터를 입력하고, 인코딩 프레임워크(130)에서 제1 벡터에 맵핑되는 저차원의 제2 벡터를 단어 I의 벡터값, 즉 단어 I의 벡터값으로 설정할 수 있다. Then, the server inputs the first vector to the encoded encoding framework 130 in this manner, and the low-dimensional second vector mapped to the first vector in the encoding framework 130 is the vector value of the word I, Can be set to the vector value of the word I.

서버는 I의 벡터값을 획득한 방식으로 like, an, apple 각 단어의 벡터값을 얻어낼 수 있다. 이 때 최종적으로 얻어지는 각 단어의 벡터값은 제2 벡터에 해당하는 저차원 벡터일 수 있다. The server can obtain the vector values of like, an, and apple words in a way that the vector value of I is obtained. The vector value of each word finally obtained may be a low dimensional vector corresponding to the second vector.

본 발명의 서버는 프레임워크(100)의 출력값이 입력값과 동일해지도록 프레임워크(100)를 학습시키는 것이 아니라, 프레임워크(100)의 출력값이 인접한 단어들에 부여된 임의의 벡터값에 대응되도록 프레임워크(100)를 학습시킴으로써, 각 단어와 다른 단어들 간의 관계를 임베딩에 반영할 수 있도록 한다는 점에서 종래 방식과 차이가 있다. The server of the present invention does not have to learn the framework 100 so that the output value of the framework 100 becomes equal to the input value but the output value of the framework 100 corresponds to an arbitrary vector value assigned to adjacent words The framework 100 differs from the conventional method in that it is possible to reflect the relationship between words and different words in the embedding.

상술한 예시에서는 설명의 편의를 위하여 하나의 문장에 포함된 단어 임베딩 방법을 설명하였으나, 본 발명을 실제 학습 컨텐츠에 적용함에 있어서 서버는 문제 내용에 포함된 수백개의 단어 각각에 대한 임의의 벡터값에 부여하고 하나의 문단 내지는 문제 내용 전체에 포함된 단어들과의 관계를 모두 반영할 수 있도록 프레임워크(100)를 학습시킬 수 있다. 또한 이러한 과정을 통해 서버는 각 단어, 각 문장, 각 문단, 각 문제 내용을 하나의 저차원 벡터로 모델링할 수 있다. In the above example, the word embedding method included in one sentence has been described for convenience of description. However, in applying the present invention to actual learning contents, the server may include an arbitrary vector value for each of several hundred words And can learn the framework 100 so as to reflect all the relationships between the words included in one paragraph or the content of the problem. Through this process, the server can model each word, each sentence, each paragraph, and each question as a low-dimensional vector.

뿐만 아니라, 이미지 및 이미지를 설명하는 보기가 포함된 듣기 문제의 경우 서버는 이미지와 보기 간 관계를 위와 같은 방식으로 모델링할 수 있다. 이 경우 서버는 이미지에 포함된 이미지 특징 정보에 임의의 벡터를 부여하는 방식으로 초기값을 설정한 후, 보기 단어에 부여된 벡터와 이미지에 부여된 벡터가 서로 상응하도록 프레임워크(100)를 학습시키는 방식으로 이미지 또는 보기를 저차원 모델링벡터에 임베딩할 수 있다. In addition, for listening problems involving views that describe images and images, the server can model the relationship between images and views in this way. In this case, the server sets an initial value in such a manner that an arbitrary vector is given to the image feature information included in the image, and then the framework 100 learns such that the vector assigned to the view word and the vector assigned to the image correspond to each other To embed an image or view into a low dimensional modeling vector.

상술한 바와 같이 문제 내용을 벡터화하는 것 이외에도, 서버는 문제의 주제(분야)에 기 설정된 벡터값을 맵핑시키거나, 문제에 포함된 이미지의 픽셀값, 디스크립터 등을 이용하여 이미지를 벡터화할 수 있다. 또한 서버는 문제 길이 또는 구간을 벡터값에 대응시키는 방식으로 문제 길이를 벡터화할 수 있다. 언어의 종류, 문제의 종류, 단원(파트) 종류 역시 기 설정된 벡터값에 대응시킬 수 있다. In addition to vectorizing the problem content as described above, the server may map the predetermined vector values to the subject matter of the problem, or may vectorize the image using pixel values, descriptors, etc. of the image included in the problem . The server may also vectorize the problem length by mapping the problem length or interval to a vector value. The type of the language, the kind of the problem, and the type of the part (part) can also be matched to the predetermined vector value.

다시 도 1을 참조하면, 서버는 이와 같이 벡터화된 각종 문제 특성 정보를 결합하여 문제 메타데이터를 생성할 수 있다(S200). 문제 메타데이터는 벡터화된 문제 특성 정보를 이어붙인 형태일 수 있다. 예를 들어, 임의의 문제 A에 관하여, 벡터화된 문제 내용이 [0.3, 0.6, 0.3, 0.1]이고, 벡터화된 음원 특징이 [0.1, 0.2, 0.3], 벡터화된 문제 길이가 [0.2], 벡터화된 파트 정보가 [0.7]이면, 문제 A의 문제 메타데이터는 [0.3, 0.6, 0.3, 0.1, 0.1, 0.2, 0.3, 0.2, 0.7]와 같이 생성될 수 있다. Referring again to FIG. 1, the server may generate question metadata by combining various vectorized problem characteristic information (S200). The problem metadata may be in the form of an attached vectorized problem property information. For example, for any problem A, the vectorized problem content is [0.3, 0.6, 0.3, 0.1], the vectorized sound source feature is [0.1, 0.2, 0.3], the vectorized problem length is [ The problem metadata of problem A can be generated as [0.3, 0.6, 0.3, 0.1, 0.1, 0.2, 0.3, 0.2, 0.7].

다음으로 서버는 하나 이상의 문제 메타데이터를 데이터 분석 프레임워크에 적용하여 데이터 분석 프레임워크를 학습시킬 수 있다(S300). 단계 300에서의 학습은 풀이 결과를 보유하고 있는 문제를 사용하여 이루어질 수도 있고(도 3 설명 참조), 신규 문제만으로 이루어질 수도 있다(도 4 설명 참조). The server may then apply one or more problem metadata to the data analysis framework to learn the data analysis framework (S300). The learning in step 300 may be made using the problem in which the solution holds the result (see FIG. 3) or may be made only in the new problem (see FIG. 4).

도 3을 참조하면, 서버는 단계 300에서 문제 메타데이터를 데이터 분석 프레임워크에 적용한 제1 문제 모델링 벡터가 문제에 대한 사용자의 풀이 결과 데이터를 이용하여 기 생성된 문제의 제2 모델링 벡터에 상응하도록 데이터 분석 프레임워크를 학습시킬 수 있다. Referring to FIG. 3, in step 300, the server determines that the first problem modeling vector that applied the problem metadata to the data analysis framework corresponds to the second modeling vector of the problem generated using the result data of the user for the problem You can learn the data analysis framework.

예를 들어, 서버는 아무런 풀이 결과 데이터가 존재하지 않는 신규 문제의 모델링 벡터를 생성하기 위하여, 문제 데이터베이스를 구성하는 문제 중 풀이 결과 데이터를 보유하고 있는 문제를 이용하여 데이터분석 프레임워크(300)을 학습시킬 수 있다. 데이터분석 프레임워크(50)는 종래 사용되는 풀이 결과 데이터를 이용한 데이터 분석 프레임워크이며, 데이터분석 프레임워크(300)은 문제 메타데이터를 이용하여 문제를 모델링하기 위한 본 발명의 일 실시 예에 따른 프레임워크이다. For example, the server may employ a data analysis framework 300 using the problem of having pooled result data among the problems constituting the problem database, so as to generate a modeling vector of a new problem in which no pool of result data exists Can learn. The data analysis framework 50 is a data analysis framework that uses conventionally used result data and the data analysis framework 300 is a framework for modeling problems using problem meta data, Work.

서버는 우선적으로 사용자의 풀이 결과 데이터를 보유한 문제 A에 데이터분석 프레임워크(50)를 적용하여 문제 A의 풀이 결과 모델링 벡터(55)를 획득할 수 있다. 그리고 서버는 문제 A에 본 발명의 단계 100 내지 200를 적용하여 문제 A의 메타데이터를 획득할 수 있으며, 문제 A의 메타데이터에 데이터분석 프레임워크(300)을 적용하여 문제 A 메타데이터에 대응하는 메타 모델링 벡터(305)를 얻을 수 있다. The server can first apply the data analysis framework 50 to the problem A in which the user's pool has the result data, so that the solution of the problem A can obtain the result modeling vector 55. Then, the server can acquire the metadata of the problem A by applying the steps 100 to 200 of the present invention to the problem A and apply the data analysis framework 300 to the metadata of the problem A, The meta modeling vector 305 can be obtained.

즉, 서버는 종래 방식으로 문제 A의 풀이 결과 모델링 벡터(55)를 획득하고, 메타데이터를 이용하여 모델링 벡터를 생성하는 본 발명의 일 실시 예에 따라 문제 A의 메타 모델링 벡터(305)를 획득하여, 두 벡터 값의 차이를 줄이는 방향으로 데이터분석 프레임워크(300)을 학습시킬 수 있다. That is, the server obtains the meta-modeling vector 305 of problem A in accordance with an embodiment of the present invention in which the solver of problem A obtains the result modeling vector 55 in a conventional manner and uses the metadata to generate the modeling vector The data analysis framework 300 can be learned in such a direction as to reduce the difference between the two vector values.

학습이 완료된 후 서버는 신규 문제가 유입되면 신규 문제에 대하여 단계 100 내지 단계 200을 수행하여 신규문제 메타데이터를 생성하고, 신규문제 메타데이터를 단계 300에서 학습된 데이터 분석 프레임워크에 적용하여 신규 문제의 모델링 벡터를 생성할 수 있다(S400). 따라서 이후에 생성되는 신규 문제의 경우에는 사용자의 풀이 결과 데이터 없이도 문제 그 자체만으로 모델링 벡터를 생성할 수 있으며, 모델링 벡터가 생성되면 이를 바탕으로 유사 속성(feature)를 갖는 문제들을 클러스터링할 수 있다. 나아가 본 발명에 의하면 맞춤형 문제 제공, 사용자의 풀이 결과 데이터를 수집하기 위한 진단 문제 세트의 구성 등이 용이해진다. After the learning is completed, the server performs steps 100 through 200 for the new problem to generate new problem metadata, and applies the new problem metadata to the data analysis framework learned in step 300, (S400). &Lt; / RTI > Therefore, in the case of a new problem to be generated later, the user's solution can generate the modeling vector only by the problem itself without the result data, and when the modeling vector is generated, problems having similar features can be clustered based on the generated modeling vector. Further, according to the present invention, it is easy to provide a customized problem, a configuration of a diagnostic problem set for collecting result data of a user, and the like.

데이터분석 프레임워크(300)을 학습시키는 단계 300의 또 다른 실시 예를 도 4를 참고하여 설명한다. 도 4는 사용자 풀이 결과 데이터 없이 본 발명의 데이터분석 프레임워크(300)을 학습시킬 수 있는 방법의 일 실시 예이다. 따라서 도 4의 실시 예에 따르면, 서버는 기존 데이터를 전혀 이용하지 않고 신규 문제만으로도 신규 문제의 모델링 벡터를 생성할 수 있다. Another embodiment of step 300 of learning data analysis framework 300 is described with reference to FIG. Figure 4 is an embodiment of a method by which a user pool can learn the data analysis framework 300 of the present invention without resulting data. Therefore, according to the embodiment of FIG. 4, the server can generate the modeling vector of the new problem with only the new problem without using the existing data at all.

도 4를 참조하면, 서버는 단계 100 내지 단계 200을 거쳐 생성된 문제 메타 데이터를 인코딩 프레임워크에 적용하여 저차원의 제1 문제 벡터를 획득하고(S303), 제1 문제 벡터를 디코딩 프레임워크에 적용하여 k차원의 제2 문제 벡터를 획득할 수 있다(S355). 서버는 데이터분석 프레임워크(300)에서 출력된 제2 문제 벡터가 처음에 데이터분석 프레임워크(300)에 입력했던 문제 메타데이터에 상응하도록(차이가 줄어들도록) 인코딩 프레임워크(330) 및 디코딩 프레임워크(350)를 학습시킬 수 있다.Referring to FIG. 4, the server applies the problem metadata generated through steps 100 through 200 to the encoding framework to obtain a low-dimensional first problem vector (S303), and the first problem vector is applied to the decoding framework The second problem vector of k dimensions can be obtained (S355). The server determines whether the second problem vector output from the data analysis framework 300 corresponds to the problem metadata initially input to the data analysis framework 300 The work 350 can be learned.

학습이 진행될수록 인코딩 프레임워크(330) 및 디코딩 프레임워크(350)를 구성하는 심층 신경망의 각 레이어 가중치는 계속해서 갱신되며 서버는 신규 문제의 메타데이터에 학습이 완료된 인코딩 프레임워크(330)를 적용함으로써 저차원의 신규 문제 모델링 벡터를 도출할 수 있다(S400) As the learning progresses, each layer weight of the deep neural network constituting the encoding framework 330 and the decoding framework 350 is continuously updated, and the server applies the learned encoding framework 330 to the metadata of the new problem A new problem modeling vector of low dimension can be derived (S400)

저차원 벡터를 신규 문제 모델링 벡터로 사용하는 이유는, 낮은 차원의 벡터값을 이용하면 이후 모델링 벡터를 이용한 데이터 처리(클러스터링, 데이터 분류 등)에 소모되를 리소스를 줄일 수 있기 때문이다. 뿐만 아니라 모델링 벡터를 이용하여 인공신경망을 학습시키는 경우 벡터값의 차원이 낮으면 학습량도 줄어드는 장점이 있다. The reason why low-dimensional vectors are used as new problem modeling vectors is that low-dimensional vector values can be used to reduce resources that are then consumed in data processing (clustering, data classification, etc.) using later modeling vectors. In addition, when the artificial neural network is learned by using the modeling vector, the amount of learning decreases when the dimension of the vector value is low.

또한 본 발명의 일 실시 예에서는 각각의 학습 컨텐츠들을 벡터화된 메타데이터로 구성하고 이를 다시 저차원의 벡터에 임베딩함으로써 컨텐츠를 하나의 벡터에 대응시키는데, 전술한 과정을 통해 문제 각각의 특성(feature)을 나타내는 모델링 벡터가 더욱 강건해질 뿐 아니라, 메타데이터에 포함되어있는 노이즈도 제거되는 효과가 있다. In one embodiment of the present invention, contents are associated with one vector by constructing each learning contents into vectorized metadata and embedding them into a low-dimensional vector. In the above-described process, And the noise contained in the meta data is also removed.

전술한 신규 학습 콘텐츠의 모델링 벡터 생성 방법은 각 실시 예 중 어느 하나를 실행시키기 위하여 컴퓨터 판독 가능 매체에 저장된 신규 문제 모델링 벡터 생성 프로그램을 통해 서버 또는 단말에서 구현될 수 있다. The above-described modeling vector generation method of the new learning contents can be implemented in the server or the terminal through the new problem modeling vector generation program stored in the computer-readable medium in order to execute any one of the embodiments.

본 명세서에서 생략된 일부 실시 예는 그 실시 주체가 동일한 경우 동일하게 적용 가능하다. 또한, 전술한 본 발명은 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자에게 있어 본 발명의 기술적 사상을 벗어나지 않는 범위 내에서 여러 가지 치환, 변형 및 변경이 가능하므로 전술한 실시 예 및 첨부된 도면에 의해 한정되는 것이 아니다.Some embodiments omitted in this specification are equally applicable if their implementation subject is the same. It is to be understood that both the foregoing general description and the following detailed description of the present invention are exemplary and explanatory and are intended to be exemplary and explanatory only and are not restrictive of the invention, The present invention is not limited to the drawings.

Claims

신규 문제의 모델링 벡터를 생성하는 방법에 있어서,
임의의 문제에 대하여, 상기 문제의 특성을 나타내는 하나 이상의 문제 특성 정보를 각각 벡터화하는 a단계;
벡터화된 문제 특성 정보를 결합하여 문제 메타데이터를 생성하는 b단계;
하나 이상의 문제 메타데이터를 데이터 분석 프레임워크에 적용하여 상기 데이터 분석 프레임워크를 학습시키는 c단계;
상기 a 내지 b 단계를 통해 상기 신규 문제에 대한 신규문제 메타데이터를 생성하고, 상기 신규문제 메타데이터를 상기 학습된 데이터 분석 프레임워크에 적용하여 상기 신규 문제의 모델링 벡터를 생성하는 d 단계를 포함하며,
상기 문제 특성 정보는 문제 내용, 문제에 포함된 이미지, 음원 특징, 음원 길이, 문제 길이, 결합 문제 수 또는 문제가 속한 단원 정보 중 적어도 하나를 포함하는 신규 문제 모델링 벡터 생성 방법.
A method for generating a modeling vector of a new problem,
A) a step of vectorizing one or more problem characteristic information indicating a characteristic of the problem, respectively, for an arbitrary problem;
B) combining the vectorized problem feature information to generate problem metadata;
C) applying one or more problem metadata to a data analysis framework to learn the data analysis framework;
Generating new problem metadata for the new problem through steps a and b and applying the new problem metadata to the learned data analysis framework to generate a modeling vector of the new problem, ,
Wherein the problem characteristic information includes at least one of a problem content, an image included in a problem, a sound source feature, a sound source length, a problem length, a number of combination problems, or a unit information to which a problem belongs.

제1항에 있어서,
상기 c 단계는
상기 문제 메타데이터를 상기 데이터 분석 프레임워크에 적용한 제1 문제 모델링 벡터가 상기 문제에 대한 사용자의 풀이 결과 데이터를 이용하여 기 생성된 상기 문제의 제2 모델링 벡터에 상응하도록 상기 데이터 분석 프레임워크를 학습시키는 단계를 포함하는 신규 문제 모델링 벡터 생성 방법.
The method according to claim 1,
Step c
The first problem modeling vector applying the problem meta data to the data analysis framework is adapted to learn the data analysis framework so that the user's solution to the problem corresponds to the second modeling vector of the problem pre- Generating a new problem modeling vector.

제1항에 있어서,
상기 c 단계는
k 차원의 상기 문제 메타데이터를 인코딩 프레임워크에 적용하여 저차원의 제1 문제 벡터를 획득하는 c-1 단계;
상기 제1 문제 벡터를 디코딩 프레임워크에 적용하여 k차원의 제2 문제 벡터를 획득하는 c-2 단계;
상기 제2 문제 벡터가 상기 문제 메타데이터에 상응하도록 상기 c-1 내지 c-2 단계를 반복하여 상기 인코딩 프레임워크 및 상기 디코딩 프레임워크를 학습시키는 c-3 단계;
학습이 완료되는 시점의 상기 제2 문제 벡터를 상기 문제 메타데이터의 모델링 벡터로 설정하는 c-4 단계를 포함하는 신규 문제 모델링 벡터 생성 방법.
The method according to claim 1,
Step c
a c-1 step of applying the k-dimensional problem metadata to an encoding framework to obtain a low-dimensional first problem vector;
C-2 step of applying the first problem vector to a decoding framework to obtain a k-th second problem vector;
C-3 step of repeating the steps c-1 to c-2 so that the second problem vector corresponds to the problem metadata, thereby learning the encoding framework and the decoding framework;
And c-4 setting the second problem vector at a time when the learning is completed to a modeling vector of the problem meta data.

제1항에 있어서,
상기 문제 특성 정보가 문제 내용이고, 상기 문제 내용은 복수의 단어를 포함하는 경우,
상기 a 단계는
상기 복수의 단어 각각에 임의의 n차원 벡터를 부여하는 a-1 단계;
제1 단어에 부여된 제1 벡터를 인코딩 프레임워크에 적용하여 저차원의 제2 벡터를 획득하는 a-2 단계;
상기 제2 벡터를 디코딩 프레임워크에 적용하여 n차원의 제3 벡터를 획득하는 a-3 단계;
상기 제3 벡터가 상기 제1 단어로부터 기 설정된 거리 내에 포함되는 제2 단어에 임의로 부여된 제4 벡터에 상응하도록 상기 인코딩 프레임워크 및 상기 디코딩 프레임워크를 학습시키는 a-4 단계;
하나 이상의 제2 단어 각각에 대하여 상기 a-2 내지 a-4 단계를 반복 수행하여 최종 도출된 제 2 벡터를 상기 제1 단어의 벡터값으로 설정하는 a-5단계;
상기 복수의 단어 각각에 대하여 상기 a-2 내지 a-5 단계를 수행하여 단어별 벡터값을 획득하고, 상기 단어별 벡터값을 이용하여 상기 문제 내용을 벡터화하는 단계를 포함하는 신규 문제 모델링 벡터 생성 방법.
The method according to claim 1,
Wherein the problem characteristic information is a content of a problem, and the content of the problem includes a plurality of words,
In the step a,
A-1 step of assigning an arbitrary n-dimensional vector to each of the plurality of words;
A-2 applying the first vector assigned to the first word to the encoding framework to obtain a second vector of low dimension;
A-3 applying the second vector to a decoding framework to obtain an n-dimensional third vector;
A-4 step of learning the encoding framework and the decoding framework so that the third vector corresponds to a fourth vector arbitrarily given to a second word included within a predetermined distance from the first word;
A-5 performing the steps a-2 to a-4 for each of the one or more second words to set the finally derived second vector as the vector value of the first word;
Performing a-2 to a-5 steps for each of the plurality of words to obtain a word-by-word vector value, and vectorizing the problem content using the word-by-word vector value. Way.

제1항 내지 제5항의 방법 중 어느 하나의 방법을 실행시키기 위하여 컴퓨터 판독 가능 매체에 저장된 신규 문제 모델링 벡터 생성 응용 프로그램.A new problem modeling vector generation application stored in a computer readable medium for carrying out any one of the methods of claims 1 to 5.