KR20180075368A

KR20180075368A - Dropout method for improving training speed and memory efficiency on artificial neural network and learning method based on the same

Info

Publication number: KR20180075368A
Application number: KR1020170125452A
Authority: KR
Inventors: 최호진; 고병수; 김한규; 오교중
Original assignee: 한국과학기술원
Priority date: 2016-12-26
Filing date: 2017-09-27
Publication date: 2018-07-04
Also published as: KR102061615B1

Abstract

According to a dropout method for increasing a training speed and memory efficiency in an artificial neural network model may be performed in an artificial neural network model including a layer. In the controlled dropout method, matrix elements may be selected to be activated in a matrix in the layer in a column unit or row unit according to a dropout rate. In the controlled dropout method, matrix elements deactivated in the matrix are dropped out in the row unit or the column unit. Accordingly, the matrix may be transformed into a reduced-size matrix including only the selected matrix elements activated in the column or row unit, thereby forming a reduced artificial neural network model. In the controlled dropout method, forward propagation is performed by using the reduced artificial neural network formed in the previous stage, thereby performing the training of the neural network model. In addition, neural network model training may be performed through back propagation by using the reduced neural network model formed in the previous stage. A weight matrix and a bias vector value calculated in the back propagation of the reduced neural network model can be updated at original positions of a weight matrix and a bias vector of an original neural network model.

Description

인공 신경망 모델에서 메모리 효율성 및 학습 속도 향상을 위한 드롭아웃 방법과 이를 이용한 학습 방법{DROPOUT METHOD FOR IMPROVING TRAINING SPEED AND MEMORY EFFICIENCY ON ARTIFICIAL NEURAL NETWORK AND LEARNING METHOD BASED ON THE SAME}TECHNICAL FIELD [0001] The present invention relates to a dropout method for improving memory efficiency and learning speed in an artificial neural network model, and a learning method using the dropout method. [0002]

본 발명은 인공 신경망 기술 분야에 관한 것으로, 보다 상세하게는 인공 신경망 모델에서 과적합 문제(overfitting problem)를 해결하기 위한 개선된 드롭아웃 방법과 이를 이용한 인공신경망 모델 학습방법에 관한 것이다.Field of the Invention [0002] The present invention relates to artificial neural network technology, and more particularly, to an improved dropout method and an artificial neural network model learning method for solving an overfitting problem in an artificial neural network model.

인공 신경망(artificial neural network) 모델은 생물학의 신경망에서 영감을 받아 개발된 통계학적 학습 모델로서, 여러 개의 레이어(layer)들과 각 레이어를 구성하는 뉴런(neuron)들이 형성하는 네트워크가 학습을 통하여 문제 해결 능력을 가지는 모델을 말한다. 인공신경망의 간단한 모델은 잘 알려진 것처럼 모든 레이어가 다중 뉴런을 가질 때, 입력 레이어, 출력 레이어, 그리고 하나 또는 여러 개의 히든 레이어를 포함한다. 도 1의 (a)는 히든 레이어가 2 레이어로 이루어진 경우를 예시한다. 뉴런(즉, 노드)간의 각 연결 즉, 신경망에서 정보는 가중치와 바이어스의 형태로 바뀌어 저장된다. The artificial neural network model is a statistical learning model inspired by the neural network of biology. It is a statistical learning model in which networks formed by layers and neurons constituting each layer are learned through learning It is a model that has solving capability. A simple model of an artificial neural network, as is well known, includes an input layer, an output layer, and one or more hidden layers when all layers have multiple neurons. 1 (a) illustrates a case where the hidden layer is composed of two layers. Each connection between neurons (ie, nodes), ie, information in a neural network, is stored in the form of weights and biases.

순방향 전파(forward propagation)단계에서, 주어진 레이어의 활성화들(activations)을 가지고 다음 레이어의 활성화들을 계산할 수 있다. 즉, 외부에서 노드로 인가되는 신호는 해당 가중치가 곱해져 노드에 전달되고, 노드에서는 이 값들을 모두 합한다. 이 합계를 가중합(weighted sum)이라 한다. 이런 계산을 통해 얻은 가중합을 활성함수에 입력하여 얻은 값이 외부로 출력되는 출력값이다. 활성함수에 의해 노드의 동작 특성이 결정된다. 또한, 이 결과를 이용하여 역전파(backpropagation) 과정을 통해 레이어들 사이의 가중치(weight)와 바이어스(bias)를 갱신할 수 있다.In the forward propagation step, activations of the next layer can be calculated with activations of a given layer. That is, a signal applied from the outside to the node is multiplied by the corresponding weight and delivered to the node, and the node adds all of the values. This sum is called a weighted sum. The value obtained by inputting the weighted sum obtained by such calculation into the active function is an output value output to the outside. The operating characteristic of the node is determined by the activation function. The weight and bias between the layers can be updated through a backpropagation process using this result.

최근, 인공 신경망을 기반으로 하는 딥 러닝(deep learning) 기술은 복수의 히든 레이어(hidden layer)들과 비선형 변환(non-linear activation)을 이용한 높은 수준의 추상화를 통해 복잡한 구조의 문제나 데이터에 대해서 학습을 시도하고 있다. 이런 심층 신경망(deep neural network)을 기반으로 하는 딥 러닝 기술은 매우 강력한 기계 학습 기법이다. 하지만 여러 개의 히든 레이어들을 사용함에 따라, 학습해야 하는 파라미터(parameter)의 수가 많아져서 그만큼 학습시간을 많이 필요로 한다. 또한, 인공 신경망이 입력된 훈련 데이터에 과도하게 편중되어 학습 결과가 일반성을 잃는 과적합 문제가 발생하기도 한다. In recent years, deep learning techniques based on artificial neural networks have been applied to complex structure problems or data through a high level of abstraction using multiple hidden layers and non-linear activation. I am trying to learn. Deep learning techniques based on deep neural networks are very powerful machine learning techniques. However, by using several hidden layers, the number of parameters to be learned increases, which requires much learning time. In addition, the artificial neural network is excessively biased to the input training data, resulting in an over-sum problem in which the learning result loses generality.

과적합 문제는 구문 분석을 위반하는 모델이나 절차를 사용하는 것과 관련이 있다. 이것은 모델이 필요 이상으로 더 많은 용어를 포함하거나 필요 이상으로 더 복잡한 접근법을 사용할 때 발생한다. 특히 제한된 데이터의 경우, 훈련 데이터 간의 이러한 복잡한 관계의 대부분은 샘플링 노이즈의 결과일 가능성이 높으며 실제 테스트 데이터보다는 학습 세트에 존재할 수 있다. 이것은 신경망 모델에서 심각한 문제이며 매우 정확한 학습 세트이지만 정확도가 낮은 테스트 세트로 이어진다.The overhead problem is related to the use of models or procedures that violate parsing. This occurs when a model contains more terms than necessary or uses more complex approaches than necessary. Especially for limited data, most of these complex relationships between training data are likely to be the result of sampling noise and may be present in the training set rather than the actual test data. This is a serious problem in the neural network model and leads to a less accurate set of tests but a less accurate set of tests.

이러한 과적합 문제를 해결하기 위해 학습과정 동안 노드(뉴런)들을 임의로 비활성화 시키는 드롭아웃(dropout. 탈락) 기술이 개발되어 사용되고 있다. 노드들을 비활성화 시키는 드롭아웃은 훈련(training)(또는, 학습)에 사용되는 행렬(matrix)의 일부 값들을 0으로 변경시키는 것이다. 이러한 드롭아웃 기술이 적용된 행렬에서 다수의 행렬 요소(matrix element)들이 0으로 변환된다. 도 1의 (b)는 도 1의 (a)에 도시된 신경망에 드롭아웃을 적용하여 신경망이 엷어진 상태를 예시한다. 종래의 드롭아웃 기법은 노드(유닛)를 무작위로 선택하여 해당 레이어에서 일시적으로 제거한다. In order to solve this oversaturation problem, a dropout technique has been developed and used which arbitrarily deactivates nodes (neurons) during the learning process. A dropout that deactivates nodes is to change some values of the matrix used for training (or training) to zero. In a matrix to which this drop-out technique is applied, a plurality of matrix elements are converted to zero. Fig. 1 (b) illustrates a state in which the neural network is thinned by applying a dropout to the neural network shown in Fig. 1 (a). The conventional drop-out technique randomly selects a node (unit) and temporarily removes it from the layer.

드롭아웃을 적용하면 각 히든 레이어에 제로(0) 요소가 많이 생성된다. n개의 히든 레이어가 있고 각 레이어는 m 개의 노드를 가지며 드롭아웃 비율이 0.5인 경우(거의 절반의 요소들이 존재함을 의미함), 드롭아웃 후 제로(0) 요소의 수의 기대값은 (1/2)mn개 이다. 예를 들어, 10개의 히든 레이어와 각 레이어에 대해 500 개의 노드가 있는 경우, 2,500 개의 제로 요소가 생성된다. 그런데 이러한 제로(0) 요소들은 행렬 곱셈에 대한 불필요한 계산을 초래한다. 즉, 이와 같이 드롭아웃을 적용하여 행렬 요소를 변환하면 제로(0) 요소들이 많이 발생하고, 신경망을 훈련시키기 위한 행렬 사이의 내적 연산 과정에서 제로(0) 요소들은 행렬의 다른 어떤 실수와 곱셈을 하게 되는데, 이러한 계산은 0이라는 성질 상 불필요한 계산이고, 결과적으로 자원의 낭비를 초래한다. Applying a dropout produces a lot of zero elements in each hidden layer. If there are n hidden layers and each layer has m nodes and the dropout ratio is 0.5 (meaning that there are almost half of the elements), the expected value of the number of zero elements after dropout is (1 / 2) mn. For example, if there are 10 hidden layers and 500 nodes for each layer, 2,500 zero elements are created. However, these zero elements lead to unnecessary calculations for matrix multiplication. In other words, when the matrix elements are transformed by applying the dropout, a large number of zero elements are generated. In the inner operation between the matrixes for training the neural network, the zero elements are multiplied by some other real number of the matrix This calculation is an unnecessary calculation in the nature of 0, resulting in a waste of resources.

비특허문헌 1: "Dropout: a simple way to prevent neural networks from overfitting.", Journal of Machine Learning Research, vol. 15, no. 1, pp. 1929-1958, 2014Non-Patent Document 1: "Dropout: a simple way to prevent neural networks from overfitting ", Journal of Machine Learning Research, vol. 15, no. 1, pp. 1929-1958, 2014

신경망을 훈련시키는 것은 큰 크기의 행렬을 저장하고 그들 간에 계산을 수행하는 작업을 포함한다. 이런 작업은 상당한 양의 메모리 공간과 계산 능력을 필요로 한다. 드롭아웃을 적용한 후에는 행렬에 많은 제로 요소가 발생하고 0과의 곱셈은 0이 되므로 불필요한 계산이 발생할 수 있다. 이런 점을 감안하여, 본 발명의 일 목적은 인공 신경망 모델에서 학습에 필요한 행렬 계산 시 탈락시킬 0의 행렬 요소를 조직화된 형식으로 배열하여 제거함으로써 중복 행렬 계산을 줄일 수 있는 제어된 드롭아웃(controlled dropout) 방법을 제공하기 위한 것이다.Training neural networks involves storing large-sized matrices and performing calculations between them. This requires a significant amount of memory space and computing power. After applying the dropout, many zero elements occur in the matrix and the multiplication with zero is zero, which may lead to unnecessary calculations. In view of this, it is an object of the present invention to provide a controlled dropout method capable of reducing redundant matrix calculations by arranging and eliminating matrix elements of 0 to be eliminated in a matrix calculation required for learning in an artificial neural network model, dropout method.

본 발명의 다른 목적은 과적합 문제를 위한 드롭아웃 방법을 사용하면서 동시에 인공 신경망 모델의 메모리 사용률을 줄여 메모리 효율성을 높이고, 인공 신경망 모델의 학습 속도를 향상시킬 수 있는 제어된 드롭아웃 방법을 이용한 인공 신경망 모델 학습 방법을 제공하기 위한 것이다.Another object of the present invention is to provide an artificial neural network model which uses a dropout method for over sum problems and simultaneously increases the memory efficiency by reducing the memory utilization of the artificial neural network model, And to provide a neural network model learning method.

본 발명의 일 목적을 달성하기 위한 본 발명의 실시예들에 따른 제어된 드롭아웃 방법은 레이어(layer)를 포함하는 인공 신경망 모델에서 수행될 수 있다. 상기 제어된 드롭아웃 방법은, 드롭아웃 비율(dropout rate)에 따라 열 단위 또는 행 단위로 상기 레이어의 행렬에서 활성화될 행렬 요소들을 선택하는 단계; 및 상기 행렬에서 비활성화된 행렬 요소들을 열 단위 또는 행 단위로 탈락시켜, 상기 행렬을 선택된 열 또는 행의 활성화될 행렬 요소들로만 구성되는 축소된 크기의 행렬로 변환하여 축소된 인공 신경망 모델을 구성하는 단계를 포함할 수 있다.In order to accomplish an object of the present invention, a controlled dropout method according to embodiments of the present invention may be performed in an artificial neural network model including a layer. The controlled dropout method includes: selecting matrix elements to be activated in a matrix of the layer in column units or row units according to a dropout rate; And constructing a reduced artificial neural network model by dropping the matrix elements deactivated in the matrix on a column or row basis and converting the matrix into a matrix of reduced size composed only of matrix elements to be activated in the selected column or row . &Lt; / RTI >

일 실시예에 의하면, 상기 선택하는 단계는, 상기 인공 신경망 모델이 복수의 레이어들을 포함하는 경우, 상기 복수의 레이어들 각각의 행렬에서 상기 활성화될 행렬 요소들을 상기 드롭아웃 비율(dropout rate)에 따라 열 단위 또는 행 단위로 선택하는 단계를 포함할 수 있다.According to an embodiment, the selecting may comprise: when the artificial neural network model includes a plurality of layers, selecting the matrix elements to be activated in the matrix of each of the plurality of layers according to the dropout rate Column or row basis. &Lt; RTI ID = 0.0 >

일 실시예에 의하면, 상기 복수의 레이어들은 상기 인공 신경망 모델의 입력 레이어와 출력 레이어 사이의 복수의 히든 레이어를 포함할 수 있다.According to an embodiment, the plurality of layers may include a plurality of hidden layers between an input layer and an output layer of the artificial neural network model.

일 실시예에 의하면, 상기 제어된 드롭아웃 방법은 상기 복수의 레이어들과 연관된 매개변수들을 상기 축소된 크기의 행렬에 기초하여 축소시키는 매개변수 축소 단계를 더 포함할 수 있다.According to one embodiment, the controlled dropout method may further comprise a parameter reduction step of reducing parameters associated with the plurality of layers based on the matrix of reduced size.

일 실시예에 의하면, 상기 매개변수 축소 단계는, 인접하는 제1 및 제2 레이어 사이의 가중치 행렬에서, 상기 제1 레이어의 행렬의 활성화될 열 또는 행과 곱해지는 상기 가중치 행렬의 행 또는 열만 남기고 나머지 행 또는 열은 제거하여 상기 가중치 행렬의 크기를 축소하는 단계를 포함할 수 있다.According to an embodiment, the step of reducing parameters may be performed by leaving only a row or a column of the weight matrix multiplied by a column or a row to be activated of the matrix of the first layer in a weighting matrix between adjacent first and second layers And removing the remaining rows or columns to reduce the size of the weight matrix.

일 실시예에 의하면, 상기 매개변수 축소 단계는, 인접하는 제1 및 제2 레이어 사이의 바이어스 벡터에서, 상기 제2 레이어의 행렬의 활성화될 열 또는 행에 대응하는 상기 바이어스 벡터의 열 또는 행만 남기고 나머지 행 또는 열은 제거하여 상기 바이어스 벡터의 크기를 축소하는 단계를 포함할 수 있다.According to one embodiment, in the step of reducing the parameter, in the bias vector between the adjacent first and second layers, only the column or the row of the bias vector corresponding to the column or row to be activated of the matrix of the second layer And removing the remaining rows or columns to reduce the size of the bias vector.

일 실시예에 의하면, 상기 선택하는 단계에서, 상기 활성화될 행렬 요소들의 열 또는 행은 상기 드롭아웃 비율에 기초하여 확률적으로 결정될 수 있다.According to one embodiment, in the selecting step, the column or row of the matrix elements to be activated may be stochastically determined based on the dropout rate.

일 실시예에 의하면, 상기 인공 신경망 모델에 제공되는 각 데이터의 구성단위가 행렬의 열인 경우에는 상기 활성화될 행렬 요소들의 단위는 행이며, 상기 각 데이터의 구성단위가 행렬의 행인 경우에는 상기 활성화될 행렬 요소들의 단위는 열일 수 있다.According to an embodiment, when the constituent unit of each data provided to the artificial neural network model is a column of a matrix, the unit of the matrix elements to be activated is a row, and when the constituent unit of each data is a matrix row, The unit of matrix elements may be column.

본 발명의 다른 목적을 달성하기 위한 본 발명의 실시예들에 따른 제어된 드롭아웃 방법을 이용한 인공 신경망 모델 학습 방법은 레이어를 포함하는 인공 신경망 모델에서 수행될 수 있다. 상기 인공 신경망 모델 학습 방법은 드롭아웃 비율(dropout rate)에 따라 열 단위 또는 행 단위로 상기 레이어의 행렬에서 활성화될 행렬 요소들을 선택하는 단계; 상기 행렬에서 비활성화된 행렬 요소들을 열 단위 또는 행 단위로 탈락시켜, 상기 행렬을 선택된 열 또는 행의 활성화될 행렬 요소들로만 구성되는 축소된 크기의 행렬로 변환하여 축소된 인공 신경망 모델을 구성하는 단계; 그리고 상기 축소된 인공 신경망 모델로 상기 인공 신경망 모델을 학습하는 단계를 포함할 수 있다.According to another aspect of the present invention, there is provided an artificial neural network model learning method using a controlled dropout method. The artificial neural network model learning method includes the steps of: selecting matrix elements to be activated in a matrix of the layer in column units or row units according to a dropout rate; Constructing a reduced artificial neural network model by dropping matrix elements deactivated in the matrix on a column or row basis and converting the matrix into a matrix of reduced size composed only of matrix elements to be activated in the selected column or row; And learning the artificial neural network model with the reduced artificial neural network model.

일 실시예에 의하면, 상기 학습하는 단계는, 이전 단계에서 형성한 축소된 신경망 모델을 이용하여 전방 전파(forward propagation)를 진행하여 신경망 모델 학습을 수행하는 전방 전파 수행 단계; 그리고 이전 단계에서 형성한 축소된 신경망 모델을 이용하여 역 전파(back propagation)를 통해 신경망 모델 학습을 수행하는 역 전파 수행 단계를 포함할 수 있다.According to an embodiment, the learning step may include: a forward propagation step of performing neural network model learning by forward propagation using a reduced neural network model formed in a previous step; And a back propagation step of performing neural network model learning through back propagation using the reduced neural network model formed in the previous step.

일 실시예에 의하면, 상기 선택하는 단계는, 상기 인공 신경망 모델이 복수의 레이어들을 포함하는 경우, 상기 복수의 레이어들 각각의 행렬에서 상기 활성화될 행렬 요소들을 상기 드롭아웃 비율(dropout rate)에 따라 열 단위 또는 행 단위로 선택하는 단계를 포함하며, 상기 복수의 레이어들은 상기 인공 신경망 모델의 입력 레이어와 복수의 히든 레이어 중 적어도 어느 한 가지를 포함할 수 있다.According to an embodiment, the selecting may comprise: when the artificial neural network model includes a plurality of layers, selecting the matrix elements to be activated in the matrix of each of the plurality of layers according to the dropout rate Wherein the plurality of layers include at least one of an input layer of the artificial neural network model and a plurality of hidden layers.

일 실시예에 의하면, 상기 인공신경망 모델 학습 방법은, 상기 학습하는 단계 이전에, 상기 복수의 레이어들과 연관된 매개변수들을 상기 축소된 크기의 행렬에 대응하도록 축소시키는 매개변수 축소 단계를 더 포함할 수 있고, 상기 매개변수들은 가중치 행렬과 바이어스 행렬을 포함할 수 있다.According to one embodiment, the artificial neural network model learning method further includes a parameter reduction step of reducing parameters associated with the plurality of layers to correspond to the matrix of the reduced size before the learning step And the parameters may include a weight matrix and a bias matrix.

일 실시예에 의하면, 상기 인공신경망 모델 학습 방법은, 상기 역 전파 수행 단계에서 산출된 값을 이용하여 상기 복수의 레이어들과 연관된 상기 매개변수들을 상기 축소된 인공 신경망 모델의 원래의 자리에 갱신하는 단계를 더 포함할 수 있다.According to an embodiment, the artificial neural network model learning method updates the parameters associated with the plurality of layers to the original position of the reduced neural network model using the values calculated in the back propagation step Step < / RTI >

일 실시예에 의하면, 상기 매개 변수를 갱신하는 단계는, 상기 축소된 인공 신경망 모델을 이용하여 역전파를 진행하여 계산된 가중치 행렬과 바이어스 벡터값을 원래의 상기 인공 신경망 모델의 가중치 행렬과 바이어스 벡터에서의 원래의 자리에 갱신하는 단계를 포함할 수 있다.According to an embodiment, the step of updating the parameter may further include a step of calculating a weight matrix and a bias vector value calculated by advancing back propagation using the reduced artificial neural network model to a weight matrix of the original artificial neural network model and a bias vector To the original place in the memory.

본 발명은 복수의 레이어들을 포함하는 인공 신경망 모델에서 제어된 드롭아웃 기술을 적용하여 행렬의 불필요한 계산을 수행하지 않게 하여 과적합 문제를 해결할 수 있다. 종래의 드롭아웃 방법은 인공 신경망 모델의 레이어의 행렬의 각 행렬요소 단위로 활성화 여부를 결정하여 드롭아웃하였다. 드롭아웃 되는 행렬 요소들이 행렬 연산에 참여하게 되어 비효율이 컸었다. 이에 비해, 상기 목적을 달성하기 위한 본 발명의 실시예들에 따른 드롭아웃 방법은 드롭아웃을 적용함에 있어서 행렬의 요소들에 대하여 활성화시킬 행렬요소를 드롭아웃 비율(dropout rate)에 따라 열 또는 행 단위로 결정하여 드롭아웃한다. 이에 의해, 상기 행렬의 비활성화 될 요소들을 제거함으로써 행렬의 차원을 줄일 수 있다. 그 행렬 요소들과 계산될 가중치(weight) 행렬 요소와 바이어스(bias) 벡터 요소들 또한 제거함으로써 행렬의 크기를 줄인다. 이처럼 과적합 문제 해결을 위한 드롭아웃을 수행하면서, 계산 대상인 행렬의 크기를 줄일 수 있고, 그러한 행렬 축소를 통해 메모리 사용률을 줄일 수 있고, 인공 신경망 모델의 학습 속도를 개선시킬 수 있다.The present invention can solve the over sum problem by not performing unnecessary calculation of the matrix by applying the controlled dropout technique in the artificial neural network model including a plurality of layers. The conventional dropout method determines whether to activate or not in each matrix element of the matrix of the layer of the artificial neural network model and drop out. The ineffectivity of the matrix elements dropped out was significant. In order to achieve the above object, in a dropout method according to embodiments of the present invention, a matrix element to be activated for elements of a matrix in a dropout is divided into columns or rows according to a dropout rate. The unit is determined and dropped out. Thereby, the dimension of the matrix can be reduced by removing elements to be inactivated in the matrix. The size of the matrix is reduced by also removing the matrix elements and the weight matrix elements and bias vector elements to be computed. In this way, it is possible to reduce the size of the matrix to be computed while performing the dropout for solving the over-sum problem, reduce the memory utilization rate by reducing the matrix, and improve the learning speed of the artificial neural network model.

또한, 본 발명의 실시예들에 따른 제어된 드롭아웃 방법은, 드롭아웃 기술이 적용된 레이어의 개수가 많을수록, 그 레이어를 구성하는 뉴런의 수가 많을수록, 메모리 효율성 및 학습 속도를 향상시킬 수 있다. Further, in the controlled dropout method according to embodiments of the present invention, the greater the number of layers to which the dropout technique is applied, and the greater the number of neurons constituting the layer, the higher the memory efficiency and the learning speed.

이러한 새로운 드롭아웃 방법을 전체 인공 신경망 모델에서 효과적으로 적용할 수 있다. This new drop-out method can be effectively applied to the entire artificial neural network model.

본 발명의 실시예들에 따른 제어된 드롭아웃 방법을 적용하여 얻어지는 축소된 인공 신경망 모델을 이용하여 학습을 수행하면 학습의 효율성도 높일 수 있다. The learning efficiency can be improved by performing the learning using the reduced artificial neural network model obtained by applying the controlled dropout method according to the embodiments of the present invention.

다만, 본 발명의 효과는 상기 효과들로 한정되는 것이 아니며, 본 발명의 사상 및 영역으로부터 벗어나지 않는 범위에서 다양하게 확장될 수 있을 것이다.However, the effects of the present invention are not limited to the above effects, and may be variously extended without departing from the spirit and scope of the present invention.

도 1의 (a)와 (b)는 2개의 히든 레이어를 갖는 일반적인 인공 신경망과 이것에 대하여 드롭아웃이 적용된 후 엷어진 인공 신경망을 각각 예시한다.
도 2는 본 발명의 실시예들에 따른 드롭아웃 방법을 적용한 행렬 형태의 비교예를 나타내는 도면이다.
도 3은 본 발명의 실시예들에 따른 드롭아웃 방법에 의해 감소된 행렬의 일 예를 나타내는 도면이다.
도 4는 드롭아웃 적용 전의 모델(a), 기존의 일반적인 드롭아웃 방법을 적용한 모델(b)과 본 발명의 실시예들에 따른 제어된 드롭아웃 방법적용 모델(c)을 피드-포워드(feed-forward) 인공 신경망을 사용하여 예시한다.
도 5는 본 발명의 예시적인 실시예들에 따른 드롭아웃 방법을 적용한 인공 신경망 모델 학습 방법을 나타내는 순서도이다.1 (a) and 1 (b) illustrate a general artificial neural network having two hidden layers and a thinned artificial neural network to which a dropout is applied.
FIG. 2 is a diagram illustrating a matrix form in which a dropout method according to embodiments of the present invention is applied.
3 is a diagram illustrating an example of a matrix reduced by a dropout method according to embodiments of the present invention.
FIG. 4 is a flow chart showing a model (a) before the dropout application, a model (b) applying the conventional general dropout method and a controlled dropout method application model (c) according to the embodiments of the present invention to a feed- forward artificial neural networks.
5 is a flowchart illustrating an artificial neural network model learning method to which a dropout method according to exemplary embodiments of the present invention is applied.

이하, 첨부한 도면들을 참조하여, 본 발명의 실시예들을 보다 상세하게 설명하고자 한다. 도면상의 동일한 구성 요소에 대해서는 동일하거나 유사한 참조 부호를 사용한다.Hereinafter, embodiments of the present invention will be described in detail with reference to the accompanying drawings. The same or similar reference numerals are used for the same components in the drawings.

이하에서는 본 발명의 실시예들에 따른 인공 신경망 모델에서 메모리 효율성 및 학습 속도 향상을 위한 드롭아웃 방법을 완전 접속망(fully connected network)에 대해 적용하여 예시적으로 설명하며, 인공 신경망 모델의 종류나 레이어(layer)의 개수, 각 레이어를 구성하는 뉴런의 개수는 달리 적용 될 수 있음이 자명할 것이다. 또한, 본 명세서에서 사용되는 용어인 뉴런, 행렬 요소, 노드는 등가적인 의미를 갖는다.Hereinafter, a drop-out method for improving memory efficiency and learning speed in an artificial neural network model according to embodiments of the present invention will be described in an exemplary manner by applying the artificial neural network model to a fully connected network. the number of layers and the number of neurons constituting each layer may be applied differently. In addition, the terms neuron, matrix element, and node, as used herein, have equivalent meanings.

도 2는 본 발명의 실시예들에 따른 드롭아웃 방법을 적용한 행렬 형태의 비교예를 나타내는 도면이다.FIG. 2 is a diagram illustrating a matrix form in which a dropout method according to embodiments of the present invention is applied.

도 2를 참조하면, 제1 비교 행렬(Y1)은 드롭아웃 기술을 이용하지 않는 인공 신경망 모델에서 사용되는 행렬을 나타낸다. 제1 비교 행렬(Y1)은 특정 레이어에 적용되거나, 특정 레이어에서 출력될 수 있는 행렬이다. 도면에는 제1 비교 행렬이 예컨대 6x6 행렬로 도시되어 있는데, 제1 비교 행렬의 크기에는 별다른 제한이 없다. 여기서, 레이어는 히든 레이어(hidden layer) 및/또는 입력 레이어(input layer)일 수 있다. 제2 비교 행렬(Y2)은 제1 비교 행렬(Y1)에 일반적인 드롭아웃 방법을 적용한 행렬을 예시적으로 나타낸다. 제3 행렬(Y3)은 본 발명의 예시적인 실시예들에 따른 제어된 드롭아웃 방법을 적용한 행렬을 나타낸다. 여기서, 각각의 사각형이 행렬 요소를 나타내는데, 전체가 검은색으로 표시된 사각형은 일반적인 행렬 요소(또는, 활성화되는 행렬 요소)를 나타내고, 가운데 부분이 비어 있는(흰색으로 표시된) 사각형은 드롭아웃에 의해 생기는 제로(0)의 값을 갖는 행렬 요소를 나타낸다. Referring to FIG. 2, the first comparison matrix Y1 represents a matrix used in an artificial neural network model that does not use a dropout technique. The first comparison matrix Y1 is a matrix that can be applied to a specific layer or output from a specific layer. In the figure, a first comparison matrix is shown, for example, as a 6x6 matrix, and the size of the first comparison matrix is not particularly limited. Here, the layer may be a hidden layer and / or an input layer. The second comparison matrix Y2 exemplarily shows a matrix to which a general dropout method is applied to the first comparison matrix Y1. The third matrix Y3 represents a matrix to which the controlled dropout method according to the exemplary embodiments of the present invention is applied. Here, each of the rectangles represents a matrix element. A rectangle indicated by black in the whole represents a general matrix element (or an activated matrix element), and a rectangle in which a center portion is empty (indicated by white) Represents a matrix element having a value of zero (0).

기존의 드롭아웃 기법과 본 발명의 예시적인 실시형태에 따른 제어된 드롭아웃(controlled drop-out) 기법이 비교가능하게 예시되어 있다. 서로 다른 방식으로 행렬 요소들을 떨어뜨린다. 기존의 드롭아웃 방법은 각 행렬 요소를 기준으로 드롭아웃 비율(dropout rate)에 따라 활성화시킬 행렬 요소를 확률적으로 결정한다. 이에 비해, 본 발명의 예시적인 실시예에 따른 제어된 드롭아웃 방법은 행렬의 각 열 및/또는 각 행을 기준으로 드롭아웃 비율(dropout rate)에 따라 활성화시킬 열 및/또는 행을 확률적으로 결정한다. A conventional drop-out technique and a controlled drop-out technique according to an exemplary embodiment of the present invention are comparably illustrated. Drop matrix elements in different ways. The existing dropout method stochastically determines a matrix element to be activated according to a dropout rate based on each matrix element. By contrast, the controlled dropout method according to the exemplary embodiment of the present invention provides a method for stochastically and / or column-wise activating columns and / or rows to be activated according to a dropout rate with respect to each column and / .

이를 구체적으로 설명한다. 제2 비교 행렬(Y2)로 예시된 것처럼, 기존의 일반적인 드롭아웃 기법은 유닛을 드롭아웃 비율에 따라 행렬 요소를 무작위로(randomly) 탈락시키기 때문에 어떤 행렬 요소가 탈락될 지 예측할 수 없다. 드롭아웃 후의 행렬이 어떤 모양을 가질 지 예측할 수 없다. 본 발명에 있어서, 드롭아웃 비율은 레이어를 구성하는 각 행 또는 각 열이 활성화될 확률을 나타낸다. 그 드롭아웃 비율은 미리 설정될 수 있다. 예를 들어, 드롭아웃 비율이 0.5인 경우, 36개의 행렬요소들 중에서 임의의 18개의 행렬 요소들이 드롭아웃될 수 있다. 예를 들어, 드롭아웃된 행렬 요소(예를 들어, 흰색으로 채워진 행렬 요소)는 0의 값을 가지고, 드롭아웃된 행렬 요소에 대응하는 뉴런은 비활성화 될 수 있다. 도 2에 도시된 것처럼, 제2 비교 행렬(Y2)에서, 행렬 요소들 각각의 드롭아웃 여부는 드롭아웃 비율(dropout rate)에 기초하여 확률적으로 랜덤하게 결정되므로, 드롭아웃되는 행렬 요소들의 위치는 어떠한 규칙성도 없이 랜덤하게 정해짐을 알 수 있다. This will be explained in detail. As exemplified by the second comparison matrix Y2, a conventional general dropout technique can not predict which matrix elements will be dropped because the unit will randomly drop the matrix elements according to the dropout rate. It is not possible to predict what shape the matrix after dropout will have. In the present invention, the dropout ratio indicates the probability that each row or each column constituting the layer is activated. The dropout rate can be set in advance. For example, if the dropout ratio is 0.5, any 18 matrix elements out of the 36 matrix elements may be dropped out. For example, a dropped-out matrix element (e.g., a matrix element filled in white) has a value of 0, and a neuron corresponding to the dropped-out matrix element can be deactivated. As shown in FIG. 2, in the second comparison matrix Y2, whether or not each of the matrix elements is dropped out is stochastically determined randomly based on the dropout rate, so that the position of the matrix elements to be dropped out Can be determined randomly without any regularity.

이에 비해, 본 발명의 예시적인 실시예들에 따르면, 행렬 요소들에 대하여 제어된 드롭아웃(controlled drop-out)을 적용할 수 있다. 행렬 요소들의 드롭아웃 여부는 인공 신경망 모델에 제공되는 데이터 각각이 '열(column)' 단위로 구성되는 경우, 행렬 요소들의 '행(row)'을 기준으로 드롭아웃 비율에 기초하여 결정될 수 있다. 어떤 행이 드롭아웃될 것인지는 드롭아웃 비율에 기초하여 확률적으로 선택될 수 있다. 예를 들어 드롭아웃 비율이 0.5인 경우, 6개의 행들 중에서 3개의 행이 드롭아웃될 수 있다. 예를 들어, {2, 4, 6}행이 드롭아웃될 행으로 정해질 수 있고, 이 행들에 포함된 행렬 요소들이 드롭아웃될 수 있다.In contrast, according to exemplary embodiments of the present invention, a controlled drop-out can be applied to the matrix elements. The dropout status of the matrix elements can be determined based on the dropout rate based on the 'row' of the matrix elements when each of the data provided to the artificial neural network model is composed of 'column' units. Which row is to be dropped out can be probabilistically selected based on the dropout rate. For example, if the dropout ratio is 0.5, 3 rows out of 6 rows can be dropped out. For example, {2, 4, 6} rows can be defined as rows to be dropped out, and the matrix elements contained in these rows can be dropped out.

행렬 요소들의 드롭아웃 여부가 행 단위로 결정되는 것으로 설명하였으나, 이는 예시적인 것에 불과하고, 본 발명이 이에 한정되는 것은 아니다. 예시적인 다른 실시예에 따르면, 제어된 드롭아웃 방법은 행렬 요소들의 드롭 여부를 행렬 요소들의 '열'을 기준으로 수행할 수도 있다. 예를 들어, 인공 신경망 모델에 제공되는 데이터 각각이 열 단위가 아닌 '행' 단위로 구성되는 경우, 행렬 요소들의 드롭아웃 여부는 '열' 단위로 결정될 수도 있다. 도 2의 제3 행렬(Y3)의 경우, 열 단위로 드롭아웃을 결정한 예로 볼 수 있다. 즉, 드롭아웃 비율이 0.5인 경우로서, {2, 4, 5}열이 드롭아웃될 열로 결정되어, 이들 3열의 행렬 요소 전부가 제3 행렬(Y3)에서 드롭아웃되는 요소로 결정된 경우이다.It is described that the drop-out state of the matrix elements is determined on a row-by-row basis, but this is merely exemplary and the present invention is not limited thereto. According to another exemplary embodiment, the controlled dropout method may perform the drop of the matrix elements on the basis of the 'column' of the matrix elements. For example, if each of the data provided to the artificial neural network model is composed of 'row' units, not column units, the dropout of the matrix elements may be determined in units of 'columns'. In the case of the third matrix Y3 of FIG. 2, it can be seen that the dropout is determined in units of columns. That is, when the dropout ratio is 0.5, it is determined that the {2, 4, 5} columns are determined as the columns to be dropped out, and all of the matrix elements of the three columns are determined to be dropped out in the third matrix Y3.

예시적인 실시예에 따르면, 인공 신경망 모델에 제공되는 각 데이터의 구성단위가 행렬의 '열'인 경우에는, 활성화될 행렬 요소들의 단위 (즉, 드롭아웃될 행렬 요소의 단위)는 '행'일 수 있다. 마찬가지로, 상기 각 데이터의 구성단위가 행렬의 '행'인 경우에는, 활성화될 행렬 요소들의 단위 (즉, 드롭아웃될 행렬 요소의 단위)는 '열'일 수 있다.According to an exemplary embodiment, when the constituent unit of each data provided to the artificial neural network model is a 'column' of a matrix, a unit of matrix elements to be activated (that is, a unit of a matrix element to be dropped out) . Similarly, when the constituent unit of each data is 'row' of the matrix, the unit of matrix elements to be activated (that is, the unit of the matrix element to be dropped out) may be 'column'.

또한, 도 2에서 행렬들(Y1, Y2, Y3)은 2차원 행렬인 것으로 설명하였으나, 이는 예시적인 것으로, 본 발명이 이에 한정되는 것은 아니다. 본 발명의 실시예들에 따른 제어된 드롭아웃 방법은 모든 행렬 차원에 적용될 수 있다.2, the matrixes Y1, Y2, and Y3 are two-dimensional matrices. However, the present invention is not limited thereto. The controlled dropout method according to embodiments of the present invention may be applied to all matrix dimensions.

도 3은 인공 신경망에서 본 발명의 예시적인 실시예에 따른 제어된 드롭아웃 방법을 적용하여 행렬의 크기를 줄이는 방법을 예시한다. 도 3은 인공 신경망 모델의 하나의 레이어를 기준으로 드롭아웃 비율(drop-out rate)이 0.5인 경우의 간단한 예를 보여준다. 제어된 드롭아웃 방법을 적용함에 있어서, 드롭아웃 비율은 0.5가 아닌 다른 값을 가질 수 있다.Figure 3 illustrates a method of reducing the size of a matrix by applying a controlled dropout method in an artificial neural network according to an exemplary embodiment of the present invention. FIG. 3 shows a simple example in which the drop-out rate is 0.5 based on one layer of the artificial neural network model. In applying the controlled dropout method, the dropout ratio may have a value other than 0.5.

도 3에서, 검은색 박스는 일반적인 행렬 요소이고, 흰색 박스는 드롭아웃에 의해 생성되는 0 요소를 나타낸다. 제1 행렬(X1)은 제1 레이어에서 본 발명에 따른 제어된 드롭아웃 적용 이후의 출력 행렬(output matrix)이고, 제2 행렬(X2)은 제2 레이어에서 본 발명에 따른 제어된 드롭아웃 적용 이후의 출력 행렬(output matrix)을 나타낸다. 예를 들어 제1 레이어는 제1 히든 레이어일 수 있고, 제2 레이어는 제2 히든 레이어일 수 있다. 제1 가중치 행렬(W1)은 제1 레이어와 제2 레이어 사이의 가중치 행렬이고, 제1 바이어스 벡터(B1)는 제1 레이어와 제2 레이어 사이의 바이어스 벡터이다. 제1 축소 행렬(X1')은 제1 레이어에서 본 발명에 따른 제어된 드롭아웃 적용 이후의 축소된 출력 행렬이고, 제2 축소 행렬(X2')은 제2 레이어에서 본 발명에 따른 제어된 드롭아웃 적용 이후의 축소된 출력 행렬이다. 제1 축소 가중치 행렬(W1')은 제1 레이어와 제2 레이어 사이의 축소된 가중치 행렬이고, 제1 축소 바이어스 벡터(B1')는 제1 레이어와 제2 레이어 사이의 축소된 바이어스 벡터이다.In Figure 3, the black box is a generic matrix element, and the white box represents the 0 element generated by the dropout. The first matrix X1 is the output matrix after the controlled dropout application according to the invention in the first layer and the second matrix X2 is the output matrix in the second layer in the controlled dropout application Represents the output matrix after that. For example, the first layer may be a first hidden layer, and the second layer may be a second hidden layer. The first weight matrix W1 is a weight matrix between the first layer and the second layer, and the first bias vector B1 is a bias vector between the first layer and the second layer. The first reduction matrix X1 'is a reduced output matrix after the controlled dropout application according to the invention in the first layer and the second reduction matrix X2' is the reduced output matrix in the second layer, Lt; RTI ID = 0.0 > out-applied. &Lt; / RTI > The first reduced weight matrix W1 'is a reduced weight matrix between the first layer and the second layer, and the first reduced bias vector B1' is a reduced bias vector between the first layer and the second layer.

예시적인 실시예에 따르면, 인공 신경망 모델의 학습 단계에서는 인공 신경망 모델이 전방향 전파(forward propagation), 역전파(backpropagation, 또는 back propagation) 및 매개변수(parameter) 갱신으로 이루어진 하나의 반복(iteration)을 수행할 수 있다. 본 발명의 예시적인 실시예에 따른 제어된 드롭아웃 방법은 그와 같은 반복 수행을 통한 학습을 수행하기 전에, 레이어에서 드롭아웃 될 열 또는 행을 미리 결정할 수 있다. 예시적인 실시예에 따르면, 도 3에 도시된 바와 같이, 제어된 드롭아웃 방법은 예를 들어 제1 레이어의 제1 행렬(X1)에서는 {2, 3, 6}열이 드롭되는 것으로 결정되고, 제2 레이어의 제2 행렬(X2)에서는 {1, 3, 5}열이 드롭된다는 정보를 미리 결정할 수 있다. According to an exemplary embodiment, in the learning phase of the artificial neural network model, the artificial neural network model includes one iteration consisting of forward propagation, back propagation, or back propagation, and parameter update, Can be performed. A controlled dropout method in accordance with an exemplary embodiment of the present invention may predetermine a column or row to be dropped out at a layer prior to performing the learning through such iterative execution. According to an exemplary embodiment, as shown in FIG. 3, the controlled dropout method is determined to drop {2, 3, 6} columns in the first matrix X1 of the first layer, for example, In the second matrix X2 of the second layer, information that the {1, 3, 5} column is dropped can be determined in advance.

이 경우, 제어된 드롭아웃 방법(또는, 인공 신경망 모델)은 드롭 정보에 기초하여 제1 및 제2 행렬들(X1, X2)에서 드롭될 열들을 제거함으로써, 제1 및 제2 행렬들(X1, X2)의 크기를 줄일 수 있다. 예를 들어, 드롭아웃 비율이 0.5인 경우, 도시된 것과 같이 제1 행렬(X1) 및 제2 행렬(X2) 각각은 6x6 행렬에서 6x3 행렬로 축소될 수 있다. In this case, the controlled dropout method (or the artificial neural network model) removes the columns to be dropped in the first and second matrices X1, X2 based on the drop information, , X2) can be reduced. For example, if the dropout ratio is 0.5, each of the first matrix X1 and the second matrix X2 may be reduced from a 6x6 matrix to a 6x3 matrix as shown.

유사하게, 제어된 드롭아웃 방법은 미리 설정된 드롭아웃될 대상에 관한 정보에 기초하여 가중치(weight)와 바이어스(bias)와 같은 매개변수(parameter)에 대한 행렬도 축소시킬 수 있다. Similarly, the controlled dropout method may also reduce the matrix for parameters such as weight and bias based on information about the object to be pre-established to be dropped out.

제1 가중치 행렬(W1)과 제1 행렬(X1)을 내적(dot product) 연산을 할 수 있다. 그 경우, 제1 행렬(X1)의 드롭되는 대상은 제1 가중치 행렬(W1)의 대응되는 행렬 요소와 곱해진다. 도 3을 참조하면, 예컨대 제1 행렬(X1)의 드롭되는 대상인 {2, 3, 6}열은 제1 가중치 행렬(W1)의 {2, 3, 6}행들과 각각 곱해질 것이다. 따라서 제1 행렬(X1)의 드롭되는 {2, 3, 6}열들에 대응되는 제1 가중치 행렬(W1)의 {2, 3, 6}행들은 의미가 없으므로 제거할 수 있다. 유사하게, 제2 행렬(X2)에서는 예컨대 {1, 3, 5}열들이 드롭되는 대상이므로, 제1 가중치 행렬(W1)의 {1, 3, 5}열들 또한 무의미하며, 따라서 이들 세 개의 열도 제거될 수 있다. It is possible to perform a dot product operation on the first weight matrix W1 and the first matrix X1. In this case, the object to be dropped of the first matrix X1 is multiplied by the corresponding matrix element of the first weighting matrix W1. Referring to FIG. 3, for example, {2, 3, 6} columns to be dropped in the first matrix X1 will be multiplied with {2, 3, 6} rows of the first weight matrix W1, respectively. Therefore, the {2, 3, 6} rows of the first weighting matrix W1 corresponding to the {2, 3, 6} columns of the first matrix X1 are meaningless and can be removed. Similarly, in the second matrix X2, {1, 3, 5} columns of the first weighting matrix W1 are also meaningless, for example, {1, 3, 5} columns are objects to be dropped, Can be removed.

또한, 제1 바이어스 벡터(bias vector)(B1)의 {1, 3, 5} 행렬 요소(일반적으로는, 제1 바이어스 행렬의 {1, 3, 5}열들)는 제2 행렬(X2)의 드롭대상인 {1, 3, 5}열에 대응하므로, 제1 바이어스 벡터(B1)의 {1, 3, 5} 요소들도 제거될 수 있다. In addition, the {1, 3, 5} matrix elements of the first bias vector B1 (generally the {1, 3, 5} columns of the first bias matrix) The {1, 3, 5} elements of the first bias vector B1 can also be removed, since they correspond to the {1, 3, 5} columns of the drop target.

결국, 제1 가중치 벡터(W1)는 6x6 행렬에서 3x3 행렬로 축소되고, 제1 바이어스 벡터(B1)는 1x6의 크기에서 1x3의 크기로 축소될 수 있다. As a result, the first weight vector W1 is reduced from the 6x6 matrix to the 3x3 matrix, and the first bias vector B1 can be reduced from 1x6 to 1x3.

이처럼, 본 발명에의 예시적인 실시예에 따른 제어된 드롭아웃 방법은, 반복 학습의 수행 전에, 인공 신경망의 각 레이어의 행렬에 대하여 드롭아웃 비율에 따라 행 및/또는 열을 기준으로 드롭아웃 대상을 미리 결정함으로써, 각 레이어의 행렬을 크기가 감소된 잘 조직화된 행렬로 축소시켜주고, 그와 연관된 가중치 행렬과 바이어스 행렬의 크기까지 축소시켜줄 수 있다. 그와 같은 잘 조직화된 행렬 크기의 축소를 통해, 불필요한 행렬 계산을 수행하지 않아도 되고 축소된 크기의 행렬 간의 계산만 수행하면 된다. 도 3은 이해를 돕기 위한 간단한 예를 설명하였으나, 본 발명의 제어된 드롭아웃 방법은 이에 한정되지 않고 모든 레이어들에 적용될 수 있다.As described above, in the controlled dropout method according to the exemplary embodiment of the present invention, before performing the iterative learning, the matrix of each layer of the artificial neural network is subjected to the dropout object The matrix of each layer can be reduced to a well-organized matrix with a reduced size and reduced to the size of the weight matrix and the bias matrix associated therewith. Through such a well-organized reduction of the matrix size, unnecessary matrix computation is not required and only the computation between the reduced-size matrix is performed. Although FIG. 3 illustrates a simple example for facilitating understanding, the controlled dropout method of the present invention is not limited to this and can be applied to all layers.

다음으로, 도 4는 본 발명의 실시예들에 따른 드롭아웃 방법을 이용하는 인공 신경망 모델의 일예를 나타내는 도면이다.4 is a diagram illustrating an example of an artificial neural network model using a dropout method according to embodiments of the present invention.

도 4는 드롭아웃 적용 전의 모델(a), 기존의 일반적인 드롭아웃 방법을 적용한 모델(b)과 본 발명의 실시예들에 따른 제어된 드롭아웃 방법적용 모델(c)을 피드-포워드(feed-forward) 인공 신경망을 사용하여 예시한다.FIG. 4 is a flow chart showing a model (a) before the dropout application, a model (b) applying the conventional general dropout method and a controlled dropout method application model (c) according to the embodiments of the present invention to a feed- forward artificial neural networks.

도 4를 참조하면, 제1 모델(310)은 드롭아웃 적용 전의 인공 신경망 모델을 나타내고, 예를 들어, 도 2를 참조하여 설명한 제1 비교 행렬(Y1)이 그 예가 될 수 있다. 제2 모델(320)은 제1 모델(310)에 기존의 일반적인 드롭아웃 방법을 적용한 인공 신경망 모델을 나타내고, 예를 들어, 도 2를 참조하여 설명한 제2 비교 행렬(Y2)이 그 예가 될 수 있다. 제3 모델(330)은 제1 모델(310)에 본 발명의 실시예들에 따른 제어된 드롭아웃 방법을 적용한 인공 신경망 모델을 나타내고, 도 2를 참조하여 설명한 제3 행렬(Y3)을 이용할 수 있다.Referring to FIG. 4, the first model 310 represents an artificial neural network model before dropout application. For example, the first comparison matrix Y1 described with reference to FIG. 2 may be an example. The second model 320 represents an artificial neural network model in which a conventional dropout method is applied to the first model 310. For example, the second comparison matrix Y2 described with reference to FIG. 2 may be an example have. The third model 330 represents an artificial neural network model to which the controlled dropout method according to the embodiments of the present invention is applied to the first model 310 and can use the third matrix Y3 described with reference to FIG. have.

드롭아웃을 적용하기 전의 모델이 제1 모델(310)과 같다면, 기존의 일반적인 드롭아웃 방법을 적용하면 제2 모델(320)처럼 각 레이어마다 비활성화되는 뉴런들이 그대로 남아있을 수 있다. 예를 들어, 드롭아웃을 히든 레이어인 제1 내지 제3 레이어들(L1, L2, L3)에 대해서 적용하는 경우, 그 제1 내지 제3 레이어들(L1, L2, L3)에서 4개의 뉴런들 중 2개의 뉴런들이 비활성화된 상태로 유지될 수 있다.If the model before applying the dropout is the same as the first model 310, neurons deactivated for each layer may remain as in the second model 320 by applying the conventional general dropout method. For example, when the dropout is applied to the first to third layers L1, L2, and L3, which are hidden layers, four neurons (L1, L2, and L3) Two of the neurons can be kept in an inactive state.

이와 달리, 본 발명에 따른 제어된 드롭아웃 방법을 히든 레이어들(L1', L2', L3')에 적용하여 각 히든 레이어마다 행렬을 축소시키면, 제3 모델(330)에서와 같이 각 히든 레이어(L1', L2', L3')마다 비활성화되는 뉴런들은 탈락시켜 축소된 모델을 얻을 수 있다. 도 4를 참조하여 설명한 바와 같이, 제어된 드롭아웃 방법은 열 단위로 비활성화시킬 행렬 요소들(또는, 활성화시킬 행렬 요소들)을 결정함으로써 행렬들을 축소시킬 수 있기 때문이다.Alternatively, if the matrix is reduced for each hidden layer by applying the controlled dropout method according to the present invention to the hidden layers L1 ', L2', and L3 ', as in the third model 330, The neurons deactivated per L1 ', L2', and L3 'are eliminated and a reduced model can be obtained. As described with reference to FIG. 4, the controlled dropout method can reduce the matrices by determining matrix elements (or matrix elements to be activated) to be deactivated on a column-by-column basis.

도 3 및 도 4에 도시된 바와 같이, 각 레이어를 구성하는 행렬들, 가중치 행렬, 바이어스 벡터의 크기는 드롭아웃 비율에 따라 축소될 수 있다. 예를 들어, 행렬들의 크기는 "1 드롭아웃비율" 수준으로 축소될 수 있다. 제어된 드롭아웃 방법이 모든 레이어들에 적용되는 경우, 그 제어된 드롭아웃 방법을 적용한 인공 신경망 모델은 일반적인 인공 신경망 모델에 비해 그 크기가 현저하게 축소될 수 있다. 그에 따라, 축소된 인공 신경망 모델이 사용하는 메모리 사용률이 현저히 낮아질 수 있고, 행렬 크기의 축소에 따라 행렬 계산에 소요되는 시간 또한 크게 감소될 수 있다.As shown in FIGS. 3 and 4, the sizes of the matrices, the weighting matrix, and the bias vector constituting each layer can be reduced according to the dropout ratio. For example, the size of the matrices may be reduced to a "one dropout rate" level. When the controlled dropout method is applied to all layers, the artificial neural network model to which the controlled dropout method is applied can be significantly reduced in size compared with a general artificial neural network model. Accordingly, the memory utilization rate used by the reduced artificial neural network model can be significantly lowered, and the time required for the matrix calculation can be greatly reduced as the matrix size is reduced.

도 4에서 인공 신경망 모델은 피드-포워드(feed-forward) 인공 신경망인 것으로 도시되어 있으나, 이는 예시적인 것으로, 본 발명이 이에 한정되는 것은 아니다. 예를 들어, 본 발명의 실시예들에 따른 드롭아웃 방법은 피드-포워드 인공 신경망 이외의 다른 신경망 모델들에 적용될 수 있다.In FIG. 4, the artificial neural network model is shown as being a feed-forward artificial neural network, but this is merely exemplary and the present invention is not limited thereto. For example, the dropout method according to embodiments of the present invention can be applied to other neural network models other than the feed-forward artificial neural network.

도 5는 위에서 설명한 본 발명의 예시적인 실시예들에 따른 드롭아웃 방법을 적용한 인공 신경망 모델 학습 방법을 나타내는 순서도이다.FIG. 5 is a flowchart illustrating an artificial neural network model learning method to which the dropout method according to the exemplary embodiments of the present invention described above is applied.

도 5의 인공 신경망 모델 학습 방법은 인공 신경망 모델에서 수행될 수 있다. 인공 신경망 모델은 도 5의 방법을 이용하여 학습을 수행할 수 있다. 여기서, 인공 신경망 모델은 복수의 레이어들(또는, 히든 레이어들)을 포함할 수 있다.The artificial neural network model learning method of FIG. 5 can be performed in the artificial neural network model. The artificial neural network model can perform learning using the method of FIG. Here, the artificial neural network model may include a plurality of layers (or hidden layers).

도 4 및 5를 참조하면, 도 5의 학습 방법은 인공 신경망 모델(예를 들어, 일반적인 인공 신경망 모델)의 각 레이어의 행렬에서 활성화될 열을 드롭아웃 비율에 기초하여 확률적으로 선택할 수 있다(S410). 예를 들어, 인공 신경망 모델이 2개의 히든 레이어를 포함하는 경우, 도 5의 방법은 제1 히든 레이어의 행렬에서 활성화될 열을 선택하고, 제2 히든 레이어의 행렬에서 활성화될 열을 선택할 수 있다. 이처럼, 도 5의 방법은 인공 신경망 모델의 레이어들 각각의 행렬에서 활성화되는 열을 선택할 수 있다.Referring to FIGS. 4 and 5, the learning method of FIG. 5 can stochastically select a column to be activated in a matrix of each layer of an artificial neural network model (for example, a general artificial neural network model) based on the dropout rate S410). For example, if the artificial neural network model includes two hidden layers, the method of FIG. 5 may select the columns to be activated in the matrix of the first hidden layer and select the columns to be activated in the matrix of the second hidden layer . Thus, the method of FIG. 5 can select the columns to be activated in the matrix of each of the layers of the artificial neural network model.

S410 단계에서, 활성화 된 대상을 열 단위가 아니라 행 단위로 선택할 수 있음은 물론이다. 또한, 도 5의 학습방법에 있어서, 인공 신경망 모델의 레이어의 행렬에서 비활성화될 열을 드롭아웃 비율에 기초하여 확률적으로 선택할 수도 있다.It goes without saying that, in step S410, the activated object can be selected in row units instead of column units. In addition, in the learning method of FIG. 5, the column to be inactivated in the matrix of the layer of the artificial neural network model may be stochastically selected based on the dropout rate.

도 5의 학습 방법은 S410 단계에서 선택된 활성화될 열만을 이용하여 축소된 인공 신경망 모델을 구성할 수 있다(S420).The learning method of FIG. 5 may construct a reduced artificial neural network model using only the columns to be activated selected in operation S410 (S420).

도 3을 참조하여 설명한 바와 같이, 도 5의 방법은 각 레이어의 행렬에서 드롭아웃될 대상으로 결정된 열(즉, 비활성화되는 열)을 제외한 나머지 열(즉, 활성화되는 열)로써 축소된 행렬을 구성할 수 있다. 이 축소된 행렬에 대응하는 신경망 모델의 예로는 예컨대 도 4의 제3 신경망 모델(330)을 들 수 있다.As described with reference to FIG. 3, the method of FIG. 5 constructs a matrix reduced by the remaining columns (i.e., the activated columns) except for the columns determined to be dropped out in the matrix of each layer can do. An example of a neural network model corresponding to this reduced matrix is the third neural network model 330 of FIG. 4, for example.

유사하게, 도 5의 학습 방법은 도 3을 참조하여 설명한 매개변수(예를 들어, 해당 레이어에 대응하는 가중치 행렬과 바이어스 벡터)를 축소된 행렬에 기초하여 축소시킬 수 있다.Similarly, the learning method of FIG. 5 may reduce the parameters described with reference to FIG. 3 (e.g., the weighting matrix and the bias vector corresponding to the layer) based on the reduced matrix.

결과적으로, 도 4를 참조하여 설명한 제3 모델(330)과 같은 축소된 인공 신경망 모델을 구성할 수 있다.As a result, a reduced artificial neural network model such as the third model 330 described with reference to FIG. 4 can be constructed.

이후, 도 5의 방법은 축소된 인공 신경망 모델에서 전방 전파(forward propagation) 및 역전파(back propagation)를 진행할 수 있다(S430, S440). 예를 들어, 도 5의 방법은 축소된 인공 신경망 모델에서 전방 전파(forward propagation)를 진행하고, 역전파(back propagation)를 진행할 수 있다. 여기서, 역전파는 역방향으로 에러를 전파하는 것일 수 있다. 이를 통해, 축소된 인공 신경망 모델(또는, 인공 신경망 모델)은 최적의 학습 결과를 도출할 수 있다.Then, the method of FIG. 5 may perform forward propagation and back propagation in the reduced artificial neural network model (S430, S440). For example, the method of FIG. 5 can perform forward propagation and back propagation in a reduced artificial neural network model. Here, the reverse wave may propagate the error in the reverse direction. Thus, a reduced artificial neural network model (or an artificial neural network model) can derive optimal learning results.

도 5의 방법은 역전파 진행 과정에서 산출된 값(예를 들어, 에러)을 이용하여 레이어들에 대한 매개 변수를 갱신할 수 있다(S450). 예를 들어, 도 5의 방법은 인공 신경망 모델의 레이어들과 연관된 매개변수들인 가중치와 바이어스를 축소된 인공 신경망 모델의 원래의 자리에 갱신할 수 있다. 즉, 축소된 인공 신경망 모델을 이용하여 역전파를 진행하여 계산된 가중치 행렬과 바이어스 벡터값을 원래의 인공 신경망 모델의 가중치 행렬과 바이어스 벡터에서의 원래의 자리에 갱신할 수 있다. 예를 들어, 도 5의 방법은 일반적인 인공 신경망 모델(즉, 축소되지 않는 인공 신경망 모델)의 가중치와 바이어스를 갱신할 수 있다.The method of FIG. 5 may update the parameters for the layers using a value (e.g., error) calculated in the back propagation process (S450). For example, the method of FIG. 5 may update the weights and biases, which are parameters associated with the layers of the artificial neural network model, to the original positions of the reduced artificial neural network model. That is, the backward propagation is performed using the reduced artificial neural network model, and the calculated weight matrix and the bias vector value can be updated to the weight matrix of the original neural network model and the original position in the bias vector. For example, the method of FIG. 5 may update the weights and biases of a general artificial neural network model (i.e., a non-collapsed artificial neural network model).

도 5의 방법은 인공 신경망 모델의 학습이 더 필요한지 여부를 판단하고(S460), 학습이 종료될 때까지 S410 내지 S450의 단계들을 반복적으로 수행할 수 있다.The method of FIG. 5 may determine whether further learning of the artificial neural network model is needed (S460) and may repeat steps S410 through S450 until the learning ends.

상술한 바와 같이, 도 5의 방법은 축소된 인공 신경망 모델을 구성하므로, 메모리 사용률을 감소시키고, 학습 속도를 향상시킬 수 있다. 예를 들어, 도 5의 방법은 축소된 인공 신경망 모델을 이용함에 따라, 역전파 진행시 각 레이어에서 여러 값들(예를 들어, 편미분 값들)을 비교적 적은 메모리를 이용하여 빠른 속도로 계산할 수 있다. 즉, 인공 신경망 모델의 메모리 사용률이 감소되고, 학습 속도가 향상될 수 있다.As described above, since the method of FIG. 5 constructs a reduced artificial neural network model, it is possible to reduce the memory usage rate and improve the learning speed. For example, using the reduced neural network model, the method of FIG. 5 can calculate various values (for example, partial derivative values) at each layer at a high speed using a relatively small memory in the course of back propagation. That is, the memory utilization rate of the artificial neural network model can be reduced and the learning speed can be improved.

이상, 본 발명의 실시예들에 따른 인공 신경망 모델에서 메모리 효율성 및 학습 속도 향상을 위한 드롭아웃 방법에 대하여 도면을 참조하여 설명하였지만, 상기 설명은 예시적인 것으로서 본 발명의 기술적 사상을 벗어나지 않는 범위에서 해당 기술 분야에서 통상의 지식을 가진 자에 의하여 수정 및 변경될 수 있을 것이다. Although the memory efficiency and the dropout method for improving the learning speed in the artificial neural network model according to the embodiments of the present invention have been described with reference to the drawings, the above description is illustrative and not restrictive within the scope of the present invention. And may be modified and changed by those skilled in the art.

본 발명의 실시예들에 따른 인공 신경망 모델에서 메모리 효율성 및 학습 속도 향상을 위한 드롭아웃 방법과 이를 이용한 인공 신경망 학습 방법은 다양한 신경망 모델에 적용될 수 있다. 최근 주목받고 있는 딥 러닝 등에도 다양하게 이용될 수 있다.In the artificial neural network model according to the embodiments of the present invention, a dropout method for improving memory efficiency and learning speed and an artificial neural network learning method using the method are applicable to various neural network models. It can be used variously for deep running which has been attracting attention recently.

310: 제1 모델 320: 제2 모델
330: 제3 모델310: first model 320: second model
330: Third Model

Claims

레이어를 포함하는 인공 신경망 모델에서,
드롭아웃 비율(dropout rate)에 따라 열 단위 또는 행 단위로 상기 레이어의 행렬에서 활성화될 행렬 요소들을 선택하는 단계; 및
상기 행렬에서 비활성화된 행렬 요소들을 열 단위 또는 행 단위로 탈락시켜, 상기 행렬을 선택된 열 또는 행의 활성화될 행렬 요소들로만 구성되는 축소된 크기의 행렬로 변환하여 축소된 인공 신경망 모델을 구성하는 단계를 포함하는 것을 특징으로 하는 제어된 드롭아웃 방법.In an artificial neural network model containing layers,
Selecting matrix elements to be activated in a matrix of the layer in units of columns or rows according to a dropout rate; And
Removing the matrix elements deactivated in the matrix on a column or row basis and transforming the matrix into a reduced size matrix consisting only of matrix elements to be activated in the selected column or row to construct a reduced artificial neural network model Gt; a < / RTI > controlled dropout method.

제1항에 있어서, 상기 선택하는 단계는, 상기 인공 신경망 모델이 복수의 레이어들을 포함하는 경우, 상기 복수의 레이어들 각각의 행렬에서 상기 활성화될 행렬 요소들을 상기 드롭아웃 비율(dropout rate)에 따라 열 단위 또는 행 단위로 선택하는 단계를 포함하는 것을 특징으로 하는 제어된 드롭아웃 방법.2. The method of claim 1, wherein, in the case where the artificial neural network model includes a plurality of layers, the step of selecting includes, in the matrix of each of the plurality of layers, the matrix elements to be activated according to the dropout rate Wherein the step of selecting includes a step of selecting in units of columns or rows.

제2항에 있어서, 상기 복수의 레이어들은 상기 인공 신경망 모델의 입력 레이어와 출력 레이어 사이의 복수의 히든 레이어를 포함하는 것을 특징으로 하는 제어된 드롭아웃 방법.3. The method of claim 2, wherein the plurality of layers comprises a plurality of hidden layers between an input layer and an output layer of the artificial neural network model.

제2항에 있어서, 상기 복수의 레이어들과 연관된 매개변수들을 상기 축소된 크기의 행렬에 기초하여 축소시키는 매개변수 축소 단계를 더 포함하는 것을 특징으로 하는 제어된 드롭아웃 방법.3. The controlled dropout method of claim 2, further comprising a parameter reduction step of reducing parameters associated with the plurality of layers based on the matrix of reduced size.

제4항에 있어서, 상기 매개변수 축소 단계는, 인접하는 제1 및 제2 레이어 사이의 가중치 행렬에서, 상기 제1 레이어의 행렬의 활성화될 열 또는 행과 곱해지는 상기 가중치 행렬의 행 또는 열만 남기고 나머지 행 또는 열은 제거하여 상기 가중치 행렬의 크기를 축소하는 단계를 포함하는 것을 특징으로 하는 제어된 드롭아웃 방법.5. The method of claim 4, wherein the step of reducing parameters leaves only a row or a column of the weight matrix multiplied by a column or a row to be activated of the matrix of the first layer in a weighting matrix between adjacent first and second layers And removing the remaining rows or columns to reduce the size of the weight matrix.

제4항에 있어서, 상기 매개변수 축소 단계는, 인접하는 제1 및 제2 레이어 사이의 바이어스 벡터에서, 상기 제2 레이어의 행렬의 활성화될 열 또는 행에 대응하는 상기 바이어스 벡터의 열 또는 행만 남기고 나머지 행 또는 열은 제거하여 상기 바이어스 벡터의 크기를 축소하는 단계를 포함하는 것을 특징으로 하는 제어된 드롭아웃 방법.5. The method of claim 4, wherein the step of reducing the parameter is performed by leaving only the column or the row of the bias vector corresponding to the column or row to be activated in the matrix of the second layer in the bias vector between the adjacent first and second layers And removing the remaining rows or columns to reduce the size of the bias vector.

제1항에 있어서, 상기 선택하는 단계에서, 상기 활성화될 행렬 요소들의 열 또는 행은 상기 드롭아웃 비율에 기초하여 확률적으로 결정되는 것을 특징으로 하는 제어된 드롭아웃 방법.2. The method of claim 1, wherein in the selecting step, the column or row of matrix elements to be activated is stochastically determined based on the dropout ratio.

제1항에 있어서, 상기 인공 신경망 모델에 제공되는 각 데이터의 구성단위가 행렬의 열인 경우에는 상기 활성화될 행렬 요소들의 단위는 행이며, 상기 각 데이터의 구성단위가 행렬의 행인 경우에는 상기 활성화될 행렬 요소들의 단위는 열인 것을 특징으로 하는 제어된 드롭아웃 방법.2. The method of claim 1, wherein if the constituent unit of each data provided to the artificial neural network model is a column of a matrix, the unit of the matrix elements to be activated is a row, and if the constituent unit of each data is a matrix row, Wherein the unit of matrix elements is a column.

레이어를 포함하는 인공 신경망 모델에서,
드롭아웃 비율(dropout rate)에 따라 열 단위 또는 행 단위로 상기 레이어의 행렬에서 활성화될 행렬 요소들을 선택하는 단계;
상기 행렬에서 비활성화된 행렬 요소들을 열 단위 또는 행 단위로 탈락시켜, 상기 행렬을 선택된 열 또는 행의 활성화될 행렬 요소들로만 구성되는 축소된 크기의 행렬로 변환하여 축소된 인공 신경망 모델을 구성하는 단계; 그리고
상기 축소된 인공 신경망 모델로 상기 인공 신경망 모델을 학습하는 단계를 포함하는 것을 특징으로 하는 제어된 드롭아웃 방법을 이용한 인공 신경망 모델 학습 방법.In an artificial neural network model containing layers,
Selecting matrix elements to be activated in a matrix of the layer in units of columns or rows according to a dropout rate;
Constructing a reduced artificial neural network model by dropping matrix elements deactivated in the matrix on a column or row basis and converting the matrix into a matrix of reduced size composed only of matrix elements to be activated in the selected column or row; And
And learning the artificial neural network model with the reduced artificial neural network model.

제9항에 있어서, 상기 학습하는 단계는, 이전 단계에서 형성한 축소된 신경망 모델을 이용하여 전방 전파(forward propagation)를 진행하여 신경망 모델 학습을 수행하는 전방 전파 수행 단계; 그리고 이전 단계에서 형성한 축소된 신경망 모델을 이용하여 역 전파(back propagation)를 통해 신경망 모델 학습을 수행하는 역 전파 수행 단계를 포함하는 것을 특징으로 하는 제어된 드롭아웃 방법을 이용한 인공 신경망 모델 학습 방법.The method of claim 9, wherein the learning step comprises: a forward propagation step of performing neural network model learning by forward propagation using a reduced neural network model formed in a previous step; And a back propagation step of performing neural network model learning through back propagation using the reduced neural network model formed in the previous step. The neural network model learning method using the controlled dropout method .

제9항 또는 제10항에 있어서, 상기 선택하는 단계는, 상기 인공 신경망 모델이 복수의 레이어들을 포함하는 경우, 상기 복수의 레이어들 각각의 행렬에서 상기 활성화될 행렬 요소들을 상기 드롭아웃 비율(dropout rate)에 따라 열 단위 또는 행 단위로 선택하는 단계를 포함하며, 상기 복수의 레이어들은 상기 인공 신경망 모델의 입력 레이어와 복수의 히든 레이어 중 적어도 어느 한 가지를 포함하는 것을 특징으로 하는 제어된 드롭아웃 방법을 이용한 인공 신경망 모델 학습 방법.11. The method of claim 9 or 10, wherein the selecting comprises: if the artificial neural network model comprises a plurality of layers, selecting the matrix elements to be activated from the matrix of each of the plurality of layers with the dropout ratio wherein the plurality of layers comprises at least one of an input layer of the artificial neural network model and a plurality of hidden layers, Artificial Neural Network Model Learning Method Using.

제9항 또는 제10항에 있어서, 상기 학습하는 단계 이전에, 상기 복수의 레이어들과 연관된 매개변수들을 상기 축소된 크기의 행렬에 대응하도록 축소시키는 매개변수 축소 단계를 더 포함하며, 상기 매개변수들은 가중치 행렬과 바이어스 행렬을 포함하는 것을 특징으로 하는 제어된 드롭아웃 방법을 이용한 인공 신경망 모델 학습 방법.11. The method of claim 9 or 10, further comprising a parameter reduction step of reducing parameters associated with the plurality of layers to correspond to the matrix of reduced size before the learning step, Wherein the weighting matrix and the bias matrix include a weight matrix and a bias matrix.

제12항에 있어서, 상기 역 전파 수행 단계에서 산출된 값을 이용하여 상기 복수의 레이어들과 연관된 상기 매개변수들을 상기 축소된 인공 신경망 모델의 원래의 자리에 갱신하는 단계를 더 포함하는 것을 특징으로 하는 제어된 드롭아웃 방법을 이용한 인공 신경망 모델 학습 방법.13. The method of claim 12, further comprising updating the parameters associated with the plurality of layers to the original positions of the reduced artificial neural network model using the values calculated in the back propagation step Artificial Neural Network Model Learning Method Using Controlled Dropout Method.

제13항에 있어서, 상기 매개 변수를 갱신하는 단계는, 상기 축소된 인공 신경망 모델을 이용하여 역전파를 진행하여 계산된 가중치 행렬과 바이어스 벡터값을 원래의 상기 인공 신경망 모델의 가중치 행렬과 바이어스 벡터에서의 원래의 자리에 갱신하는 단계를 포함하는 것을 특징으로 하는 제어된 드롭아웃 방법을 이용한 인공 신경망 모델 학습 방법.14. The method of claim 13, wherein the updating of the parameter comprises: calculating weight matrix and bias vector values calculated by advancing back propagation using the reduced artificial neural network model to a weight matrix of the original artificial neural network model and a bias vector And updating the original position of the neural network model to the original position of the neural network model.