KR20200068106A

KR20200068106A - Method for compressing deep learning neural networks and apparatus for performing the same

Info

Publication number: KR20200068106A
Application number: KR1020180150181A
Authority: KR
Inventors: 강유; 김태범; 유재민
Original assignee: 서울대학교산학협력단
Priority date: 2018-11-28
Filing date: 2018-11-28
Publication date: 2020-06-15
Also published as: KR102199285B1

Abstract

The present invention relates to a method for compressing a deep learning neural network while minimizing accuracy loss. The method for compressing a deep learning neural network comprises the steps of: decomposing at least one tensor constituting a deep learning neural network so that the output of an original tensor approximates the output of the decomposed tensor as much as possible with respect to the same input; and providing random noise both to the original tensor and the deposed tensor as input and training the original tensor and the decomposed tensor through the output.

Description

딥러닝 신경망의 압축 방법 및 이를 수행하기 위한 장치 {METHOD FOR COMPRESSING DEEP LEARNING NEURAL NETWORKS AND APPARATUS FOR PERFORMING THE SAME}A method for compressing a deep learning neural network and an apparatus for performing the same {METHOD FOR COMPRESSING DEEP LEARNING NEURAL NETWORKS AND APPARATUS FOR PERFORMING THE SAME}

본 명세서에서 개시되는 실시예들은 딥러닝 신경망을 압축하기 위한 방법 및 장치에 관한 것이다.Embodiments disclosed herein relate to a method and apparatus for compressing a deep learning neural network.

딥러닝이 활용되는 분야가 다양해지고, 딥러닝에 사용되는 데이터의 양이 급격하게 증가함에 따라 점점 더 복잡한 구조의 딥러닝 신경망들이 개발되고 있다. 이에 따라 최근에 개발된 딥러닝 신경망들은 상당히 많은 양의 메모리를 사용한다.Deep learning neural networks with increasingly complex structures are being developed as the fields of deep learning are diversified and the amount of data used for deep learning is rapidly increasing. Accordingly, recently developed deep learning neural networks use a considerable amount of memory.

한편, 인공지능과 IoT 기술이 융합되는 등 모바일 플랫폼에서 딥러닝 신경망을 실행해야 할 필요성은 날이 갈수록 높아지고 있어, 딥러닝 신경망을 경량화하여 모바일 플랫폼에서도 구동될 수 있도록 해야 할 필요가 있다. 이와 같이, 딥러닝 신경망의 압축 기술에 대한 필요성이 높아지고 있다.Meanwhile, the necessity to execute a deep learning neural network on a mobile platform, such as artificial intelligence and IoT technology converging, is increasing day by day, so it is necessary to make the deep learning neural network lighter so that it can be operated on a mobile platform. As such, the need for a compression technique for deep learning neural networks is increasing.

딥러닝 신경망을 압축하기 위한 기술의 하나로서 텐서 분해(tensor decomposition)는 다양한 타입의 CNN(Convolutional Neural Network) 또는 RNN(Recurrent Neural Network)을 압축할 때 널리 사용되는 기법인데, 이 기법에 의할 경우 분해된 텐서는 원래의 텐서에 비해 정확도가 떨어지는 문제가 있다.As a technique for compressing deep learning neural networks, tensor decomposition is a technique widely used to compress various types of convolutional neural networks (CNNs) or recurrent neural networks (RNNs). The decomposed tensor has a problem that the accuracy is lower than the original tensor.

한편, 전술한 배경기술은 발명자가 본 발명의 도출을 위해 보유하고 있었거나, 본 발명의 도출 과정에서 습득한 기술 정보로서, 반드시 본 발명의 출원 전에 일반 공중에게 공개된 공지기술이라 할 수는 없다.On the other hand, the above-mentioned background technology is the technical information acquired by the inventor for the derivation of the present invention or acquired in the derivation process of the present invention, and is not necessarily a known technology disclosed to the general public before filing the present invention. .

본 명세서에서 개시되는 실시예들은, 정확도 손실을 최소화하면서 딥러닝 신경망을 압축하기 위한 방법 및 장치를 제공하고자 한다.Embodiments disclosed herein are intended to provide a method and apparatus for compressing a deep learning neural network while minimizing loss of accuracy.

상술한 기술적 과제를 달성하기 위한 기술적 수단으로서, 일 실시예에 따르면, 딥러닝 신경망 압축 방법은, 상기 딥러닝 신경망을 구성하는 적어도 하나의 텐서(tensor)를 분해하되, 동일한 입력에 대해서 원래의 텐서와 분해된 텐서의 출력이 가장 근사해지도록 상기 적어도 하나의 텐서를 분해하는 단계 및 상기 원래의 텐서 및 분해된 텐서 모두에 임의의 노이즈를 입력으로 주고, 이들의 출력을 통해 상기 원래의 텐서 및 분해된 텐서를 학습시키는 단계를 포함할 수 있다.As a technical means for achieving the above-described technical problem, according to an embodiment, the deep learning neural network compression method decomposes at least one tensor constituting the deep learning neural network, but the original tensor for the same input And the step of decomposing the at least one tensor so that the output of the decomposed tensor is the closest, and inputting arbitrary noise to both the original tensor and the decomposed tensor as inputs, and the original tensor and decomposed through their outputs. And training a tensor.

다른 실시예에 따르면, 딥러닝 신경망 압축 방법을 수행하기 위한 컴퓨터 프로그램으로서, 딥러닝 신경망 압축 방법은, 상기 딥러닝 신경망을 구성하는 적어도 하나의 텐서(tensor)를 분해하되, 동일한 입력에 대해서 원래의 텐서와 분해된 텐서의 출력이 가장 근사해지도록 상기 적어도 하나의 텐서를 분해하는 단계 및 상기 원래의 텐서 및 분해된 텐서 모두에 임의의 노이즈를 입력으로 주고, 이들의 출력을 통해 상기 원래의 텐서 및 분해된 텐서를 학습시키는 단계를 포함할 수 있다.According to another embodiment, as a computer program for performing a deep learning neural network compression method, the deep learning neural network compression method decomposes at least one tensor constituting the deep learning neural network, but is original for the same input. Decomposing the at least one tensor so that the outputs of the tensor and the decomposed tensor are closest, and inputting arbitrary noise to both the original tensor and the decomposed tensor as inputs, and through these outputs, the original tensor and decomposition It may include the step of learning the tensor.

또 다른 실시예에 따르면, 딥러닝 신경망 압축 방법을 수행하는 프로그램이 기록된 컴퓨터 판독 가능한 기록매체로서, 딥러닝 신경망 압축 방법은, 상기 딥러닝 신경망을 구성하는 적어도 하나의 텐서(tensor)를 분해하되, 동일한 입력에 대해서 원래의 텐서와 분해된 텐서의 출력이 가장 근사해지도록 상기 적어도 하나의 텐서를 분해하는 단계 및 상기 원래의 텐서 및 분해된 텐서 모두에 임의의 노이즈를 입력으로 주고, 이들의 출력을 통해 상기 원래의 텐서 및 분해된 텐서를 학습시키는 단계를 포함할 수 있다.According to another embodiment, as a computer-readable recording medium in which a program for performing a deep learning neural network compression method is recorded, the deep learning neural network compression method decomposes at least one tensor constituting the deep learning neural network, , Dissolving the at least one tensor so that the outputs of the original tensor and the disassembled tensor are closest to the same input, and inputting random noise to both the original tensor and the disassembled tensor as inputs, and outputting these. And learning the original tensor and the decomposed tensor.

또 다른 실시예에 따르면, 딥러닝 신경망 압축 장치는, 압축 대상이 되는 딥러닝 신경망을 수신하기 위한 입출력부, 상기 딥러닝 신경망을 압축하기 위한 프로그램이 저장되는 저장부 및 상기 프로그램을 실행함으로써 상기 딥러닝 신경망을 압축하는 제어부를 포함하며, 상기 제어부는, 상기 딥러닝 신경망을 구성하는 적어도 하나의 텐서(tensor)를 분해하되, 동일한 입력에 대해서 원래의 텐서와 분해된 텐서의 출력이 가장 근사해지도록 상기 적어도 하나의 텐서를 분해하고, 상기 원래의 텐서 및 분해된 텐서 모두에 임의의 노이즈을 입력으로 주고, 이들의 출력을 통해 상기 원래의 텐서 및 분해된 텐서를 학습시킬 수 있다.According to another embodiment, the deep learning neural network compression apparatus includes an input/output unit for receiving a deep learning neural network to be compressed, a storage unit storing a program for compressing the deep learning neural network, and the program to execute the dip. It includes a control unit for compressing a running neural network, and the control unit decomposes at least one tensor constituting the deep learning neural network, but the output of the original tensor and the decomposed tensor is closest to the same input. It is possible to decompose at least one tensor, input random noise to both the original tensor and the decomposed tensor, and train the original tensor and the decomposed tensor through their outputs.

전술한 과제 해결 수단 중 어느 하나에 의하면, 정확도 손실을 최소화하면서 효율적으로 딥러닝 신경망을 경량화할 수 있다.According to any one of the above-described problem solving means, the deep learning neural network can be efficiently reduced in weight while minimizing loss of accuracy.

딥러닝 신경망의 학습 데이터에 대한 성능과 실제 추론 시의 성능 간 차이를 줄일 수 있다.It is possible to reduce the difference between the performance of the deep learning neural network's training data and the performance in actual reasoning.

개시되는 실시예들에서 얻을 수 있는 효과는 이상에서 언급한 효과들로 제한되지 않으며, 언급하지 않은 또 다른 효과들은 아래의 기재로부터 개시되는 실시예들이 속하는 기술분야에서 통상의 지식을 가진 자에게 명확하게 이해될 수 있을 것이다.The effects obtained in the disclosed embodiments are not limited to the above-mentioned effects, and other effects not mentioned are obvious to those skilled in the art to which the embodiments disclosed from the following description belong. Can be understood.

도 1은 기존의 텐서 분해 기법 중 하나인 터커 분해를 수행한 결과 분해된 텐서와 원래의 텐서 간의 관계를 나타낸 도면이다.
도 2는 일 실시예에 따른 딥러닝 신경망 압축 장치를 도시한 도면이다.
도 3는 일 실시예에 따른 딥러닝 신경망 압축 방법의 압축을 수행하는 과정에서 분해된 텐서와 원래의 텐서 간의 관계를 나타낸 도면이다.
도 4는 일 실시예에 따른 딥러닝 신경망 압축 방법에서 여러 개의 레이어로 이루어진 하나의 큰 모델의 압축을 수행하는 구조를 설명하기 위한 도면이다.
도 5는 일 실시예에 따른 딥러닝 신경망 압축 알고리즘을 나타내는 표를 도시한 도면이다.
도 6은 일 실시예에 따른 딥러닝 신경망 압축 방법을 설명하기 위한 순서도이다.1 is a view showing a relationship between a decomposed tensor and an original tensor as a result of performing Tucker decomposition, which is one of the existing tensor decomposition techniques.
2 is a diagram illustrating a deep learning neural network compression apparatus according to an embodiment.
FIG. 3 is a diagram illustrating a relationship between a decomposed tensor and an original tensor in the process of compressing a deep learning neural network compression method according to an embodiment.
FIG. 4 is a diagram for explaining a structure for compressing one large model composed of several layers in a deep learning neural network compression method according to an embodiment.
5 is a table showing a deep learning neural network compression algorithm according to an embodiment.
6 is a flowchart illustrating a deep learning neural network compression method according to an embodiment.

아래에서는 첨부한 도면을 참조하여 다양한 실시예들을 상세히 설명한다. 아래에서 설명되는 실시예들은 여러 가지 상이한 형태로 변형되어 실시될 수도 있다. 실시예들의 특징을 보다 명확히 설명하기 위하여, 이하의 실시예들이 속하는 기술분야에서 통상의 지식을 가진 자에게 널리 알려져 있는 사항들에 관해서 자세한 설명은 생략하였다. 그리고, 도면에서 실시예들의 설명과 관계없는 부분은 생략하였으며, 명세서 전체를 통하여 유사한 부분에 대해서는 유사한 도면 부호를 붙였다.Hereinafter, various embodiments will be described in detail with reference to the accompanying drawings. The embodiments described below may be embodied in various different forms. In order to more clearly describe the features of the embodiments, detailed descriptions of the items well known to those of ordinary skill in the art to which the following embodiments pertain are omitted. In the drawings, parts irrelevant to the description of the embodiments are omitted, and like reference numerals are assigned to similar parts throughout the specification.

명세서 전체에서, 어떤 구성이 다른 구성과 "연결"되어 있다고 할 때, 이는 '직접적으로 연결'되어 있는 경우뿐 아니라, '그 중간에 다른 구성을 사이에 두고 연결'되어 있는 경우도 포함한다. 또한, 어떤 구성이 어떤 구성을 "포함"한다고 할 때, 이는 특별히 반대되는 기재가 없는 한, 그 외 다른 구성을 제외하는 것이 아니라 다른 구성들을 더 포함할 수도 있음을 의미한다.Throughout the specification, when a component is "connected" to another component, this includes not only a case of'directly connecting', but also a case of'connecting other components in between'. In addition, when a configuration is said to "include" a configuration, this means that, unless specifically stated otherwise, it may mean that other configurations may be included as well.

이하 첨부된 도면을 참고하여 실시예들을 상세히 설명하기로 한다.Hereinafter, exemplary embodiments will be described in detail with reference to the accompanying drawings.

본 명세서에서는 딥러닝 신경망을 압축하기 위한 새로운 기법으로서 ALCOM(Adversarial Layerwise Compression)을 제안하는데, ALCOM은 딥러닝 신경망을 레이어 단위(layerwise)로 압축하는 기법이며, 기존의 텐서 분해(tensor decomposition) 및 지식 추출(knowledge distillation)을 함께 적용하는 기법이다. 따라서, 이하에서는 먼저 텐서 분해 및 지식 추출에 대해서 설명하도록 한다.In this specification, a new technique for compressing a deep learning neural network is proposed, ALCOM (Adversarial Layerwise Compression). ALCOM is a technique for compressing a deep learning neural network in a layerwise manner, and existing tensor decomposition and knowledge It is a technique that applies knowledge distillation together. Therefore, hereinafter, the tensor decomposition and knowledge extraction will be described first.

(1) 텐서 분해(tensor decomposition)(1) tensor decomposition

텐서 분해는 다양한 타입의 CNN 및 RNN들을 압축하는데 널리 사용되는 효과적인 방법이다. 대부분의 신경망의 학습 파라미터(learnable parameter)들은 다차원 어레이(multi-dimensional arrays) 또는 텐서(tensor)로 표현되는데, 텐서 분해는 큰 텐서들을 작은 텐서들의 연산으로 표현함으로써 딥러닝 신경망을 압축한다. 즉, 텐서 분해란 N 차원의 텐서를 간단한 텐서들에 대한 연산으로 나타내는 것이며, 큰 텐서들에 대한 높은 품질의 로우-랭크 근사치들을 찾는 프로세스라고 볼 수 있다.Tensor decomposition is an effective method widely used to compress various types of CNNs and RNNs. The learning parameters of most neural networks are expressed as multi-dimensional arrays or tensors, and tensor decompression compresses deep learning neural networks by expressing large tensors as operations of small tensors. In other words, tensor decomposition is a process of finding high-quality, low-rank approximations for large tensors, representing N-dimensional tensors as operations on simple tensors.

터커 분해(tucker decomposition)는 대표적인 텐서 분해 알고리즘이다. 터커 분해는 형태(shape)가

인 N 차원 텐서

를 다음의 수학식 1과 같은 형태로 분해한다.Tucker decomposition is a typical tensor decomposition algorithm. Tucker disassembly has a shape

Phosphorus N dimensional tensor

Is decomposed into the following equation (1).

위 수학식 1은 다음의 수학식 2와 같이 간단하게 나타낼 수 있다.Equation 1 above may be simply expressed as Equation 2 below.

이때,

은 텐서와 행렬 사이의 n-모드 곱(n-mode product)이고,

는

의 형태를 갖는 코어 텐서(core tensor)이며,

는 행렬의 i번째 인수(factor)다. (예를 들어,

는 형태가

인 텐서를 생성함)At this time,

Is the n-mode product between the tensor and the matrix,

The

Is a core tensor having the form of,

Is the i-th factor of the matrix. (For example,

Has the form

Create an tensor)

상기 수학식 2를 통해 텐서를 분해한 결과는 도 1과 같이 나타낼 수 있다.The result of decomposing the tensor through Equation 2 may be expressed as shown in FIG. 1.

도 1은 터커 분해를 수행한 결과 분해된 텐서와 원래의 텐서 간의 관계를 나타낸 도면이다. 도 1을 참조하면, 3차원 텐서

는 하나의 코어 텐서

와 3개의 인수 행렬들 A, B 및 C로 분해됨을 알 수 있다.1 is a view showing a relationship between a tensor decomposed as a result of performing tucker decomposition and an original tensor. Referring to Figure 1, a three-dimensional tensor

Is one core tensor

And 3 factor matrices A, B and C.

터커 분해의 목적은 다음의 수학식 3을 만족하는

를 찾는 것이다.The purpose of Tucker decomposition is to satisfy Equation 3 below.

Is to find

이때,

는 텐서의 L2 놈(L2 norm)을 나타낸다.At this time,

Denotes the tensor L2 norm.

수학식 3에 따를 경우, 분해된 텐서

와 원래의 텐서

간 차이가 가장 작아지도록 분해를 할 수 있다. 즉, 터커 분해를 수행할 경우 원래의 텐서

와 가장 근사한 분해된 텐서

를 찾을 수 있다.Decomposed tensor according to equation (3)

And the original tensor

Decomposition can be performed to minimize the difference in liver. That is, when performing tucker disassembly, the original tensor

And the coolest disassembled tensor

Can be found.

(2) 지식 추출(knowledge distillation)(2) Knowledge distillation

지식 추출이란 큰 신경망의 지식을 작은 신경망으로 효과적으로 전달하는 기술을 의미한다. 지식 추출 기법에 따르면, 큰 신경망을 선생 모델(teacher model), 작은 신경망을 학생 모델(student model)이라고 하고, 선생 모델에 학습 데이터(training data)를 주고, 그것의 예측을 통해 학생 모델(student model)을 학습시킨다. 지식 추출은 학습 데이터의 양이 제한적이거나 학생 모델의 크기가 작을 때 잘 작동한다.Knowledge extraction refers to a technique that effectively transfers knowledge from a large neural network to a small neural network. According to the knowledge extraction technique, a large neural network is called a teacher model, and a small neural network is called a student model, and training data is given to the teacher model, and a student model is obtained through its prediction. ). Knowledge extraction works well when the amount of learning data is limited or the student model is small.

지식 추출을 수행하는 과정에 대해서 설명하면 다음과 같다.The process of performing knowledge extraction is as follows.

큰 신경망

및 작은 신경망

가 주어진다고 가정하면, 학습 데이터를

에 주입하고, 학습 데이터의 실제 분포(distribution) 대신에

로부터 예측된 분포에 기초하여

를 학습시킨다. 그 결과,

는 원-핫 벡터(one-hot vectors)보다는 소프트 분포(soft distribution)에 의해 학습되며,

이 이미 학습한 레이블들 간 최신 관계를 학습할 수 있다.Large neural network

And small neural networks

Assuming that is given,

And then instead of the actual distribution of the training data

Based on the distribution predicted from

To learn. As a result,

Is learned by soft distribution rather than one-hot vectors,

You can learn the latest relationship between these already learned labels.

는 다음 수학식 4로 표현되는 크로스-엔트로피 손실(cross-entropy loss)

을 최소화하도록

에 의해 학습될 수 있다.

Is the cross-entropy loss expressed by the following equation (4)

To minimize

Can be learned by.

이때,

및

은 각각

및

로부터 예측된 확률을 의미하며, 다음과 같이 온도(temperature)

를 통해 수학식 5로 나타낼 수 있다.At this time,

And

Each

And

Means the predicted probability from

It can be represented by Equation (5).

이때,

는 i번째 레이블에 대한 로짓(logit)이며,

는 보통 1보다 큰 값으로 설정된다. 따라서,

는 소프트맥스 스코어(softmax score)를 스무스하게 해주며

가

의 지식을 더 정확하게 학습할 수 있도록 한다.At this time,

Is the logit for the i-th label,

Is usually set to a value greater than 1. therefore,

Makes the softmax score smooth

end

To learn more accurately.

이하에서는 먼저 도 2를 참조하여 ALCOM을 수행하기 위한 딥러닝 신경망 압축 장치의 구성들에 대해서 설명한 후, ALCOM의 알고리즘에 대해서 설명한다.Hereinafter, configurations of the deep learning neural network compression apparatus for performing ALCOM will be described with reference to FIG. 2, and then the algorithm of ALCOM will be described.

도 2는 일 실시예에 따른 딥러닝 신경망 압축 장치를 도시한 도면이다. 도 2를 참조하면, 일 실시예에 따른 딥러닝 신경망 압축 장치(100)는 입출력부(110), 제어부(120) 및 저장부(130)를 포함할 수 있다. 일 실시예에 따른 딥러닝 신경망 압축 장치(100)는 PC, 노트북 등과 같이 데이터 연산이 가능한 다양한 전자장치일 수 있다.2 is a diagram illustrating a deep learning neural network compression apparatus according to an embodiment. Referring to FIG. 2, the deep learning neural network compression apparatus 100 according to an embodiment may include an input/output unit 110, a control unit 120, and a storage unit 130. The deep learning neural network compression apparatus 100 according to an embodiment may be various electronic devices capable of data calculation, such as a PC and a laptop.

입출력부(110)는 데이터, 프로그램 및 사용자 입력 등을 수신하고, 사용자의 입력에 따라 데이터를 연산 처리한 결과를 출력하기 위한 구성이다. 일 실시예에 따르면 입출력부(110)는 외부의 장치나 네트워크로부터 딥러닝 신경망을 수신하고, 사용자로부터 딥러닝 신경망의 압축을 요청하는 입력을 수신할 수 있다.The input/output unit 110 is configured to receive data, programs, user inputs, and the like, and to output a result of calculating and processing data according to a user input. According to an embodiment, the input/output unit 110 may receive a deep learning neural network from an external device or network, and receive an input requesting compression of the deep learning neural network from a user.

제어부(120)는 CPU 등과 같은 적어도 하나의 프로세서를 포함하는 구성으로서, 딥러닝 신경망 압축 장치(100)의 전반적인 동작을 제어한다. 특히, 제어부(120)는 저장부(130)에 저장된 프로그램을 실행함으로써 ALCOM 기법에 따라 딥러닝 신경망을 압축할 수 있다. 제어부(120)가 딥러닝 신경망을 압축하는 구체적인 과정에 대해서는 아래에서 자세히 설명한다.The controller 120 is a configuration including at least one processor such as a CPU, and controls the overall operation of the deep learning neural network compression apparatus 100. In particular, the controller 120 may compress the deep learning neural network according to the ALCOM technique by executing a program stored in the storage 130. The detailed process of the controller 120 compressing the deep learning neural network will be described in detail below.

저장부(130)에는 다양항 종류의 프로그램 및 데이터가 저장될 수 있다. 특히, 저장부(130)에는 ALCOM 기법을 적용하여 딥러닝 신경망을 압축하기 위한 프로그램이 저장되어 제어부(120)에 의해 실행될 수 있다. 또한, 저장부(130)에는 압축 대상이 되는 딥러닝 신경망이 저장될 수도 있다.Various types of programs and data may be stored in the storage unit 130. In particular, a program for compressing a deep learning neural network by applying an ALCOM technique may be stored in the storage unit 130 and executed by the control unit 120. Also, a deep learning neural network that is a compression target may be stored in the storage unit 130.

제어부(120)는 이하에서 설명하는 ALCOM 기법에 따라 딥러닝 신경망을 압축할 수 있다.The controller 120 may compress the deep learning neural network according to the ALCOM technique described below.

우선 본 명세서에서 제안하는 새로운 신경망 압축 기술인 ALCOM의 특징에 대해서 간략히 소개한다. ALCOM은 신경망의 가중치 텐서(weight tensor)들의 로우-랭크 근사치(low-rank approximate)들을 찾음으로써 신경망을 압축하는 기법이다. ALCOM에 따르면, 분해된 텐서들을 임의의 값으로 초기화하고, 원래의 텐서를 선생 모델로 사용하여 학습을 수행하며, 선생 모델이 학습한 지식을 학생 모델인 분해된 텐서가 학습한다. 이와 같은 동작들은 레이어 단위(layerwise)로 수행되므로 ALCOM은 신경망의 타입에 관계 없이 병렬적으로 수행 가능한 장점이 있다.First, the characteristics of ALCOM, a new neural network compression technique proposed in this specification, will be briefly introduced. ALCOM is a technique for compressing a neural network by finding low-rank approximate values of the weight tensors of the neural network. According to ALCOM, decomposed tensors are initialized to arbitrary values, learning is performed using the original tensor as a teacher model, and the knowledge learned by the teacher model is learned by the decomposed tensor, a student model. Since these operations are performed layerwise, ALCOM has the advantage that it can be executed in parallel regardless of the type of neural network.

ALCOM의 알고리즘은 대립적인 학습(adversarial training)을 통해 신경망의 가중치 텐서들의 로우-랭크 근사치를 찾음으로써 신경망을 효율적으로 압축하는 것이며, ALCOM은 레이어 단위로 동작하므로 어떤 종류의 신경망에라도 다양한 종류의 텐서 분해 기법을 응용하여 함께 적용 가능하다. 또한, ALCOM은 기존의 텐서 분해 방법, 예를 들어 터커 분해와 비교했을 때, 압축된 신경망의 분류 정확도가 향상되는 장점이 있다.ALCOM's algorithm is to efficiently compress a neural network by finding a low-rank approximation of the weighted tensors of the neural network through adversarial training, and ALCOM operates on a layer-by-layer basis. It can be applied together by applying the technique. In addition, ALCOM has an advantage in that the classification accuracy of the compressed neural network is improved when compared to the existing tensor decomposition method, for example, Tucker decomposition.

신경망은 각각이 텐서로 표현되는 많은 레이어들로 구성되고, 이를 일반적인 텐서 분해를 통해 분해할 경우 원래의 텐서와 분해된 텐서 사이의 에러가 최소화되는 방향으로 분해를 하는데, 그럴 경우 높은 성능을 보장하지 않아 정확도가 떨어지는 단점이 있다. 따라서, ALCOM에서는 이러한 문제점을 해결하기 위한 방안을 제시한다.The neural network is composed of many layers, each of which is represented by a tensor, and when it is decomposed through general tensor decomposition, it decomposes in the direction that minimizes the error between the original tensor and the decomposed tensor. There is a disadvantage that the accuracy is poor. Therefore, ALCOM proposes a solution to this problem.

제어부(120)가 ALCOM을 적용하여 딥러닝 신경망을 압축하는 과정은 크게 두 가지 학습 단계로 나눌 수 있다. 첫 번째는 대립적 학습(adversarial training)이고, 두 번째는 임의의 입력을 통한 학습(training by random inputs)이다.The process of the controller 120 compressing the deep learning neural network by applying ALCOM can be roughly divided into two learning stages. The first is adversarial training, and the second is training by random inputs.

먼저, 제어부(120)가 딥러닝 신경망에 대해서 대립적 학습을 통해 텐서를 분해하는 과정을 설명한다.First, a description will be given of a process in which the controller 120 decomposes the tensor through confrontational learning about the deep learning neural network.

제어부(120)는 기존의 텐서 분해 기법을 변형하여 사용한다. 앞서 설명한 바와 같이 ALCOM은 모든 종류의 텐서 분해 기법으로부터 응용이 가능하지만, 이하에서는 터커 분해를 예로 설명한다.The control unit 120 uses a modified tensor decomposition technique. As described above, ALCOM can be applied from all types of tensor decomposition techniques, but tucker decomposition will be described below as an example.

터커 분해에서는 앞서 수학식 3을 통해 설명한 바와 같이 원래의 텐서

와 가장 근사한 분해된 텐서

를 찾지만, ALCOM에 따르면 제어부(120)는 다음의 수학식 6을 통해

를 찾는다.In Tucker decomposition, the original tensor, as explained through Equation 3 above,

And the coolest disassembled tensor

Although it is found, according to ALCOM, the control unit 120 may be expressed through Equation 6 below.

Find

이때,

는 임의의 텐서 입력이고,

는 두 텐서들 간 임의의 연산이고,

은 두 텐서들을 입력받아 스칼라(scalar)를 출력하는 임의의 손실 함수(loss function)이다.At this time,

Is an arbitrary tensor input,

Is an arbitrary operation between two tensors,

Is an arbitrary loss function that takes two tensors and outputs a scalar.

위의 수학식 6에 따라서 구한 분해된 텐서

는, 동일한 입력

에 대해서 원래의 텐서

의 출력에 가장 근사한 값을 출력한다. 즉, 제어부(120)는 동일한 입력에 대한 두 텐서(원래의 텐서

및 분해된 텐서

)의 출력이 가장 근사해지도록 하는

를 찾는다. 이와 같이, ALCOM은 기존의 터커 분해가 두 텐서(원래의 텐서

및 분해된 텐서

) 자체가 서로 근사해지도록 한다는 점과 차이가 있다.Decomposed tensor obtained according to Equation 6 above

The same input

About the original tensor

Outputs the value closest to the output of. That is, the control unit 120 has two tensors for the same input (original tensor

And disassembled tensors

) Output is most approximate

Find As such, ALCOM has two existing tensor decompositions (original tensors).

And disassembled tensors

) It is different from the fact that it makes itself close to each other.

만약,

를 입력 특징(input feature),

를 가중치 텐서(weight tensor)라고 한다면, 연산

는 싱글 레이어상에서의 순전파(forward propagation)에 해당될 수 있다. if,

Is the input feature,

If is a weight tensor, compute

May correspond to forward propagation on a single layer.

제어부(120)는 위 수학식 6을 만족하는

를 찾기 위해 원래의 텐서를 선생 모델로 사용하고 분해된 텐서를 학생 모델로 사용하는, 지식 추출 기법을 이용할 수 있다. 자세하게는, 제어부(120)는 원래의 텐서 및 분해된 텐서에 모두 동일한 입력을 주고, 결과 값으로 나오는 분포가 같아지도록 함으로써 위 수학식 6을 만족하는

를 찾을 수 있다.The controller 120 satisfies Equation 6 above

In order to find, we can use the knowledge extraction technique, which uses the original tensor as a teacher model and the decomposed tensor as a student model. In detail, the controller 120 satisfies Equation 6 above by giving the same input to both the original tensor and the decomposed tensor, and making the distribution resulting from the result equal.

Can be found.

분해된 레이어는 원래의 레이어에 비해 훨씬 적은 파라미터들을 가지고 있으므로, 이러한 대립적 학습은 압축된 신경망이 분해된 텐서의 성분들을 적절히 학습하도록 도울 수 있고, 이러한 점은 도 3을 통해 표현된다.Since the decomposed layer has much fewer parameters than the original layer, this alternative learning can help the compressed neural network to properly learn the components of the decomposed tensor, which is expressed through FIG. 3.

그러나, 전체 모델이 아닌 각각의 레이어의 지식을 추출하기 때문에, 위의 수학식 4의 추출 손실(distill loss)을 학습에 직접적으로 사용할 수 없다. 왜냐하면, 중간 레이어(intermediate layer)의 출력이 확률 분포를 나타내는 것은 아니기 때문이다. 따라서, 다른 종류의 손실 함수(loss function)를 사용해야 하고, 일 실시예에 따르면 제어부(120)는 L2 손실 또는 L1 손실을 사용하여 ALCOM을 실행한다.However, since the knowledge of each layer is extracted rather than the entire model, the distill loss of Equation 4 above cannot be directly used for learning. This is because the output of the intermediate layer does not indicate the probability distribution. Therefore, a different type of loss function must be used, and according to an embodiment, the controller 120 executes ALCOM using L2 loss or L1 loss.

이번에는, 제어부(120)가 분해된 신경망에 대해서 임의의 입력을 통한 학습을 수행하는 과정을 설명한다.This time, a description will be given of a process in which the controller 120 performs learning through an arbitrary input to the decomposed neural network.

학습 데이터를 이용하여 분해된 텐서들을 학습시킴에 있어서, 원래의 신경망에 학습 데이터를 입력으로 주고, 중간 특징들(intermediate feature)을 추출하고, 추출된 중간 특징들을 압축된 신경망의 분해된 텐서들을 학습시키는데 이용할 수도 있다.In learning the decomposed tensors using the learning data, the learning data is input to the original neural network, the intermediate features are extracted, and the extracted intermediate features are trained on the decomposed tensors of the compressed neural network. You can also use it.

도 4는 이러한 기법을 이용하여 3-레이어 신경망을 학습시키는 것을 설명하기 위한 도면이다. 도 4를 참조하면, 원래의 텐서들

및

로부터 생성된 출력들이 각각 분해된 텐서들

및

에 대한 입력으로 사용됨을 알 수 있다.4 is a diagram for explaining training a three-layer neural network using this technique. Referring to Figure 4, the original tensors

And

Tensors whose outputs are decomposed respectively

And

It can be seen that it is used as input to.

그러나, 중간 특징(intermediate feature)들은 원래의 모델에 맞춰져 있고 서로 유사한 분포를 따르므로, 이러한 방법은 자칫 분해된 텐서들에 대해 과도하거나 불충분한 학습 결과를 가져올 수도 있다.However, since intermediate features are fitted to the original model and follow a similar distribution to each other, this method may result in excessive or insufficient learning results for decomposed tensors.

따라서, 일 실시예에 따르면 제어부(120)는 분해된 텐서들을 학습 데이터로 미세조정하기 전에 두 모델 모두에 임의의 노이즈를 입력으로 주고, 이들의 출력을 통해 각 텐서를 분리하여 학습시킨다. 이때, 제어부(120)는 가우시안 분포(Gaussian distribution) 또는 단일 분포(uniform distribution)을 통해 입력으로 사용될 노이즈를 생성할 수 있다.Therefore, according to one embodiment, the controller 120 inputs random noise to both models as inputs before fine-tuning the decomposed tensors into learning data, and separates and trains each tensor through their outputs. At this time, the controller 120 may generate noise to be used as an input through a Gaussian distribution or a uniform distribution.

이와 같이 함에 따라, 분해된 텐서들은 원래의 텐서들의 일반적인 행동들을 학습할 수 있을 뿐만 아니라, 분해된 텐서에 포함되는 성분들을 과도하거나 불충분하지 않게 효율적으로 학습시킬 수 있다.In this way, the decomposed tensors can not only learn the general behaviors of the original tensors, but also can efficiently learn the components included in the decomposed tensors without excessive or insufficient.

또한, 각각의 텐서는 타입이나 형태와 관계 없이 병렬적으로 학습될 수 있으며, 학습을 정규화함으로써 압축된 모델이 시험 데이터에 대해서 잘 동작하도록 할 수 있다.In addition, each tensor can be learned in parallel regardless of type or shape, and by normalizing the learning, the compressed model can work well with the test data.

이상에서 설명한 바와 같이, ALCOM에 따르면 분해된 텐서들을 임의로 초기화하고, 원래의 텐서들을 선생 모델로 하여 지식 추출을 통해 분해된 텐서들을 학습시킨다. 각각의 텐서들에 대한 학습은 독립적이므로 이러한 프로세스는 병렬적으로 수행될 수 있으며, 이러한 과정을 거친 후 압축된 모델은 학습 데이터에 대한 분류 에러가 최소화되도록 미세조정 될 수 있다.As described above, according to ALCOM, decomposed tensors are randomly initialized, and decomposed tensors are trained through knowledge extraction using original tensors as teacher models. Since learning for each tensor is independent, these processes can be performed in parallel, and after this process, the compressed model can be fine-tuned to minimize classification errors for the training data.

도 5에는 이상에서 설명한 ALCOM 알고리즘을 나타내는 표를 도시하였다.5 shows a table showing the ALCOM algorithm described above.

이하에서는 도 6을 참조하여 일 실시예에 따른 딥러닝 신경망 압축 방법에 대해서 설명한다.Hereinafter, a deep learning neural network compression method according to an embodiment will be described with reference to FIG. 6.

도 6을 참조하면, 601 단계에서 동일한 입력에 대해서 원래의 텐서와 분해된 텐서의 출력이 가장 근사해지도록 딥러닝 신경망을 구성하는 텐서를 분해한다. 이때, 원래의 텐서를 선생 모델로 하고, 분해된 텐서를 학생 모델로 하는 지식 추출 기법을 적용할 수 있다. 또한, 원래의 텐서와 분해된 텐서에 동일한 입력을 주고, 두 텐서들로부터 결과 값으로 나오는 분포가 같아지도록 함으로써 텐서 분해를 수행할 수도 있다.Referring to FIG. 6, in step 601, the tensor constituting the deep learning neural network is decomposed so that the output of the original tensor and the decomposed tensor is closest to the same input. In this case, a knowledge extraction technique using an original tensor as a teacher model and a decomposed tensor as a student model can be applied. It is also possible to perform tensor decomposition by giving the same input to the original tensor and the decomposed tensor, and making the distribution resulting from the two tensors the same.

602 단계에서는 원래의 텐서 및 분해된 텐서 모두에 임의의 노이즈를 입력으로 주고, 이들의 출력을 통해 원래의 텐서 및 분해된 텐서를 학습시킬 수 있다. 이때, 임의의 노이즈 입력은 가우시안 분포 또는 단일 분포를 통해 생성될 수 있다.In step 602, random noise is input to both the original tensor and the decomposed tensor, and the original tensor and the decomposed tensor can be trained through these outputs. At this time, an arbitrary noise input may be generated through a Gaussian distribution or a single distribution.

603 단계에서는 학습 데이터를 통해 분해된 텐서를 미세조정(fine-tuning)함으로써 압축된 신경망의 정확도를 높일 수 있다.In step 603, the precision of the compressed neural network may be improved by fine-tuning the decomposed tensor through the training data.

이상의 실시예들에서 사용되는 '~부'라는 용어는 소프트웨어 또는 FPGA(field programmable gate array) 또는 ASIC 와 같은 하드웨어 구성요소를 의미하며, '~부'는 어떤 역할들을 수행한다. 그렇지만 '~부'는 소프트웨어 또는 하드웨어에 한정되는 의미는 아니다. '~부'는 어드레싱할 수 있는 저장 매체에 있도록 구성될 수도 있고 하나 또는 그 이상의 프로세서들을 재생시키도록 구성될 수도 있다. 따라서, 일 예로서 '~부'는 소프트웨어 구성요소들, 객체지향 소프트웨어 구성요소들, 클래스 구성요소들 및 태스크 구성요소들과 같은 구성요소들과, 프로세스들, 함수들, 속성들, 프로시저들, 서브루틴들, 프로그램특허 코드의 세그먼트들, 드라이버들, 펌웨어, 마이크로코드, 회로, 데이터, 데이터베이스, 데이터 구조들, 테이블들, 어레이들, 및 변수들을 포함한다.The term'~unit' used in the above embodiments means software or hardware components such as a field programmable gate array (FPGA) or an ASIC, and'~unit' performs certain roles. However,'~ wealth' is not limited to software or hardware. The'~ unit' may be configured to be in an addressable storage medium or may be configured to reproduce one or more processors. Thus, as an example,'~ unit' refers to components such as software components, object-oriented software components, class components and task components, processes, functions, attributes, and procedures. , Subroutines, segments of program patent code, drivers, firmware, microcode, circuitry, data, database, data structures, tables, arrays, and variables.

구성요소들과 '~부'들 안에서 제공되는 기능은 더 작은 수의 구성요소들 및 '~부'들로 결합되거나 추가적인 구성요소들과 '~부'들로부터 분리될 수 있다.The functions provided within the components and'~units' may be combined into a smaller number of components and'~units', or separated from additional components and'~units'.

뿐만 아니라, 구성요소들 및 '~부'들은 디바이스 또는 보안 멀티미디어카드 내의 하나 또는 그 이상의 CPU 들을 재생시키도록 구현될 수도 있다.In addition, the components and'~ unit' may be implemented to play one or more CPUs in the device or secure multimedia card.

도 6을 통해 설명된 실시예에 따른 딥러닝 신경망 압축 방법은 컴퓨터에 의해 실행 가능한 명령어 및 데이터를 저장하는, 컴퓨터로 판독 가능한 매체의 형태로도 구현될 수 있다. 이때, 명령어 및 데이터는 프로그램 코드의 형태로 저장될 수 있으며, 프로세서에 의해 실행되었을 때, 소정의 프로그램 모듈을 생성하여 소정의 동작을 수행할 수 있다. 또한, 컴퓨터로 판독 가능한 매체는 컴퓨터에 의해 액세스될 수 있는 임의의 가용 매체일 수 있고, 휘발성 및 비휘발성 매체, 분리형 및 비분리형 매체를 모두 포함한다. 또한, 컴퓨터로 판독 가능한 매체는 컴퓨터 기록 매체일 수 있는데, 컴퓨터 기록 매체는 컴퓨터 판독 가능 명령어, 데이터 구조, 프로그램 모듈 또는 기타 데이터와 같은 정보의 저장을 위한 임의의 방법 또는 기술로 구현된 휘발성 및 비휘발성, 분리형 및 비분리형 매체를 모두 포함할 수 있다. 예를 들어, 컴퓨터 기록 매체는 HDD 및 SSD 등과 같은 마그네틱 저장 매체, CD, DVD 및 블루레이 디스크 등과 같은 광학적 기록 매체, 또는 네트워크를 통해 접근 가능한 서버에 포함되는 메모리일 수 있다.The deep learning neural network compression method according to the embodiment described with reference to FIG. 6 may also be implemented in the form of a computer-readable medium storing instructions and data executable by a computer. At this time, instructions and data may be stored in the form of program code, and when executed by a processor, a predetermined program module may be generated to perform a predetermined operation. In addition, the computer-readable medium can be any available medium that can be accessed by a computer, and includes both volatile and nonvolatile media, removable and non-removable media. In addition, the computer-readable medium may be a computer recording medium, which is volatile and non-volatile implemented in any method or technology for storage of information such as computer-readable instructions, data structures, program modules or other data. Volatile, removable and non-removable media. For example, the computer recording medium may be a magnetic storage medium such as HDD and SSD, an optical recording medium such as CD, DVD and Blu-ray disk, or a memory included in a server accessible through a network.

또한 도 6을 통해 설명된 실시예에 따른 딥러닝 신경망 압축 방법은 컴퓨터에 의해 실행 가능한 명령어를 포함하는 컴퓨터 프로그램(또는 컴퓨터 프로그램 제품)으로 구현될 수도 있다. 컴퓨터 프로그램은 프로세서에 의해 처리되는 프로그래밍 가능한 기계 명령어를 포함하고, 고레벨 프로그래밍 언어(High-level Programming Language), 객체 지향 프로그래밍 언어(Object-oriented Programming Language), 어셈블리 언어 또는 기계 언어 등으로 구현될 수 있다. 또한 컴퓨터 프로그램은 유형의 컴퓨터 판독가능 기록매체(예를 들어, 메모리, 하드디스크, 자기/광학 매체 또는 SSD(Solid-State Drive) 등)에 기록될 수 있다.In addition, the deep learning neural network compression method according to the embodiment described with reference to FIG. 6 may be implemented as a computer program (or computer program product) including instructions executable by a computer. The computer program includes programmable machine instructions processed by a processor and may be implemented in a high-level programming language, object-oriented programming language, assembly language, or machine language. . In addition, the computer program may be recorded on a tangible computer-readable recording medium (eg, memory, hard disk, magnetic/optical medium, or solid-state drive (SSD), etc.).

따라서 도 6을 통해 설명된 실시예에 따른 딥러닝 신경망 압축 방법은 상술한 바와 같은 컴퓨터 프로그램이 컴퓨팅 장치에 의해 실행됨으로써 구현될 수 있다. 컴퓨팅 장치는 프로세서와, 메모리와, 저장 장치와, 메모리 및 고속 확장포트에 접속하고 있는 고속 인터페이스와, 저속 버스와 저장 장치에 접속하고 있는 저속 인터페이스 중 적어도 일부를 포함할 수 있다. 이러한 성분들 각각은 다양한 버스를 이용하여 서로 접속되어 있으며, 공통 머더보드에 탑재되거나 다른 적절한 방식으로 장착될 수 있다.Accordingly, the deep learning neural network compression method according to the embodiment described with reference to FIG. 6 may be implemented by executing a computer program as described above by a computing device. The computing device may include at least some of a processor, a memory, a storage device, a high-speed interface connected to the memory and a high-speed expansion port, and a low-speed interface connected to the low-speed bus and the storage device. Each of these components is connected to each other using various buses, and can be mounted on a common motherboard or mounted in other suitable ways.

여기서 프로세서는 컴퓨팅 장치 내에서 명령어를 처리할 수 있는데, 이런 명령어로는, 예컨대 고속 인터페이스에 접속된 디스플레이처럼 외부 입력, 출력 장치상에 GUI(Graphic User Interface)를 제공하기 위한 그래픽 정보를 표시하기 위해 메모리나 저장 장치에 저장된 명령어를 들 수 있다. 다른 실시예로서, 다수의 프로세서 및(또는) 다수의 버스가 적절히 다수의 메모리 및 메모리 형태와 함께 이용될 수 있다. 또한 프로세서는 독립적인 다수의 아날로그 및(또는) 디지털 프로세서를 포함하는 칩들이 이루는 칩셋으로 구현될 수 있다.Here, the processor is capable of processing instructions within the computing device, such as to display graphical information for providing a graphical user interface (GUI) on an external input or output device, such as a display connected to a high-speed interface. Examples are commands stored in memory or storage devices. In other embodiments, multiple processors and/or multiple buses may be used in conjunction with multiple memories and memory types as appropriate. Also, the processor may be implemented as a chipset formed by chips including a plurality of independent analog and/or digital processors.

또한 메모리는 컴퓨팅 장치 내에서 정보를 저장한다. 일례로, 메모리는 휘발성 메모리 유닛 또는 그들의 집합으로 구성될 수 있다. 다른 예로, 메모리는 비휘발성 메모리 유닛 또는 그들의 집합으로 구성될 수 있다. 또한 메모리는 예컨대, 자기 혹은 광 디스크와 같이 다른 형태의 컴퓨터 판독 가능한 매체일 수도 있다.Memory also stores information within computing devices. In one example, the memory may consist of volatile memory units or a collection thereof. As another example, the memory may consist of non-volatile memory units or a collection thereof. The memory may also be other forms of computer-readable media, such as magnetic or optical disks.

그리고 저장장치는 컴퓨팅 장치에게 대용량의 저장공간을 제공할 수 있다. 저장 장치는 컴퓨터 판독 가능한 매체이거나 이런 매체를 포함하는 구성일 수 있으며, 예를 들어 SAN(Storage Area Network) 내의 장치들이나 다른 구성도 포함할 수 있고, 플로피 디스크 장치, 하드 디스크 장치, 광 디스크 장치, 혹은 테이프 장치, 플래시 메모리, 그와 유사한 다른 반도체 메모리 장치 혹은 장치 어레이일 수 있다.And the storage device can provide a large storage space for the computing device. The storage device may be a computer-readable medium or a configuration including such a medium, and may include, for example, devices within a storage area network (SAN) or other configurations, and may include floppy disk devices, hard disk devices, optical disk devices, Or a tape device, flash memory, or other similar semiconductor memory device or device array.

상술된 실시예들은 예시를 위한 것이며, 상술된 실시예들이 속하는 기술분야의 통상의 지식을 가진 자는 상술된 실시예들이 갖는 기술적 사상이나 필수적인 특징을 변경하지 않고서 다른 구체적인 형태로 쉽게 변형이 가능하다는 것을 이해할 수 있을 것이다. 그러므로 상술된 실시예들은 모든 면에서 예시적인 것이며 한정적이 아닌 것으로 이해해야만 한다. 예를 들어, 단일형으로 설명되어 있는 각 구성 요소는 분산되어 실시될 수도 있으며, 마찬가지로 분산된 것으로 설명되어 있는 구성 요소들도 결합된 형태로 실시될 수 있다.The above-described embodiments are for illustration only, and those having ordinary knowledge in the technical field to which the above-described embodiments belong can easily be modified into other specific forms without changing the technical idea or essential features of the above-described embodiments. You will understand. Therefore, it should be understood that the above-described embodiments are illustrative in all respects and not restrictive. For example, each component described as a single type may be implemented in a distributed manner, and similarly, components described as distributed may be implemented in a combined form.

본 명세서를 통해 보호 받고자 하는 범위는 상기 상세한 설명보다는 후술하는 특허청구범위에 의하여 나타내어지며, 특허청구범위의 의미 및 범위 그리고 그 균등 개념으로부터 도출되는 모든 변경 또는 변형된 형태를 포함하는 것으로 해석되어야 한다.The scope to be protected through the present specification is indicated by the claims, which will be described later, rather than the detailed description, and should be interpreted to include all modified or modified forms derived from the meaning and scope of the claims and equivalent concepts thereof. .

100: 딥러닝 신경망 압축 장치 110: 입출력부
120: 제어부 130: 저장부100: deep learning neural network compression device 110: input and output unit
120: control unit 130: storage unit

Claims

딥러닝 신경망 압축 방법에 있어서,
상기 딥러닝 신경망을 구성하는 적어도 하나의 텐서(tensor)를 분해하되, 동일한 입력에 대해서 원래의 텐서와 분해된 텐서의 출력이 가장 근사해지도록 상기 적어도 하나의 텐서를 분해하는 단계; 및
상기 원래의 텐서 및 분해된 텐서 모두에 임의의 노이즈를 입력으로 주고, 이들의 출력을 통해 상기 원래의 텐서 및 분해된 텐서를 학습시키는 단계를 포함하는, 방법.In the deep learning neural network compression method,
Decomposing at least one tensor constituting the deep learning neural network, and decomposing the at least one tensor so that the output of the original tensor and the decomposed tensor approximates the same input; And
And inputting random noise to both the original tensor and the decomposed tensor as input, and training the original tensor and the decomposed tensor through their outputs.

제1항에 있어서,
상기 분해하는 단계는,
상기 원래의 텐서를 선생 모델로 하고, 상기 분해된 텐서를 학생 모델로 하는, 지식 추출(knowledge distillation) 기법을 통해 수행되는 것을 특징으로 하는 방법.According to claim 1,
The decomposing step,
The original tensor as a teacher model, and the decomposed tensor as a student model, characterized in that it is carried out through a knowledge extraction (knowledge distillation) technique.

제2항에 있어서,
상기 분해하는 단계는,
상기 원래의 텐서와 상기 분해된 텐서에 동일한 입력을 주고, 상기 두 텐서들로부터 결과 값으로 나오는 분포가 같아지도록 하는 것을 특징으로 하는 방법.According to claim 2,
The decomposing step,
A method characterized in that the same input is applied to the original tensor and the decomposed tensor, and the distribution resulting from the two tensors is the same.

제1항에 있어서,
학습 데이터를 통해 상기 분해된 텐서를 미세조정(fine-tuning)하는 단계를 더 포함하는 것을 특징으로 하는 방법.According to claim 1,
And fine-tuning the decomposed tensor through training data.

제1항에 기재된 방법을 수행하는 프로그램이 기록된 컴퓨터 판독 가능한 기록 매체.A computer-readable recording medium on which a program for performing the method according to claim 1 is recorded.

딥러닝 신경망 압축 장치에 의해 수행되며, 제1항에 기재된 방법을 수행하기 위해 매체에 저장된 컴퓨터 프로그램.A computer program performed by a deep learning neural network compression device and stored in a medium to perform the method according to claim 1.

딥러닝 신경망 압축 장치에 있어서,
압축 대상이 되는 딥러닝 신경망을 수신하기 위한 입출력부;
상기 딥러닝 신경망을 압축하기 위한 프로그램이 저장되는 저장부; 및
상기 프로그램을 실행함으로써 상기 딥러닝 신경망을 압축하는 제어부를 포함하며,
상기 제어부는,
상기 딥러닝 신경망을 구성하는 적어도 하나의 텐서(tensor)를 분해하되, 동일한 입력에 대해서 원래의 텐서와 분해된 텐서의 출력이 가장 근사해지도록 상기 적어도 하나의 텐서를 분해하고, 상기 원래의 텐서 및 분해된 텐서 모두에 임의의 노이즈을 입력으로 주고, 이들의 출력을 통해 상기 원래의 텐서 및 분해된 텐서를 학습시키는, 장치.In the deep learning neural network compression device,
An input/output unit for receiving a deep learning neural network to be compressed;
A storage unit for storing a program for compressing the deep learning neural network; And
And a controller compressing the deep learning neural network by executing the program,
The control unit,
Decomposing at least one tensor constituting the deep learning neural network, decomposing the at least one tensor so that the output of the original tensor and the decomposed tensor is closest to the same input, and the original tensor and decomposition An apparatus that inputs random noise to all of the tensors as input and trains the original tensor and the decomposed tensor through their outputs.

제7항에 있어서,
상기 제어부는,
상기 원래의 텐서를 선생 모델로 하고, 상기 분해된 텐서를 학생 모델로 하는, 지식 추출(knowledge distillation) 기법을 통해 상기 적어도 하나의 텐서를 분해하는 것을 특징으로 하는 장치.The method of claim 7,
The control unit,
And decomposing the at least one tensor through a knowledge distillation technique, using the original tensor as a teacher model and the decomposed tensor as a student model.

제8항에 있어서,
상기 제어부는,
상기 원래의 텐서와 상기 분해된 텐서에 동일한 입력을 주고, 상기 두 텐서들로부터 결과 값으로 나오는 분포가 같아지도록 상기 적어도 하나의 텐서를 분해하는 것을 특징으로 하는 장치.The method of claim 8,
The control unit,
Apparatus characterized by decomposing the at least one tensor so that the original tensor and the decomposed tensor have the same input, and the distribution resulting from the two tensors is the same.

제7항에 있어서,
상기 제어부는,
학습 데이터를 통해 상기 분해된 텐서를 미세조정(fine-tuning)하는 것을 특징으로 하는 장치.The method of claim 7,
The control unit,
Apparatus characterized by fine-tuning the decomposed tensor through learning data.