KR102199285B1

KR102199285B1 - Method for compressing deep learning neural networks and apparatus for performing the same

Info

Publication number: KR102199285B1
Application number: KR1020180150181A
Authority: KR
Inventors: 강유; 김태범; 유재민
Original assignee: 서울대학교산학협력단
Priority date: 2018-11-28
Filing date: 2018-11-28
Publication date: 2021-01-06
Also published as: KR20200068106A

Abstract

딥러닝 신경망 압축 방법은, 상기 딥러닝 신경망을 구성하는 적어도 하나의 텐서(tensor)를 분해하되, 동일한 입력에 대해서 원래의 텐서와 분해된 텐서의 출력이 가장 근사해지도록 상기 적어도 하나의 텐서를 분해하는 단계 및 상기 원래의 텐서 및 분해된 텐서 모두에 임의의 노이즈를 입력으로 주고, 이들의 출력을 통해 상기 원래의 텐서 및 분해된 텐서를 학습시키는 단계를 포함한다.The deep learning neural network compression method decomposes at least one tensor constituting the deep learning neural network, but decomposes the at least one tensor so that the output of the original tensor and the decomposed tensor are most approximated for the same input. And providing random noise to both the original tensor and the decomposed tensor as inputs, and learning the original tensor and the decomposed tensor through outputs thereof.

Description

딥러닝 신경망의 압축 방법 및 이를 수행하기 위한 장치 {METHOD FOR COMPRESSING DEEP LEARNING NEURAL NETWORKS AND APPARATUS FOR PERFORMING THE SAME}Deep learning neural network compression method and device for performing it {METHOD FOR COMPRESSING DEEP LEARNING NEURAL NETWORKS AND APPARATUS FOR PERFORMING THE SAME}

본 명세서에서 개시되는 실시예들은 딥러닝 신경망을 압축하기 위한 방법 및 장치에 관한 것이다.Embodiments disclosed herein relate to a method and apparatus for compressing a deep learning neural network.

딥러닝이 활용되는 분야가 다양해지고, 딥러닝에 사용되는 데이터의 양이 급격하게 증가함에 따라 점점 더 복잡한 구조의 딥러닝 신경망들이 개발되고 있다. 이에 따라 최근에 개발된 딥러닝 신경망들은 상당히 많은 양의 메모리를 사용한다.Deep learning neural networks with increasingly complex structures are being developed as the fields in which deep learning is used diversify and the amount of data used for deep learning increases rapidly. Accordingly, recently developed deep learning neural networks use a considerable amount of memory.

한편, 인공지능과 IoT 기술이 융합되는 등 모바일 플랫폼에서 딥러닝 신경망을 실행해야 할 필요성은 날이 갈수록 높아지고 있어, 딥러닝 신경망을 경량화하여 모바일 플랫폼에서도 구동될 수 있도록 해야 할 필요가 있다. 이와 같이, 딥러닝 신경망의 압축 기술에 대한 필요성이 높아지고 있다.On the other hand, the necessity of running deep learning neural networks on mobile platforms, such as the convergence of artificial intelligence and IoT technologies, is increasing day by day.There is a need to reduce the weight of deep learning neural networks so that they can be operated on mobile platforms. As such, the need for a compression technique of a deep learning neural network is increasing.

딥러닝 신경망을 압축하기 위한 기술의 하나로서 텐서 분해(tensor decomposition)는 다양한 타입의 CNN(Convolutional Neural Network) 또는 RNN(Recurrent Neural Network)을 압축할 때 널리 사용되는 기법인데, 이 기법에 의할 경우 분해된 텐서는 원래의 텐서에 비해 정확도가 떨어지는 문제가 있다.As one of the techniques for compressing deep learning neural networks, tensor decomposition is a technique widely used when compressing various types of CNN (Convolutional Neural Network) or RNN (Recurrent Neural Network). The decomposed tensor has a problem of inferior accuracy compared to the original tensor.

한편, 전술한 배경기술은 발명자가 본 발명의 도출을 위해 보유하고 있었거나, 본 발명의 도출 과정에서 습득한 기술 정보로서, 반드시 본 발명의 출원 전에 일반 공중에게 공개된 공지기술이라 할 수는 없다.On the other hand, the above-described background technology is technical information that the inventor possessed for derivation of the present invention or acquired during the derivation process of the present invention, and is not necessarily known to be publicly known before filing the present invention. .

본 명세서에서 개시되는 실시예들은, 정확도 손실을 최소화하면서 딥러닝 신경망을 압축하기 위한 방법 및 장치를 제공하고자 한다.Embodiments disclosed herein are intended to provide a method and apparatus for compressing a deep learning neural network while minimizing loss of accuracy.

상술한 기술적 과제를 달성하기 위한 기술적 수단으로서, 일 실시예에 따르면, 딥러닝 신경망 압축 방법은, 상기 딥러닝 신경망을 구성하는 적어도 하나의 텐서(tensor)를 분해하되, 동일한 입력에 대해서 원래의 텐서와 분해된 텐서의 출력이 가장 근사해지도록 상기 적어도 하나의 텐서를 분해하는 단계 및 상기 원래의 텐서 및 분해된 텐서 모두에 임의의 노이즈를 입력으로 주고, 이들의 출력을 통해 상기 원래의 텐서 및 분해된 텐서를 학습시키는 단계를 포함할 수 있다.As a technical means for achieving the above-described technical problem, according to an embodiment, the deep learning neural network compression method decomposes at least one tensor constituting the deep learning neural network, but the original tensor for the same input The step of decomposing the at least one tensor so that the output of the decomposed tensor is most approximated, and random noise is given as input to both the original tensor and the decomposed tensor, and the original tensor and the decomposed It may include the step of learning the tensor.

다른 실시예에 따르면, 딥러닝 신경망 압축 방법을 수행하기 위한 컴퓨터 프로그램으로서, 딥러닝 신경망 압축 방법은, 상기 딥러닝 신경망을 구성하는 적어도 하나의 텐서(tensor)를 분해하되, 동일한 입력에 대해서 원래의 텐서와 분해된 텐서의 출력이 가장 근사해지도록 상기 적어도 하나의 텐서를 분해하는 단계 및 상기 원래의 텐서 및 분해된 텐서 모두에 임의의 노이즈를 입력으로 주고, 이들의 출력을 통해 상기 원래의 텐서 및 분해된 텐서를 학습시키는 단계를 포함할 수 있다.According to another embodiment, as a computer program for performing a deep learning neural network compression method, the deep learning neural network compression method decomposes at least one tensor constituting the deep learning neural network, but the original Decomposing the at least one tensor so that the output of the tensor and the decomposed tensor is most approximated, and random noise is input to both the original tensor and the decomposed tensor, and the original tensor and decomposition through their outputs It may include the step of learning the tensor.

또 다른 실시예에 따르면, 딥러닝 신경망 압축 방법을 수행하는 프로그램이 기록된 컴퓨터 판독 가능한 기록매체로서, 딥러닝 신경망 압축 방법은, 상기 딥러닝 신경망을 구성하는 적어도 하나의 텐서(tensor)를 분해하되, 동일한 입력에 대해서 원래의 텐서와 분해된 텐서의 출력이 가장 근사해지도록 상기 적어도 하나의 텐서를 분해하는 단계 및 상기 원래의 텐서 및 분해된 텐서 모두에 임의의 노이즈를 입력으로 주고, 이들의 출력을 통해 상기 원래의 텐서 및 분해된 텐서를 학습시키는 단계를 포함할 수 있다.According to another embodiment, as a computer-readable recording medium in which a program for performing a deep learning neural network compression method is recorded, the deep learning neural network compression method decomposes at least one tensor constituting the deep learning neural network. , Decomposing the at least one tensor so that the outputs of the original tensor and the decomposed tensor are most approximated for the same input, and random noise is given as input to both the original tensor and the decomposed tensor, and their outputs are Through learning the original tensor and the decomposed tensor may be included.

또 다른 실시예에 따르면, 딥러닝 신경망 압축 장치는, 압축 대상이 되는 딥러닝 신경망을 수신하기 위한 입출력부, 상기 딥러닝 신경망을 압축하기 위한 프로그램이 저장되는 저장부 및 상기 프로그램을 실행함으로써 상기 딥러닝 신경망을 압축하는 제어부를 포함하며, 상기 제어부는, 상기 딥러닝 신경망을 구성하는 적어도 하나의 텐서(tensor)를 분해하되, 동일한 입력에 대해서 원래의 텐서와 분해된 텐서의 출력이 가장 근사해지도록 상기 적어도 하나의 텐서를 분해하고, 상기 원래의 텐서 및 분해된 텐서 모두에 임의의 노이즈을 입력으로 주고, 이들의 출력을 통해 상기 원래의 텐서 및 분해된 텐서를 학습시킬 수 있다.According to another embodiment, the deep learning neural network compression apparatus includes an input/output unit for receiving a deep learning neural network to be compressed, a storage unit storing a program for compressing the deep learning neural network, and the deep learning neural network by executing the program. And a control unit for compressing a running neural network, wherein the control unit decomposes at least one tensor constituting the deep learning neural network, and the original tensor and the output of the decomposed tensor are most approximated for the same input. It is possible to decompose at least one tensor, apply random noise to both the original tensor and the decomposed tensor as inputs, and learn the original tensor and the decomposed tensor through their outputs.

전술한 과제 해결 수단 중 어느 하나에 의하면, 정확도 손실을 최소화하면서 효율적으로 딥러닝 신경망을 경량화할 수 있다.According to any one of the above-described problem solving means, it is possible to efficiently lighten the deep learning neural network while minimizing loss of accuracy.

딥러닝 신경망의 학습 데이터에 대한 성능과 실제 추론 시의 성능 간 차이를 줄일 수 있다.It is possible to reduce the difference between the performance of the training data of the deep learning neural network and the performance during actual inference.

개시되는 실시예들에서 얻을 수 있는 효과는 이상에서 언급한 효과들로 제한되지 않으며, 언급하지 않은 또 다른 효과들은 아래의 기재로부터 개시되는 실시예들이 속하는 기술분야에서 통상의 지식을 가진 자에게 명확하게 이해될 수 있을 것이다.The effects that can be obtained in the disclosed embodiments are not limited to the above-mentioned effects, and other effects not mentioned are obvious to those of ordinary skill in the art to which the embodiments disclosed from the following description belong. Can be understood.

도 1은 기존의 텐서 분해 기법 중 하나인 터커 분해를 수행한 결과 분해된 텐서와 원래의 텐서 간의 관계를 나타낸 도면이다.
도 2는 일 실시예에 따른 딥러닝 신경망 압축 장치를 도시한 도면이다.
도 3는 일 실시예에 따른 딥러닝 신경망 압축 방법의 압축을 수행하는 과정에서 분해된 텐서와 원래의 텐서 간의 관계를 나타낸 도면이다.
도 4는 일 실시예에 따른 딥러닝 신경망 압축 방법에서 여러 개의 레이어로 이루어진 하나의 큰 모델의 압축을 수행하는 구조를 설명하기 위한 도면이다.
도 5는 일 실시예에 따른 딥러닝 신경망 압축 알고리즘을 나타내는 표를 도시한 도면이다.
도 6은 일 실시예에 따른 딥러닝 신경망 압축 방법을 설명하기 위한 순서도이다.1 is a diagram showing a relationship between a decomposed tensor and an original tensor as a result of performing Tucker decomposition, one of the existing tensor decomposition techniques.
2 is a diagram illustrating an apparatus for compressing a deep learning neural network according to an embodiment.
3 is a diagram illustrating a relationship between a decomposed tensor and an original tensor in a process of performing compression in a deep learning neural network compression method according to an embodiment.
FIG. 4 is a diagram illustrating a structure for compressing a single large model composed of multiple layers in a deep learning neural network compression method according to an embodiment.
5 is a diagram illustrating a table showing a deep learning neural network compression algorithm according to an embodiment.
6 is a flowchart illustrating a method of compressing a deep learning neural network according to an embodiment.

아래에서는 첨부한 도면을 참조하여 다양한 실시예들을 상세히 설명한다. 아래에서 설명되는 실시예들은 여러 가지 상이한 형태로 변형되어 실시될 수도 있다. 실시예들의 특징을 보다 명확히 설명하기 위하여, 이하의 실시예들이 속하는 기술분야에서 통상의 지식을 가진 자에게 널리 알려져 있는 사항들에 관해서 자세한 설명은 생략하였다. 그리고, 도면에서 실시예들의 설명과 관계없는 부분은 생략하였으며, 명세서 전체를 통하여 유사한 부분에 대해서는 유사한 도면 부호를 붙였다.Hereinafter, various embodiments will be described in detail with reference to the accompanying drawings. The embodiments described below may be modified and implemented in various different forms. In order to more clearly describe the features of the embodiments, detailed descriptions of matters widely known to those of ordinary skill in the art to which the following embodiments belong are omitted. In addition, parts not related to the description of the embodiments are omitted in the drawings, and similar reference numerals are attached to similar parts throughout the specification.

명세서 전체에서, 어떤 구성이 다른 구성과 "연결"되어 있다고 할 때, 이는 '직접적으로 연결'되어 있는 경우뿐 아니라, '그 중간에 다른 구성을 사이에 두고 연결'되어 있는 경우도 포함한다. 또한, 어떤 구성이 어떤 구성을 "포함"한다고 할 때, 이는 특별히 반대되는 기재가 없는 한, 그 외 다른 구성을 제외하는 것이 아니라 다른 구성들을 더 포함할 수도 있음을 의미한다.Throughout the specification, when a configuration is said to be "connected" with another configuration, this includes not only a case of being'directly connected' but also a case of being'connected with another configuration in between'. In addition, when a certain configuration "includes" a certain configuration, this means that other configurations may be further included rather than excluding other configurations, unless otherwise specified.

이하 첨부된 도면을 참고하여 실시예들을 상세히 설명하기로 한다.Hereinafter, exemplary embodiments will be described in detail with reference to the accompanying drawings.

본 명세서에서는 딥러닝 신경망을 압축하기 위한 새로운 기법으로서 ALCOM(Adversarial Layerwise Compression)을 제안하는데, ALCOM은 딥러닝 신경망을 레이어 단위(layerwise)로 압축하는 기법이며, 기존의 텐서 분해(tensor decomposition) 및 지식 추출(knowledge distillation)을 함께 적용하는 기법이다. 따라서, 이하에서는 먼저 텐서 분해 및 지식 추출에 대해서 설명하도록 한다.In this specification, as a new technique for compressing a deep learning neural network, ALCOM (Adversarial Layerwise Compression) is proposed, which is a technique for compressing a deep learning neural network layerwise, and the existing tensor decomposition and knowledge It is a technique that applies knowledge distillation together. Therefore, in the following, first, tensor decomposition and knowledge extraction will be described.

(1) 텐서 분해(tensor decomposition)(1) tensor decomposition

텐서 분해는 다양한 타입의 CNN 및 RNN들을 압축하는데 널리 사용되는 효과적인 방법이다. 대부분의 신경망의 학습 파라미터(learnable parameter)들은 다차원 어레이(multi-dimensional arrays) 또는 텐서(tensor)로 표현되는데, 텐서 분해는 큰 텐서들을 작은 텐서들의 연산으로 표현함으로써 딥러닝 신경망을 압축한다. 즉, 텐서 분해란 N 차원의 텐서를 간단한 텐서들에 대한 연산으로 나타내는 것이며, 큰 텐서들에 대한 높은 품질의 로우-랭크 근사치들을 찾는 프로세스라고 볼 수 있다.Tensor decomposition is an effective method widely used to compress various types of CNNs and RNNs. The learning parameters of most neural networks are expressed as multi-dimensional arrays or tensors, and tensor decomposition compresses deep learning neural networks by expressing large tensors as operations of small tensors. In other words, tensor decomposition represents an N-dimensional tensor as an operation on simple tensors, and can be viewed as a process of finding high-quality low-rank approximations for large tensors.

터커 분해(tucker decomposition)는 대표적인 텐서 분해 알고리즘이다. 터커 분해는 형태(shape)가

인 N 차원 텐서

를 다음의 수학식 1과 같은 형태로 분해한다.Tucker decomposition is a representative tensor decomposition algorithm. Tucker decomposition has a shape

N-dimensional tensor

Is decomposed into the following equation (1).

위 수학식 1은 다음의 수학식 2와 같이 간단하게 나타낼 수 있다.Equation 1 above can be simply expressed as Equation 2 below.

이때,

은 텐서와 행렬 사이의 n-모드 곱(n-mode product)이고,

는

의 형태를 갖는 코어 텐서(core tensor)이며,

는 행렬의 i번째 인수(factor)다. (예를 들어,

는 형태가

인 텐서를 생성함)At this time,

Is the n-mode product between the tensor and the matrix,

Is

It is a core tensor with the form of,

Is the ith factor of the matrix. (For example,

Has the form

Create an intensor)

상기 수학식 2를 통해 텐서를 분해한 결과는 도 1과 같이 나타낼 수 있다.The result of decomposing the tensor through Equation 2 can be expressed as shown in FIG. 1.

도 1은 터커 분해를 수행한 결과 분해된 텐서와 원래의 텐서 간의 관계를 나타낸 도면이다. 도 1을 참조하면, 3차원 텐서

는 하나의 코어 텐서

와 3개의 인수 행렬들 A, B 및 C로 분해됨을 알 수 있다.1 is a diagram showing a relationship between a tensor decomposed as a result of performing Tucker decomposition and an original tensor. Referring to Figure 1, a three-dimensional tensor

Is one core tensor

It can be seen that and are decomposed into three factor matrices A, B and C.

터커 분해의 목적은 다음의 수학식 3을 만족하는

를 찾는 것이다.The purpose of the Tucker decomposition satisfies the following equation (3)

Is looking for.

이때,

는 텐서의 L2 놈(L2 norm)을 나타낸다.At this time,

Represents the L2 norm of the tensor.

수학식 3에 따를 경우, 분해된 텐서

와 원래의 텐서

간 차이가 가장 작아지도록 분해를 할 수 있다. 즉, 터커 분해를 수행할 경우 원래의 텐서

와 가장 근사한 분해된 텐서

를 찾을 수 있다.According to Equation 3, the decomposed tensor

And original tensor

It can be decomposed so that the difference between them is the smallest. In other words, if you do the Tucker decomposition, the original tensor

And the closest decomposed tensor

Can be found.

(2) 지식 추출(knowledge distillation)(2) knowledge distillation

지식 추출이란 큰 신경망의 지식을 작은 신경망으로 효과적으로 전달하는 기술을 의미한다. 지식 추출 기법에 따르면, 큰 신경망을 선생 모델(teacher model), 작은 신경망을 학생 모델(student model)이라고 하고, 선생 모델에 학습 데이터(training data)를 주고, 그것의 예측을 통해 학생 모델(student model)을 학습시킨다. 지식 추출은 학습 데이터의 양이 제한적이거나 학생 모델의 크기가 작을 때 잘 작동한다.Knowledge extraction refers to a technology that effectively transfers the knowledge of a large neural network to a small neural network. According to the knowledge extraction technique, a large neural network is called a teacher model, and a small neural network is called a student model, and training data is given to the teacher model, and through its prediction, a student model ) To learn. Knowledge extraction works well when the amount of training data is limited or when the size of the student model is small.

지식 추출을 수행하는 과정에 대해서 설명하면 다음과 같다.The process of performing knowledge extraction will be described as follows.

큰 신경망

및 작은 신경망

가 주어진다고 가정하면, 학습 데이터를

에 주입하고, 학습 데이터의 실제 분포(distribution) 대신에

로부터 예측된 분포에 기초하여

를 학습시킨다. 그 결과,

는 원-핫 벡터(one-hot vectors)보다는 소프트 분포(soft distribution)에 의해 학습되며,

이 이미 학습한 레이블들 간 최신 관계를 학습할 수 있다.Large neural network

And a small neural network

Assuming that is given, the training data

And in place of the actual distribution of training data

Based on the predicted distribution from

To learn. As a result,

Is learned by soft distribution rather than one-hot vectors,

You can learn the latest relationships between these already learned labels.

는 다음 수학식 4로 표현되는 크로스-엔트로피 손실(cross-entropy loss)

을 최소화하도록

에 의해 학습될 수 있다.

Is the cross-entropy loss represented by the following equation (4)

To minimize

Can be learned by

이때,

및

은 각각

및

로부터 예측된 확률을 의미하며, 다음과 같이 온도(temperature)

를 통해 수학식 5로 나타낼 수 있다.At this time,

And

Is each

And

It means the predicted probability from, and the temperature is as follows.

It can be represented by Equation 5 through.

이때,

는 i번째 레이블에 대한 로짓(logit)이며,

는 보통 1보다 큰 값으로 설정된다. 따라서,

는 소프트맥스 스코어(softmax score)를 스무스하게 해주며

가

의 지식을 더 정확하게 학습할 수 있도록 한다.At this time,

Is the logit for the ith label,

Is usually set to a value greater than 1. therefore,

Smoothes the softmax score,

end

To learn more accurately the knowledge of

이하에서는 먼저 도 2를 참조하여 ALCOM을 수행하기 위한 딥러닝 신경망 압축 장치의 구성들에 대해서 설명한 후, ALCOM의 알고리즘에 대해서 설명한다.Hereinafter, configurations of a deep learning neural network compression apparatus for performing ALCOM will be described with reference to FIG. 2, and then an algorithm of ALCOM will be described.

도 2는 일 실시예에 따른 딥러닝 신경망 압축 장치를 도시한 도면이다. 도 2를 참조하면, 일 실시예에 따른 딥러닝 신경망 압축 장치(100)는 입출력부(110), 제어부(120) 및 저장부(130)를 포함할 수 있다. 일 실시예에 따른 딥러닝 신경망 압축 장치(100)는 PC, 노트북 등과 같이 데이터 연산이 가능한 다양한 전자장치일 수 있다.2 is a diagram illustrating an apparatus for compressing a deep learning neural network according to an embodiment. Referring to FIG. 2, the deep learning neural network compression apparatus 100 according to an embodiment may include an input/output unit 110, a control unit 120, and a storage unit 130. The deep learning neural network compression apparatus 100 according to an embodiment may be a variety of electronic devices capable of computing data, such as a PC or a notebook computer.

입출력부(110)는 데이터, 프로그램 및 사용자 입력 등을 수신하고, 사용자의 입력에 따라 데이터를 연산 처리한 결과를 출력하기 위한 구성이다. 일 실시예에 따르면 입출력부(110)는 외부의 장치나 네트워크로부터 딥러닝 신경망을 수신하고, 사용자로부터 딥러닝 신경망의 압축을 요청하는 입력을 수신할 수 있다.The input/output unit 110 is a component for receiving data, programs, user input, and the like, and outputting a result of calculating and processing data according to a user input. According to an embodiment, the input/output unit 110 may receive a deep learning neural network from an external device or network, and may receive an input requesting compression of the deep learning neural network from a user.

제어부(120)는 CPU 등과 같은 적어도 하나의 프로세서를 포함하는 구성으로서, 딥러닝 신경망 압축 장치(100)의 전반적인 동작을 제어한다. 특히, 제어부(120)는 저장부(130)에 저장된 프로그램을 실행함으로써 ALCOM 기법에 따라 딥러닝 신경망을 압축할 수 있다. 제어부(120)가 딥러닝 신경망을 압축하는 구체적인 과정에 대해서는 아래에서 자세히 설명한다.The control unit 120 includes at least one processor such as a CPU, and controls the overall operation of the deep learning neural network compression apparatus 100. In particular, the control unit 120 may compress the deep learning neural network according to the ALCOM technique by executing a program stored in the storage unit 130. A detailed process by which the controller 120 compresses the deep learning neural network will be described in detail below.

저장부(130)에는 다양항 종류의 프로그램 및 데이터가 저장될 수 있다. 특히, 저장부(130)에는 ALCOM 기법을 적용하여 딥러닝 신경망을 압축하기 위한 프로그램이 저장되어 제어부(120)에 의해 실행될 수 있다. 또한, 저장부(130)에는 압축 대상이 되는 딥러닝 신경망이 저장될 수도 있다.Various types of programs and data may be stored in the storage unit 130. In particular, a program for compressing a deep learning neural network by applying the ALCOM technique may be stored in the storage unit 130 and executed by the controller 120. In addition, a deep learning neural network to be compressed may be stored in the storage unit 130.

제어부(120)는 이하에서 설명하는 ALCOM 기법에 따라 딥러닝 신경망을 압축할 수 있다.The control unit 120 may compress the deep learning neural network according to the ALCOM technique described below.

우선 본 명세서에서 제안하는 새로운 신경망 압축 기술인 ALCOM의 특징에 대해서 간략히 소개한다. ALCOM은 신경망의 가중치 텐서(weight tensor)들의 로우-랭크 근사치(low-rank approximate)들을 찾음으로써 신경망을 압축하는 기법이다. ALCOM에 따르면, 분해된 텐서들을 임의의 값으로 초기화하고, 원래의 텐서를 선생 모델로 사용하여 학습을 수행하며, 선생 모델이 학습한 지식을 학생 모델인 분해된 텐서가 학습한다. 이와 같은 동작들은 레이어 단위(layerwise)로 수행되므로 ALCOM은 신경망의 타입에 관계 없이 병렬적으로 수행 가능한 장점이 있다.First, the features of ALCOM, a new neural network compression technique proposed in this specification, are briefly introduced. ALCOM is a technique for compressing a neural network by finding low-rank approximates of the weight tensors of the neural network. According to ALCOM, decomposed tensors are initialized to arbitrary values, training is performed using the original tensor as a teacher model, and the knowledge learned by the teacher model is learned by the decomposed tensor, a student model. Since these operations are performed layerwise, ALCOM has the advantage of being able to execute in parallel regardless of the type of neural network.

ALCOM의 알고리즘은 대립적인 학습(adversarial training)을 통해 신경망의 가중치 텐서들의 로우-랭크 근사치를 찾음으로써 신경망을 효율적으로 압축하는 것이며, ALCOM은 레이어 단위로 동작하므로 어떤 종류의 신경망에라도 다양한 종류의 텐서 분해 기법을 응용하여 함께 적용 가능하다. 또한, ALCOM은 기존의 텐서 분해 방법, 예를 들어 터커 분해와 비교했을 때, 압축된 신경망의 분류 정확도가 향상되는 장점이 있다.ALCOM's algorithm efficiently compresses a neural network by finding a low-rank approximation of the weight tensors of a neural network through adversarial training, and because ALCOM operates in layers, it decomposes various types of tensors in any kind of neural network. It can be applied together by applying the technique. In addition, ALCOM has the advantage of improving the classification accuracy of a compressed neural network when compared to a conventional tensor decomposition method, for example, Tucker decomposition.

신경망은 각각이 텐서로 표현되는 많은 레이어들로 구성되고, 이를 일반적인 텐서 분해를 통해 분해할 경우 원래의 텐서와 분해된 텐서 사이의 에러가 최소화되는 방향으로 분해를 하는데, 그럴 경우 높은 성능을 보장하지 않아 정확도가 떨어지는 단점이 있다. 따라서, ALCOM에서는 이러한 문제점을 해결하기 위한 방안을 제시한다.A neural network is composed of many layers, each of which is represented by a tensor, and when decomposing it through general tensor decomposition, it decomposes in a direction that minimizes the error between the original tensor and the decomposed tensor. In that case, high performance is not guaranteed. There is a disadvantage that the accuracy is poor. Therefore, ALCOM proposes a solution to this problem.

제어부(120)가 ALCOM을 적용하여 딥러닝 신경망을 압축하는 과정은 크게 두 가지 학습 단계로 나눌 수 있다. 첫 번째는 대립적 학습(adversarial training)이고, 두 번째는 임의의 입력을 통한 학습(training by random inputs)이다.The process of the controller 120 compressing the deep learning neural network by applying ALCOM can be roughly divided into two learning steps. The first is adversarial training, and the second is training by random inputs.

먼저, 제어부(120)가 딥러닝 신경망에 대해서 대립적 학습을 통해 텐서를 분해하는 과정을 설명한다.First, a description will be given of a process in which the controller 120 decomposes a tensor through confrontational learning for a deep learning neural network.

제어부(120)는 기존의 텐서 분해 기법을 변형하여 사용한다. 앞서 설명한 바와 같이 ALCOM은 모든 종류의 텐서 분해 기법으로부터 응용이 가능하지만, 이하에서는 터커 분해를 예로 설명한다.The control unit 120 uses a modified tensor decomposition technique. As described above, ALCOM can be applied from all kinds of tensor decomposition techniques, but Tucker decomposition will be described below as an example.

터커 분해에서는 앞서 수학식 3을 통해 설명한 바와 같이 원래의 텐서

와 가장 근사한 분해된 텐서

를 찾지만, ALCOM에 따르면 제어부(120)는 다음의 수학식 6을 통해

를 찾는다.In the Tucker decomposition, as described through Equation 3 above, the original tensor

And the closest decomposed tensor

However, according to ALCOM, the control unit 120 is based on Equation 6 below.

Look for

이때,

는 임의의 텐서 입력이고,

는 두 텐서들 간 임의의 연산이고,

은 두 텐서들을 입력받아 스칼라(scalar)를 출력하는 임의의 손실 함수(loss function)이다.At this time,

Is an arbitrary tensor input,

Is an arbitrary operation between two tensors,

Is an arbitrary loss function that receives two tensors and outputs a scalar.

위의 수학식 6에 따라서 구한 분해된 텐서

는, 동일한 입력

에 대해서 원래의 텐서

의 출력에 가장 근사한 값을 출력한다. 즉, 제어부(120)는 동일한 입력에 대한 두 텐서(원래의 텐서

및 분해된 텐서

)의 출력이 가장 근사해지도록 하는

를 찾는다. 이와 같이, ALCOM은 기존의 터커 분해가 두 텐서(원래의 텐서

및 분해된 텐서

) 자체가 서로 근사해지도록 한다는 점과 차이가 있다.The decomposed tensor obtained according to Equation 6 above

Is, the same input

About the original tensor

Prints the closest match to the output of. That is, the control unit 120 has two tensors for the same input (original tensor

And the decomposed tensor

) So that the output of

Look for As such, ALCOM uses two tensors (the original tensor

And the decomposed tensor

The difference is that) itself makes them approximate each other.

만약,

를 입력 특징(input feature),

를 가중치 텐서(weight tensor)라고 한다면, 연산

는 싱글 레이어상에서의 순전파(forward propagation)에 해당될 수 있다. if,

The input feature,

Is a weight tensor, then the operation

May correspond to forward propagation on a single layer.

제어부(120)는 위 수학식 6을 만족하는

를 찾기 위해 원래의 텐서를 선생 모델로 사용하고 분해된 텐서를 학생 모델로 사용하는, 지식 추출 기법을 이용할 수 있다. 자세하게는, 제어부(120)는 원래의 텐서 및 분해된 텐서에 모두 동일한 입력을 주고, 결과 값으로 나오는 분포가 같아지도록 함으로써 위 수학식 6을 만족하는

를 찾을 수 있다.The control unit 120 satisfies Equation 6 above

To find, we can use an original tensor as a teacher model and a decomposed tensor as a student model, a knowledge extraction technique. In detail, the control unit 120 satisfies Equation 6 above by giving the same input to both the original tensor and the decomposed tensor, and making the distributions as result values become the same

Can be found.

분해된 레이어는 원래의 레이어에 비해 훨씬 적은 파라미터들을 가지고 있으므로, 이러한 대립적 학습은 압축된 신경망이 분해된 텐서의 성분들을 적절히 학습하도록 도울 수 있고, 이러한 점은 도 3을 통해 표현된다.Since the decomposed layer has far fewer parameters than the original layer, this confrontational learning can help the compressed neural network to properly learn the components of the decomposed tensor, and this point is expressed through FIG. 3.

그러나, 전체 모델이 아닌 각각의 레이어의 지식을 추출하기 때문에, 위의 수학식 4의 추출 손실(distill loss)을 학습에 직접적으로 사용할 수 없다. 왜냐하면, 중간 레이어(intermediate layer)의 출력이 확률 분포를 나타내는 것은 아니기 때문이다. 따라서, 다른 종류의 손실 함수(loss function)를 사용해야 하고, 일 실시예에 따르면 제어부(120)는 L2 손실 또는 L1 손실을 사용하여 ALCOM을 실행한다.However, since the knowledge of each layer, not the entire model, is extracted, the distill loss of Equation 4 above cannot be used directly for learning. This is because the output of the intermediate layer does not represent a probability distribution. Accordingly, another type of loss function must be used, and according to an embodiment, the control unit 120 executes ALCOM using the L2 loss or the L1 loss.

이번에는, 제어부(120)가 분해된 신경망에 대해서 임의의 입력을 통한 학습을 수행하는 과정을 설명한다.This time, a process in which the controller 120 performs learning through an arbitrary input on the decomposed neural network will be described.

학습 데이터를 이용하여 분해된 텐서들을 학습시킴에 있어서, 원래의 신경망에 학습 데이터를 입력으로 주고, 중간 특징들(intermediate feature)을 추출하고, 추출된 중간 특징들을 압축된 신경망의 분해된 텐서들을 학습시키는데 이용할 수도 있다.In training decomposed tensors using training data, training data is input to the original neural network, intermediate features are extracted, and the extracted intermediate features are trained on decomposed tensors of the compressed neural network. It can also be used to order.

도 4는 이러한 기법을 이용하여 3-레이어 신경망을 학습시키는 것을 설명하기 위한 도면이다. 도 4를 참조하면, 원래의 텐서들

및

로부터 생성된 출력들이 각각 분해된 텐서들

및

에 대한 입력으로 사용됨을 알 수 있다.4 is a diagram for explaining training of a 3-layer neural network using this technique. Referring to Figure 4, original tensors

And

The outputs generated from are decomposed tensors

And

It can be seen that it is used as an input to

그러나, 중간 특징(intermediate feature)들은 원래의 모델에 맞춰져 있고 서로 유사한 분포를 따르므로, 이러한 방법은 자칫 분해된 텐서들에 대해 과도하거나 불충분한 학습 결과를 가져올 수도 있다.However, since the intermediate features fit the original model and follow a similar distribution, this method may result in excessive or insufficient learning results for decomposed tensors.

따라서, 일 실시예에 따르면 제어부(120)는 분해된 텐서들을 학습 데이터로 미세조정하기 전에 두 모델 모두에 임의의 노이즈를 입력으로 주고, 이들의 출력을 통해 각 텐서를 분리하여 학습시킨다. 이때, 제어부(120)는 가우시안 분포(Gaussian distribution) 또는 단일 분포(uniform distribution)을 통해 입력으로 사용될 노이즈를 생성할 수 있다.Accordingly, according to an embodiment, before fine-tuning the decomposed tensors into training data, the controller 120 provides random noise to both models as inputs, and separates and trains each tensor through the outputs thereof. In this case, the control unit 120 may generate noise to be used as an input through a Gaussian distribution or a uniform distribution.

이와 같이 함에 따라, 분해된 텐서들은 원래의 텐서들의 일반적인 행동들을 학습할 수 있을 뿐만 아니라, 분해된 텐서에 포함되는 성분들을 과도하거나 불충분하지 않게 효율적으로 학습시킬 수 있다.By doing this, the decomposed tensors can learn not only the general behaviors of the original tensors, but also efficiently learn the components included in the decomposed tensors without being excessive or insufficient.

또한, 각각의 텐서는 타입이나 형태와 관계 없이 병렬적으로 학습될 수 있으며, 학습을 정규화함으로써 압축된 모델이 시험 데이터에 대해서 잘 동작하도록 할 수 있다.In addition, each tensor can be trained in parallel regardless of its type or shape, and by normalizing the learning, the compressed model can work well on the test data.

이상에서 설명한 바와 같이, ALCOM에 따르면 분해된 텐서들을 임의로 초기화하고, 원래의 텐서들을 선생 모델로 하여 지식 추출을 통해 분해된 텐서들을 학습시킨다. 각각의 텐서들에 대한 학습은 독립적이므로 이러한 프로세스는 병렬적으로 수행될 수 있으며, 이러한 과정을 거친 후 압축된 모델은 학습 데이터에 대한 분류 에러가 최소화되도록 미세조정 될 수 있다.As described above, according to ALCOM, decomposed tensors are randomly initialized, and the decomposed tensors are trained through knowledge extraction using the original tensors as a teacher model. Since learning for each tensor is independent, this process can be performed in parallel, and after passing through this process, the compressed model can be fine-tuned to minimize classification errors for training data.

도 5에는 이상에서 설명한 ALCOM 알고리즘을 나타내는 표를 도시하였다.5 is a table showing the ALCOM algorithm described above.

이하에서는 도 6을 참조하여 일 실시예에 따른 딥러닝 신경망 압축 방법에 대해서 설명한다.Hereinafter, a deep learning neural network compression method according to an embodiment will be described with reference to FIG. 6.

도 6을 참조하면, 601 단계에서 동일한 입력에 대해서 원래의 텐서와 분해된 텐서의 출력이 가장 근사해지도록 딥러닝 신경망을 구성하는 텐서를 분해한다. 이때, 원래의 텐서를 선생 모델로 하고, 분해된 텐서를 학생 모델로 하는 지식 추출 기법을 적용할 수 있다. 또한, 원래의 텐서와 분해된 텐서에 동일한 입력을 주고, 두 텐서들로부터 결과 값으로 나오는 분포가 같아지도록 함으로써 텐서 분해를 수행할 수도 있다.Referring to FIG. 6, in step 601, a tensor constituting a deep learning neural network is decomposed so that the output of the original tensor and the decomposed tensor are most approximated for the same input. In this case, a knowledge extraction technique using the original tensor as a teacher model and the decomposed tensor as a student model may be applied. In addition, tensor decomposition can be performed by giving the same input to the original tensor and the decomposed tensor, and making the distribution of the result values from the two tensors become the same.

602 단계에서는 원래의 텐서 및 분해된 텐서 모두에 임의의 노이즈를 입력으로 주고, 이들의 출력을 통해 원래의 텐서 및 분해된 텐서를 학습시킬 수 있다. 이때, 임의의 노이즈 입력은 가우시안 분포 또는 단일 분포를 통해 생성될 수 있다.In step 602, random noise is given to both the original tensor and the decomposed tensor as inputs, and the original tensor and the decomposed tensor may be trained through the outputs thereof. In this case, an arbitrary noise input may be generated through a Gaussian distribution or a single distribution.

603 단계에서는 학습 데이터를 통해 분해된 텐서를 미세조정(fine-tuning)함으로써 압축된 신경망의 정확도를 높일 수 있다.In step 603, the accuracy of the compressed neural network can be improved by fine-tuning the tensor decomposed through the training data.

이상의 실시예들에서 사용되는 '~부'라는 용어는 소프트웨어 또는 FPGA(field programmable gate array) 또는 ASIC 와 같은 하드웨어 구성요소를 의미하며, '~부'는 어떤 역할들을 수행한다. 그렇지만 '~부'는 소프트웨어 또는 하드웨어에 한정되는 의미는 아니다. '~부'는 어드레싱할 수 있는 저장 매체에 있도록 구성될 수도 있고 하나 또는 그 이상의 프로세서들을 재생시키도록 구성될 수도 있다. 따라서, 일 예로서 '~부'는 소프트웨어 구성요소들, 객체지향 소프트웨어 구성요소들, 클래스 구성요소들 및 태스크 구성요소들과 같은 구성요소들과, 프로세스들, 함수들, 속성들, 프로시저들, 서브루틴들, 프로그램특허 코드의 세그먼트들, 드라이버들, 펌웨어, 마이크로코드, 회로, 데이터, 데이터베이스, 데이터 구조들, 테이블들, 어레이들, 및 변수들을 포함한다.The term'~ unit' used in the above embodiments refers to software or hardware components such as field programmable gate array (FPGA) or ASIC, and the'~ unit' performs certain roles. However,'~ part' is not limited to software or hardware. The'~ unit' may be configured to be in an addressable storage medium, or may be configured to reproduce one or more processors. Thus, as an example,'~ unit' refers to components such as software components, object-oriented software components, class components and task components, processes, functions, properties, and procedures. , Subroutines, segments of program patent code, drivers, firmware, microcode, circuitry, data, database, data structures, tables, arrays, and variables.

구성요소들과 '~부'들 안에서 제공되는 기능은 더 작은 수의 구성요소들 및 '~부'들로 결합되거나 추가적인 구성요소들과 '~부'들로부터 분리될 수 있다.The components and functions provided in the'~ units' may be combined into a smaller number of elements and'~ units' or separated from the additional elements and'~ units'.

뿐만 아니라, 구성요소들 및 '~부'들은 디바이스 또는 보안 멀티미디어카드 내의 하나 또는 그 이상의 CPU 들을 재생시키도록 구현될 수도 있다.In addition, components and'~ units' may be implemented to play one or more CPUs in a device or a security multimedia card.

도 6을 통해 설명된 실시예에 따른 딥러닝 신경망 압축 방법은 컴퓨터에 의해 실행 가능한 명령어 및 데이터를 저장하는, 컴퓨터로 판독 가능한 매체의 형태로도 구현될 수 있다. 이때, 명령어 및 데이터는 프로그램 코드의 형태로 저장될 수 있으며, 프로세서에 의해 실행되었을 때, 소정의 프로그램 모듈을 생성하여 소정의 동작을 수행할 수 있다. 또한, 컴퓨터로 판독 가능한 매체는 컴퓨터에 의해 액세스될 수 있는 임의의 가용 매체일 수 있고, 휘발성 및 비휘발성 매체, 분리형 및 비분리형 매체를 모두 포함한다. 또한, 컴퓨터로 판독 가능한 매체는 컴퓨터 기록 매체일 수 있는데, 컴퓨터 기록 매체는 컴퓨터 판독 가능 명령어, 데이터 구조, 프로그램 모듈 또는 기타 데이터와 같은 정보의 저장을 위한 임의의 방법 또는 기술로 구현된 휘발성 및 비휘발성, 분리형 및 비분리형 매체를 모두 포함할 수 있다. 예를 들어, 컴퓨터 기록 매체는 HDD 및 SSD 등과 같은 마그네틱 저장 매체, CD, DVD 및 블루레이 디스크 등과 같은 광학적 기록 매체, 또는 네트워크를 통해 접근 가능한 서버에 포함되는 메모리일 수 있다.The deep learning neural network compression method according to the embodiment described with reference to FIG. 6 may also be implemented in the form of a computer-readable medium storing instructions and data executable by a computer. In this case, the instructions and data may be stored in the form of a program code, and when executed by a processor, a predetermined program module may be generated to perform a predetermined operation. Further, the computer-readable medium may be any available medium that can be accessed by a computer, and includes both volatile and nonvolatile media, and removable and non-removable media. Further, the computer-readable medium may be a computer recording medium, which is volatile and non-volatile implemented in any method or technology for storage of information such as computer-readable instructions, data structures, program modules, or other data. It can include both volatile, removable and non-removable media. For example, the computer recording medium may be a magnetic storage medium such as an HDD and an SSD, an optical recording medium such as a CD, DVD, and Blu-ray disk, or a memory included in a server accessible through a network.

또한 도 6을 통해 설명된 실시예에 따른 딥러닝 신경망 압축 방법은 컴퓨터에 의해 실행 가능한 명령어를 포함하는 컴퓨터 프로그램(또는 컴퓨터 프로그램 제품)으로 구현될 수도 있다. 컴퓨터 프로그램은 프로세서에 의해 처리되는 프로그래밍 가능한 기계 명령어를 포함하고, 고레벨 프로그래밍 언어(High-level Programming Language), 객체 지향 프로그래밍 언어(Object-oriented Programming Language), 어셈블리 언어 또는 기계 언어 등으로 구현될 수 있다. 또한 컴퓨터 프로그램은 유형의 컴퓨터 판독가능 기록매체(예를 들어, 메모리, 하드디스크, 자기/광학 매체 또는 SSD(Solid-State Drive) 등)에 기록될 수 있다.In addition, the deep learning neural network compression method according to the embodiment described with reference to FIG. 6 may be implemented as a computer program (or a computer program product) including instructions executable by a computer. The computer program includes programmable machine instructions processed by a processor, and may be implemented in a high-level programming language, an object-oriented programming language, an assembly language, or a machine language. . Further, the computer program may be recorded on a tangible computer-readable recording medium (eg, memory, hard disk, magnetic/optical medium, solid-state drive (SSD), etc.).

따라서 도 6을 통해 설명된 실시예에 따른 딥러닝 신경망 압축 방법은 상술한 바와 같은 컴퓨터 프로그램이 컴퓨팅 장치에 의해 실행됨으로써 구현될 수 있다. 컴퓨팅 장치는 프로세서와, 메모리와, 저장 장치와, 메모리 및 고속 확장포트에 접속하고 있는 고속 인터페이스와, 저속 버스와 저장 장치에 접속하고 있는 저속 인터페이스 중 적어도 일부를 포함할 수 있다. 이러한 성분들 각각은 다양한 버스를 이용하여 서로 접속되어 있으며, 공통 머더보드에 탑재되거나 다른 적절한 방식으로 장착될 수 있다.Therefore, the deep learning neural network compression method according to the embodiment described with reference to FIG. 6 may be implemented by executing the above-described computer program by the computing device. The computing device may include at least some of a processor, a memory, a storage device, a high speed interface connected to the memory and a high speed expansion port, and a low speed interface connected to the low speed bus and the storage device. Each of these components is connected to each other using various buses and can be mounted on a common motherboard or in other suitable manner.

여기서 프로세서는 컴퓨팅 장치 내에서 명령어를 처리할 수 있는데, 이런 명령어로는, 예컨대 고속 인터페이스에 접속된 디스플레이처럼 외부 입력, 출력 장치상에 GUI(Graphic User Interface)를 제공하기 위한 그래픽 정보를 표시하기 위해 메모리나 저장 장치에 저장된 명령어를 들 수 있다. 다른 실시예로서, 다수의 프로세서 및(또는) 다수의 버스가 적절히 다수의 메모리 및 메모리 형태와 함께 이용될 수 있다. 또한 프로세서는 독립적인 다수의 아날로그 및(또는) 디지털 프로세서를 포함하는 칩들이 이루는 칩셋으로 구현될 수 있다.Here, the processor can process commands within the computing device. Such commands include, for example, to display graphic information for providing a GUI (Graphic User Interface) on an external input or output device, such as a display connected to a high-speed interface. Examples are instructions stored in memory or storage devices. As another embodiment, multiple processors and/or multiple buses may be utilized with multiple memories and memory types as appropriate. In addition, the processor may be implemented as a chipset formed by chips including a plurality of independent analog and/or digital processors.

또한 메모리는 컴퓨팅 장치 내에서 정보를 저장한다. 일례로, 메모리는 휘발성 메모리 유닛 또는 그들의 집합으로 구성될 수 있다. 다른 예로, 메모리는 비휘발성 메모리 유닛 또는 그들의 집합으로 구성될 수 있다. 또한 메모리는 예컨대, 자기 혹은 광 디스크와 같이 다른 형태의 컴퓨터 판독 가능한 매체일 수도 있다.The memory also stores information within the computing device. As an example, the memory may be composed of volatile memory units or a set of them. As another example, the memory may be composed of a nonvolatile memory unit or a set of them. Also, the memory may be another type of computer-readable medium such as a magnetic or optical disk.

그리고 저장장치는 컴퓨팅 장치에게 대용량의 저장공간을 제공할 수 있다. 저장 장치는 컴퓨터 판독 가능한 매체이거나 이런 매체를 포함하는 구성일 수 있으며, 예를 들어 SAN(Storage Area Network) 내의 장치들이나 다른 구성도 포함할 수 있고, 플로피 디스크 장치, 하드 디스크 장치, 광 디스크 장치, 혹은 테이프 장치, 플래시 메모리, 그와 유사한 다른 반도체 메모리 장치 혹은 장치 어레이일 수 있다.In addition, the storage device may provide a large amount of storage space to the computing device. The storage device may be a computer-readable medium or a configuration including such a medium, for example, devices in a storage area network (SAN) or other configurations, a floppy disk device, a hard disk device, an optical disk device, Or it may be a tape device, a flash memory, or another semiconductor memory device or device array similar thereto.

상술된 실시예들은 예시를 위한 것이며, 상술된 실시예들이 속하는 기술분야의 통상의 지식을 가진 자는 상술된 실시예들이 갖는 기술적 사상이나 필수적인 특징을 변경하지 않고서 다른 구체적인 형태로 쉽게 변형이 가능하다는 것을 이해할 수 있을 것이다. 그러므로 상술된 실시예들은 모든 면에서 예시적인 것이며 한정적이 아닌 것으로 이해해야만 한다. 예를 들어, 단일형으로 설명되어 있는 각 구성 요소는 분산되어 실시될 수도 있으며, 마찬가지로 분산된 것으로 설명되어 있는 구성 요소들도 결합된 형태로 실시될 수 있다.The above-described embodiments are for illustrative purposes only, and those of ordinary skill in the art to which the above-described embodiments belong can easily transform into other specific forms without changing the technical idea or essential features of the above-described embodiments. You can understand. Therefore, it should be understood that the above-described embodiments are illustrative and non-limiting in all respects. For example, each component described as a single type may be implemented in a distributed manner, and similarly, components described as being distributed may also be implemented in a combined form.

본 명세서를 통해 보호 받고자 하는 범위는 상기 상세한 설명보다는 후술하는 특허청구범위에 의하여 나타내어지며, 특허청구범위의 의미 및 범위 그리고 그 균등 개념으로부터 도출되는 모든 변경 또는 변형된 형태를 포함하는 것으로 해석되어야 한다.The scope to be protected through the present specification is indicated by the claims to be described later rather than the detailed description, and should be interpreted as including all changes or modified forms derived from the meaning and scope of the claims and the concept of equivalents thereof. .

100: 딥러닝 신경망 압축 장치 110: 입출력부
120: 제어부 130: 저장부100: deep learning neural network compression device 110: input/output unit
120: control unit 130: storage unit

Claims

딥러닝 신경망 압축 장치에 의해 수행되는 딥러닝 신경망 압축 방법에 있어서,
상기 딥러닝 신경망 압축 장치의 제어부가 상기 딥러닝 신경망을 구성하는 적어도 하나의 텐서(tensor)를 분해하되, 동일한 입력에 대한 원래의 텐서의 출력과 분해된 텐서의 출력이 가장 근사해지도록 상기 적어도 하나의 텐서를 분해하는 단계; 및
상기 제어부가 상기 원래의 텐서 및 분해된 텐서 모두에 임의의 노이즈를 입력으로 주고, 이들의 출력을 통해 상기 원래의 텐서 및 분해된 텐서를 학습시키는 단계를 포함하는, 방법.In the deep learning neural network compression method performed by the deep learning neural network compression device,
The control unit of the deep learning neural network compression device decomposes at least one tensor constituting the deep learning neural network, but the at least one tensor outputs the output of the original tensor for the same input and the output of the decomposed tensor are most approximated. Decomposing the tensor; And
The method comprising the step of the control unit giving random noise to both the original tensor and the decomposed tensor as inputs, and learning the original tensor and the decomposed tensor through outputs thereof.

제1항에 있어서,
상기 분해하는 단계는,
상기 제어부가 상기 원래의 텐서를 선생 모델로 하고, 상기 분해된 텐서를 학생 모델로 하는, 지식 추출(knowledge distillation) 기법을 통해 수행되는 것을 특징으로 하는 방법.The method of claim 1,
The decomposing step,
The method according to claim 1, wherein the control unit uses the original tensor as a teacher model and the decomposed tensor as a student model, and is performed through a knowledge distillation technique.

제2항에 있어서,
상기 분해하는 단계는,
상기 제어부가 상기 원래의 텐서와 상기 분해된 텐서에 동일한 입력을 주고, 상기 두 텐서들로부터 결과 값으로 나오는 분포가 같아지도록 하는 것을 특징으로 하는 방법.The method of claim 2,
The decomposing step,
And the control unit gives the same input to the original tensor and the decomposed tensor, and causes the distributions resulting from the two tensors to be the same.

제1항에 있어서,
상기 제어부가 학습 데이터를 통해 상기 분해된 텐서를 미세조정(fine-tuning)하는 단계를 더 포함하는 것을 특징으로 하는 방법.The method of claim 1,
The method further comprising the step of fine-tuning the decomposed tensor through the learning data by the control unit.

제1항에 기재된 방법을 수행하는 프로그램이 기록된 컴퓨터 판독 가능한 기록 매체.A computer-readable recording medium on which a program for performing the method according to claim 1 is recorded.

딥러닝 신경망 압축 장치에 의해 수행되며, 제1항에 기재된 방법을 수행하기 위해 매체에 저장된 컴퓨터 프로그램.A computer program performed by a deep learning neural network compression device and stored in a medium to perform the method according to claim 1.

딥러닝 신경망 압축 장치에 있어서,
압축 대상이 되는 딥러닝 신경망을 수신하기 위한 입출력부;
상기 딥러닝 신경망을 압축하기 위한 프로그램이 저장되는 저장부; 및
상기 프로그램을 실행함으로써 상기 딥러닝 신경망을 압축하는 제어부를 포함하며,
상기 제어부는,
상기 딥러닝 신경망을 구성하는 적어도 하나의 텐서(tensor)를 분해하되, 동일한 입력에 대한 원래의 텐서의 출력과 분해된 텐서의 출력이 가장 근사해지도록 상기 적어도 하나의 텐서를 분해하고, 상기 원래의 텐서 및 분해된 텐서 모두에 임의의 노이즈를 입력으로 주고, 이들의 출력을 통해 상기 원래의 텐서 및 분해된 텐서를 학습시키는, 장치.In the deep learning neural network compression device,
An input/output unit for receiving a deep learning neural network to be compressed;
A storage unit storing a program for compressing the deep learning neural network; And
And a control unit compressing the deep learning neural network by executing the program,
The control unit,
At least one tensor constituting the deep learning neural network is decomposed, but the at least one tensor is decomposed so that the output of the original tensor for the same input and the output of the decomposed tensor are most approximated, and the original tensor And giving random noise to all of the decomposed tensors as inputs, and learning the original tensor and the decomposed tensor through their outputs.

제7항에 있어서,
상기 제어부는,
상기 원래의 텐서를 선생 모델로 하고, 상기 분해된 텐서를 학생 모델로 하는, 지식 추출(knowledge distillation) 기법을 통해 상기 적어도 하나의 텐서를 분해하는 것을 특징으로 하는 장치.The method of claim 7,
The control unit,
And decomposing the at least one tensor through a knowledge distillation technique using the original tensor as a teacher model and the decomposed tensor as a student model.

제8항에 있어서,
상기 제어부는,
상기 원래의 텐서와 상기 분해된 텐서에 동일한 입력을 주고, 상기 두 텐서들로부터 결과 값으로 나오는 분포가 같아지도록 상기 적어도 하나의 텐서를 분해하는 것을 특징으로 하는 장치.The method of claim 8,
The control unit,
And decomposing the at least one tensor such that the original tensor and the decomposed tensor are given the same input, and distributions resulting from the two tensors are the same.

제7항에 있어서,
상기 제어부는,
학습 데이터를 통해 상기 분해된 텐서를 미세조정(fine-tuning)하는 것을 특징으로 하는 장치.The method of claim 7,
The control unit,
The device, characterized in that fine-tuning the decomposed tensor through training data.