KR20200005402A

KR20200005402A - System and method for DNN based image or video coding based on tool-by-tool

Info

Publication number: KR20200005402A
Application number: KR1020180138737A
Authority: KR
Inventors: 문현철; 김재곤; 천승문; 고현철
Original assignee: (주)인시그널; 한국항공대학교산학협력단
Priority date: 2018-07-05
Filing date: 2018-11-13
Publication date: 2020-01-15
Also published as: KR20200005403A

Abstract

Disclosed are a system for coding an image or a video in a tool-by-tool manner based on a deep neural network (DNN), which is capable of applying DNN technology to implement all or a part of functions of an encoder and a decoder, and a method thereof. According to one embodiment of the present invention, the system comprises: a network training framework generating and transmitting a trained tool neural network described by training a DNN-based coding tool; an encoder, during an encoding process, applying the trained tool neural network to an inference engine in a first encoding process corresponding to the DNN-based coding tool, and applying a normalized image or video encoding tool in a second encoding process except the first encoding process to encode an inputted image or video so as to generate an encoded bitstream; and a decoder, during a decoding process, applying the trained tool neural network to an inference engine in a first decoding process corresponding to the DNN-based coding tool, and applying a normalized image or video decoding tool in a second decoding process except the first decoding process to decode the encoded bitstream generated by the encoder.

Description

도구 단위의 DNN 기반 이미지 또는 비디오 코딩을 위한 시스템 및 방법{System and method for DNN based image or video coding based on tool-by-tool}System and method for DNN based image or video coding based on tool-by-tool}

본 발명은 이미지 또는 비디오 코딩 기술(image or video coding technology)에 관한 것으로, 보다 구체적으로 도구 단위로 심층 신경망(Deep Neural Network, DNN) 기술을 활용하여 이미지 또는 비디오를 인코딩 및 디코딩하기 위한 방법과 시스템에 관한 것이다.The present invention relates to an image or video coding technology, and more particularly, to a method and a system for encoding and decoding an image or a video using a deep neural network (DNN) technology in a tool unit. It is about.

인공 지능(Artificial Intelligence, AI)을 다양한 산업 분야에서 활용하기 위한 시도들이 계속되어 왔다. 특히, 최근의 인공 지능 기술은 생물학적 신경망과 공통된 특정 성능을 갖는 정보 처리 시스템인 신경망(Neural Network, NN)을 활용하면서, 그 성능이 큰 폭으로 향상되고 있으며, 그에 따라 응용 분야도 급속도로 증가하고 있다. Attempts have been made to exploit artificial intelligence (AI) in various industries. In particular, the recent artificial intelligence technology utilizes Neural Network (NN), an information processing system having a certain performance in common with biological neural networks, and its performance is greatly improved, and thus the application field is rapidly increasing. have.

이러한 신경망(NN)은 '인공' 신경망(Artificial Neural Network, ANN)이라고도 불린다. 신경망은 외부 입력에 대한 자신의 동적 상태 응답에 의해 정보를 프로세싱하는 다수의 간단하고 고도로 상호 연결된 프로세싱 요소로 구성된다. 프로세싱 요소는 인간 두뇌의 뉴런으로 간주될 수 있는데, 다수의 입력을 받아들이고 입력의 가중화된 합을 계산한다. 그리고 상호 연결된 프로세싱 요소는 계층으로 조직화된다. Such a neural network (NN) is also called an artificial neural network (ANN). The neural network consists of a number of simple and highly interconnected processing elements that process information by their dynamic state response to external input. The processing element can be considered a neuron in the human brain, which accepts multiple inputs and calculates the weighted sum of the inputs. And interconnected processing elements are organized in layers.

인공 신경망은 신경망에 포함되는 변수 및 토폴로지 관계를 지정하기 위해 상이한 아키텍쳐를 사용할 수 있다. 신경망에 포함되는 변수는 뉴런의 활동과 함께 뉴련들 간의 연결의 가중치일 수 있다. 신경망 토폴로지의 유형으로 피드 포워드 네트워크와 역방향 전파 신경망(backward propagation neural network)이 있다. 전자에서는 동일한 계층에서 서로 연결된 각 계층 내의 노드가 다음 스테이지로 공급되는데, 제공되는 입력 패턴에 따라 연결의 가중치를 수정하는 '학습 규칙'의 일부 형태를 포함한다. 후자에서는 가중 조정치의 역방향 에러 전파를 허용하는 것으로, 전자보다 진보된 신경망이다. Artificial neural networks may use different architectures to specify the variables and topology relationships involved in neural networks. A variable included in the neural network may be a weight of neuronal activity along with neuronal activity. Types of neural network topologies include feedforward networks and backward propagation neural networks. In the former, nodes in each layer connected to each other in the same layer are fed to the next stage, which includes some form of 'learning rule' that modifies the weight of the connection according to the input pattern provided. The latter allows for backward error propagation of the weighted adjustments, which is a more advanced neural network than the former.

심층 신경망(Deep Neural Network, DNN)은 다수의 레벨의 상호 연결된 노드를 갖는 신경망에 대응하여 매우 비선형이고 고도로 변화하는 기능을 콤팩트하게 표현할 수 있다. 그럼에도 불구하고, 다수의 계층과 연관된 노드의 수와 함께 DNN에 대한 계산 복잡도가 급격히 상승한다. 최근까지 이러한 DNN을 학습(training)시키기 위한 효율적인 연산 방법들이 개발되고 있다. DNN의 학습 속도가 획기적으로 높아짐에 따라, 음성 인식, 이미지 세분화, 물체 감지, 안면 인식 등의 다양하고 복잡한 작업에 성공적으로 적용되고 있다. Deep Neural Networks (DNNs) can compactly represent very nonlinear and highly varying functions in response to neural networks having multiple levels of interconnected nodes. Nevertheless, the computational complexity for the DNN increases rapidly with the number of nodes associated with multiple layers. Until recently, efficient computational methods for training such DNNs have been developed. As the learning speed of DNNs has increased dramatically, they have been successfully applied to a variety of complex tasks such as speech recognition, image segmentation, object detection, and facial recognition.

비디오 코딩(video coding)도 이러한 DNN의 적용이 시도되고 있는 분야의 하나이다. 차세대 비디오 코딩으로 현재 고효율 비디오 코딩(High Efficiency Video Coding, HEVC)이 ITU-T(비디오 코딩 전문가 그룹) 및 ISO/IEC MPEG(동영상 전문가 그룹) 표준화 조직의 공동 비디오 프로젝트에 의하여 개발되어 국제 표준으로 채택되어 사용되고 있다. DNN을 HEVC 등과 같은 새로운 비디오 코딩 표준에 적용함으로써, 그 성능을 더욱 향상시키는 것이 가능하다는 것이 알려져 있다. Video coding is also one of the fields where the application of such a DNN is attempted. Next Generation Video Coding, High Efficiency Video Coding (HEVC) is currently developed by joint video projects of the ITU-T (Video Coding Expert Group) and ISO / IEC MPEG (Video Expert Group) standardization organizations to be adopted as an international standard. It is used. It is known that by applying the DNN to new video coding standards such as HEVC, it is possible to further improve its performance.

이러한 시도의 하나가 한국공개특허 제10-2018-0052651호, "비디오 코딩에서의 신경망 기반 프로세시의 방법 및 장치"에 개시되어 있다. 상기 한국공개특허에 따른 신경망 기반 프로세싱 방법에 의하면, DNN을 사용하여 타겟 신호가 프로세싱되며, DNN 입력에 제공되는 타겟 신호는 예측 프로세스, 재구성 프로세서, 하나 이상의 필터링 프로세스, 또는 이들의 조합으로부터 출력되는, 재구성된 잔차에 대응한다. DNN 출력으로부터의 출력 데이터가 인코딩 프로세스 또는 디코딩 프로세스를 위해 제공된다. DNN은, 타겟 신호의 픽셀 값을 복원하거나 타겟 신호와 오리지널 신호 사이에서 하나 이상의 잔체 픽셀의 부호를 예측하는 데에 사용될 수 있다. 하나 이상의 잔차 픽셀의 절대값은 비디오 비트스트림에서 시그널링될 수 있고 타겟 신호의 잔차 오차를 감소시키기 위해 부호와 함께 사용될 수 있다.One such attempt is disclosed in Korean Patent Laid-Open Publication No. 10-2018-0052651, "Method and Apparatus for Neural Network Based Process in Video Coding". According to the neural network based processing method according to the Korean Patent Publication, a target signal is processed using a DNN, and the target signal provided to the DNN input is output from a prediction process, a reconstruction processor, one or more filtering processes, or a combination thereof. Corresponds to the reconstructed residual. Output data from the DNN output is provided for the encoding process or the decoding process. The DNN may be used to reconstruct the pixel value of the target signal or to predict the sign of one or more residual pixels between the target signal and the original signal. The absolute value of one or more residual pixels may be signaled in the video bitstream and used with a sign to reduce the residual error of the target signal.

한국공개특허 제10-2018-0052651호, "비디오 코딩에서의 신경망 기반 프로세시의 방법 및 장치"Korean Laid-Open Patent Publication No. 10-2018-0052651, "Method and apparatus of neural network based process in video coding"

상기 한국공개특허에 의하면, 즉 예측 프로세스, 재구성 프로세서, 하나 이상의 필터링 프로세스, 또는 이들의 조합으로부터 출력되는 재구성된 잔차(reconstructed residual)에 대응하는 타겟 신호를 프로세싱하는데 DNN 기술을 적용하는 것으로 개시하고 있다. 즉, 상기 한국공개특허에서는 비디오 코딩 과정에서 생성되는 여러 가지 신호들 중에서 특정한 신호를 프로세싱하는데 DNN 기술을 적용하는 것으로 개시하고 있을 뿐이며, 인코더 및 디코더의 전체 기능 또는 일부 기능을 구현하는데, DNN 기술을 어떻게 적용할지에 대해서는 전혀 개시하고 있지 않다. The Korean Patent Publication discloses applying the DNN technique to processing a target signal corresponding to a reconstructed residual output from a prediction process, a reconstruction processor, one or more filtering processes, or a combination thereof. . That is, the Korean Laid-Open Patent Publication discloses only the application of the DNN technology to processing a specific signal among various signals generated in the video coding process, and implements all or some functions of the encoder and the decoder. How to apply it is not disclosed at all.

그리고 상기 한국공개특허에서는 DNN 기술을 적용하기 위한 DNN 파라미터가 비디오 코딩 시스템에 대해 사전에 정의되어 있거나 또는 비디오 코딩 시스템이 다수의 DNN 파라미터 세트를 선택하는 것으로 전제하고 있다. 이에 의하면, DNN 기술의 적용을 위하여 인코더 및 디코더는 미리 특정된 신경망(NN) 또는 이의 DNN 파라미터를 구비하고 있는 것을 전제하고 있을 뿐, 인코더와 디코더의 추론 엔진(inference engine)이 사용할 학습된 신경망을 어떻게 전달할 것인지에 대해서도 개시하고 있지 않다.In addition, the Korean Patent Publication assumes that a DNN parameter for applying a DNN technique is previously defined for a video coding system, or that the video coding system selects a plurality of DNN parameter sets. According to the present invention, the encoder and the decoder have a pre-specified neural network (NN) or a DNN parameter thereof for the application of the DNN technology, and the trained neural network for the inference engine of the encoder and the decoder is used. It does not disclose how to deliver it.

따라서 본 발명이 해결하고자 하는 하나의 과제는 인코더 및 디코더의 전체 기능 및/또는 일부 기능을 구현하는데 DNN 기술을 구체적으로 적용할 수 있는 DNN 기반 이미지 또는 비디오 코딩을 위한 방법 및 시스템을 제공하는 것이다.Therefore, one problem to be solved by the present invention is to provide a method and system for DNN-based image or video coding that can specifically apply the DNN technology to implement the entire function and / or some functions of the encoder and decoder.

본 발명이 해결하고자 하는 다른 하나의 과제는 학습된 신경망을 인코더와 디코더에 전달하여 인코더와 디코더의 추론 엔진(inference engine)이 사용할 수 있도록 하는 DNN 기반 이미지 또는 비디오 코딩을 위한 방법 및 시스템을 제공하는 것이다.Another object of the present invention is to provide a method and system for DNN-based image or video coding that delivers a learned neural network to an encoder and a decoder so that the inference engine of the encoder and the decoder can be used. will be.

전술한 과제를 해결하기 위한 본 발명의 일 실시예에 따른 DNN 기반 이미지 또는 비디오 코딩을 위한 시스템은 DNN 기반 코딩 도구를 학습시켜서 기술한 학습된 도구 신경망을 생성하여 전송하는 네트워크 트레이닝 프레임워크, 부호화 과정 중에서, 상기 DNN 기반 코딩 도구에 대응되는 제1 부호화 과정에서는 상기 학습된 도구 신경망을 추론 엔진에 적용하고 또한 상기 제1 부호화 과정 이외의 제2 부호화 과정에서는 규격화된 이미지 또는 비디오 부호화 도구를 적용하여, 입력되는 이미지 또는 비디오를 부호화하여 부호화된 비트스트림을 생성하는 인코더 및 복호화 과정 중에서, 상기 DNN 기반 코딩 도구에 대응되는 제1 복호화 과정에서는 상기 학습된 도구 신경망을 추론 엔진에 적용하고 또한 상기 제1 복호화 과정 이외의 제2 복호화 과정에서는 규격화된 이미지 또는 비디오 복호화 도구를 적용하여, 상기 인코더에 의하여 생성된 부호화된 비트스트림을 복호화하는 디코더를 포함한다. DNN-based image or video coding system according to an embodiment of the present invention for solving the above problems is a network training framework for generating and transmitting the learned tool neural network described by learning the DNN-based coding tool, encoding process In the first encoding process corresponding to the DNN-based coding tool, the learned tool neural network is applied to the inference engine, and in the second encoding process other than the first encoding process, a standardized image or video encoding tool is applied. In the encoder and decoding process of encoding an input image or video to generate an encoded bitstream, in the first decoding process corresponding to the DNN-based coding tool, the learned tool neural network is applied to the inference engine and the first decoding is performed. In the second decoding process other than the The applied images or video decoding tool, a decoder for decoding the encoded bitstream generated by the encoder.

상기 실시예의 일 측면에 의하면, 상기 DNN 기반 코딩 도구는 상기 인코더 및 상기 디코더에 공통으로 구비되는 코딩 기능을 구현하는 제1 코딩 도구를 포함할 수 있다. 이 때, 상기 DNN 기반 코딩 도구는 상기 인코더와 상기 디코더 중에서 하나에만 구비되는 코딩 기능을 구현하는 제2 코딩 도구를 더 포함할 수도 있다. According to an aspect of the embodiment, the DNN-based coding tool may include a first coding tool for implementing a coding function commonly provided in the encoder and the decoder. In this case, the DNN-based coding tool may further include a second coding tool for implementing a coding function provided in only one of the encoder and the decoder.

상기 실시예의 다른 측면에 의하면, 상기 네트워크 트레이닝 프레임워크로부터 상기 학습된 도구 신경망을 수신한 다음 압축하여 상기 인코더와 상기 디코더로 전송하기 위한 신경망 압축부(NN compression)를 더 포함할 수 있다. 이 때, 상기 네트워크 트레이닝 프레임워크에 의하여 기술된 상기 학습된 도구 신경망 중에서, 일부는 상기 신경망 압축부를 경유하고, 나머지 일부는 상기 신경망 압축부를 경유하지 않고 상기 네트워크 프레이닝 프레임워크로부터 상기 인코더 및 상기 디코더로 바로 전송될 수 있다. According to another aspect of the embodiment, it may further include a neural network compression unit (NN compression) for receiving the learned tool neural network from the network training framework and then compressing and transmitting to the encoder and the decoder. At this time, among the learned tool neural networks described by the network training framework, some are via the neural network compression unit, and some are not from the network framing framework without the neural network compression unit. Can be sent directly to.

상기 실시예의 또 다른 측면에 의하면, 상기 네트워크 트레이닝 프레임워크는 동일한 코딩 기능을 구현하는 복수의 DNN 기반 코딩 도구 중에서 하나를 선택하여 상기 학습된 도구 신경망을 생성할 수 있다. 이 때, 상기 네트워크 트레이닝 프레임워크는 상기 입력되는 이미지 또는 비디오의 유형에 기초하여 상기 복수의 DNN 기반 코딩 도구 중에서 하나를 선택할 수 있다. According to another aspect of the embodiment, the network training framework may generate the learned tool neural network by selecting one of a plurality of DNN based coding tools that implement the same coding function. In this case, the network training framework may select one of the plurality of DNN based coding tools based on the type of the input image or video.

상기 실시예의 또 다른 측면에 의하면, 상기 네트워크 트레이닝 프레임워크는, 상기 인코더 및 상기 디코더와 상호교환될 수 있도록, 망 구조(network structure) 및 학습된 가중치(trained weight)를 포함하는 호환가능한 포맷으로 상기 학습된 도구 신경망을 기술할 수 있다. 이 때, 상기 호환가능한 포맷은 상기 학습된 도구 신경망뿐만 아니라 시스템 구성(system configuration)에 관한 상위 레벨 정보를 더 포함할 수 있다. 그리고 상기 학습된 도구 신경망에 대한 상위 레벨 정보는 상기 DNN 기반 코딩 도구의 유형을 지시하는 정보를 포함할 수 있다. According to another aspect of the embodiment, the network training framework is configured in a compatible format including a network structure and a trained weight so that the network training framework can be interchanged with the encoder and the decoder. Describe the learned tool neural networks. In this case, the compatible format may further include higher level information regarding a system configuration as well as the learned tool neural network. The higher level information on the learned tool neural network may include information indicating the type of the DNN-based coding tool.

상기 실시예의 또 다른 측면에 의하면, 상기 인코더는 상기 제1 부호화 과정에서 상기 학습된 도구 신경망을 추론 엔진에 적용하는 것과 규격화된 이미지 또는 비디오 부호화 도구를 적용하는 것 중에서 선택할 수 있다. 이 때, 상기 인코더는 RD 코스트에 기초하여 선택할 수 있다. 또한, 상기 인코더에 의하여 선택된 정보는 상위 레벨 정보에 포함되어 디코더에 전달될 수 있다.According to another aspect of the embodiment, the encoder may select from applying the learned tool neural network to the inference engine and applying a standardized image or video encoding tool in the first encoding process. At this time, the encoder may select based on the RD cost. Also, the information selected by the encoder may be included in higher level information and transmitted to the decoder.

전술한 과제를 해결하기 위한 본 발명의 일 실시예에 따른 DNN 기반 이미지 또는 비디오 코딩을 위한 방법은 DNN 기반 코딩 도구를 학습시켜서 기술한 학습된 도구 신경망을 생성하여 전송하는 학습 단계, 부호화 과정 중에서, 상기 DNN 기반 코딩 도구에 대응되는 제1 부호화 과정에서는 상기 학습된 도구 신경망을 인코더의 추론 엔진에 적용하고 또한 상기 제1 부호화 과정 이외의 제2 부호화 과정에서는 인코더의 규격화된 이미지 또는 비디오 부호화 도구를 적용하여, 입력되는 이미지 또는 비디오를 부호화하여 부호화된 비트스트림을 생성하는 부호화 단계 및 복호화 과정 중에서, 상기 DNN 기반 코딩 도구에 대응되는 제1 복호화 과정에서는 상기 학습된 도구 신경망을 디코더의 추론 엔진에 적용하고 또한 상기 제1 복호화 과정 이외의 제2 복호화 과정에서는 디코더의 규격화된 이미지 또는 비디오 복호화 도구를 적용하여, 상기 부호화 단계에서 생성된 부호화된 비트스트림을 복호화하는 복호화 단계를 포함한다. DNN-based image or video coding method according to an embodiment of the present invention for solving the above-mentioned problems is a learning step of generating and transmitting the learned tool neural network described by learning the DNN-based coding tool, In the first encoding process corresponding to the DNN-based coding tool, the learned tool neural network is applied to the inference engine of the encoder, and in the second encoding process other than the first encoding process, the standardized image or video encoding tool of the encoder is applied. In the encoding and decoding processes of encoding an input image or video to generate an encoded bitstream, in the first decoding process corresponding to the DNN-based coding tool, the learned tool neural network is applied to the inference engine of the decoder. Also, a second decoding process other than the first decoding process Applying the stand standardized image or video decoding tools in the decoder, and a decoding step of decoding the encoded bitstream generated in the encoding step.

전술한 본 발명의 실시예에 의하면, 네트워크 트레이닝 프레임워크에서 학습된 DNN 기반 도구의 결과물인 기술된(represented) 학습된 도구 신경망을 생성하여 인코더 및 디코더에 전송하기 때문에, 부호화 및 복호화 과정을 구성하는 일부 기능 단위로 상호 호환적으로 DNN 기반 기술을 적용하는 것이 가능하다.According to the above-described embodiment of the present invention, since the generated learned tool neural network, which is the result of the DNN-based tool learned in the network training framework, is generated and transmitted to the encoder and the decoder, the encoding and decoding process may be configured. It is possible to apply DNN-based technology in some functional units with mutual compatibility.

그리고 본 발명의 실시예에 따른 인코더와 디코더는, 네트워크 트레이닝 프레임워크로부터 수신된 학습된 도구 신경망을 추론 엔진(inference engine)에 적용하여 부호화 또는 복호화를 수행한다. 그 결과, 기존의 규격화된 이미지 또는 비디오 인코딩 기술이나 또는 장래에 개발되는 규격화된 이미지 또는 비디오 인코딩 기술의 일부 기능을 대체하여 도구 단위(tool-by-tool)로 DNN 기술을 적용할 수 있다. 특히, 동일한 코딩 기능을 구현하는 복수의 DNN 기반 코딩 도구 중에서 코딩 효율인 높은 DNN 기반 코딩 도구를 적용하여 부호화 및/또는 복호화를 수행할 수 있을 뿐만 아니라, 인코더는 DNN 기반 기술과 기존의 규격화된 코딩 기술 중에서 코딩 효율이 더 높은 기술을 선택적으로 적용하여, 부호화를 수행하는 것도 가능하다.The encoder and decoder according to an embodiment of the present invention apply the learned tool neural network received from the network training framework to an inference engine to perform encoding or decoding. As a result, the DNN technology can be applied on a tool-by-tool basis, replacing some functions of the existing standardized image or video encoding technology or future standardized image or video encoding technology. In particular, among the plurality of DNN-based coding tools that implement the same coding function, encoding and / or decoding may be performed by applying a high coding efficiency DNN-based coding tool, and the encoder may perform DNN-based technology and existing standardized coding. Coding may also be performed by selectively applying a technique having a higher coding efficiency among the techniques.

도 1은 본 발명의 일 실시예에 따른 DNN 기반 이미지 또는 비디오 코딩을 위한 도구 단위 기반 방법에 따른 시스템의 구성을 보여 주는 것이다.
도 2는 도 1의 시스템에서 수행되는 본 발명의 일 실시예에 따른 DNN 기반 이미지 또는 비디오 코딩을 위한 방법을 보여 주는 흐름도이다.
도 3은 본 발명의 다른 실시예에 따른 DNN 기반 이미지 또는 비디오 코딩을 위한 전체 코덱 기반 방법에 따른 시스템의 구성을 보여 주는 것이다.
도 4는 도 3의 시스템에서 수행되는 본 발명의 일 실시예에 따른 DNN 기반 이미지 또는 비디오 코딩을 위한 방법을 보여 주는 흐름도이다.1 illustrates a configuration of a system according to a tool unit based method for DNN-based image or video coding according to an embodiment of the present invention.
2 is a flowchart illustrating a method for DNN based image or video coding according to an embodiment of the present invention performed in the system of FIG.
3 illustrates a configuration of a system according to an entire codec based method for DNN based image or video coding according to another embodiment of the present invention.
4 is a flowchart illustrating a method for DNN based image or video coding according to an embodiment of the present invention performed in the system of FIG. 3.

이하, 도면을 참조하여 본 발명의 바람직한 실시형태 및 실시예를 설명한다. 다만, 이하의 실시형태 및 실시예는 본 발명의 바람직한 구성을 예시적으로 나타내는 것일 뿐이며, 본 발명의 범위는 이들 구성에 한정되지 않는다. 그리고 이하의 설명에 있어서, 장치의 하드웨어 구성 및 소프트웨어 구성, 처리 흐름, 제조조건, 크기, 재질, 형상 등은, 특히 특정적인 기재가 없는 한, 본 발명의 범위를 이것으로 한정하려는 취지인 것은 아니다.EMBODIMENT OF THE INVENTION Hereinafter, preferred embodiment and Example of this invention are described with reference to drawings. However, the following embodiment and examples are only illustrative of the preferable structure of this invention, and the scope of the present invention is not limited to these structures. In the following description, the hardware configuration, software configuration, processing flow, manufacturing conditions, size, material, shape, and the like of the apparatus are not intended to limit the scope of the present invention to these unless specifically stated otherwise. .

전술한 바와 같이, 최근 심층 신경망(Deep Neural Network, DNN)은 다양한 응용 분야에서 높은 관심을 끌고 있으며, 이미지 또는 비디오(이하, 단순히 '비디오'라고도 한다)의 코딩 기술에도 적용하려는 시도가 있어 왔다. 예를 들어, 전술한 한국공개특허 제10-2018-0052651호에 개시된 바와 같이, 특정한 신호, 즉 재구성된 잔차(reconstructed residual)를 프로세싱하는 과정에 대하여 DNN 기술을 적용하는 것이 그 중의 하나이다.As described above, deep neural networks (DNNs) have recently attracted high interest in various applications, and there have been attempts to apply them to coding techniques of images or video (hereinafter, simply referred to as 'video'). For example, as disclosed in the aforementioned Korean Patent Publication No. 10-2018-0052651, one of them is to apply a DNN technique to a process of processing a specific signal, that is, a reconstructed residual.

다른 하나의 시도는, 본 발명의 실시예와 같이, 학습된 신경망(trained Neural Network)을 인코더와 디코더에 전달하여 인코더와 디코더의 추론 엔진(inference engine)에 적용하는 것이다. 이에 의하면, 두 가지 접근 방법이 있는데, 도구 단위 기반(tool-by-tool basis) 방법과 전체 코덱 기반(entire codec basis) 방법이 그것이다. 도구 단위 기반 방법에 의하면, 인코딩/디코딩 과정 또는 코덱을 구성하는 일부 기존의 구성요소 또는 기능이 DNN 기반 기술, 즉 DNN 기반 코딩 도구에 의하여 대체된다. 반면, 전체 코덱 기반 방법에 의하면, 코덱을 구성하는 전체 인코딩/디코딩 과정이 시작부터 끝까지 DNN 기반 기술, 즉 DNN 기반 인코더/디코더에 의하여 대체된다.Another attempt is to deliver a trained neural network to the encoder and decoder, as in the embodiment of the present invention, and apply it to the inference engine of the encoder and decoder. According to this, there are two approaches, a tool-by-tool basis method and an entire codec basis method. According to the tool unit based method, some existing components or functions constituting an encoding / decoding process or a codec are replaced by a DNN based technology, that is, a DNN based coding tool. On the other hand, according to the entire codec-based method, the entire encoding / decoding process constituting the codec is replaced by a DNN-based technology, that is, a DNN-based encoder / decoder from start to finish.

도구 단위 기반 방법Tool unit based method

도구 단위 기반 방법에서, 용어 '도구'는 코덱을 구성하는 일부 기존의 구성요소 또는 기능을 수행하기 위한 모듈(컴퓨터 프로그램 또는 이를 구현한 컴퓨터 프로세서)을 가리킨다. 전체 코딩 프로세스에서 '도구'에 의하여 수행되는 코딩 과정이 범위는 특별한 제한이 없다. 예컨대, '도구'는 H.264나 HEVC 등과 같은 국제 비디오 부호화 표준에서 통상적으로 사용되는 인코딩/디코딩 블록도의 일 기능(예컨대, 인트라 예측 코딩 과정, 인터 예측 코딩 과정, 양자화/역양자화 과정, 엔트로피 코딩 과정, 인-루프 필터링 과정, 등)을 가리킬 수 있다. 또는, '도구'는 상기 인코딩/디코딩 블록도의 일 기능에 포함되는 일부 과정(예컨대, 인트라 예측 코딩을 위한 예측 모드 결정 과정, 양자화 과정을 위한 양자화 상수 결정 과정 등)을 가리킬 수도 있다.In a tool unit based method, the term 'tool' refers to a module (a computer program or a computer processor implementing it) for performing some existing component or function of a codec. There is no particular limitation on the scope of the coding process performed by the 'tool' in the entire coding process. For example, the 'tool' is a function of encoding / decoding block diagrams commonly used in international video coding standards such as H.264 or HEVC (eg, intra prediction coding process, inter prediction coding process, quantization / dequantization process, entropy). Coding process, in-loop filtering process, etc.). Alternatively, the 'tool' may refer to some processes (eg, a prediction mode determination process for intra prediction coding, a quantization constant determination process for quantization process, etc.) included in one function of the encoding / decoding block diagram.

도 1은 본 발명의 일 실시예에 따른 DNN 기반 이미지 또는 비디오 코딩을 위한 시스템의 구성을 보여 주는 것으로서, 도구 단위 기반 방법에 따른 시스템이다. 도 1을 참조하면, DNN 기반 비디오 코딩 시스템(100)은 네트워크 트레이닝 프레임워크(network training framework, 110), 인코더(encoder, 120) 및 디코더(decoder, 130)를 포함한다. 그리고 DNN 기반 비디오 코딩 시스템(100)은 신경망 압축부(Neural Network(NN) compression, 140)를 더 포함할 수 있다.1 illustrates a configuration of a system for DNN-based image or video coding according to an embodiment of the present invention, and is a system according to a tool unit-based method. Referring to FIG. 1, the DNN-based video coding system 100 includes a network training framework 110, an encoder 120, and a decoder 130. The DNN-based video coding system 100 may further include a neural network (NN) compression 140.

네트워크 트레이닝 프레임워크(110)는 DNN 기반 코딩 도구를 학습시켜서 기술한 학습된 도구 신경망을 생성한다. 이를 위하여, 네트워크 트레이닝 프레임워크(110)는 우선 DNN 기반 코딩 도구들의 일 집합을 고안하고 또한 이들을 각각 학습시킨다. 여기서, 고안된 DNN 기반 코딩 도구는 소정의 망 구조(network structure)를 갖는 소정의 네트워크 모델을 가리킨다. The network training framework 110 trains the DNN based coding tool to generate the learned tool neural network described. To this end, network training framework 110 first devises a set of DNN-based coding tools and learns each of them. Herein, the designed DNN-based coding tool indicates a predetermined network model having a predetermined network structure.

DNN 기반 코딩 도구는 2가지 유형이 존재할 수 있다. 보다 구체적으로, DNN 기반 코딩 도구는 인코더(120) 및 디코더(130) 모두에 필수적인 기능을 구현하는 제1 코딩 도구(도 1에서는 Type-A로 표시되어 있는 도구 A(tool A) 및 도구 B(tool B)가 이에 해당됨)와 인코더(120)와 디코더(130) 중에서 어느 하나에만 필수적인 기능을 구현하는 제2 코딩 도구(도 1에서는 Type-B로 표시되어 있는 도구 C(tool C) 및 도구 D(tool D)가 이에 해당됨)의 두 가지 유형이 존재한다. 이것은 이미지/비디오 코딩에서, 일부의 코딩 도구, 즉 제1 코딩 도구는 인코더(120)와 디코더(130) 모두에 요구되는 것이고, 나머지 다른 일부의 코딩 도구, 즉 제2 코딩 도구는 인코더(120)와 디코더(130) 중에서 어느 하나에만 요구되는 기능이기 때문이다. 예를 들어, 비디오 코딩 과정에서, 인-루프 필터링 과정은 인코더(120)와 디코더(130) 모두에서 행해지는 제1 코딩 도구의 기능에 해당하지만, 인트라 모드 예측 과정은 오직 인코더(120)에서만 행해지는 제2 코딩 도구의 기능에 해당되며, 디코더(130)로는 오직 결정된 예측 모드 정보만이 보내진다.There are two types of DNN-based coding tools. More specifically, the DNN-based coding tool is the first coding tool (tool A and tool B (marked as Type-A in FIG. 1) that implements the functionality necessary for both encoder 120 and decoder 130. tool B) and a second coding tool (tool C and tool D, denoted as Type-B in FIG. 1) that implements functionality essential to only one of encoder 120 and decoder 130. (tool D) corresponds to these two types. This means that in image / video coding, some coding tool, i.e., the first coding tool is required for both encoder 120 and decoder 130, and the other part of the coding tool, i.e., the second coding tool is encoder 120. This is because the function is required only in any one of the and the decoder 130. For example, in the video coding process, the in-loop filtering process corresponds to the functionality of the first coding tool performed at both the encoder 120 and the decoder 130, while the intra mode prediction process is performed only at the encoder 120. Corresponds to the function of the second coding tool, and only the determined prediction mode information is sent to the decoder 130.

따라서 두 가지 유형의 DNN 기반 코딩 도구가 고려되어야 하며, 이러한 DNN 기반 코딩 도구의 유형(type)은 반드시 학습된 DNN 기반 코딩 도구의 상위 레벨 정보로서 표시가 되어야 한다. 전술한 바와 같이, 도 1에서는 인코더(120)와 디코더(130) 모두에게 필요한 도구 A(tool A) 및 도구 B(tool B)는 Type-A로 표시되어 있으며, 인코더(120)와 디코더(130) 중에서 하나에게만 필요한 도구 C(tool C) 및 도구 D(tool D)는 Type-B로 표시되어 있다.Therefore, two types of DNN-based coding tools should be considered, and the type of such DNN-based coding tools must be represented as higher level information of the learned DNN-based coding tools. As described above, in FIG. 1, tool A and tool B necessary for both the encoder 120 and the decoder 130 are denoted by Type-A, and the encoder 120 and the decoder 130 are shown in FIG. Tool C and tool D, which are required only by one of these, are denoted by Type-B.

그리고 네트워크 트레이닝 프레임워크(110)에서 학습된 DNN 기반 도구의 결과물(results of trained DNN based tool), 즉 학습된 신경망은 인코더(120) 및/또는 디코더(130)로 전달되어야 하며 또는 상호 호환이 가능하게 교환되어야 한다. 이것은 네트워크 구조와 학습된 가중치(network structure and trained weight)를 포함하는 학습된 신경망(trained network)을 기술하기 위한 호환가능한 포맷(interoperable format)을 규정함으로써 달성할 수 있다. 호환가능한 포맷은 학습된 신경망뿐만 아니라 시스템 구성(system configuration)에 관한 상위 레벨 정보(high-level information)를 기술할 필요가 있다. 예를 들어, 도구 단위 기반 방법의 경우에는, 해당 도구가 인코더(120)와 디코더(130) 모두에 공통적으로 적용되는지 또는 어느 하나에만 적용되는지를 지시하기 위한 도구의 유형 정보가 상호호환 가능한 포맷으로 기술되어야 한다.The results of the trained DNN based tool trained in the network training framework 110, that is, the learned neural network, must be delivered to the encoder 120 and / or the decoder 130 or are mutually compatible. Should be exchanged. This can be accomplished by defining an interoperable format for describing a trained network that includes network structure and trained weights. The compatible format needs to describe high-level information about the system configuration as well as the learned neural networks. For example, in the case of a tool unit based method, the type information of the tool to indicate whether the tool is commonly applied to both the encoder 120 and the decoder 130 or to only one is in an interchangeable format. Should be described.

이와 같이, 상호호환 가능한 포맷으로 기술된 학습된 신경망은 학습된 DNN 기반 도구 신경망뿐만 아니라 도 1에 도시되어 있는 시스템의 전반적인 구성에 관한 상위 레벨 정보를 포함하고 있어야 한다.As such, the trained neural networks described in an interoperable format should include higher level information about the overall configuration of the system shown in FIG. 1 as well as the trained DNN-based tool neural networks.

보다 구체적으로, 학습된 DNN 기반 도구 신경망과 관련된 상위 레벨 정보로는, 인식(recognition), 분류(classification), 생성(generation), 차별화(discrimination) 등과 같은 해당 신경망의 기본 기능의 관점에서 본 목표 응용(target application)에 관한 정보, 도구 단위 기반 방법인지 또는 전체 코덱 단위 기반 방법인지에 관한 정보, 도구 단위 기반 방법에서 학습된 DNN 기반 코딩 도구의 유형을 지시하는 정보, 도구 단위 기반 방법에서 인코더가 특정 부호화 과정의 수행시에 학습된 도구 신경망을 추론 엔진에 적용하는 것과 규격화된 이미지 또는 비디오 부호화 도구를 적용하는 것 중에서 무엇을 선택하였는지를 지시하는 정보, 최적화된 콘텐츠 유형(customized content type)에 관한 정보, 오토인코더(autoencoder), CNN(Convolutional Neural Network), GAN(Generative Adversarial Network), RNN(Recurrent Neural Network) 등과 같은 학습된 DNN 기반 신경망의 알고리즘에 관한 기초 정보, 학습 데이터 및/또는 테스트 데이터에 관한 기본 정보, 메모리 용량 및 컴퓨팅 파워의 관점에서 추론 엔진에 요구되는 능력에 관한 정보, 모델 압축에 관한 정보 등을 포함한다.More specifically, the high-level information related to the learned DNN-based tool neural network includes target application in view of the basic functions of the neural network such as recognition, classification, generation, differentiation, and the like. information about a target application, whether it is a tool unit based method or an entire codec unit based method, information indicating the type of DNN-based coding tool learned in a tool unit based method, or an encoder is specific to a tool unit based method. Information indicating whether to apply the learned tool neural network to the inference engine when performing the encoding process, or to apply a standardized image or video encoding tool, information about a customized content type, Autoencoder, Convolutional Neural Network (CNN), Generic Adversarial Network (GAN), Recurrent Basic information about the algorithms of trained DNN-based neural networks, such as Neural Network, etc., basic information about training data and / or test data, information about the capabilities required by the inference engine in terms of memory capacity and computing power, and model compression. Information about the unit and the like.

그리고 네트워크 트레이닝 프레임워크(110)는 하나 이상의 학습된 도구 신경망, 즉 학습된 도구 단위 신경망의 코딩된 기술(trained coded representation of neural network)을 선택하여 인코더(120) 및/또는 디코더(130)로 전송한다. 전술한 바와 같이, 학습된 도구 신경망의 유형, 즉 도구의 유형에 따라서 인코더(120)와 디코더(130) 모두에게 전송되거나 또는 인코더(120) 또는 디코더(130)에게만 전송될 수 있다. 보다 구체적으로, 도 1에 예시적으로 도시된 바와 같이, 인코더(120)와 디코더(130) 모두에게 필요한 Type-A의 도구 A(tool A) 및 도구 B(tool B)는 인코더(120)와 디코더(130) 모두에게 전송되지만, 디코더(130)에게 필요한 Type-B의 도구C(tool C) 및 도구 D(tool D)는 디코더(130)에게만 전송된다.The network training framework 110 then selects one or more trained coded representations of neural networks trained tool neural networks, that is, trained coded representations of neural networks, and transmits them to the encoder 120 and / or the decoder 130. do. As described above, it may be transmitted to both the encoder 120 and the decoder 130 or only to the encoder 120 or the decoder 130 depending on the type of the tool neural network learned, that is, the type of the tool. More specifically, as exemplarily shown in FIG. 1, Type-A tool A and tool B necessary for both the encoder 120 and the decoder 130 may be connected to the encoder 120. Although transmitted to all of the decoders 130, Tool-C and tool D of Type-B required by the decoder 130 are transmitted only to the decoder 130.

이 때, 네트워크 트레이닝 프레임워크(110)로부터 인코더(120) 및/또는 디코더(130)로 전송되는 학습된 도구 신경망의 전부 또는 일부는 신경망 압축부(140)를 거쳐서 전송될 수도 있다. 즉, 네트워크 트레이닝 프레임워크(110)에서 생성되어 기술된 학습된 도구 신경망의 일부는 네트워크 트레이닝 프레임워크(110)로부터 인코더(120) 및/또는 디코더(130)로 바로 전송되지만, 학습된 도구 신경망의 나머지 일부는 신경망 압축부(140)를 경유하여 네트워크 트레이닝 프레임워크(110)로부터 인코더(120) 및/또는 디코더(130)로 전송될 수 있다. In this case, all or part of the learned tool neural network transmitted from the network training framework 110 to the encoder 120 and / or the decoder 130 may be transmitted through the neural network compression unit 140. That is, some of the learned tool neural networks created and described in the network training framework 110 are transmitted directly from the network training framework 110 to the encoder 120 and / or the decoder 130, The other part may be transmitted from the network training framework 110 to the encoder 120 and / or the decoder 130 via the neural network compression unit 140.

신경망 압축부(140)는 학습된 도구 신경망을 압축하여 호환 가능한 포맷으로 기술하기 위한 수단이다. 이러한 신경망 압축부(140)의 일례는 가속기 라이브러리(accelerator library)이다. 가속기 라이브러리, 즉 신경망 압축부(140)를 이용한 학습된 도구 신경망의 압축은 선택적(optional) 과정이다. 즉, 학습된 도구 신경망이 압축을 거치지 않고서도 인코더(120) 및/또는 디코더(130)로 전송될 수 있을 정도로 충분히 컴팩트한 경우이거나 또는 압축으로 인하여 코덱의 성능을 심각하게 떨어뜨리는 경우 등에는, 학습된 도구 신경망은 신경망 압축부(140)를 거치지 않고 직접 인코더(120) 및/또는 디코더(130)로 전송될 수 있다.The neural network compression unit 140 is a means for compressing the learned tool neural network in a compatible format. One example of such a neural network compression unit 140 is an accelerator library. The compression of the learned tool neural network using the accelerator library, that is, the neural network compression unit 140, is an optional process. That is, if the learned tool neural network is compact enough to be transmitted to the encoder 120 and / or decoder 130 without being compressed, or if the compression significantly degrades the performance of the codec. The learned tool neural network may be directly transmitted to the encoder 120 and / or the decoder 130 without passing through the neural network compression unit 140.

본 실시예의 일 측면에 의하면, 네트워크 트레이닝 프레임워크(110)에는 동일한 코딩 기능을 구현하는 복수의 DNN 기반 코딩 도구, 즉 복수의 도구 단위 신경망 모델이 이용 가능하도록 준비되어 설정되어 있을 수 있다. 그리고 네트워크 트레이닝 프레임워크(110)는 소정의 기준에 따라 이용 가능한 상기 복수의 DNN 기반 코딩 도구 중에서 하나를 선택하여 학습된 도구 신경망을 생성할 수 있다. 예를 들어, 네트워크 트레이닝 프레임워크(110)는 부호화하고자 하는 입력 콘텐츠의 유형에 기초하여 따라서 보다 높은 성능을 보이는 보다 적절한 코딩 도구를 선택할 수 있다.According to an aspect of the present embodiment, the network training framework 110 may be prepared and configured to use a plurality of DNN-based coding tools, that is, a plurality of tool unit neural network models that implement the same coding function. The network training framework 110 may generate one of the learned tool neural networks by selecting one of the plurality of DNN-based coding tools available according to a predetermined criterion. For example, the network training framework 110 may select a more appropriate coding tool that thus provides higher performance based on the type of input content to be encoded.

인코더(120)는 입력되는 이미지 또는 비디오를 부호화하여 부호화된 비트스트림을 생성하여 출력한다. 인코더(120)에서의 비디오 부호화 과정은 특정한 기존의 비디오 부호화 규격(표준), 예컨대 H.264 또는 HEVC 등에 규격에 따른 일련의 과정으로 이루어져 있다. 특히, 본 실시예에 따른 인코더(120)는 비디오 부호화 규격에 따른 일련의 과정 중에서 일부의 과정이 코딩 도구 단위로 DNN 기반 기술에 의하여 대체되어 수행된다. 이것은, 기존의 과정을 대체한 DNN 기반 기술, 즉 학습된 DNN 기반 코딩 도구에서의 출력은 비디오 부호화 규격에 규정되어 있는 사항을 따르며, 그 결과 비록 전체 부호화 과정의 일부 기능이 DNN 기반 기술에 의하여 대체되더라도 최종적으로 출력되는 부호화된 비트스트림도 기존의 비디오 부호화 규격에 따른 결과물과 동일하다는 것을 전제로 한다는 것은 자명하다. The encoder 120 encodes an input image or video to generate and output an encoded bitstream. The video encoding process in the encoder 120 consists of a series of processes according to a specific existing video encoding standard (standard), such as H.264 or HEVC. In particular, the encoder 120 according to the present embodiment is performed by replacing some of the processes according to the video coding standard by DNN-based technology in units of coding tools. This means that the output from the DNN-based technology, that is, the learned DNN-based coding tool, replaces the existing process, as specified in the video coding standard, and as a result, although some functions of the entire coding process are replaced by the DNN-based technology. Even if it is, it is obvious that the final coded bitstream is identical to the result of the existing video encoding standard.

이를 위하여, 인코더(120)는 DNN 기반 기술을 적용하기 위한 추론 엔진(inference engine)을 구비한다. 그리고 인코더(120)의 추론 엔진에서는, 호환가능한 포맷으로 기술되어 있는 학습된 DNN 기반 코딩 도구(학습된 신경망)가 사용된다. To this end, the encoder 120 has an inference engine for applying the DNN-based technology. And in the inference engine of encoder 120, a trained DNN-based coding tool (learned neural network) described in a compatible format is used.

보다 구체적으로, 본 실시예에 따른 인코더(120)는, 기존의 비디오 부호화 규격에 따른 전체 부호화 과정 중에서, 네트워크 트레이닝 프레임워크(110)로부터 수신되는 DNN 기반 코딩 도구에 대응되는 제1 부호화 과정에서는 학습된 도구 신경망을 내부의 추론 엔진에 적용하여 입력되는 비디오(또는 해당 과정의 이전 과정에서의 부호화 결과물)를 부호화하는 반면, 상기 제1 부호화 과정 이외의 부호화 과정(즉, 제2 부호화 과정)에서는 기존의 비디오 부호화 규격에 따른 방식(즉, 기존의 비디오 부호화 도구)를 적용하여 입력되는 비디오(또는 해당 과정의 이전 과정에서의 부호화 결과물)을 부호화한다.More specifically, the encoder 120 according to the present embodiment learns in the first encoding process corresponding to the DNN-based coding tool received from the network training framework 110 among all encoding processes according to the existing video encoding standard. The applied tool neural network is applied to the internal inference engine to encode the input video (or the encoding result of the previous process of the process), while the encoding process other than the first encoding process (that is, the second encoding process) Encodes the input video (or the encoding result of the previous step of the corresponding process) by applying a method according to the video encoding standard of the (i.e., an existing video encoding tool).

본 실시예의 일 측면에 의하면, 인코더(120)는 부호화 과정 중에서 특정한 과정을 수행하는데 있어서, 소정의 기준, 예컨대 R-D 비용(cost) 등과 같은 코딩 효율을 지시하는 정보에 기초하여 DNN 기반 코딩 도구와 기존의 비디오 부호화 규격에 따른 코딩 도구 중에서 하나를 선택할 수 있다. 즉, 인코더(120)는 비록 네트워크 트레이닝 프레임워크(110)로부터 특정한 부호화 과정을 수행하기 위한 학습된 DNN 기반 코딩 도구, 즉 학습된 도구 신경망을 수신하였다고 하더라도, 해당 과정에서 반드시 DNN 기반 코딩 도구를 적용하여 해당 과정에 대한 부호화를 수행할 필요는 없으며, 코딩 효율을 더 향상시킬 수 있는 경우라면 기존의 비디오 부호화 규격에 따른 코딩 도구를 적용하여 해당 과정에 대한 부호화를 수행할 수도 있다. 그리고 인코더(120)는 해당 과정을 수행함에 있어서, DNN 기반 코딩 도구와 기존의 비디오 부호화 규격에 따른 코딩 도구를 선택하여 부호화를 했는지는, 상위 레벨 정보로 포함시킬 수 있다.According to an aspect of the present embodiment, the encoder 120 performs a specific process in the encoding process, based on a predetermined criterion, for example, information indicating the coding efficiency such as RD cost, etc. One of the coding tools according to the video coding standard of may be selected. That is, even if the encoder 120 receives a trained DNN-based coding tool, ie, a learned tool neural network, for performing a specific encoding process from the network training framework 110, the encoder 120 necessarily applies the DNN-based coding tool in the corresponding process. There is no need to perform the encoding for the corresponding process, and if the coding efficiency can be further improved, the encoding of the corresponding process may be performed by applying a coding tool according to the existing video encoding standard. In performing the corresponding process, the encoder 120 may include, as higher level information, whether the DNN-based coding tool and the coding tool according to the existing video coding standard are encoded.

인코더(120)로부터 출력되는 부호화된 비트스트림은 출력되어 디코더(130)로 전송되거나 또는 소정의 저장 매체에 저장될 수도 있다. 그리고 어떤 경우이든 인코더(120)로부터 출력되는 부호화된 비트스트림은 디코더(130)의 입력으로 입력되어 복호화되며, 그 결과 재구성된 이미지 또는 비디오가 출력된다. The encoded bitstream output from the encoder 120 may be output and transmitted to the decoder 130 or may be stored in a predetermined storage medium. In any case, the encoded bitstream output from the encoder 120 is input to the decoder 130 and decoded, and as a result, a reconstructed image or video is output.

디코더(130)에서의 비디오 복호화 과정은 특정한 기존의 비디오 복호화 규격(표준), 예컨대 H.264 또는 HEVC 등에 규격에 따른 일련의 과정으로 이루어져 있다. 특히, 본 실시예에 따른 디코더(130)는 비디오 부호화 규격에 따른 일련의 과정 중에서 일부의 과정이 코딩 도구 단위로 DNN 기반 기술에 의하여 대체되어 수행된다. 이것은, 기존의 과정을 대체한 DNN 기반 기술, 즉 학습된 DNN 기반 코딩 도구에서의 출력은 비디오 부호화 규격에 규정되어 있는 사항을 따르며, 그 결과 비록 전체 복호화 과정의 일부 기능이 DNN 기반 기술에 의하여 대체되더라도 각 과정에서 출력되는 데이터도 기존의 비디오 부호화 규격에 따른 결과물과 동일하다는 것을 전제로 한다는 것은 자명하다. The video decoding process in the decoder 130 consists of a series of processes according to a specific existing video decoding standard (standard), such as H.264 or HEVC. In particular, the decoder 130 according to the present embodiment is performed by replacing some of the processes according to the video coding standard by the DNN-based technology in units of coding tools. This means that the output from the DNN-based technology, that is, the learned DNN-based coding tool, replaces the existing process, as specified in the video coding specification, so that some functions of the entire decoding process are replaced by the DNN-based technology. Even if it is, it is obvious that the data output in each process is the same as the output according to the existing video coding standard.

이를 위하여, 디코더(130)는 DNN 기반 기술을 적용하기 위한 추론 엔진(inference engine)을 구비한다. 그리고 디코더(130)의 추론 엔진에서는, 호환가능한 포맷으로 기술되어 있는 학습된 DNN 기반 코딩 도구(학습된 신경망)가 사용된다. To this end, the decoder 130 includes an inference engine for applying the DNN-based technology. And in the inference engine of decoder 130, a trained DNN-based coding tool (learned neural network) described in a compatible format is used.

디코더(130)는 기존의 비디오 부호화 규격에 따른 전체 복호화 과정 중에서, 네트워크 트레이닝 프레임워크(110)로부터 수신되는 DNN 기반 코딩 도구에 대응되는 제1 복호화 과정에서는 학습된 도구 신경망을 내부의 추론 엔진에 적용하여 입력되는 비트스트림(또는 해당 과정의 이전 과정에서의 복호화 결과물)를 복호화하는 반면, 상기 제1 복호화 과정 이외의 복호화 과정(즉, 제2 복호화 과정)에서는 기존의 비디오 부호화 규격에 따른 방식(즉, 기존의 비디오 복호화 도구)를 적용하여 입력되는 비트스트림(또는 해당 과정의 이전 과정에서의 복호화 결과물)을 복호화한다.The decoder 130 applies the learned tool neural network to the internal inference engine in the first decoding process corresponding to the DNN-based coding tool received from the network training framework 110 among all decoding processes according to the existing video coding standard. While decoding the input bit stream (or the decoding result of the previous process of the corresponding process), in the decoding process other than the first decoding process (that is, the second decoding process) according to the existing video encoding standard (that is, In operation, a conventional video decoding tool is applied to decode an input bitstream (or a decoding result of a previous process of the corresponding process).

도 2는 본 발명의 일 실시예에 따른 DNN 기반 이미지 또는 비디오 코딩을 위한 방법을 보여 주는 흐름도이다. 도 2에 도시된 방법은 도 1에 도시된 시스템에서 수행되는 것이다. 따라서, 불필요한 중복 설명을 피하기 위하여, 본 발명의 일 실시예에 따른 DNN 기반 이미지 또는 비디오 코딩을 위한 방법에 대하여 간략히 설명하기로 한다. 따라서 여기에서 설명되지 않은 사항은 도 1을 참조하여 전술한 내용이 동일하게 적용될 수 있다. 2 is a flowchart illustrating a method for DNN based image or video coding according to an embodiment of the present invention. The method shown in FIG. 2 is performed in the system shown in FIG. Therefore, in order to avoid unnecessary overlapping description, a method for DNN-based image or video coding according to an embodiment of the present invention will be briefly described. Therefore, the matters not described herein may be equally applied to the above description with reference to FIG. 1.

도 2를 참조하면, 우선 DNN 기반 코딩 도구를 학습시켜서 기술한 학습된 도구 신경망을 생성하여 인코더 및/또는 디코더를 위하여 전송한다(S10). 본 단계는 도 1에 도시된 시스템(100)의 네트워크 트레이닝 프레임워크(110)에 의하여 수행될 수 있다. Referring to FIG. 2, first, a trained tool neural network described by learning a DNN-based coding tool is generated and transmitted for an encoder and / or a decoder (S10). This step may be performed by the network training framework 110 of the system 100 shown in FIG. 1.

그리고 입력되는 이미지 또는 비디오를 부호화하여 부호화된 비트스트림을 생성하는 부호화 과정이 수행된다(S11).부호화 과정 중에서, DNN 기반 코딩 도구에 대응되는 제1 부호화 과정에서는 학습된 도구 신경망을 인코더의 추론 엔진에 적용하고 또한 제1 부호화 과정 이외의 제2 부호화 과정에서는 인코더의 규격화된 이미지 또는 비디오 부호화 도구를 적용한다. 본 단계는 도 1에 도시된 시스템(100)의 인코더(120)에서 수행될 수 있다.An encoding process of encoding an input image or video and generating an encoded bitstream is performed (S11). In the encoding process, in the first encoding process corresponding to the DNN-based coding tool, the trained tool neural network is derived from the inference engine of the encoder. In the second encoding process other than the first encoding process, a standardized image or video encoding tool of the encoder is applied. This step may be performed in the encoder 120 of the system 100 shown in FIG.

또한, 부호화 단계에서 생성된 부호화된 비트스트림을 복호화하는 복호화 과정이 수행된다(S12). 복호화 과정 중에서, DNN 기반 코딩 도구에 대응되는 제1 복호화 과정에서는 학습된 도구 신경망을 디코더의 추론 엔진에 적용하고 또한 제1 복호화 과정 이외의 제2 복호화 과정에서는 디코더의 규격화된 이미지 또는 비디오 복호화 도구를 적용한다. 본 단계는 도 1에 도시된 시스템(100)의 디코더(130)에서 수행될 수 있다. In addition, a decoding process of decoding the encoded bitstream generated in the encoding step is performed (S12). In the decoding process, in the first decoding process corresponding to the DNN-based coding tool, the learned tool neural network is applied to the inference engine of the decoder, and in the second decoding process other than the first decoding process, the normalized image or video decoding tool of the decoder is applied. Apply. This step may be performed by the decoder 130 of the system 100 shown in FIG. 1.

전체 코덱 기반 방법Full codec based method

전체 코덱 기반 방법에서, 용어 '코덱'은 이미지 또는 비디오에 대한 부호화 또는 복호화의 전 과정을 수행하기 위한 모듈(컴퓨터 프로그램 또는 이를 구현한 컴퓨터 프로세서)을 가리킨다. 이러한 코덱은 부호화 또는 복호화의 전 과정을 수행하기 위한 일련의 과정으로 구성되어 있는데, 여기서 각 과정의 전부 또는 일부, 또는 복수의 과정은 전술한 '도구'의 대응된다. 따라서 '코덱'은 H.264나 HEVC 등과 같은 국제 비디오 부호화 표준에서 규정되어 있는 전체 인코딩 블록도 또는 전체 디코딩 블록도를 가리킨다.In the entire codec based method, the term 'codec' refers to a module (a computer program or a computer processor implementing the same) for performing the entire process of encoding or decoding on an image or video. The codec is composed of a series of processes for performing the entire process of encoding or decoding, wherein all or part of each process or a plurality of processes correspond to the above-described 'tool'. Thus, 'codec' refers to an entire encoding block diagram or an entire decoding block diagram defined in international video encoding standards such as H.264 or HEVC.

도 3은 본 발명의 일 실시예에 따른 DNN 기반 이미지 또는 비디오 코딩을 위한 시스템의 구성을 보여 주는 것으로서, 전체 코덱 기반 방법에 따른 시스템이다. 도 31을 참조하면, DNN 기반 비디오 코딩 시스템(200)은 네트워크 트레이닝 프레임워크(network training framework, 210), 인코더(encoder, 220) 및 디코더(decoder, 230)를 포함한다. 그리고 DNN 기반 비디오 코딩 시스템(200)은 신경망 압축부(Neural Network(NN) compression, 240)를 더 포함할 수 있다.3 illustrates a configuration of a system for DNN-based image or video coding according to an embodiment of the present invention, and is a system according to the entire codec-based method. Referring to FIG. 31, the DNN based video coding system 200 includes a network training framework 210, an encoder 220, and a decoder 230. The DNN-based video coding system 200 may further include a neural network (NN) compression 240.

네트워크 트레이닝 프레임워크(210)는 DNN 기반 인코더를 학습시켜서 기술한 학습된 인코더 신경망과 DNN 기반 디코더를 학습시켜서 기술한 학습된 디코더 신경망을 각각 생성한다. 이를 위하여, 네트워크 트레이닝 프레임워크(210)는 우선 DNN 기반 인코더 및 DNN 기반 디코더 각각의 일 집합을 고안하고 또한 이들을 각각 트레이닝시킨다. 여기서, 고안된 DNN 기반 인코더와 디코더는 각각 소정의 망 구조(network structure)를 갖는 소정의 네트워크 모델을 가리킨다. The network training framework 210 generates the learned encoder neural network described by learning the DNN-based encoder and the learned decoder neural network described by learning the DNN-based decoder. To this end, the network training framework 210 first devises a set of each of the DNN based encoder and the DNN based decoder and also trains each of them. Here, the designed DNN-based encoder and decoder indicate a predetermined network model each having a predetermined network structure.

그리고 네트워크 트레이닝 프레임워크(210)에서 학습된 DNN 인코더 및 디코더 각각의 결과물(results of each of trained DNN based encoder and decoder), 즉 학습된 인코더 신경망 및 디코더 신경망(이하, '학습된 신경망'이라고도 한다)은 각각 인코더(220) 또는 디코더(230)로 전달되어야 하며 또는 상호 호환이 가능하게 교환되어야 한다. 이것은 네트워크 구조와 학습된 가중치(network structure and trained weight)를 포함하는 학습된 신경망(trained network)을 기술하기 위한 호환가능한 포맷(interoperable format)을 규정함으로써 달성할 수 있다. 호환가능한 포맷은 학습된 신경망뿐만 아니라 시스템 구성(system configuration)에 관한 상위 레벨 정보(high-level information)를 기술할 필요가 있다.And results of each of trained DNN based encoder and decoder, that is, learned encoder neural network and decoder neural network (hereinafter also referred to as 'learned neural network') in the network training framework 210. Must be passed to encoder 220 or decoder 230, respectively, or exchanged interchangeably. This can be accomplished by defining an interoperable format for describing a trained network that includes network structure and trained weights. The compatible format needs to describe high-level information about the system configuration as well as the learned neural networks.

이와 같이, 상호호환 가능한 포맷으로 기술된 학습된 신경망은 학습된 DNN 기반 인코더 신경망 또는 디코더 신경망뿐만 아니라 도 3에 도시되어 있는 시스템의 전반적인 구성에 관한 상위 레벨 정보를 포함하고 있어야 한다.As such, the trained neural network described in the interchangeable format should include high-level information about the overall configuration of the system shown in FIG. 3 as well as the trained DNN based encoder neural network or decoder neural network.

보다 구체적으로, 학습된 DNN 기반 인코더/디코더 신경망과 관련된 상위 레벨 정보로는, 인식(recognition), 분류(classification), 생성(generation), 차별화(discrimination) 등과 같은 해당 신경망의 기본 기능의 관점에서 본 목표 응용(target application)에 관한 정보, 도구 단위 기반 방법인지 또는 전체 코덱 단위 기반 방법인지에 관한 정보, 최적화된 콘텐츠 유형(customized content type)에 관한 정보, 오토인코더(autoencoder), CNN(Convolutional Neural Network), GAN(Generative Adversarial Network), RNN(Recurrent Neural Network) 등과 같은 학습된 DNN 기반 신경망의 알고리즘에 관한 기초 정보, 트레이닝 데이터 및/또는 테스트 데이터에 관한 기본 정보, 메모리 용량 및 컴퓨팅 파워의 관점에서 추론 엔진에 요구되는 능력에 관한 정보, 모델 압축에 관한 정보 등을 포함한다.More specifically, the high-level information related to the learned DNN-based encoder / decoder neural network may be viewed in terms of the basic functions of the neural network such as recognition, classification, generation, and discrimination. Information about the target application, whether it is a tool unit based method or an entire codec unit based method, information about customized content types, autoencoder, CNN (Convolutional Neural Network) Reasoning in terms of basic information about algorithms of trained DNN-based neural networks, such as GAN (Generative Adversarial Network) and Recurrent Neural Network (RNN), basic information about training data and / or test data, memory capacity and computing power. Includes information about capabilities required for the engine, information about model compression, and the like.

그리고 네트워크 트레이닝 프레임워크(210)는 학습된 신경망, 즉 학습된 인코더 신경망의 코딩된 기술(trained coded representation of encoder neural network)과 학습된 디코더 신경망의 코딩된 기술(trained coded representation of decoder neural network)을 각각 선택하여 인코더(220) 또는 디코더(230)로 전송한다. The network training framework 210 may also include a trained coded representation of decoder neural network and a trained coded representation of decoder neural network. Each of them is selected and transmitted to the encoder 220 or the decoder 230.

이 때, 네트워크 트레이닝 프레임워크(210)로부터 인코더(220) 및/또는 디코더(230)로 전송되는 학습된 신경망의 전부 또는 일부는 신경망 압축부(240)를 거쳐서 전송될 수도 있다. 즉, 네트워크 트레이닝 프레임워크(210)에서 생성되어 기술된 학습된 신경망의 일부는 네트워크 트레이닝 프레임워크(210)로부터 인코더(220) 또는 디코더(230)로 바로 전송되지만, 학습된 도구 신경망의 나머지 일부는 신경망 압축부(240)를 경유하여 네트워크 트레이닝 프레임워크(210)로부터 인코더(220) 또는 디코더(230)로 전송될 수 있다. In this case, all or part of the learned neural network transmitted from the network training framework 210 to the encoder 220 and / or the decoder 230 may be transmitted through the neural network compression unit 240. That is, some of the learned neural networks created and described in the network training framework 210 are transferred directly from the network training framework 210 to the encoder 220 or the decoder 230, while the remaining portions of the learned tool neural networks are It may be transmitted from the network training framework 210 to the encoder 220 or the decoder 230 via the neural network compressor 240.

신경망 압축부(240)는 학습된 인코더 신경망 및/또는 학습된 디코더 신경망 각각을 압축하여 호환 가능한 포맷으로 기술하기 위한 수단이다. 이러한 신경망 압축부(240)의 일례는 가속기 라이브러리(accelerator library)이다. 가속기 라이브러리, 즉 신경망 압축부(240)를 이용한 학습된 신경망의 압축은 선택적(optional) 과정이다. 즉, 학습된 신경망이 압축을 거치지 않고서도 인코더(220) 또는 디코더(230)로 전송될 수 있을 정도로 충분히 컴팩트한 경우이거나 또는 압축으로 인하여 코덱의 성능을 심각하게 떨어뜨리는 경우 등에는, 학습된 신경망은 신경망 압축부(240)를 거치지 않고 직접 인코더(220) 또는 디코더(230)로 전송될 수 있다.The neural network compressor 240 is a means for compressing each of the learned encoder neural networks and / or the learned decoder neural networks in a compatible format. One example of such a neural network compression unit 240 is an accelerator library. The compression of the learned neural network using the accelerator library, that is, the neural network compression unit 240, is an optional process. That is, when the learned neural network is compact enough to be transmitted to the encoder 220 or the decoder 230 without undergoing compression, or when the performance of the codec seriously degrades the performance of the codec due to compression, The N may be transmitted directly to the encoder 220 or the decoder 230 without passing through the neural network compressor 240.

본 실시예의 일 측면에 의하면, 네트워크 트레이닝 프레임워크(210)에는 동일한 인코더/디코더를 구현하는 복수의 DNN 기반 인코더/디코더, 즉 복수의 인코더/디코더 신경망 모델이 이용 가능하도록 준비되어 설정되어 있을 수 있다. 그리고 네트워크 트레이닝 프레임워크(210)는 소정의 기준에 따라 이용 가능한 상기 복수의 DNN 기반 인코더/디코더 중에서 하나를 선택하여 학습된 인코더/디코더 신경망을 생성할 수 있다. 예를 들어, 네트워크 트레이닝 프레임워크(210)는 부호화하고자 하는 입력 콘텐츠의 유형에 기초하여 따라서 보다 높은 성능을 보이는 보다 적절한 인코더/디코더를 선택할 수 있다.According to an aspect of the present embodiment, the network training framework 210 may be prepared and configured to use a plurality of DNN-based encoder / decoder, that is, a plurality of encoder / decoder neural network model that implements the same encoder / decoder. . The network training framework 210 may select one of the plurality of DNN based encoders / decoders available according to a predetermined criterion to generate a learned encoder / decoder neural network. For example, the network training framework 210 may select a more appropriate encoder / decoder, thus showing higher performance, based on the type of input content to be encoded.

인코더(220)는 입력되는 이미지 또는 비디오를 부호화하여 특성 벡터(feature vector)를 생성하여 출력한다. 특히, 본 실시예에 따른 인코더(220)는 비디오 부호화 규격에 따른 일련의 과정 전체가 전체 코덱 단위로 DNN 기반 기술에 의하여 대체되어 수행되며, 그 결과물로서 입력되는 이미지의 차원(dimensions)을 감소시킨 특성 벡터를 출력한다. 이를 위하여, 인코더(220)는 DNN 기반 기술을 적용하기 위한 추론 엔진(inference engine)을 구비한다. 그리고 인코더(220)의 추론 엔진에서는, 호환가능한 포맷으로 기술되어 있는 학습된 DNN 기반 인코더(학습된 신경망)가 사용된다. The encoder 220 encodes an input image or video to generate and output a feature vector. In particular, the encoder 220 according to the present embodiment is performed by replacing the entire series of processes according to the video coding standard by the DNN-based technology by the entire codec unit, and reducing the dimensions of the input image as a result. Output a feature vector. To this end, the encoder 220 has an inference engine for applying the DNN-based technology. And in the inference engine of encoder 220, a trained DNN-based encoder (learned neural network) described in a compatible format is used.

인코더(220)로부터 출력되는 특성 벡터는 디코더(230)로 전송되거나 또는 소정의 저장 매체에 저장될 수도 있다. 그리고 어떤 경우이든 인코더(220)로부터 출력되는 특성 벡터는 디코더(230)의 입력으로 입력되어 복호화되며, 그 결과 재구성된 이미지 또는 비디오가 출력된다. 실시예에 따라서는, 인코더(220)로부터 출력되는 특성 벡터는 양자화 및 엔트로피 코딩 과정을 거쳐서 비트스트림화된 후에 소정의 매체를 통하여 전송되고 또한 디코더(230)로 입력되기 이전에 역양자화 및 엔트로피 코딩 과정을 거쳐서 특성 벡터로 복호화될 수 있다.The feature vector output from the encoder 220 may be transmitted to the decoder 230 or may be stored in a predetermined storage medium. In any case, the feature vector output from the encoder 220 is input to the decoder 230 and decoded, and as a result, a reconstructed image or video is output. According to an embodiment, the feature vector output from the encoder 220 is bitstreamed through a quantization and entropy coding process and then transmitted through a predetermined medium and before de-quantization and entropy coding before being input to the decoder 230. The process can be decoded into a feature vector.

본 실시예에 따른 디코더(230)는 비디오 부호화 규격에 따른 일련의 복호화 과정 전체가 전체 코덱 단위로 DNN 기반 기술에 의하여 대체되어 수행되며, 그 결과물로서 입력되는 특성 벡터로부터 재구성된 이미지 또는 비디오를 생성한다. 이를 위하여, 디코더(230)는 DNN 기반 기술을 적용하기 위한 추론 엔진(inference engine)을 구비한다. 그리고 인코더(230)의 추론 엔진에서는, 호환가능한 포맷으로 기술되어 있는 학습된 DNN 기반 디코더(학습된 신경망)가 사용된다. The decoder 230 according to the present exemplary embodiment performs the entire decoding process according to the video encoding standard by replacing the entire codec unit by the DNN-based technology, and generates a reconstructed image or video from the feature vectors input as a result. do. To this end, the decoder 230 includes an inference engine for applying the DNN-based technology. And in the inference engine of encoder 230, a trained DNN based decoder (learned neural network) described in a compatible format is used.

도 4는 본 발명의 다른 실시예에 따른 DNN 기반 이미지 또는 비디오 코딩을 위한 방법을 보여 주는 흐름도이다. 도 4에 도시된 방법은 도 3에 도시된 시스템에서 수행되는 것이다. 따라서, 불필요한 중복 설명을 피하기 위하여, 본 발명의 다른 실시예에 따른 DNN 기반 이미지 또는 비디오 코딩을 위한 방법에 대하여 간략히 설명하기로 한다. 따라서 여기에서 설명되지 않은 사항은 도 3을 참조하여 전술한 내용이 동일하게 적용될 수 있다. 4 is a flowchart illustrating a method for DNN based image or video coding according to another embodiment of the present invention. The method shown in FIG. 4 is performed in the system shown in FIG. Therefore, in order to avoid unnecessary overlapping description, a method for DNN based image or video coding according to another embodiment of the present invention will be briefly described. Therefore, matters not described herein may be equally applied to the above description with reference to FIG. 3.

도 4를 참조하면, 우선 DNN 기반 인코더와 DNN 기반 디코더를 각각 학습시켜서 기술한 학습된 인코더/디코더 신경망을 생성하여 인코더 및/또는 디코더를 위하여 전송한다(S20). 본 단계는 도 3에 도시된 시스템(200)의 네트워크 트레이닝 프레임워크(210)에 의하여 수행될 수 있다. Referring to FIG. 4, first, a trained encoder / decoder neural network described by learning a DNN-based encoder and a DNN-based decoder is generated and transmitted for an encoder and / or a decoder (S20). This step may be performed by the network training framework 210 of the system 200 shown in FIG. 3.

그리고 입력되는 이미지 또는 비디오를 부호화하여 특성 벡터를 생성하는 부호화 과정이 수행된다(S21). 본 단계에서는 전체 부호화 과정에서 학습된 인코더 신경망을 추론 엔진에 적용한다. 본 단계는 도 3에 도시된 시스템(200)의 인코더(220)에서 수행될 수 있다.In operation S21, an encoding process of encoding an input image or video to generate a feature vector is performed. In this step, we apply the encoder neural network learned in the entire encoding process to the inference engine. This step may be performed in the encoder 220 of the system 200 shown in FIG.

또한, 부호화 단계에서 생성된 특성 벡터를 복호화하는 복호화 과정이 수행된다(S22). 본 단계에서는 전체 복호화 과정에서 학습된 디코더 신경망을 추론 엔진에 적용한다. 본 단계는 도 3에 도시된 시스템(200)의 디코더(230)에서 수행될 수 있다. In addition, a decoding process of decoding the feature vector generated in the encoding step is performed (S22). In this step, the decoder neural network learned in the entire decoding process is applied to the inference engine. This step may be performed by the decoder 230 of the system 200 shown in FIG. 3.

전술한 바와 같이, 이상의 설명은 실시예에 불과할 뿐이며 이에 의하여 한정되는 것으로 해석되어서는 안된다. 본 발명의 기술 사상은 후술하는 특허청구범위에 기재된 발명에 의해서만 특정되어야 하며, 그와 동등한 범위 내에 있는 모든 기술 사상은 본 발명의 권리범위에 포함되는 것으로 해석되어야 할 것이다. 따라서 전술한 실시예가 다양한 형태로 변형되어 구현될 수 있다는 것은 통상의 기술자에게 자명하다.As mentioned above, the above description is only an example and should not be construed as being limited thereto. The technical idea of the present invention should be specified only by the invention described in the claims to be described later, and all technical ideas within the equivalent scope should be construed as being included in the scope of the present invention. Therefore, it will be apparent to those skilled in the art that the above-described embodiments may be implemented in various forms.

Claims

DNN 기반 코딩 도구를 학습시켜서 기술한 학습된 도구 신경망을 생성하여 전송하는 네트워크 트레이닝 프레임워크;
부호화 과정 중에서, 상기 DNN 기반 코딩 도구에 대응되는 제1 부호화 과정에서는 상기 학습된 도구 신경망을 추론 엔진에 적용하고 또한 상기 제1 부호화 과정 이외의 제2 부호화 과정에서는 규격화된 이미지 또는 비디오 부호화 도구를 적용하여, 입력되는 이미지 또는 비디오를 부호화하여 부호화된 비트스트림을 생성하는 인코더; 및
복호화 과정 중에서, 상기 DNN 기반 코딩 도구에 대응되는 제1 복호화 과정에서는 상기 학습된 도구 신경망을 추론 엔진에 적용하고 또한 상기 제1 복호화 과정 이외의 제2 복호화 과정에서는 규격화된 이미지 또는 비디오 복호화 도구를 적용하여, 상기 인코더에 의하여 생성된 부호화된 비트스트림을 복호화하는 디코더를 포함하는 것을 특징으로 하는 DNN 기반 이미지 또는 비디오 코딩을 위한 시스템.A network training framework for generating and transmitting a learned tool neural network by learning and describing a DNN-based coding tool;
In the encoding process, in the first encoding process corresponding to the DNN-based coding tool, the learned tool neural network is applied to the inference engine, and in the second encoding process other than the first encoding process, a standardized image or video encoding tool is applied. An encoder for encoding an input image or video to generate an encoded bitstream; And
In the decoding process, in the first decoding process corresponding to the DNN-based coding tool, the learned tool neural network is applied to the inference engine, and in the second decoding process other than the first decoding process, a standardized image or video decoding tool is applied. And a decoder for decoding the encoded bitstream generated by the encoder.

제1항에 있어서,
상기 DNN 기반 코딩 도구는 상기 인코더 및 상기 디코더에 공통으로 구비되는 코딩 기능을 구현하는 제1 코딩 도구를 포함하는 것을 특징으로 하는 DNN 기반 이미지 또는 비디오 코딩을 위한 시스템.The method of claim 1,
The DNN-based coding tool includes a first coding tool for implementing a coding function common to the encoder and the decoder.

제2항에 있어서,
상기 DNN 기반 코딩 도구는 상기 인코더와 상기 디코더 중에서 하나에만 구비되는 코딩 기능을 구현하는 제2 코딩 도구를 더 포함하는 것을 특징으로 하는 DNN 기반 이미지 또는 비디오 코딩을 위한 시스템.The method of claim 2,
The DNN based coding tool further comprises a second coding tool for implementing a coding function provided in only one of the encoder and the decoder.

제1항에 있어서,
상기 네트워크 트레이닝 프레임워크로부터 상기 학습된 도구 신경망을 수신한 다음 압축하여 상기 인코더와 상기 디코더로 전송하기 위한 신경망 압축부(NN compression)를 더 포함하는 것을 특징으로 하는 DNN 기반 이미지 또는 비디오 코딩을 위한 시스템.The method of claim 1,
And a NN compression for receiving the learned tool neural network from the network training framework and then compressing and transmitting the compressed neural network to the encoder and the decoder. .

제1항에 있어서,
상기 네트워크 트레이닝 프레임워크는 동일한 코딩 기능을 구현하는 복수의 DNN 기반 코딩 도구 중에서 하나를 선택하여 상기 학습된 도구 신경망을 생성하는 것을 특징으로 하는 DNN 기반 이미지 또는 비디오 코딩을 위한 시스템.The method of claim 1,
Wherein the network training framework selects one of a plurality of DNN-based coding tools that implement the same coding function to generate the learned tool neural network.

제1항에 있어서,
상기 네트워크 트레이닝 프레임워크는, 상기 인코더 및 상기 디코더와 상호교환될 수 있도록, 망 구조(network structure) 및 학습된 가중치(trained weight)를 포함하는 호환가능한 포맷으로 상기 학습된 도구 신경망을 기술하는 것을 특징으로 하는 DNN 기반 이미지 또는 비디오 코딩을 위한 시스템.The method of claim 1,
The network training framework describes the learned tool neural network in a compatible format including a network structure and trained weights so that the network training framework can be interchanged with the encoder and the decoder. A system for DNN based image or video coding.

제1항에 있어서,
상기 인코더는 상기 제1 부호화 과정에서 상기 학습된 도구 신경망을 추론 엔진에 적용하는 것과 규격화된 이미지 또는 비디오 부호화 도구를 적용하는 것 중에서 선택할 수 있는 것을 특징으로 하는 DNN 기반 이미지 또는 비디오 코딩을 위한 시스템.The method of claim 1,
And the encoder can select between applying the learned tool neural network to the inference engine and applying a standardized image or video encoding tool in the first encoding process.

DNN 기반 코딩 도구를 학습시켜서 기술한 학습된 도구 신경망을 생성하여 전송하는 학습 단계;
부호화 과정 중에서, 상기 DNN 기반 코딩 도구에 대응되는 제1 부호화 과정에서는 상기 학습된 도구 신경망을 인코더의 추론 엔진에 적용하고 또한 상기 제1 부호화 과정 이외의 제2 부호화 과정에서는 인코더의 규격화된 이미지 또는 비디오 부호화 도구를 적용하여, 입력되는 이미지 또는 비디오를 부호화하여 부호화된 비트스트림을 생성하는 부호화 단계; 및
복호화 과정 중에서, 상기 DNN 기반 코딩 도구에 대응되는 제1 복호화 과정에서는 상기 학습된 도구 신경망을 디코더의 추론 엔진에 적용하고 또한 상기 제1 복호화 과정 이외의 제2 복호화 과정에서는 디코더의 규격화된 이미지 또는 비디오 복호화 도구를 적용하여, 상기 부호화 단계에서 생성된 부호화된 비트스트림을 복호화하는 복호화 단계를 포함하는 것을 특징으로 하는 DNN 기반 이미지 또는 비디오 코딩을 위한 방법.
A learning step of generating and transmitting a learned tool neural network described by learning a DNN-based coding tool;
In the encoding process, in the first encoding process corresponding to the DNN-based coding tool, the learned tool neural network is applied to the inference engine of the encoder, and in the second encoding process other than the first encoding process, the standardized image or video of the encoder is applied. An encoding step of applying an encoding tool to generate an encoded bitstream by encoding an input image or video; And
In the decoding process, in the first decoding process corresponding to the DNN-based coding tool, the learned tool neural network is applied to the inference engine of the decoder, and in the second decoding process other than the first decoding process, the normalized image or video of the decoder is used. And a decoding step of decoding the encoded bitstream generated in the encoding step by applying a decoding tool.