JP2021170728A

JP2021170728A - Device, image processor, imaging device, mobile body, program, and method

Info

Publication number: JP2021170728A
Application number: JP2020073149A
Authority: JP
Inventors: 数史佐藤; Kazufumi Sato
Original assignee: SZ DJI Technology Co Ltd
Current assignee: SZ DJI Technology Co Ltd
Priority date: 2020-04-15
Filing date: 2020-04-15
Publication date: 2021-10-28

Abstract

To provide an image processor which can improve image degradations by using a learned convolutional neural network (CNN).SOLUTION: In an imaging device, a controller has a decryption unit which stores a plurality of learned CNNs for processing a decrypted image in relation to the amount of ones of spatial frequency components of a coded image that is larger than a predetermined value, acquires coded data generated by coding of the image, acquires a plurality of spatial frequency components of the coded image from the coded data, generates a decrypted image by decrypting the coded data, acquires the amount of ones of a plurality of spatial frequency components acquired from the coded data that are larger than a predetermined value, selects one of a plurality of learned CNNs that is related to the amount of the acquired spatial frequency components, and processes the generated decrypted image by using the selected and learned CNN.SELECTED DRAWING: Figure 7

Description

本発明は、装置、画像処理装置、撮像装置、移動体、プログラム及び方法に関する。 The present invention relates to an apparatus, an image processing apparatus, an imaging apparatus, a moving body, a program and a method.

非特許文献１及び２には、動画像の符号化装置において、動き予測ループ内に機械学習によるフィルタを挿入した構成が記載されている。非特許文献３には、Ｈ．２６５（ＩＳＯ／ＩＥＣ２３００８−２ＨＥＶＣ）における画像符号化・復号化技術が記載されている。
［先行技術文献］
［特許文献］
［非特許文献１］ＬｕｌｕＺｈｏｕ，ＸｉａｏｄａｎＳｏｎｇ，ＪｉａｂａｏＹａｏ，ＬｉＷａｎｇ，ＦａｎｇｄｏｎｇＣｈｅｎ， "ＪＶＥＴ−Ｉ００２２：Ｃｏｎｖｏｌｕｔｉｏｎａｌｎｅｕｒａｌｎｅｔｗｏｒｋｆｉｌｔｅｒ（ＣＮＮＦ）ｆｏｒｉｎｔｒａｆｒａｍｅ"，ＪｏｉｎｔＶｉｄｅｏＥｘｐｌｏｒａｔｉｏｎＴｅａｍ（ＪＶＥＴ）９ｔｈＭｅｅｔｉｎｇ：Ｇｗａｎｇｊｕ，Ｋｏｒｅａ，２０１８年１月
［非特許文献２］ＪｉａｂａｏＹａｏ，ＸｉａｏｄａｎＳｏｎｇ，ＳｈｕｑｉｎｇＦａｎｇ，ＬｉＷａｎｇ， "ＡＨＧ９：ＣｏｎｖｏｌｕｔｉｏｎａｌＮｅｕｒａｌＮｅｔｗｏｒｋＦｉｌｔｅｒｆｏｒｉｎｔｅｒｆｒａｍｅ"，ＩＳＯ／ＩＥＣＪＴＣ１／ＳＣ２９ＷＧ１１，ＪＶＥＴ−Ｋ０２２２，２０１８年７月
［非特許文献３］ＧａｒｙＪ．Ｓｕｌｌｉｖａｎ，Ｊｅｎｓ−ＲａｉｎｅｒＯｈｍ，Ｗｏｏ−ＪｉｎＨａｎ，ａｎｄＴｈｏｍａｓＷｉｅｇａｎｄ， "ＯｖｅｒｖｉｅｗｏｆｔｈｅＨｉｇｈＥｆｆｉｃｉｅｎｃｙＶｉｄｅｏＣｏｄｉｎｇ．（ＨＥＶＣ）Ｓｔａｎｄａｒｄ"，ＩＥＥＥＴｒａｎｓａｃｔｉｏｎｓｏｎＣｉｒｃｕｉｔｓａｎｄＳｙｓｔｅｍｓｆｏｒＶｉｄｅｏＴｅｃｈｎｏｌｏｇｙ，２０１２年１２月 Non-Patent Documents 1 and 2 describe a configuration in which a filter by machine learning is inserted in a motion prediction loop in a moving image encoding device. Non-Patent Document 3 describes H. The image coding / decoding technique in 265 (ISO / IEC 23008-2 HEVC) is described.
[Prior art literature]
[Patent Document]
[Non-Patent Document 1] Lulu Zhou, Xiaodan Song, Jiabao Yao, Li Wang, Fangdong Chen, "JVET-I0022: Convolutional neural network Video filter (CNNF) for Korea, January 2018 [Non-Patent Document 2] Jiabao Yao, Xiaodan Song, Shuqing Fang, Li Wang, "AHG9: Convolutional Neural Network, Network, Network, 20S July [Non-Patent Document 3] Gary J. Sullivan, Jens-Rainer Ohm, Woo-Jin Han, and Thomas Wiegand, "Overview of the High Efficiency Video Coding. (HEVC) Standard", IEEE Transactions on Circuits and Systems for Video Technology, 12 May 2012

本発明の第１の形態に係る装置は、符号化された画像の空間周波数成分のうち大きさが予め定められた値を超える空間周波数成分の量に対応づけて、復号化された画像を処理するための複数の学習済みニューラルネットワークを記憶するように構成される回路を備える。回路は、画像の符号化により生成された符号化データを取得するように構成される。回路は、符号化データから、符号化された画像の複数の空間周波数成分を取得するように構成される。符号化データを復号化することにより、復号化画像を生成するように構成される。回路は、符号化データから取得した複数の空間周波数成分のうち大きさが予め定められた値を超える空間周波数成分の量を取得する。回路は、複数の学習済みニューラルネットワークのうち取得した空間周波数成分の量に対応づけられた学習済みニューラルネットワークを選択するように構成される。回路は、選択した学習済みニューラルネットワークを用いて、生成した復号化画像を処理するように構成される。 The apparatus according to the first aspect of the present invention processes a decoded image in association with the amount of spatial frequency components whose size exceeds a predetermined value among the spatial frequency components of the encoded image. It comprises a circuit configured to store a plurality of trained neural networks for the purpose. The circuit is configured to acquire the coded data generated by the coding of the image. The circuit is configured to obtain multiple spatial frequency components of the coded image from the coded data. It is configured to generate a decoded image by decoding the coded data. The circuit acquires the amount of spatial frequency components whose magnitude exceeds a predetermined value among the plurality of spatial frequency components acquired from the coded data. The circuit is configured to select a trained neural network associated with the amount of acquired spatial frequency components from a plurality of trained neural networks. The circuit is configured to process the generated decoded image using the trained neural network of choice.

回路は、符号化された画像の空間周波数成分のうち大きさが予め定められた値を超える空間周波数成分の割合に対応づけて、複数の学習済みニューラルネットワークを記憶するように構成されてよい。回路は、符号化データから取得した複数の空間周波数成分のうち大きさが予め定められた値を超える空間周波数成分の割合を取得するように構成されてよい。回路は、複数の学習済みニューラルネットワークのうち取得した空間周波数成分の割合に対応づけられた学習済みニューラルネットワークを選択するように構成されてよい。 The circuit may be configured to store a plurality of trained neural networks in association with the proportion of spatial frequency components of the encoded image whose magnitude exceeds a predetermined value. The circuit may be configured to acquire the proportion of spatial frequency components whose magnitude exceeds a predetermined value among the plurality of spatial frequency components acquired from the coded data. The circuit may be configured to select a trained neural network associated with the percentage of acquired spatial frequency components among the plurality of trained neural networks.

予め定められた値はゼロであってよい。 The predetermined value may be zero.

画像は、動画を構成する動画構成画像であってよい。符号化データは、動画構成画像のインター予測又はイントラ予測によって得られた複数の空間周波数成分の予測差分値の量子化によって得られた複数の量子化差分値を示す情報を含んでよい。回路は、複数の量子化差分値のうち大きさが予め定められた値を超える量子化差分値の量に対応づけて、複数の学習済みニューラルネットワークを記憶するように構成されてよい。回路は、符号化データから取得した複数の量子化差分値のうち大きさが予め定められた値を超える量子化差分値の量を取得するように構成されてよい。回路は、複数の学習済みニューラルネットワークのうち取得した空間周波数成分の量子化差分値の量に対応づけられた学習済みニューラルネットワークを選択するように構成されてよい。回路は、符号化データから取得した複数の量子化差分値の逆量子化によって得られた空間周波数成分の予測差分値に基づいて差分画像を生成し、生成した差分画像にインター予測画像又はイントラ予測画像を加算することにより、復号化された動画構成画像を生成するように構成されてよい。回路は、選択した学習済みニューラルネットワークを用いて、生成した動画構成画像を処理するように構成されてよい。 The image may be a moving image constituting a moving image. The coded data may include information indicating a plurality of quantization difference values obtained by quantization of the prediction difference values of the plurality of spatial frequency components obtained by inter-prediction or intra-prediction of the moving image constituent image. The circuit may be configured to store a plurality of trained neural networks in association with the amount of quantization difference values whose magnitude exceeds a predetermined value among the plurality of quantization difference values. The circuit may be configured to acquire the amount of the quantization difference value whose magnitude exceeds a predetermined value among the plurality of quantization difference values acquired from the coded data. The circuit may be configured to select a trained neural network associated with the amount of quantization difference values of the acquired spatial frequency components from a plurality of trained neural networks. The circuit generates a difference image based on the predicted difference value of the spatial frequency component obtained by inverse quantization of a plurality of quantization difference values obtained from the coded data, and the generated difference image is used as an inter-predicted image or an intra-predicted image. It may be configured to generate a decoded moving image configuration image by adding the images. The circuit may be configured to process the generated moving image construct using the trained neural network of choice.

画像は、動画を構成する動画構成画像であってよい。回路は、動画構成画像のピクチャ種別にさらに対応づけて、複数の学習済みニューラルネットワークを記憶するように構成されてよい。回路は、複数の学習済みニューラルネットワークのうち、符号化データのピクチャ種別と取得した空間周波数成分の量とに対応づけられた学習済みニューラルネットワークを選択するように構成されてよい。 The image may be a moving image constituting a moving image. The circuit may be configured to store a plurality of trained neural networks in association with the picture type of the moving image. The circuit may be configured to select a trained neural network associated with a picture type of encoded data and an amount of acquired spatial frequency components from a plurality of trained neural networks.

学習済みニューラルネットワークは、学習用画像と学習用画像の符号化データとを学習データとして用いて、学習データに含まれる符号化データから取得された複数の空間周波数成分のうち大きさが予め定められた値を超える空間周波数成分の量に応じて機械学習を行うことによって得られた畳み込みニューラルネットワークであってよい。 The trained neural network uses the training image and the coded data of the training image as training data, and the size of a plurality of spatial frequency components acquired from the coded data included in the training data is predetermined. It may be a convolutional neural network obtained by performing machine learning according to the amount of spatial frequency components exceeding the value.

本発明の第２の態様に係る装置は、符号化された画像の空間周波数成分のうち大きさが予め定められた値を超える空間周波数成分の量に対応づけて、復号化された画像を処理するための複数の学習済みニューラルネットワークを記憶するよう構成された回路を備える。回路は、符号化対象画像をインター予測又はイントラ予測を含む符号化処理で符号化することによって、複数の空間周波数成分の予測差分値を示す情報を含む符号化データを生成するよう構成される。回路は、符号化データを出力するよう構成される。回路は、符号化データを復号化することによって復号化画像を生成するよう構成される。回路は、複数の学習済みニューラルネットワークのうち複数の空間周波数成分の予測差分値のうち大きさが予め定められた値を超える空間周波数成分の予測差分値の量に対応づけられた学習済みニューラルネットワークを選択するよう構成される。回路は、選択した学習済みニューラルネットワークを用いて、生成した復号化画像を処理することによって、インター予測又はイントラ予測に用いられる参照用画像を生成するよう構成される。 The apparatus according to the second aspect of the present invention processes the decoded image in association with the amount of the spatial frequency component of the encoded image whose size exceeds a predetermined value. It comprises a circuit configured to store a plurality of trained neural networks for the purpose. The circuit is configured to generate coded data including information indicating predicted difference values of a plurality of spatial frequency components by coding the image to be coded by a coding process including inter-prediction or intra-prediction. The circuit is configured to output coded data. The circuit is configured to generate a decoded image by decoding the coded data. The circuit is a trained neural network in which the magnitude of the predicted difference value of a plurality of spatial frequency components in a plurality of trained neural networks exceeds a predetermined value and is associated with the amount of the predicted difference value of the spatial frequency component. Is configured to select. The circuit is configured to generate a reference image used for inter-prediction or intra-prediction by processing the generated decoded image using a selected trained neural network.

本発明の第３の態様に係る画像処理装置は、第１の態様に係る装置と、第２の態様に係る装置とを備える。 The image processing apparatus according to the third aspect of the present invention includes an apparatus according to the first aspect and an apparatus according to the second aspect.

本発明の第４の態様に係る撮像装置は、上記の装置と、画像を生成するイメージセンサとを備える。 The image pickup apparatus according to the fourth aspect of the present invention includes the above-mentioned apparatus and an image sensor for generating an image.

本発明の第５の態様に係る移動体は、上記の撮像装置を備えて移動する。 The moving body according to the fifth aspect of the present invention moves with the above-mentioned imaging device.

移動体は、無人航空機であってよい。 The moving body may be an unmanned aerial vehicle.

本発明の第６の態様に係るプログラムは、コンピュータを上記の装置として機能させる。プログラムは、非一時的記録媒体に記録されてよい。 The program according to the sixth aspect of the present invention causes the computer to function as the above-mentioned device. The program may be recorded on a non-temporary recording medium.

本発明の第７の態様に係る方法は、符号化された画像の空間周波数成分のうち大きさが予め定められた値を超える空間周波数成分の量に対応づけて、復号化された画像を処理するための複数の学習済みニューラルネットワークを記憶する段階を備える。方法は、画像の符号化により生成された符号化データを取得する段階を備える。方法は、符号化データから、符号化された画像の複数の空間周波数成分を取得する段階を備える。方法は、符号化データを復号化することにより、復号化画像を生成する段階を備える。方法は、符号化データから取得した複数の空間周波数成分のうち大きさが予め定められた値を超える空間周波数成分の量を取得する段階を備える。方法は、複数の学習済みニューラルネットワークのうち取得した空間周波数成分の量に対応づけられた学習済みニューラルネットワークを選択する段階を備える。方法は、選択した学習済みニューラルネットワークを用いて、生成した復号化画像を処理する段階を備える。 The method according to a seventh aspect of the present invention processes a decoded image in association with the amount of spatial frequency components whose size exceeds a predetermined value among the spatial frequency components of the encoded image. It is provided with a stage of storing a plurality of trained neural networks to be used. The method comprises the step of acquiring the coded data generated by the coding of the image. The method comprises the step of acquiring a plurality of spatial frequency components of a coded image from the coded data. The method comprises the step of generating a decoded image by decoding the encoded data. The method comprises a step of acquiring the amount of spatial frequency components whose magnitude exceeds a predetermined value among the plurality of spatial frequency components acquired from the coded data. The method comprises selecting a trained neural network associated with the amount of acquired spatial frequency components from a plurality of trained neural networks. The method comprises processing the generated decoded image using the trained neural network of choice.

本発明の第８の態様に係る方法は、符号化後の画像の空間周波数成分のうち大きさが予め定められた値を超える空間周波数成分の量に対応づけて、復号化後の画像を処理するための複数の学習済みニューラルネットワークを記憶する段階を備える。方法は、符号化対象画像をインター予測又はイントラ予測を含む符号化処理で符号化することによって、複数の空間周波数成分の予測差分値を示す情報を含む符号化データを生成する段階を備える。方法は、符号化データを出力する段階を備える。方法は、符号化データを復号化することによって復号化画像を生成する段階を備える。方法は、複数の学習済みニューラルネットワークのうち複数の空間周波数成分の予測差分値のうち大きさが予め定められた値を超える空間周波数成分の予測差分値の量に対応づけられた学習済みニューラルネットワークを選択する段階を備える。選択した学習済みニューラルネットワークを用いて、生成した復号化画像を処理することによって、インター予測又はイントラ予測に用いられる参照用画像を生成する段階を備える。 The method according to the eighth aspect of the present invention processes the decoded image in association with the amount of spatial frequency components whose size exceeds a predetermined value among the spatial frequency components of the encoded image. It is provided with a stage of storing a plurality of trained neural networks to be used. The method comprises a step of generating coded data including information indicating predicted difference values of a plurality of spatial frequency components by coding the image to be coded by a coding process including inter-prediction or intra-prediction. The method comprises a step of outputting encoded data. The method comprises the step of generating a decoded image by decoding the encoded data. The method is a trained neural network in which the magnitude of the predicted difference value of a plurality of spatial frequency components in a plurality of trained neural networks exceeds a predetermined value and is associated with the amount of the predicted difference value of the spatial frequency component. Prepare for the stage of selecting. It comprises a step of generating a reference image used for inter-prediction or intra-prediction by processing the generated decoded image using the selected trained neural network.

本発明の上記の態様によれば、画像の符号化又は復号化を適切に行うことができる。 According to the above aspect of the present invention, the image can be appropriately encoded or decoded.

なお、上記の発明の概要は、本発明の必要な特徴の全てを列挙したものではない。また、これらの特徴群のサブコンビネーションもまた、発明となりうる。 The outline of the above invention does not list all the necessary features of the present invention. Sub-combinations of these feature groups can also be inventions.

本実施形態に係る撮像装置１００の外観斜視図の一例を示す図である。It is a figure which shows an example of the external perspective view of the image pickup apparatus 100 which concerns on this embodiment. 本実施形態に係る撮像装置１００の機能ブロックを示す図である。It is a figure which shows the functional block of the image pickup apparatus 100 which concerns on this embodiment. 学習器のブロック図を示す。The block diagram of the learner is shown. ＤＣＴ係数に基づくクラス分類を行うためのクラス情報を示す。Class information for classifying based on the DCT coefficient is shown. ニューラルネットワークのパラメータ情報を示す。The parameter information of the neural network is shown. 制御部１１０が備える符号化器のブロック構成を示す。The block configuration of the encoder included in the control unit 110 is shown. 制御部１１０が備える復号化器のブロック構成を示す。The block configuration of the decoder included in the control unit 110 is shown. 制御部１１０が符号化対象ピクチャを符号化する場合に実行する処理のフローチャートを示す。The flowchart of the process to be executed when the control unit 110 encodes a picture to be encoded is shown. 制御部１１０が復号化対象ピクチャを復号化する場合に実行する処理のフローチャートを示す。The flowchart of the process executed when the control unit 110 decodes the decoding target picture is shown. インター予測で用いられるピクチャの参照関係の一例を示す。An example of the reference relationship of the picture used in the inter-prediction is shown. 無人航空機（ＵＡＶ）の一例を示す。An example of an unmanned aerial vehicle (UAV) is shown. 本発明の複数の態様が全体的または部分的に具現化されてよいコンピュータ１２００の一例を示す。An example of a computer 1200 in which a plurality of aspects of the present invention may be embodied in whole or in part is shown.

以下、発明の実施の形態を通じて本発明を説明するが、以下の実施の形態は特許請求の範囲に係る発明を限定するものではない。また、実施の形態の中で説明されている特徴の組み合わせの全てが発明の解決手段に必須であるとは限らない。以下の実施の形態に、多様な変更または改良を加えることが可能であることが当業者に明らかである。その様な変更または改良を加えた形態も本発明の技術的範囲に含まれ得ることが、特許請求の範囲の記載から明らかである。 Hereinafter, the present invention will be described through embodiments of the invention, but the following embodiments do not limit the invention according to the claims. Also, not all combinations of features described in the embodiments are essential to the means of solving the invention. It will be apparent to those skilled in the art that various changes or improvements can be made to the following embodiments. It is clear from the description of the claims that such modified or improved forms may also be included in the technical scope of the present invention.

特許請求の範囲、明細書、図面、及び要約書には、著作権による保護の対象となる事項が含まれる。著作権者は、これらの書類の何人による複製に対しても、特許庁のファイルまたはレコードに表示される通りであれば異議を唱えない。ただし、それ以外の場合、一切の著作権を留保する。 The claims, description, drawings, and abstracts include matters that are subject to copyright protection. The copyright holder will not object to any person's reproduction of these documents as long as they appear in the Patent Office files or records. However, in other cases, all copyrights are reserved.

本発明の様々な実施形態は、フローチャート及びブロック図を参照して記載されてよく、ここにおいてブロックは、（１）操作が実行されるプロセスの段階または（２）操作を実行する役割を持つ装置の「部」を表わしてよい。特定の段階及び「部」が、プログラマブル回路、及び／またはプロセッサによって実装されてよい。専用回路は、デジタル及び／またはアナログハードウェア回路を含んでよい。集積回路（ＩＣ）及び／またはディスクリート回路を含んでよい。プログラマブル回路は、再構成可能なハードウェア回路を含んでよい。再構成可能なハードウェア回路は、論理ＡＮＤ、論理ＯＲ、論理ＸＯＲ、論理ＮＡＮＤ、論理ＮＯＲ、及び他の論理操作、フリップフロップ、レジスタ、フィールドプログラマブルゲートアレイ（ＦＰＧＡ）、プログラマブルロジックアレイ（ＰＬＡ）等の様なメモリ要素等を含んでよい。 Various embodiments of the present invention may be described with reference to flowcharts and block diagrams, wherein the block is (1) a stage of the process in which the operation is performed or (2) a device having a role of performing the operation. May represent the "part" of. Specific steps and "parts" may be implemented by programmable circuits and / or processors. Dedicated circuits may include digital and / or analog hardware circuits. It may include integrated circuits (ICs) and / or discrete circuits. Programmable circuits may include reconfigurable hardware circuits. Reconfigurable hardware circuits include logical AND, logical OR, logical XOR, logical NAND, logical NOR, and other logical operations, flip-flops, registers, field programmable gate arrays (FPGA), programmable logic arrays (PLA), etc. It may include a memory element such as.

コンピュータ可読媒体は、適切なデバイスによって実行される命令を格納可能な任意の有形なデバイスを含んでよい。その結果、そこに格納される命令を有するコンピュータ可読媒体は、フローチャートまたはブロック図で指定された操作を実行するための手段を作成すべく実行され得る命令を含む、製品を備えることになる。コンピュータ可読媒体の例としては、電子記憶媒体、磁気記憶媒体、光記憶媒体、電磁記憶媒体、半導体記憶媒体等が含まれてよい。コンピュータ可読媒体のより具体的な例としては、フロッピー（登録商標）ディスク、ディスケット、ハードディスク、ランダムアクセスメモリ（ＲＡＭ）、リードオンリメモリ（ＲＯＭ）、消去可能プログラマブルリードオンリメモリ（ＥＰＲＯＭまたはフラッシュメモリ）、電気的消去可能プログラマブルリードオンリメモリ（ＥＥＰＲＯＭ）、静的ランダムアクセスメモリ（ＳＲＡＭ）、コンパクトディスクリードオンリメモリ（ＣＤ−ＲＯＭ）、デジタル多用途ディスク（ＤＶＤ）、ブルーレイ（登録商標）ディスク、メモリスティック、集積回路カード等が含まれてよい。 The computer-readable medium may include any tangible device capable of storing instructions executed by the appropriate device. As a result, the computer-readable medium having the instructions stored therein will include the product, including instructions that can be executed to create means for performing the operation specified in the flowchart or block diagram. Examples of computer-readable media may include electronic storage media, magnetic storage media, optical storage media, electromagnetic storage media, semiconductor storage media, and the like. More specific examples of computer-readable media include floppy® disks, diskettes, hard disks, random access memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), Electrically erasable programmable read-only memory (EEPROM), static random access memory (SRAM), compact disc read-only memory (CD-ROM), digital versatile disc (DVD), Blu-ray® disc, memory stick, An integrated circuit card or the like may be included.

コンピュータ可読命令は、１または複数のプログラミング言語の任意の組み合わせで記述されたソースコードまたはオブジェクトコードの何れかを含んでよい。ソースコードまたはオブジェクトコードは、従来の手続型プログラミング言語を含む。従来の手続型プログラミング言語は、アセンブラ命令、命令セットアーキテクチャ（ＩＳＡ）命令、マシン命令、マシン依存命令、マイクロコード、ファームウェア命令、状態設定データ、またはＳｍａｌｌｔａｌｋ（登録商標）、ＪＡＶＡ（登録商標）、Ｃ＋＋等のようなオブジェクト指向プログラミング言語、及び「Ｃ」プログラミング言語または同様のプログラミング言語でよい。コンピュータ可読命令は、汎用コンピュータ、特殊目的のコンピュータ、若しくは他のプログラム可能なデータ処理装置のプロセッサまたはプログラマブル回路に対し、ローカルにまたはローカルエリアネットワーク（ＬＡＮ）、インターネット等のようなワイドエリアネットワーク（ＷＡＮ）を介して提供されてよい。プロセッサまたはプログラマブル回路は、フローチャートまたはブロック図で指定された操作を実行するための手段を作成すべく、コンピュータ可読命令を実行してよい。プロセッサの例としては、コンピュータプロセッサ、処理ユニット、マイクロプロセッサ、デジタル信号プロセッサ、コントローラ、マイクロコントローラ等を含む。 Computer-readable instructions may include either source code or object code written in any combination of one or more programming languages. Source code or object code includes traditional procedural programming languages. Traditional procedural programming languages are assembler instructions, instruction set architecture (ISA) instructions, machine instructions, machine-dependent instructions, microcodes, firmware instructions, state-setting data, or Smalltalk®, JAVA®, C ++. It may be an object-oriented programming language such as, and a "C" programming language or a similar programming language. Computer-readable instructions are applied locally or to a processor or programmable circuit of a general purpose computer, special purpose computer, or other programmable data processing device, or a wide area network (WAN) such as a local area network (LAN), the Internet, etc. ) May be provided. The processor or programmable circuit may execute computer-readable instructions to create means for performing the operations specified in the flowchart or block diagram. Examples of processors include computer processors, processing units, microprocessors, digital signal processors, controllers, microcontrollers and the like.

図１は、本実施形態に係る撮像装置１００の外観斜視図の一例を示す図である。図２は、本実施形態に係る撮像装置１００の機能ブロックを示す図である。 FIG. 1 is a diagram showing an example of an external perspective view of the image pickup apparatus 100 according to the present embodiment. FIG. 2 is a diagram showing a functional block of the image pickup apparatus 100 according to the present embodiment.

撮像装置１００は、撮像部１０２、レンズ部２００を備える。撮像部１０２は、イメージセンサ１２０、制御部１１０、メモリ１３０、指示部１６２、及び表示部１６０を有する。 The image pickup apparatus 100 includes an image pickup section 102 and a lens section 200. The imaging unit 102 includes an image sensor 120, a control unit 110, a memory 130, an indicator unit 162, and a display unit 160.

イメージセンサ１２０は、ＣＣＤまたはＣＭＯＳにより構成されてよい。イメージセンサ１２０は、レンズ部２００が有するレンズ２１０を介して光を受光する。イメージセンサ１２０は、レンズ２１０を介して結像された光学像の画像データを制御部１１０に出力する。 The image sensor 120 may be composed of a CCD or CMOS. The image sensor 120 receives light through the lens 210 included in the lens unit 200. The image sensor 120 outputs the image data of the optical image formed through the lens 210 to the control unit 110.

制御部１１０は、ＣＰＵまたはＭＰＵなどのマイクロプロセッサ、ＭＣＵなどのマイクロコントローラなどにより構成されてよい。メモリ１３０は、コンピュータ可読可能な記録媒体でよく、ＳＲＡＭ、ＤＲＡＭ、ＥＰＲＯＭ、ＥＥＰＲＯＭ、及びＵＳＢメモリなどのフラッシュメモリの少なくとも１つを含んでよい。制御部１１０は回路に対応する。メモリ１３０は、制御部１１０がイメージセンサ１２０などを制御するのに必要なプログラム等を格納する。メモリ１３０は、撮像装置１００の筐体の内部に設けられてよい。メモリ１３０は、撮像装置１００の筐体から取り外し可能に設けられてよい。 The control unit 110 may be composed of a CPU, a microprocessor such as an MPU, a microcontroller such as an MCU, or the like. The memory 130 may be a computer-readable recording medium and may include at least one of flash memories such as SRAM, DRAM, EPROM, EEPROM, and USB memory. The control unit 110 corresponds to the circuit. The memory 130 stores a program or the like necessary for the control unit 110 to control the image sensor 120 or the like. The memory 130 may be provided inside the housing of the image pickup apparatus 100. The memory 130 may be provided so as to be removable from the housing of the image pickup apparatus 100.

指示部１６２は、撮像装置１００に対する指示をユーザから受け付けるユーザインタフェースである。表示部１６０は、イメージセンサ１２０により撮像され、制御部１１０により処理された画像、撮像装置１００の各種設定情報などを表示する。表示部１６０は、タッチパネルで構成されてよい。 The instruction unit 162 is a user interface that receives an instruction to the image pickup apparatus 100 from the user. The display unit 160 displays an image captured by the image sensor 120 and processed by the control unit 110, various setting information of the image pickup device 100, and the like. The display unit 160 may be composed of a touch panel.

制御部１１０は、レンズ部２００及びイメージセンサ１２０を制御する。例えば、制御部１１０は、レンズ２１０の焦点の位置や焦点距離を制御する。制御部１１０は、ユーザからの指示を示す情報に基づいて、レンズ部２００が備えるレンズ制御部２２０に制御命令を出力することにより、レンズ部２００を制御する。 The control unit 110 controls the lens unit 200 and the image sensor 120. For example, the control unit 110 controls the focal position and focal length of the lens 210. The control unit 110 controls the lens unit 200 by outputting a control command to the lens control unit 220 included in the lens unit 200 based on the information indicating the instruction from the user.

レンズ部２００は、１以上のレンズ２１０、レンズ駆動部２１２、レンズ制御部２２０、及びメモリ２２２を有する。本実施形態において１以上のレンズ２１０のことを「レンズ２１０」と総称する。レンズ２１０は、フォーカスレンズ及びズームレンズを含んでよい。レンズ２１０が含むレンズのうちの少なくとも一部または全部は、レンズ２１０の光軸に沿って移動可能に配置される。レンズ部２００は、撮像部１０２に対して着脱可能に設けられる交換レンズであってよい。 The lens unit 200 includes one or more lenses 210, a lens driving unit 212, a lens control unit 220, and a memory 222. In this embodiment, one or more lenses 210 are collectively referred to as "lens 210". The lens 210 may include a focus lens and a zoom lens. At least some or all of the lenses included in the lens 210 are movably arranged along the optical axis of the lens 210. The lens unit 200 may be an interchangeable lens that is detachably provided to the imaging unit 102.

レンズ駆動部２１２は、レンズ２１０のうちの少なくとも一部または全部を、レンズ２１０の光軸に沿って移動させる。レンズ制御部２２０は、撮像部１０２からのレンズ制御命令に従って、レンズ駆動部２１２を駆動して、レンズ２１０全体又はレンズ２１０が含むズームレンズやフォーカスレンズを光軸方向に沿って移動させることで、ズーム動作やフォーカス動作の少なくとも一方を実行する。レンズ制御命令は、例えば、ズーム制御命令、及びフォーカス制御命令等である。 The lens driving unit 212 moves at least a part or all of the lens 210 along the optical axis of the lens 210. The lens control unit 220 drives the lens drive unit 212 in accordance with a lens control command from the image pickup unit 102 to move the entire lens 210 or the zoom lens or focus lens included in the lens 210 along the optical axis direction. Perform at least one of the zoom and focus movements. The lens control command is, for example, a zoom control command, a focus control command, and the like.

レンズ駆動部２１２は、複数のレンズ２１０の少なくとも一部または全部を光軸方向に移動させるボイスコイルモータ（ＶＣＭ）を含んでよい。レンズ駆動部２１２は、ＤＣモータ、コアレスモータ、または超音波モータ等の電動機を含んでよい。レンズ駆動部２１２は、電動機からの動力をカム環、ガイド軸等の機構部材を介して複数のレンズ２１０の少なくとも一部または全部に伝達して、レンズ２１０の少なくとも一部または全部を光軸に沿って移動させてよい。 The lens driving unit 212 may include a voice coil motor (VCM) that moves at least a part or all of the plurality of lenses 210 in the optical axis direction. The lens driving unit 212 may include an electric motor such as a DC motor, a coreless motor, or an ultrasonic motor. The lens driving unit 212 transmits power from the motor to at least a part or all of the plurality of lenses 210 via mechanical members such as a cam ring and a guide shaft, and makes at least a part or all of the lenses 210 an optical axis. You may move it along.

メモリ２２２は、レンズ駆動部２１２を介して移動するフォーカスレンズやズームレンズ用の制御値を記憶する。メモリ２２２は、ＳＲＡＭ、ＤＲＡＭ、ＥＰＲＯＭ、ＥＥＰＲＯＭ、及びＵＳＢメモリなどのフラッシュメモリの少なくとも１つを含んでよい。 The memory 222 stores the control values for the focus lens and the zoom lens that move via the lens driving unit 212. The memory 222 may include at least one of flash memories such as SRAM, DRAM, EPROM, EEPROM, and USB memory.

制御部１１０は、指示部１６２等を通じて取得したユーザの指示を示す情報に基づいて、イメージセンサ１２０に制御命令を出力することにより、イメージセンサ１２０に撮像動作の制御を含む制御を実行する。制御部１１０は、イメージセンサ１２０により撮像された画像を取得する。制御部１１０は、イメージセンサ１２０から取得した画像に画像処理を施してメモリ１３０に格納する。 The control unit 110 executes control including control of the imaging operation on the image sensor 120 by outputting a control command to the image sensor 120 based on the information indicating the user's instruction acquired through the instruction unit 162 or the like. The control unit 110 acquires an image captured by the image sensor 120. The control unit 110 performs image processing on the image acquired from the image sensor 120 and stores it in the memory 130.

制御部１１０が実行する符号化処理及び復号化処理について説明する。制御部１１０は、符号化された画像の空間周波数成分のうち大きさが予め定められた値を超える空間周波数成分の量に対応づけて、復号化された画像を処理するための複数の学習済みニューラルネットワークを記憶する。空間周波数成分は、例えば直交変換によって生成される空間周波数成分である。本実施形態において、空間周波数成分は、画像の離散コサイン変換（「ＤＣＴ変換」と呼ぶ場合がある）によって生成されるＤＣＴ係数であるとする。なお、制御部１１０は、学習済みニューラルネットワークを外部のメモリ１３０に記憶してよい。制御部１１０は、学習済みニューラルネットワークを制御部１１０内の不揮発性メモリに記憶してよい。また、本実施形態において、符号化対象及び復号化対象の画像は、動画を構成する動画構成画像である。しかし、画像は静止画であってもよい。 The coding process and the decoding process executed by the control unit 110 will be described. The control unit 110 has registered a plurality of learned spatial frequency components for processing the decoded image in association with the amount of the spatial frequency components whose size exceeds a predetermined value among the spatial frequency components of the encoded image. Memorize the neural network. The spatial frequency component is, for example, a spatial frequency component generated by orthogonal transformation. In this embodiment, the spatial frequency component is assumed to be the DCT coefficient generated by the discrete cosine transform (sometimes referred to as "DCT transform") of the image. The control unit 110 may store the learned neural network in the external memory 130. The control unit 110 may store the learned neural network in the non-volatile memory in the control unit 110. Further, in the present embodiment, the image to be encoded and the image to be decoded are moving image constituent images constituting the moving image. However, the image may be a still image.

まず、制御部１１０が符号化対象画像を符号化する処理の概要を説明する。制御部１１０は、符号化対象画像をインター予測又はイントラ予測を含む符号化処理で符号化することによって、複数の空間周波数成分の予測差分値を示す情報を含む符号化データを生成する。制御部１１０は、生成した符号化データを出力する。例えば、制御部１１０は、生成した符号化データと、符号化に用いた動きベクトル等の圧縮情報とを含む圧縮画像データをメモリ１３０に記録する。 First, the outline of the process in which the control unit 110 encodes the image to be encoded will be described. The control unit 110 encodes the coded image by a coding process including inter-prediction or intra-prediction to generate coded data including information indicating predicted difference values of a plurality of spatial frequency components. The control unit 110 outputs the generated coded data. For example, the control unit 110 records the generated coded data and the compressed image data including the compressed information such as the motion vector used for the coding in the memory 130.

制御部１１０は、インター予測又はイントラ予測に用いる参照画像を生成する場合に、生成した符号化データを復号化することによって復号化画像を生成する。制御部１１０は、複数の学習済みニューラルネットワークのうち、複数の空間周波数成分の予測差分値のうち大きさが予め定められた値を超える空間周波数成分の予測差分値の量に対応づけられた学習済みニューラルネットワークを選択する。そして、制御部１１０は、生成した復号化画像を、選択した学習済みニューラルネットワークを用いて処理することによって、インター予測又はイントラ予測に用いる参照用画像を生成する。 The control unit 110 generates a decoded image by decoding the generated coded data when generating a reference image used for inter-prediction or intra-prediction. The control unit 110 is learning corresponding to the amount of the predicted difference value of the spatial frequency component whose magnitude exceeds a predetermined value among the predicted difference values of the plurality of spatial frequency components among the plurality of learned neural networks. Select a completed neural network. Then, the control unit 110 generates a reference image to be used for inter-prediction or intra-prediction by processing the generated decoded image using the selected trained neural network.

次に、制御部１１０が復号化対象画像を復号化する処理の概要を説明する。制御部１１０は、画像の符号化により生成された符号化データを取得する。例えば、制御部１１０は、メモリ１３０から、画像の符号化データ及び圧縮情報とを含む圧縮画像データを読み出す。制御部１１０は、取得した符号化データから、符号化された画像の複数の空間周波数成分を取得する。例えば、制御部１１０は、エントロピー符号化された符号化データに対してエントロピー復号化を施すことにより、複数の空間周波数成分を取得する。制御部１１０は、複数の空間周波数成分に基づいて復号化画像を生成する。このように、制御部１１０は、符号化データを復号化することにより、復号化画像を生成する。 Next, the outline of the process in which the control unit 110 decodes the image to be decoded will be described. The control unit 110 acquires the coded data generated by coding the image. For example, the control unit 110 reads out compressed image data including image coding data and compressed information from the memory 130. The control unit 110 acquires a plurality of spatial frequency components of the encoded image from the acquired coded data. For example, the control unit 110 acquires a plurality of spatial frequency components by performing entropy decoding on the entropy-coded coded data. The control unit 110 generates a decoded image based on a plurality of spatial frequency components. In this way, the control unit 110 generates a decoded image by decoding the encoded data.

制御部１１０は、符号化データから取得した複数の空間周波数成分のうち大きさが予め定められた値を超える空間周波数成分の量を取得する。制御部１１０は、複数の学習済みニューラルネットワークのうち取得した空間周波数成分の量に対応づけられた学習済みニューラルネットワークを選択する。制御部１１０は、選択した学習済みニューラルネットワークを用いて、生成した復号化画像を処理する。制御部１１０は、学習済みニューラルネットワークを用いて処理した復号化画像を、例えば表示部１６０に出力する。 The control unit 110 acquires the amount of the spatial frequency component whose magnitude exceeds a predetermined value among the plurality of spatial frequency components acquired from the coded data. The control unit 110 selects a trained neural network associated with the amount of acquired spatial frequency components from the plurality of trained neural networks. The control unit 110 processes the generated decoded image using the selected trained neural network. The control unit 110 outputs the decoded image processed by using the trained neural network to, for example, the display unit 160.

一例として、動画構成画像の符号化データは、動画構成画像のインター予測又はイントラ予測によって得られた複数の空間周波数成分の予測差分値の量子化によって得られた複数の量子化差分値を示す情報を含む。制御部１１０は、複数の量子化差分値のうち大きさが予め定められた値を超える量子化差分値の量に対応づけて、複数の学習済みニューラルネットワークを予め記憶する。制御部１１０は、符号化データから取得した複数の量子化差分値のうち大きさが予め定められた値を超える量子化差分値の量を取得して、複数の学習済みニューラルネットワークのうち取得した空間周波数成分の量子化差分値の量に対応づけられた学習済みニューラルネットワークを選択する。そして、制御部１１０は、符号化データから取得した複数の量子化差分値の逆量子化によって得られた空間周波数成分の予測差分値に基づいて差分画像を生成し、生成した差分画像にインター予測画像又はイントラ予測画像を加算することにより、復号化された動画構成画像を生成する。そして、制御部１１０は、選択した学習済みニューラルネットワークを用いて、生成した動画構成画像を処理する。 As an example, the encoded data of the moving image constituent image is information indicating a plurality of quantization difference values obtained by quantization of the predicted difference values of a plurality of spatial frequency components obtained by inter-prediction or intra-prediction of the moving image constituent image. including. The control unit 110 stores a plurality of learned neural networks in advance in association with the amount of the quantization difference value whose magnitude exceeds a predetermined value among the plurality of quantization difference values. The control unit 110 acquires the amount of the quantization difference value whose magnitude exceeds a predetermined value among the plurality of quantization difference values acquired from the coded data, and acquires it from the plurality of trained neural networks. Select a trained neural network that is associated with the amount of quantization difference value of the spatial frequency component. Then, the control unit 110 generates a difference image based on the predicted difference value of the spatial frequency component obtained by inverse quantization of the plurality of quantization difference values acquired from the coded data, and inter-predicts the generated difference image. A decoded moving image constituent image is generated by adding an image or an intra-predicted image. Then, the control unit 110 processes the generated moving image constituent image using the selected learned neural network.

なお、「符号化された画像の空間周波数成分のうち大きさが予め定められた値を超える空間周波数成分の量」は、「符号化された画像の空間周波数成分のうち大きさが予め定められた値を超える空間周波数成分の割合」であってよい。「予め定められた値」はゼロ（０）であってよい。図４から図１０に関連して、「空間周波数成分」としてＤＣＴ係数を採用し、「予め定められた値」としてゼロ（０）を採用し、「符号化された画像の空間周波数成分のうち大きさが予め定められた値を超える空間周波数成分の量」として「符号化された画像のＤＣＴ係数のうち大きさがゼロ（０）を超えるＤＣＴ係数の割合」を採用した形態を説明する。 The "amount of spatial frequency components whose size exceeds a predetermined value among the spatial frequency components of the encoded image" is defined as "the size of the spatial frequency components of the encoded image is predetermined". It may be "the ratio of the spatial frequency component exceeding the value". The "predetermined value" may be zero (0). In relation to FIGS. 4 to 10, the DCT coefficient is adopted as the "spatial frequency component", zero (0) is adopted as the "predetermined value", and "of the spatial frequency components of the encoded image". A mode in which "the ratio of DCT coefficients whose magnitude exceeds zero (0) among the DCT coefficients of the encoded image" is adopted as "amount of spatial frequency components whose magnitude exceeds a predetermined value" will be described.

制御部１１０は、動画構成画像のピクチャ種別にさらに対応づけて、複数の学習済みニューラルネットワークを記憶してよい。制御部１１０は、複数の学習済みニューラルネットワークのうち、符号化データのピクチャ種別と取得した空間周波数成分の量とに対応づけられた学習済みニューラルネットワークを選択してよい。 The control unit 110 may store a plurality of trained neural networks in association with the picture type of the moving image constituent image. The control unit 110 may select a trained neural network associated with the picture type of the coded data and the amount of the acquired spatial frequency component from the plurality of trained neural networks.

「学習済みニューラルネットワーク」は、学習用画像と当該学習用画像の符号化データとを学習データとして用いて、学習データに含まれる符号化データから取得された複数の空間周波数成分のうち大きさが予め定められた値を超える空間周波数成分の量に応じて機械学習を行うことによって得られた畳み込みニューラルネットワーク（ＣＮＮ）であってよい。図４から図１０に関連して、畳み込みニューラルネットワーク（ＣＮＮ）を用いた画像処理を行う「ＣＮＮフィルタ」を採用した形態を説明する。 The "trained neural network" uses the training image and the coded data of the training image as training data, and the size of the plurality of spatial frequency components acquired from the coded data included in the training data is large. It may be a convolutional neural network (CNN) obtained by performing machine learning according to the amount of spatial frequency components exceeding a predetermined value. In relation to FIGS. 4 to 10, a mode in which a “CNN filter” that performs image processing using a convolutional neural network (CNN) is adopted will be described.

図３は、学習器のブロック図を示す。図４は、ＤＣＴ係数に基づくクラス分類を行うためのクラス情報を示す。なお、学習器は、制御部１１０が使用するニューラルネットワークのパラメータ（ＣＮＮパラメータ）を生成する機械学習を行う装置である。制御部１１０が学習器を備える必要はない。 FIG. 3 shows a block diagram of the learner. FIG. 4 shows class information for classifying based on the DCT coefficient. The learner is a device that performs machine learning that generates neural network parameters (CNN parameters) used by the control unit 110. The control unit 110 does not need to include a learning device.

学習器は、入力画像と劣化画像とを用いた機械学習を行うことによって、ＣＮＮパラメータを生成する。劣化画像は、入力画像の符号化によって生成された符号化画像を復号化することによって生成された復号化画像である。復号化画像は、復号器が備えるループフィルタによる画像処理が施された画像である。ループフィルタについては後述する。学習器には、入力画像の符号化に用いた画像圧縮情報が更に入力される。画像の符号化においては、各マクロブロックの部分画像において予め定められた予測ユニット毎にイントラ予測又はインター予測が行われ、イントラ予測又はインター予測によって得られた予測画像と部分画像との間の差分画像が生成され、差分画像に対して予め定められた変換ユニット毎にＤＣＴ変換を行うことによってＤＣＴ係数が算出され、当該ＤＣＴ係数を量子化することによって、量子化ＤＣＴ係数が算出される。画像圧縮情報は、ＤＣＴ係数を量子化することによって得られた量子化ＤＣＴ係数を含んでよい。 The learner generates CNN parameters by performing machine learning using the input image and the deteriorated image. The degraded image is a decoded image generated by decoding the coded image generated by coding the input image. The decoded image is an image that has undergone image processing by a loop filter included in the decoder. The loop filter will be described later. The image compression information used for encoding the input image is further input to the learner. In image coding, intra-prediction or inter-prediction is performed for each predetermined prediction unit in the partial image of each macroblock, and the difference between the predicted image and the partial image obtained by the intra-prediction or inter-prediction. An image is generated, the DCT coefficient is calculated by performing DCT transform for each predetermined conversion unit on the difference image, and the quantized DCT coefficient is calculated by quantizing the DCT coefficient. The image compression information may include the quantized DCT coefficient obtained by quantizing the DCT coefficient.

ＤＣＴ係数計数部３１０は、画像圧縮情報に基づいて量子化ＤＣＴ係数を取得して、大きさが０より大きいＤＣＴ係数を計数する。ＤＣＴ係数計数部３１０は、例えば画像全体の全ての変換ユニットにおけるＮ個のＤＣＴ係数のうち、大きさが０を超えるＤＣＴ係数の個数Ｍを計数する。なお、「大きさが０を超えるＤＣＴ係数」は、０より大きいＤＣＴ係数と０より小さい係数のことをいう。つまり、「大きさが０を超えるＤＣＴ係数」は、０以外のＤＣＴ係数のことをいう。クラス決定部３２０は、ＭをＮで除算することにより、０以外のＤＣＴ係数の割合を算出する。本実施形態において、複数のＤＣＴ係数のうち０以外のＤＣＴ係数の割合のことを「非０ＤＣＴ割合」と呼ぶ。 The DCT coefficient counting unit 310 acquires the quantized DCT coefficient based on the image compression information and counts the DCT coefficient having a magnitude greater than 0. The DCT coefficient counting unit 310 counts, for example, the number M of DCT coefficients whose magnitude exceeds 0 among the N DCT coefficients in all the conversion units of the entire image. The "DCT coefficient having a magnitude exceeding 0" means a DCT coefficient larger than 0 and a coefficient smaller than 0. That is, the "DCT coefficient having a magnitude exceeding 0" means a DCT coefficient other than 0. The class determination unit 320 calculates the ratio of DCT coefficients other than 0 by dividing M by N. In the present embodiment, the ratio of DCT coefficients other than 0 among the plurality of DCT coefficients is referred to as "non-zero DCT ratio".

クラス決定部３２０は、図４に示すクラス情報と非０ＤＣＴ割合とに基づいて、クラスを特定する。図４に示されるように、クラス情報は、クラス識別子と非０ＤＣＴ割合の範囲とを対応づける情報である。クラス決定部３２０は、クラス情報における複数の範囲のうち、取得した非０ＤＣＴ割合を含む範囲を特定して、特定した範囲に対応づけられたクラス識別子を特定する。 The class determination unit 320 identifies the class based on the class information shown in FIG. 4 and the non-zero DCT ratio. As shown in FIG. 4, the class information is information that associates the class identifier with the range of the non-0DCT ratio. The class determination unit 320 specifies a range including the acquired non-0DCT ratio among a plurality of ranges in the class information, and specifies a class identifier associated with the specified range.

ＣＮＮ学習器３３０は、クラス決定部３２０が決定したクラス毎に、入力画像及び劣化画像を用いた機械学習を行うことによって、後述するＣＮＮフィルタを構成するＣＮＮパラメータを算出する。具体的には、ＣＮＮ学習器３３０は、劣化画像にＣＮＮフィルタを適用することによって生成される画像と入力画像とに基づいて、予め定められた損失関数を最小化するような重み付け値及びオフセットを算出する。 The CNN learner 330 calculates the CNN parameters constituting the CNN filter described later by performing machine learning using the input image and the deteriorated image for each class determined by the class determination unit 320. Specifically, the CNN learner 330 sets weights and offsets that minimize a predetermined loss function based on the image generated by applying the CNN filter to the degraded image and the input image. calculate.

図５は、ニューラルネットワークのパラメータ情報を示す。パラメータ情報は、機械学習によって算出されたＣＮＮパラメータとクラス識別子とを対応づける情報である。ＣＮＮパラメータは、ＣＮＮにおける重み付け値及びオフセットを含む。制御部１１０は、図４に示すクラス情報及びパラメータ情報を記憶する。制御部１１０は、画像の符号化処理及び復号化処理の一部として、クラス情報を参照してクラスを決定し、決定したクラスとパラメータ情報とから定まるＣＮＮパラメータを用いて画像処理を実行する。 FIG. 5 shows the parameter information of the neural network. The parameter information is information that associates the CNN parameter calculated by machine learning with the class identifier. CNN parameters include weighted values and offsets in the CNN. The control unit 110 stores the class information and the parameter information shown in FIG. The control unit 110 determines a class with reference to the class information as a part of the image coding process and the decoding process, and executes the image processing using the CNN parameter determined from the determined class and the parameter information.

図６は、制御部１１０が備える符号化器のブロック構成を示す。符号化器には、符号化対象となる入力画像データとして、動画構成画像としての時系列の複数のピクチャが入力される。リオーダ部６１０は、ピクチャ種別に基づいてピクチャの符号化を行う順序を決定する。例えば、リオーダ部６１０は、双方向予測によって符号化されるＢピクチャの符号化を行う前に、Ｂピクチャより後のＩピクチャ又はＰピクチャを符号化するように、符号化するピクチャの順序を並べ替える。 FIG. 6 shows a block configuration of a encoder included in the control unit 110. A plurality of time-series pictures as moving image constituent images are input to the encoder as input image data to be encoded. The reorder unit 610 determines the order in which the pictures are coded based on the picture type. For example, the reorder unit 610 arranges the order of the pictures to be encoded so as to encode the I picture or the P picture after the B picture before encoding the B picture encoded by the bidirectional prediction. Change.

直交変換部６２０は、リオーダ部６１０から出力されるピクチャと参照画像との差分画像をＤＣＴ変換することによって、ＤＣＴ係数を算出する。量子化部６３０は、直交変換部６２０から出力されるＤＣＴ係数を量子化することによって量子化ＤＣＴ係数を生成する。量子化部６３０は、後述するレート制御部６６０から出力される圧縮レートに基づいて、ＤＣＴ係数の量子化に用いる量子化パラメータを調整する。エントロピー符号化部６５０は、量子化部６３０が出力する量子化ＤＣＴ係数にエントロピー符号化を施すことによって符号化ピクチャを生成する。バッファ６７０は、エントロピー符号化部６５０が出力する符号化ピクチャを記憶する。レート制御部６６０は、符号化ピクチャのデータ量に基づいて圧縮レートを決定して、量子化部６３０に出力する。なお、エントロピー符号化部６５０は、動きベクトル等の圧縮情報もエントロピー符号化してバッファ６７０に記憶する。制御部１１０は、バッファ６７０に記憶された符号化ピクチャ及び圧縮情報を含む圧縮画像データをメモリ１３０等に記録する。 The orthogonal transform unit 620 calculates the DCT coefficient by performing DCT conversion of the difference image between the picture output from the reorder unit 610 and the reference image. The quantization unit 630 generates a quantization DCT coefficient by quantizing the DCT coefficient output from the orthogonal transform unit 620. The quantization unit 630 adjusts the quantization parameters used for the quantization of the DCT coefficient based on the compression rate output from the rate control unit 660 described later. The entropy coding unit 650 generates a coded picture by applying entropy coding to the quantized DCT coefficient output by the quantization unit 630. The buffer 670 stores the coded picture output by the entropy coding unit 650. The rate control unit 660 determines the compression rate based on the amount of data in the coded picture and outputs the compression rate to the quantization unit 630. The entropy coding unit 650 also entropy-codes the compressed information such as the motion vector and stores it in the buffer 670. The control unit 110 records the compressed image data including the coded picture and the compressed information stored in the buffer 670 in the memory 130 or the like.

次に、符号化器が備えるループ構造の処理を説明する。逆量子化部６４１は、量子化部６３０から出力された量子化ＤＣＴ係数を逆量子化する。逆直交変換部６４２は、逆量子化部６４１の出力を逆ＤＣＴ変換することにより差分画像を生成する。ループフィルタ６４３は、逆直交変換部６４２が生成した差分画像に参照画像を加算することによって得られた画像情報にフィルタ処理を施す。ループフィルタ６４３は、例えばデブロッキングフィルタを含んでよい。ループフィルタ６４３が生成した画像はＣＮＮフィルタ６４４に出力される。 Next, the processing of the loop structure included in the encoder will be described. The inverse quantization unit 641 dequantizes the quantization DCT coefficient output from the quantization unit 630. The inverse orthogonal transform unit 642 generates a difference image by performing an inverse DCT transform on the output of the inverse quantization unit 641. The loop filter 643 applies a filter process to the image information obtained by adding the reference image to the difference image generated by the inverse orthogonal transform unit 642. The loop filter 643 may include, for example, a deblocking filter. The image generated by the loop filter 643 is output to the CNN filter 644.

ＤＣＴ係数計数部６４６は、量子化部６３０の出力に基づいて、大きさが０を超えるＤＣＴ係数の個数を計数して、非０ＤＣＴ割合を算出する。クラス決定部６４７は、ＤＣＴ係数計数部６４６により算出された非０ＤＣＴ割合と、図４に関連して説明したクラス情報とを用いて、クラス識別子を決定する。クラス決定部６４７が決定したクラス識別子はＣＮＮフィルタ６４４に出力される。 The DCT coefficient counting unit 646 counts the number of DCT coefficients having a magnitude exceeding 0 based on the output of the quantization unit 630, and calculates the non-0DCT ratio. The class determination unit 647 determines the class identifier using the non-0DCT ratio calculated by the DCT coefficient counting unit 646 and the class information described in relation to FIG. The class identifier determined by the class determination unit 647 is output to the CNN filter 644.

ＣＮＮフィルタ６４４は、上述した学習器によって生成されたＣＮＮパラメータにより形成されるＣＮＮによって構成されるフィルタである。ＣＮＮフィルタ６４４は、ループフィルタ６４３が出力した画像にＣＮＮを用いて畳み込み演算を行うことによって参照用ピクチャを生成する。具体的には、ＣＮＮフィルタ６４４は、パラメータ情報を参照して、クラス決定部６４７が決定したクラス識別子に対応づけられたＣＮＮパラメータによって構成されるＣＮＮフィルタを、ループフィルタ６４３により処理された画像情報に適用して、参照用ピクチャを生成する。メモリ６４５は、ＣＮＮフィルタ６４４が生成した参照用ピクチャを記憶する。イントラ予測部６４８は、メモリ６４５に記憶されている参照用ピクチャを用いて、符号化対象ピクチャを符号化するためのイントラ予測を行って、参照画像としてのイントラ予測画像を生成する。インター予測部６４９は、メモリ６４５に記憶されている参照用ピクチャを用いて、他の符号化対象ピクチャを符号化するためのインター予測を行って、参照画像としてのインター予測画像を生成する。インター予測部６４９は、例えば動きベクトルを算出し、動き補償を行うことによってインター予測画像を生成してよい。 The CNN filter 644 is a filter composed of CNNs formed by the CNN parameters generated by the learner described above. The CNN filter 644 generates a reference picture by performing a convolution operation on the image output by the loop filter 643 using the CNN. Specifically, the CNN filter 644 refers to the parameter information, and the CNN filter composed of the CNN parameters associated with the class identifier determined by the class determination unit 647 is processed by the loop filter 643. To generate a reference picture. The memory 645 stores a reference picture generated by the CNN filter 644. The intra prediction unit 648 uses the reference picture stored in the memory 645 to perform intra prediction for encoding the coded target picture, and generates an intra prediction image as a reference image. The inter-prediction unit 649 uses the reference picture stored in the memory 645 to perform inter-prediction for encoding another coded target picture, and generates an inter-prediction image as a reference image. The inter-prediction unit 649 may generate an inter-prediction image by, for example, calculating a motion vector and performing motion compensation.

図７は、制御部１１０が備える復号化器のブロック構成を示す。制御部１１０は、メモリ１３０から圧縮動画データを読み出して復号化する。エントロピー復号化部７５０は、メモリ１３０から読み出された圧縮画像データをエントロピー復号化することによって、量子化ＤＣＴ係数を生成する。逆量子化部７４１は、エントロピー復号化部７５０から出力された量子化ＤＣＴ係数を逆量子化する。逆直交変換部７４２は、逆量子化部７４１の出力を逆ＤＣＴ変換することにより差分画像を生成する。ループフィルタ７４３は、逆直交変換部７４２が生成した差分画像に参照画像を加算することによって得られた画像情報にフィルタ処理を施す。ループフィルタ７４３は、例えばデブロッキングフィルタを含んでよい。ループフィルタ７４３は、ループフィルタ６４３と同一のフィルタであってよい。ループフィルタ７４３が生成した画像はＣＮＮフィルタ７４４に出力される。 FIG. 7 shows a block configuration of a decoder included in the control unit 110. The control unit 110 reads the compressed moving image data from the memory 130 and decodes it. The entropy decoding unit 750 generates a quantized DCT coefficient by entropy decoding the compressed image data read from the memory 130. The inverse quantization unit 741 dequantizes the quantization DCT coefficient output from the entropy decoding unit 750. The inverse orthogonal transform unit 742 generates a difference image by performing an inverse DCT transform on the output of the inverse quantization unit 741. The loop filter 743 filters the image information obtained by adding the reference image to the difference image generated by the inverse orthogonal transform unit 742. The loop filter 743 may include, for example, a deblocking filter. The loop filter 743 may be the same filter as the loop filter 643. The image generated by the loop filter 743 is output to the CNN filter 744.

ＤＣＴ係数計数部７４６は、エントロピー復号化部７５０から出力される量子化ＤＣＴ係数に基づいて、大きさが０を超えるＤＣＴ係数の数を計数して、非０ＤＣＴ割合を算出する。クラス決定部７４７は、ＤＣＴ係数計数部７４６により算出された非０ＤＣＴ割合とクラス情報とを用いてクラスを決定する。クラス決定部７４７が決定したクラスを示す情報はＣＮＮフィルタ７４４に出力される。 The DCT coefficient counting unit 746 counts the number of DCT coefficients having a magnitude exceeding 0 based on the quantized DCT coefficient output from the entropy decoding unit 750, and calculates the non-0DCT ratio. The class determination unit 747 determines the class using the non-0DCT ratio calculated by the DCT coefficient counting unit 746 and the class information. Information indicating the class determined by the class determination unit 747 is output to the CNN filter 744.

ＣＮＮフィルタ７４４は、上述した学習器によって生成されたＣＮＮパラメータにより形成されるＣＮＮである。ＣＮＮフィルタ７４４は、ループフィルタ７４３が出力した画像にニューラルネットワークを用いて畳み込み演算を行うことによって復号化ピクチャを生成する。具体的には、ＣＮＮフィルタ７４４は、パラメータ情報を参照して、クラス決定部７４７が決定したクラス識別子に対応づけられたＣＮＮパラメータによって構成されるＣＮＮフィルタを、ループフィルタ７４３により処理された画像情報に適用して、復号化ピクチャを生成する。リオーダー部７１０は、ピクチャ種別に基づいて、復号化ピクチャを時系列に並べ替える処理を行い、復号化画像データを出力する。復号化画像データは、例えば表示部１６０における画像の表示に用いられる。 The CNN filter 744 is a CNN formed by the CNN parameters generated by the learner described above. The CNN filter 744 generates a decoded picture by performing a convolution operation on the image output by the loop filter 743 using a neural network. Specifically, the CNN filter 744 refers to the parameter information, and the CNN filter composed of the CNN parameters associated with the class identifier determined by the class determination unit 747 is processed by the loop filter 743. To generate a decoded picture. The reordering unit 710 performs a process of rearranging the decoded pictures in chronological order based on the picture type, and outputs the decoded image data. The decoded image data is used, for example, for displaying an image on the display unit 160.

メモリ７４５は、ＣＮＮフィルタ７４４が生成した復号化ピクチャを記憶する。イントラ予測部７４８は、メモリ７４５に記憶されている復号化ピクチャを参照用ピクチャとして用いて、復号化対象ピクチャを符号化するためのイントラ予測を行い、参照画像としてのイントラ予測画像を生成する。インター予測部７４９は、メモリ７４５に記憶されている復号化ピクチャを参照用ピクチャとして用いて、他の復号化対象ピクチャを符号化するためのインター予測を行って、参照画像としてのインター予測画像を生成する。インター予測部７４９は、例えば動きベクトルを算出し、動きベクトルに基づいて動き補償を行うことによってインター予測画像を生成してよい。 The memory 745 stores the decoded picture generated by the CNN filter 744. The intra prediction unit 748 uses the decoded picture stored in the memory 745 as a reference picture to perform intra prediction for encoding the decoding target picture, and generates an intra prediction image as a reference image. The inter-prediction unit 749 uses the decoded picture stored in the memory 745 as a reference picture, performs inter-prediction for encoding another decoding target picture, and obtains an inter-prediction image as a reference image. Generate. The inter-prediction unit 749 may generate an inter-prediction image by, for example, calculating a motion vector and performing motion compensation based on the motion vector.

図８は、制御部１１０が符号化対象ピクチャを符号化する場合に実行する処理のフローチャートを示す。Ｓ８１０において、直交変換部６２０は符号化対象ピクチャの直交変換を行ってＤＣＴ係数を算出し、量子化部６３０は算出されたＤＣＴ係数を量子化することにより量子化ＤＣＴ係数を生成する。 FIG. 8 shows a flowchart of processing executed when the control unit 110 encodes the coded target picture. In S810, the orthogonal transform unit 620 performs the orthogonal transform of the coded picture to calculate the DCT coefficient, and the quantization unit 630 generates the quantization DCT coefficient by quantizing the calculated DCT coefficient.

Ｓ８２０において、ＤＣＴ係数計数部６４６は、非０ＤＣＴ割合を算出する。Ｓ８３０において、逆量子化部６４１は量子化ＤＣＴ係数を逆量子化し、逆直交変換部６４２は逆量子化により得られたＤＣＴ係数を逆直交変換することにより、差分画像を生成する。 In S820, the DCT coefficient counting unit 646 calculates the non-0DCT ratio. In S830, the inverse quantization unit 641 inversely quantizes the quantization DCT coefficient, and the inverse orthogonal transform unit 642 generates a difference image by inversely orthogonal transforming the DCT coefficient obtained by the inverse orthogonal transform.

Ｓ８４０において、イントラ予測部６４８又はインター予測部６４９は予測画像を生成して、差分画像を予測画像に加算することによって補償を行う。Ｓ８５０において、Ｓ８４０において予測画像と差分画像との加算によって生成された画像にループフィルタを適用する。 In S840, the intra prediction unit 648 or the inter prediction unit 649 generates a prediction image and adds the difference image to the prediction image to perform compensation. In S850, a loop filter is applied to the image generated by the addition of the predicted image and the difference image in S840.

Ｓ８６０において、ＤＣＴ係数計数部６４６は非０ＤＣＴ割合を算出し、クラス決定部６４７はクラス情報を参照して、算出された非０ＤＣＴ割合に基づいてクラス識別子を決定する。Ｓ８７０において、Ｓ８５０で生成されたピクチャを、クラス識別子に対応づけられたＣＮＮパラメータで構成されるＣＮＮフィルタ６４４によって画像処理を行い、参照用ピクチャを生成する。 In S860, the DCT coefficient counting unit 646 calculates the non-0DCT ratio, and the class determination unit 647 determines the class identifier based on the calculated non-0DCT ratio with reference to the class information. In S870, the picture generated in S850 is image-processed by the CNN filter 644 composed of the CNN parameters associated with the class identifier to generate a reference picture.

図９は、制御部１１０が復号化対象ピクチャを復号化する場合に実行する処理のフローチャートを示す。Ｓ９１０において、エントロピー復号化部７５０は、復号化対象ピクチャをエントロピー復号化することによって、量子化ＤＣＴ係数を生成する。Ｓ９２０において、ＤＣＴ係数計数部７４６は、Ｓ９１０において生成された量子化ＤＣＴ係数に基づいて、大きさが０を超えるＤＣＴ係数の数を計数して、非０ＤＣＴ割合を算出する。 FIG. 9 shows a flowchart of processing executed when the control unit 110 decodes the decoding target picture. In S910, the entropy decoding unit 750 generates a quantized DCT coefficient by entropy decoding the decoding target picture. In S920, the DCT coefficient counting unit 746 counts the number of DCT coefficients having a magnitude exceeding 0 based on the quantized DCT coefficient generated in S910, and calculates the non-0DCT ratio.

Ｓ９３０において、逆量子化部７４１は量子化ＤＣＴ係数を逆量子化し、逆直交変換部７４２は、逆量子化により得られたＤＣＴ係数を逆直交変換することにより、差分画像を生成する。 In S930, the inverse quantization unit 741 inversely quantizes the quantization DCT coefficient, and the inverse orthogonal transform unit 742 generates a difference image by inversely orthogonal transforming the DCT coefficient obtained by the inverse orthogonal transform.

Ｓ９４０において、イントラ予測部７４８又はインター予測部７４９により生成された予測画像と差分画像との加算によって補償を行う。Ｓ９５０において、Ｓ９４０において予測画像と差分画像との加算によって生成された画像にループフィルタを適用する。 In S940, compensation is performed by adding the predicted image generated by the intra prediction unit 748 or the inter prediction unit 749 and the difference image. In S950, a loop filter is applied to the image generated by adding the predicted image and the difference image in S940.

Ｓ９６０において、クラス決定部７４７は、クラス情報を参照して、Ｓ９２０において算出された非０ＤＣＴ割合に基づいてクラス識別子を決定する。Ｓ９７０において、Ｓ９５０においてループフィルタが適用された画像に、クラス識別子に対応づけられたＣＮＮパラメータによって構成されるＣＮＮフィルタ６４４により処理して、復号化ピクチャを生成する。上述したように、復号化ピクチャは、復号化画像データとして出力されるとともに、参照用ピクチャとして使用される。 In S960, the class determination unit 747 determines the class identifier based on the non-0DCT ratio calculated in S920 with reference to the class information. In S970, the image to which the loop filter is applied in S950 is processed by the CNN filter 644 configured by the CNN parameter associated with the class identifier to generate a decoded picture. As described above, the decoded picture is output as the decoded image data and is used as a reference picture.

図１０は、インター予測で用いられるピクチャの参照関係の一例を示す。図１０には、Ｈ．２６５等の画像符号化方式において用いられるＩピクチャ１０００、Ｐピクチャ１００４、Ｓｔｏｒｅｄ−Ｂピクチャ１００２、Ｎｏｎ−Ｓｔｏｒｅｄ−Ｂピクチャ１００１、及びＮｏｎ−Ｓｔｏｒｅｄ−Ｂピクチャ１００３の参照関係が示されている。図１０に示されるように、ピクチャ種別によって参照関係が異なる。したがって、ピクチャ種別におよって画質劣化量が異なり得る。そのため、上述した非０ＤＣＴ割合に基づくクラス分類に加え、ピクチャ種別に応じてクラス分類を行ってもよい。具体的には、ピクチャ種別及び非０ＤＣＴ割合の組み合わせ毎にクラス識別子を設定することによってクラス分類を行ってよい。機械学習において学習器は、ピクチャ種別及び非０ＤＣＴ割合の組み合わせ毎に機械学習を行って、ピクチャ種別及び非０ＤＣＴ割合の組み合わせ毎にＣＮＮパラメータを算出してよい。制御部１１０は、ピクチャ種別及び非０ＤＣＴ割合の組み合わせに対応づけてＣＮＮパラメータを記憶してよく、符号化及び復号化において、ピクチャ種別及び非０ＤＣＴ割合の組み合わせに対応づけられたＣＮＮパラメータを用いてＣＮＮフィルタを構成してよい。 FIG. 10 shows an example of the reference relationship of the pictures used in the inter-prediction. In FIG. 10, H. The reference relationships of the I picture 1000, the P picture 1004, the Straight-B picture 1002, the Non-Story-B picture 1001, and the Non-Story-B picture 1003 used in an image coding method such as 265 are shown. As shown in FIG. 10, the reference relationship differs depending on the picture type. Therefore, the amount of image quality deterioration may differ depending on the picture type. Therefore, in addition to the above-mentioned classification based on the non-zero DCT ratio, the classification may be performed according to the picture type. Specifically, the class classification may be performed by setting the class identifier for each combination of the picture type and the non-0DCT ratio. In machine learning, the learner may perform machine learning for each combination of the picture type and the non-0DCT ratio, and calculate the CNN parameter for each combination of the picture type and the non-0DCT ratio. The control unit 110 may store the CNN parameter in association with the combination of the picture type and the non-0DCT ratio, and uses the CNN parameter associated with the combination of the picture type and the non-0DCT ratio in encoding and decoding. A CNN filter may be configured.

なお、Ｈ．２６５等の符号化方式においては、直交変換を行う場合に、同一ピクチャ内で複数の変換ユニットサイズを用いることが可能である。例えば、Ｈ．２６５では、変換ユニットサイズとして４ｘ４，８ｘ８，１６ｘ１６，３２ｘ３２等のサイズを用いることが可能である。一般に、細かいテクスチャやランダムな動きを含む画像領域については、より小さい変換ユニットサイズを選択して符号化することにより符号化効率を高めることができる。一方、フラットなテクスチャ又は大域的な動きを含む画像領域については、より大きい変換ユニットサイズを選択して符号化することにより符号化効率を高めることができる。変換ユニットのサイズを変えると画質劣化量も変わる場合がある。そこで、本実施形態において、非０ＤＣＴ割合に代えて、又は、非０ＤＣＴ割合に加えて、変換サイズの分布に応じてクラス分類を行ってもよい。例えば、非０ＤＣＴ割合に代えて、又は、非０ＤＣＴ割合に加えて、各ピクチャの符号化において特定の変換ユニットサイズ（例えば、４ｘ４サイズ）が選択された割合に基づいて、クラス分類を行ってもよい。 In addition, H. In a coding method such as 265, when performing orthogonal transformation, it is possible to use a plurality of conversion unit sizes in the same picture. For example, H. In 265, it is possible to use a size such as 4x4, 8x8, 16x16, 32x32 as the conversion unit size. In general, for an image area containing fine textures and random movements, the coding efficiency can be improved by selecting and coding a smaller conversion unit size. On the other hand, for an image region containing a flat texture or global motion, the coding efficiency can be improved by selecting and coding a larger conversion unit size. If the size of the conversion unit is changed, the amount of image quality deterioration may also change. Therefore, in the present embodiment, the classification may be performed according to the distribution of the conversion size instead of the non-0DCT ratio or in addition to the non-0DCT ratio. For example, the classification may be performed instead of the non-0DCT ratio, or based on the ratio in which a specific conversion unit size (eg, 4x4 size) is selected in the coding of each picture in addition to the non-0DCT ratio. good.

上述したように、制御部１１０は、非０ＤＣＴ割合に基づいて選択されたニューラルネットワークパラメータを用いて符号化及び復号化処理を行う。例えばＨ．２６５符号化方式においては、ピクチャ内で４ｘ４，８ｘ８，１６ｘ１６，３２ｘ３２等の様々なサイズの変換ユニットに対して直交変換を行う。そのため、大きさが０を超えるＤＣＴ係数の個数は、変換ユニットのサイズによって異なり得る。上述したように、本実施形態によれば、大きさが０より大きいＤＣＴ係数が占める割合に基づいてニューラルネットワークパラメータを選択する。これにより、大きさが０より大きいＤＣＴ係数が占める数を用いる場合に比べて、ブロックサイズの影響を低減することができる。また、本実施形態においては、圧縮動画データのエントロピー復号化によって得られる量子化ＤＣＴ係数に基づいてクラス分類を行うので、クラス分類のためのメタデータを符号化側から復号化側に伝達する必要がない。なお、互いに異なる複数の変換ユニットサイズで直交変換を行い、各変換ユニットサイズで得られたＤＣＴ係数に基づいて非０ＤＣＴ割合を算出して、非０ＤＣＴ割合の組み合わせに基づいてクラス分類を行ってもよい。一例として、最小サイズ（例えば、４×４サイズ）の変換ユニットサイズを適用した場合の非０ＤＣＴ割合と、最小サイズ以外の変換ユニットサイズを適用した場合の非０ＤＣＴ割合との基づいてクラス分類を行ってもよい。 As described above, the control unit 110 performs coding and decoding processing using the neural network parameters selected based on the non-0DCT ratio. For example, H. In the 265 coding method, orthogonal transformation is performed on conversion units of various sizes such as 4x4, 8x8, 16x16, 32x32 in the picture. Therefore, the number of DCT coefficients whose magnitude exceeds 0 may vary depending on the size of the conversion unit. As described above, according to the present embodiment, the neural network parameters are selected based on the proportion of the DCT coefficients having a magnitude greater than 0. As a result, the influence of the block size can be reduced as compared with the case where the number occupied by the DCT coefficient having a magnitude larger than 0 is used. Further, in the present embodiment, since the classification is performed based on the quantized DCT coefficient obtained by the entropy decoding of the compressed moving image data, it is necessary to transmit the metadata for the classification from the coding side to the decoding side. There is no. Even if orthogonal transform is performed with a plurality of conversion unit sizes different from each other, the non-0DCT ratio is calculated based on the DCT coefficient obtained for each conversion unit size, and the classification is performed based on the combination of the non-0DCT ratios. good. As an example, classification is performed based on the non-0DCT ratio when the conversion unit size of the minimum size (for example, 4 × 4 size) is applied and the non-0DCT ratio when the conversion unit size other than the minimum size is applied. You may.

なお、上記の実施形態の変形例として、ループフィルタ６４３及びループフィルタ７４３を備えない形態を採用してもよい。 As a modification of the above embodiment, a mode not provided with the loop filter 643 and the loop filter 743 may be adopted.

上述した非特許文献１及び非特許文献２には、量子化パラメータを固定してフィルタパラメータの学習を行うことが記載されている。一般に、動画の圧縮符号化において、圧縮率が低い場合と圧縮率が高い場合とでは符号化による画質劣化量が異なる。実際に商用で用いられる符号化器では、通常、ピクチャ内で量子化パラメータを可変とし、量子化行列を適用する場合が多いため、画質劣化量に違いが生じる。そのため、非特許文献１及び非特許文献２に記載されたような量子化パラメータを固定して学習したフィルタパラメータを商用の符号化器に適用すると、符号化効率が悪化する場合があり得る。これに対し、本実施形態によれば、少なくとも非０ＤＣＴ割合に基づくクラス分類を行うことでＣＮＮパラメータを選択するので、インター予測又はイントラ予測のループ構造内において、圧縮率に応じた適切なフィルタを適用することができる。これにより、符号化効率を高めることができる場合がある。 Non-Patent Document 1 and Non-Patent Document 2 described above describe that the quantization parameter is fixed and the filter parameter is learned. In general, in the compression coding of moving images, the amount of image quality deterioration due to coding differs between the case where the compression rate is low and the case where the compression rate is high. In a encoder that is actually used commercially, the quantization parameter is usually made variable in the picture and the quantization matrix is often applied, so that the amount of image quality deterioration occurs. Therefore, if the filter parameters learned by fixing the quantization parameters as described in Non-Patent Document 1 and Non-Patent Document 2 are applied to a commercial encoder, the coding efficiency may deteriorate. On the other hand, according to the present embodiment, since the CNN parameter is selected by classifying at least based on the non-zero DCT ratio, an appropriate filter according to the compression ratio can be used in the loop structure of inter-prediction or intra-prediction. Can be applied. As a result, the coding efficiency may be improved.

上記のような撮像装置１００は、移動体に搭載されてもよい。撮像装置１００は、図１１に示すような、無人航空機（ＵＡＶ）に搭載されてもよい。ＵＡＶ１０は、ＵＡＶ本体２０、ジンバル５０、複数の撮像装置６０、及び撮像装置１００を備えてよい。ジンバル５０、及び撮像装置１００は、撮像システムの一例である。ＵＡＶ１０は、推進部により推進される移動体の一例である。移動体とは、ＵＡＶの他、空中を移動する他の航空機などの飛行体、地上を移動する車両、水上を移動する船舶等を含む概念である。 The image pickup apparatus 100 as described above may be mounted on a moving body. The imaging device 100 may be mounted on an unmanned aerial vehicle (UAV) as shown in FIG. The UAV 10 may include a UAV main body 20, a gimbal 50, a plurality of image pickup devices 60, and an image pickup device 100. The gimbal 50 and the imaging device 100 are examples of an imaging system. The UAV 10 is an example of a moving body propelled by a propulsion unit. The moving body is a concept including a UAV, a flying object such as another aircraft moving in the air, a vehicle moving on the ground, a ship moving on the water, and the like.

ＵＡＶ本体２０は、複数の回転翼を備える。複数の回転翼は、推進部の一例である。ＵＡＶ本体２０は、複数の回転翼の回転を制御することでＵＡＶ１０を飛行させる。ＵＡＶ本体２０は、例えば、４つの回転翼を用いてＵＡＶ１０を飛行させる。回転翼の数は、４つには限定されない。また、ＵＡＶ１０は、回転翼を有さない固定翼機でもよい。 The UAV main body 20 includes a plurality of rotor blades. The plurality of rotor blades are an example of a propulsion unit. The UAV main body 20 flies the UAV 10 by controlling the rotation of a plurality of rotor blades. The UAV body 20 flies the UAV 10 using, for example, four rotor blades. The number of rotor blades is not limited to four. Further, the UAV 10 may be a fixed-wing aircraft having no rotor blades.

撮像装置１００は、所望の撮像範囲に含まれる被写体を撮像する撮像用のカメラである。ジンバル５０は、撮像装置１００を回転可能に支持する。ジンバル５０は、支持機構の一例である。例えば、ジンバル５０は、撮像装置１００を、アクチュエータを用いてピッチ軸で回転可能に支持する。ジンバル５０は、撮像装置１００を、アクチュエータを用いて更にロール軸及びヨー軸のそれぞれを中心に回転可能に支持する。ジンバル５０は、ヨー軸、ピッチ軸、及びロール軸の少なくとも１つを中心に撮像装置１００を回転させることで、撮像装置１００の姿勢を変更してよい。 The imaging device 100 is an imaging camera that captures a subject included in a desired imaging range. The gimbal 50 rotatably supports the imaging device 100. The gimbal 50 is an example of a support mechanism. For example, the gimbal 50 rotatably supports the image pickup device 100 on a pitch axis using an actuator. The gimbal 50 further rotatably supports the image pickup device 100 around each of the roll axis and the yaw axis by using an actuator. The gimbal 50 may change the posture of the image pickup device 100 by rotating the image pickup device 100 around at least one of the yaw axis, the pitch axis, and the roll axis.

複数の撮像装置６０は、ＵＡＶ１０の飛行を制御するためにＵＡＶ１０の周囲を撮像するセンシング用のカメラである。２つの撮像装置６０が、ＵＡＶ１０の機首である正面に設けられてよい。更に他の２つの撮像装置６０が、ＵＡＶ１０の底面に設けられてよい。正面側の２つの撮像装置６０はペアとなり、いわゆるステレオカメラとして機能してよい。底面側の２つの撮像装置６０もペアとなり、ステレオカメラとして機能してよい。複数の撮像装置６０により撮像された画像に基づいて、ＵＡＶ１０の周囲の３次元空間データが生成されてよい。ＵＡＶ１０が備える撮像装置６０の数は４つには限定されない。ＵＡＶ１０は、少なくとも１つの撮像装置６０を備えていればよい。ＵＡＶ１０は、ＵＡＶ１０の機首、機尾、側面、底面、及び天井面のそれぞれに少なくとも１つの撮像装置６０を備えてもよい。撮像装置６０で設定できる画角は、撮像装置１００で設定できる画角より広くてよい。撮像装置６０は、単焦点レンズまたは魚眼レンズを有してもよい。 The plurality of image pickup devices 60 are sensing cameras that image the surroundings of the UAV 10 in order to control the flight of the UAV 10. Two imaging devices 60 may be provided on the front surface, which is the nose of the UAV 10. Yet two other imaging devices 60 may be provided on the bottom surface of the UAV 10. The two image pickup devices 60 on the front side may form a pair and function as a so-called stereo camera. The two image pickup devices 60 on the bottom surface side may also be paired and function as a stereo camera. Three-dimensional spatial data around the UAV 10 may be generated based on the images captured by the plurality of imaging devices 60. The number of image pickup devices 60 included in the UAV 10 is not limited to four. The UAV 10 may include at least one imaging device 60. The UAV 10 may be provided with at least one imaging device 60 on each of the nose, nose, side surface, bottom surface, and ceiling surface of the UAV 10. The angle of view that can be set by the image pickup device 60 may be wider than the angle of view that can be set by the image pickup device 100. The image pickup apparatus 60 may have a single focus lens or a fisheye lens.

遠隔操作装置３００は、ＵＡＶ１０と通信して、ＵＡＶ１０を遠隔操作する。遠隔操作装置３００は、ＵＡＶ１０と無線で通信してよい。遠隔操作装置３００は、ＵＡＶ１０に上昇、下降、加速、減速、前進、後進、回転などのＵＡＶ１０の移動に関する各種命令を示す指示情報を送信する。指示情報は、例えば、ＵＡＶ１０の高度を上昇させる指示情報を含む。指示情報は、ＵＡＶ１０が位置すべき高度を示してよい。ＵＡＶ１０は、遠隔操作装置３００から受信した指示情報により示される高度に位置するように移動する。指示情報は、ＵＡＶ１０を上昇させる上昇命令を含んでよい。ＵＡＶ１０は、上昇命令を受け付けている間、上昇する。ＵＡＶ１０は、上昇命令を受け付けても、ＵＡＶ１０の高度が上限高度に達している場合には、上昇を制限してよい。 The remote control device 300 communicates with the UAV 10 to remotely control the UAV 10. The remote control device 300 may wirelessly communicate with the UAV 10. The remote control device 300 transmits to the UAV 10 instruction information indicating various commands related to the movement of the UAV 10, such as ascending, descending, accelerating, decelerating, advancing, reversing, and rotating. The instruction information includes, for example, instruction information for raising the altitude of the UAV 10. The instruction information may indicate the altitude at which the UAV 10 should be located. The UAV 10 moves so as to be located at an altitude indicated by the instruction information received from the remote control device 300. The instruction information may include an ascending instruction to ascend the UAV 10. The UAV10 rises while accepting the rise order. Even if the UAV10 accepts the ascending command, the ascending may be restricted if the altitude of the UAV10 has reached the upper limit altitude.

図１２は、本発明の複数の態様が全体的または部分的に具現化されてよいコンピュータ１２００の一例を示す。コンピュータ１２００にインストールされたプログラムは、コンピュータ１２００に、本発明の実施形態に係る装置に関連付けられるオペレーションまたは当該装置の１または複数の「部」として機能させることができる。例えば、コンピュータ１２００にインストールされたプログラムは、コンピュータ１２００に、制御部１１０として機能させることができる。または、当該プログラムは、コンピュータ１２００に当該オペレーションまたは当該１または複数の「部」の機能を実行させることができる。当該プログラムは、コンピュータ１２００に、本発明の実施形態に係るプロセスまたは当該プロセスの段階を実行させることができる。そのようなプログラムは、コンピュータ１２００に、本明細書に記載のフローチャート及びブロック図のブロックのうちのいくつかまたはすべてに関連付けられた特定のオペレーションを実行させるべく、ＣＰＵ１２１２によって実行されてよい。 FIG. 12 shows an example of a computer 1200 in which a plurality of aspects of the present invention may be embodied in whole or in part. The program installed on the computer 1200 can cause the computer 1200 to function as an operation associated with the device according to an embodiment of the present invention or as one or more "parts" of the device. For example, a program installed on a computer 1200 can cause the computer 1200 to function as a control unit 110. Alternatively, the program may cause the computer 1200 to perform the operation or the function of the one or more "parts". The program can cause a computer 1200 to perform a process or a step of the process according to an embodiment of the present invention. Such a program may be run by the CPU 1212 to cause the computer 1200 to perform certain operations associated with some or all of the blocks in the flowcharts and block diagrams described herein.

本実施形態によるコンピュータ１２００は、ＣＰＵ１２１２、及びＲＡＭ１２１４を含み、それらはホストコントローラ１２１０によって相互に接続されている。コンピュータ１２００はまた、通信インタフェース１２２２、入力／出力ユニットを含み、それらは入力／出力コントローラ１２２０を介してホストコントローラ１２１０に接続されている。コンピュータ１２００はまた、ＲＯＭ１２３０を含む。ＣＰＵ１２１２は、ＲＯＭ１２３０及びＲＡＭ１２１４内に格納されたプログラムに従い動作し、それにより各ユニットを制御する。 The computer 1200 according to this embodiment includes a CPU 1212 and a RAM 1214, which are connected to each other by a host controller 1210. The computer 1200 also includes a communication interface 1222, an input / output unit, which are connected to the host controller 1210 via an input / output controller 1220. The computer 1200 also includes a ROM 1230. The CPU 1212 operates according to the programs stored in the ROM 1230 and the RAM 1214, thereby controlling each unit.

通信インタフェース１２２２は、ネットワークを介して他の電子デバイスと通信する。ハードディスクドライブが、コンピュータ１２００内のＣＰＵ１２１２によって使用されるプログラム及びデータを格納してよい。ＲＯＭ１２３０はその中に、アクティブ化時にコンピュータ１２００によって実行されるブートプログラム等、及び／またはコンピュータ１２００のハードウェアに依存するプログラムを格納する。プログラムが、ＣＲ−ＲＯＭ、ＵＳＢメモリまたはＩＣカードのようなコンピュータ可読記録媒体またはネットワークを介して提供される。プログラムは、コンピュータ可読記録媒体の例でもあるＲＡＭ１２１４、またはＲＯＭ１２３０にインストールされ、ＣＰＵ１２１２によって実行される。これらのプログラム内に記述される情報処理は、コンピュータ１２００に読み取られ、プログラムと、上記様々なタイプのハードウェアリソースとの間の連携をもたらす。装置または方法が、コンピュータ１２００の使用に従い情報のオペレーションまたは処理を実現することによって構成されてよい。 Communication interface 1222 communicates with other electronic devices via a network. The hard disk drive may store programs and data used by the CPU 1212 in the computer 1200. The ROM 1230 stores in it a boot program or the like executed by the computer 1200 at the time of activation and / or a program depending on the hardware of the computer 1200. The program is provided via a computer-readable recording medium such as a CR-ROM, USB memory or IC card or network. The program is installed in RAM 1214 or ROM 1230, which is also an example of a computer-readable recording medium, and is executed by CPU 1212. The information processing described in these programs is read by the computer 1200 and provides a link between the program and the various types of hardware resources described above. The device or method may be configured to implement the operation or processing of information according to the use of the computer 1200.

例えば、通信がコンピュータ１２００及び外部デバイス間で実行される場合、ＣＰＵ１２１２は、ＲＡＭ１２１４にロードされた通信プログラムを実行し、通信プログラムに記述された処理に基づいて、通信インタフェース１２２２に対し、通信処理を命令してよい。通信インタフェース１２２２は、ＣＰＵ１２１２の制御の下、ＲＡＭ１２１４、またはＵＳＢメモリのような記録媒体内に提供される送信バッファ領域に格納された送信データを読み取り、読み取られた送信データをネットワークに送信し、またはネットワークから受信した受信データを記録媒体上に提供される受信バッファ領域等に書き込む。 For example, when communication is executed between the computer 1200 and an external device, the CPU 1212 executes a communication program loaded in the RAM 1214, and performs communication processing on the communication interface 1222 based on the processing described in the communication program. You may order. Under the control of the CPU 1212, the communication interface 1222 reads the transmission data stored in the transmission buffer area provided in the RAM 1214 or a recording medium such as a USB memory, and transmits the read transmission data to the network, or The received data received from the network is written to the reception buffer area or the like provided on the recording medium.

また、ＣＰＵ１２１２は、ＵＳＢメモリ等のような外部記録媒体に格納されたファイルまたはデータベースの全部または必要な部分がＲＡＭ１２１４に読み取られるようにし、ＲＡＭ１２１４上のデータに対し様々なタイプの処理を実行してよい。ＣＰＵ１２１２は次に、処理されたデータを外部記録媒体にライトバックしてよい。 Further, the CPU 1212 makes the RAM 1214 read all or necessary parts of a file or a database stored in an external recording medium such as a USB memory, and executes various types of processing on the data on the RAM 1214. good. The CPU 1212 may then write back the processed data to an external recording medium.

様々なタイプのプログラム、データ、テーブル、及びデータベースのような様々なタイプの情報が記録媒体に格納され、情報処理を受けてよい。ＣＰＵ１２１２は、ＲＡＭ１２１４から読み取られたデータに対し、本開示の随所に記載され、プログラムの命令シーケンスによって指定される様々なタイプのオペレーション、情報処理、条件判断、条件分岐、無条件分岐、情報の検索／置換等を含む、様々なタイプの処理を実行してよく、結果をＲＡＭ１２１４に対しライトバックする。また、ＣＰＵ１２１２は、記録媒体内のファイル、データベース等における情報を検索してよい。例えば、各々が第２の属性の属性値に関連付けられた第１の属性の属性値を有する複数のエントリが記録媒体内に格納される場合、ＣＰＵ１２１２は、第１の属性の属性値が指定される、条件に一致するエントリを当該複数のエントリの中から検索し、当該エントリ内に格納された第２の属性の属性値を読み取り、それにより予め定められた条件を満たす第１の属性に関連付けられた第２の属性の属性値を取得してよい。 Various types of information such as various types of programs, data, tables, and databases may be stored in recording media and processed. The CPU 1212 describes various types of operations, information processing, conditional judgment, conditional branching, unconditional branching, and information retrieval described in various parts of the present disclosure with respect to the data read from the RAM 1214. Various types of processing may be performed, including / replacement, etc., and the results are written back to the RAM 1214. Further, the CPU 1212 may search for information in a file, a database, or the like in the recording medium. For example, when a plurality of entries each having an attribute value of the first attribute associated with the attribute value of the second attribute are stored in the recording medium, the CPU 1212 specifies the attribute value of the first attribute. Search for an entry that matches the condition from the plurality of entries, read the attribute value of the second attribute stored in the entry, and associate it with the first attribute that satisfies the predetermined condition. The attribute value of the second attribute obtained may be acquired.

上で説明したプログラムまたはソフトウェアモジュールは、コンピュータ１２００上またはコンピュータ１２００近傍のコンピュータ可読記憶媒体に格納されてよい。また、専用通信ネットワークまたはインターネットに接続されたサーバーシステム内に提供されるハードディスクまたはＲＡＭのような記録媒体が、コンピュータ可読記憶媒体として使用可能であり、それによりプログラムを、ネットワークを介してコンピュータ１２００に提供する。 The program or software module described above may be stored on a computer 1200 or in a computer readable storage medium near the computer 1200. Also, a recording medium such as a hard disk or RAM provided within a dedicated communication network or a server system connected to the Internet can be used as a computer readable storage medium, thereby allowing the program to be transferred to the computer 1200 over the network. offer.

以上、本発明を実施の形態を用いて説明したが、本発明の技術的範囲は上記実施の形態に記載の範囲には限定されない。上記実施の形態に、多様な変更または改良を加えることが可能であることが当業者に明らかである。その様な変更または改良を加えた形態も本発明の技術的範囲に含まれ得ることが、特許請求の範囲の記載から明らかである。 Although the present invention has been described above using the embodiments, the technical scope of the present invention is not limited to the scope described in the above embodiments. It will be apparent to those skilled in the art that various changes or improvements can be made to the above embodiments. It is clear from the description of the claims that such modified or improved forms may also be included in the technical scope of the present invention.

特許請求の範囲、明細書、および図面中において示した装置、システム、プログラム、および方法における動作、手順、ステップ、および段階等の各処理の実行順序は、特段「より前に」、「先立って」等と明示しておらず、また、前の処理の出力を後の処理で用いるのでない限り、任意の順序で実現しうることに留意すべきである。特許請求の範囲、明細書、および図面中の動作フローに関して、便宜上「まず、」、「次に、」等を用いて説明したとしても、この順で実施することが必須であることを意味するものではない。 The order of execution of operations, procedures, steps, steps, etc. in the devices, systems, programs, and methods shown in the claims, specification, and drawings is particularly "before" and "prior to". It should be noted that it can be realized in any order unless the output of the previous process is used in the subsequent process. Even if the scope of claims, the specification, and the operation flow in the drawings are explained using "first", "next", etc. for convenience, it means that it is essential to carry out in this order. It's not a thing.

１０ＵＡＶ
２０ＵＡＶ本体
５０ジンバル
６０撮像装置
１００撮像装置
１０２撮像部
１１０制御部
１２０イメージセンサ
１３０メモリ
１６０表示部
１６２指示部
２００レンズ部
２１０レンズ
２１２レンズ駆動部
２２０レンズ制御部
２２２メモリ
３００遠隔操作装置
３１０ＤＣＴ係数計数部
３２０クラス決定部
３３０ＣＮＮ学習器
６１０リオーダ部
６２０直交変換部
６３０量子化部
６４１逆量子化部
６４２逆直交変換部
６４３ループフィルタ
６４４ＣＮＮフィルタ
６４５メモリ
６４６ＤＣＴ係数計数部
６４７クラス決定部
６４８イントラ予測部
６４９インター予測部
６５０エントロピー符号化部
６６０レート制御部
６７０バッファ
７１０リオーダー部
７４１逆量子化部
７４２逆直交変換部
７４３ループフィルタ
７４４ＣＮＮフィルタ
７４５メモリ
７４６ＤＣＴ係数計数部
７４７クラス決定部
７４８イントラ予測部
７４９インター予測部
７５０エントロピー復号化部
１０００Ｉピクチャ
１００１、１００３Ｎｏｎ−Ｓｔｏｒｅｄ−Ｂピクチャ
１００２Ｓｔｏｒｅｄ−Ｂピクチャ
１００４Ｐピクチャ
１２００コンピュータ
１２１０ホストコントローラ
１２１２ＣＰＵ
１２１４ＲＡＭ
１２２０入力／出力コントローラ
１２２２通信インタフェース
１２３０ＲＯＭ 10 UAV
20 UAV main unit 50 gimbal 60 image pickup device 100 image pickup device 102 image pickup unit 110 control unit 120 image sensor 130 memory 160 display unit 162 indicator unit 200 lens unit 210 lens 212 lens drive unit 220 lens control unit 222 memory 300 remote control device 310 DCT coefficient Counting unit 320 Class determination unit 330 CNN learner 610 Reorder unit 620 Discrete cosine transform unit 630 Quantization unit 641 Inverse quantization unit 642 Inverse orthogonal transform unit 643 Loop filter 644 CNN filter 645 Memory 646 DCT coefficient Counting unit 647 Class determination unit 648 Intra Prediction unit 649 Inter prediction unit 650 Entropy coding unit 660 Rate control unit 670 Buffer 710 Reordering unit 741 Inverse quantization unit 742 Inverse orthogonal transform unit 743 Loop filter 744 CNN filter 745 Memory 746 DCT coefficient Counting unit 747 Class determination unit 748 Intra Prediction unit 749 Inter-prediction unit 750 Entropy decoding unit 1000 I picture 1001, 1003 Non-Story-B picture 1002 Straight-B picture 1004 P picture 1200 Computer 1210 Host controller 1212 CPU
1214 RAM
1220 Input / Output Controller 1222 Communication Interface 1230 ROM

Claims

符号化された画像の空間周波数成分のうち大きさが予め定められた値を超える空間周波数成分の量に対応づけて、復号化された画像を処理するための複数の学習済みニューラルネットワークを記憶し、
画像の符号化により生成された符号化データを取得し、
前記符号化データから、符号化された画像の複数の空間周波数成分を取得し、
前記符号化データを復号化することにより、復号化画像を生成し、
前記符号化データから取得した前記複数の空間周波数成分のうち大きさが前記予め定められた値を超える空間周波数成分の量を取得し、
前記複数の学習済みニューラルネットワークのうち前記取得した空間周波数成分の量に対応づけられた学習済みニューラルネットワークを選択し、
前記選択した前記学習済みニューラルネットワークを用いて、前記生成した前記復号化画像を処理する
ように構成された回路
を備える装置。 Stores a plurality of trained neural networks for processing the decoded image in association with the amount of spatial frequency components of the encoded image whose magnitude exceeds a predetermined value. ,
Get the coded data generated by image coding and
From the coded data, a plurality of spatial frequency components of the coded image are acquired, and a plurality of spatial frequency components are obtained.
By decoding the coded data, a decoded image is generated.
Among the plurality of spatial frequency components acquired from the coded data, the amount of the spatial frequency component whose magnitude exceeds the predetermined value is acquired.
A trained neural network corresponding to the amount of the acquired spatial frequency component is selected from the plurality of trained neural networks, and the trained neural network is selected.
A device comprising a circuit configured to process the generated decoded image using the selected trained neural network.

前記回路は、
符号化された画像の空間周波数成分のうち大きさが予め定められた値を超える空間周波数成分の割合に対応づけて、前記複数の学習済みニューラルネットワークを記憶し、
前記符号化データから取得した前記複数の空間周波数成分のうち大きさが前記予め定められた値を超える空間周波数成分の割合を取得し、
前記複数の学習済みニューラルネットワークのうち前記取得した空間周波数成分の割合に対応づけられた学習済みニューラルネットワークを選択する
ように構成される請求項１に記載の装置。 The circuit
The plurality of trained neural networks are stored in association with the ratio of the spatial frequency components whose magnitude exceeds a predetermined value among the spatial frequency components of the encoded image.
Among the plurality of spatial frequency components acquired from the coded data, the proportion of the spatial frequency component whose magnitude exceeds the predetermined value is acquired.
The apparatus according to claim 1, wherein the trained neural network associated with the ratio of the acquired spatial frequency component is selected from the plurality of trained neural networks.

前記予め定められた値はゼロである
請求項１又は２に記載の装置。 The device according to claim 1 or 2, wherein the predetermined value is zero.

前記画像は、動画を構成する動画構成画像であり、
前記符号化データは、動画構成画像のインター予測又はイントラ予測によって得られた複数の空間周波数成分の予測差分値の量子化によって得られた複数の量子化差分値を示す情報を含み、
前記回路は、
複数の量子化差分値のうち大きさが前記予め定められた値を超える量子化差分値の量に対応づけて、前記複数の学習済みニューラルネットワークを記憶し、
前記符号化データから取得した前記複数の量子化差分値のうち大きさが前記予め定められた値を超える量子化差分値の量を取得し、
前記複数の学習済みニューラルネットワークのうち前記取得した空間周波数成分の量子化差分値の量に対応づけられた学習済みニューラルネットワークを選択し、
前記符号化データから取得した前記複数の量子化差分値の逆量子化によって得られた空間周波数成分の予測差分値に基づいて差分画像を生成し、生成した差分画像にインター予測画像又はイントラ予測画像を加算することにより、復号化された動画構成画像を生成し、
前記選択した前記学習済みニューラルネットワークを用いて、前記生成した前記動画構成画像を処理する
ように構成される請求項１又は２に記載の装置。 The image is a moving image constituent image constituting a moving image, and is
The coded data includes information indicating a plurality of quantization difference values obtained by quantization of prediction difference values of a plurality of spatial frequency components obtained by inter-prediction or intra-prediction of a moving image.
The circuit
The plurality of trained neural networks are stored in association with the amount of the quantization difference value whose magnitude exceeds the predetermined value among the plurality of quantization difference values.
Among the plurality of quantization difference values acquired from the coded data, the amount of the quantization difference value whose magnitude exceeds the predetermined value is acquired.
From the plurality of trained neural networks, a trained neural network corresponding to the amount of the quantization difference value of the acquired spatial frequency component is selected.
A difference image is generated based on the predicted difference value of the spatial frequency component obtained by inverse quantization of the plurality of quantization difference values obtained from the coded data, and the inter-predicted image or the intra-predicted image is added to the generated difference image. By adding, a decoded video composition image is generated,
The apparatus according to claim 1 or 2, wherein the trained neural network selected is used to process the generated moving image constituent image.

前記画像は、動画を構成する動画構成画像であり、
前記回路は、
動画構成画像のピクチャ種別にさらに対応づけて、前記複数の学習済みニューラルネットワークを記憶し、
前記複数の学習済みニューラルネットワークのうち、前記符号化データのピクチャ種別と前記取得した空間周波数成分の量とに対応づけられた学習済みニューラルネットワークを選択する
ように構成される請求項１又は２に記載の装置。 The image is a moving image constituent image constituting a moving image, and is
The circuit
The plurality of trained neural networks are stored in association with the picture type of the moving image, and the trained neural networks are stored.
Claim 1 or 2 configured to select a trained neural network associated with the picture type of the coded data and the amount of the acquired spatial frequency component from the plurality of trained neural networks. The device described.

前記学習済みニューラルネットワークは、学習用画像と前記学習用画像の符号化データとを学習データとして用いて、前記学習データに含まれる前記符号化データから取得された複数の空間周波数成分のうち大きさが前記予め定められた値を超える空間周波数成分の量に応じて機械学習を行うことによって得られた畳み込みニューラルネットワークである
請求項１又は２に記載の装置。 The trained neural network uses the training image and the coded data of the training image as training data, and has a size among a plurality of spatial frequency components acquired from the coded data included in the training data. The apparatus according to claim 1 or 2, wherein is a convolutional neural network obtained by performing machine learning according to the amount of spatial frequency components exceeding the predetermined value.

符号化された画像の空間周波数成分のうち大きさが予め定められた値を超える空間周波数成分の量に対応づけて、復号化された画像を処理するための複数の学習済みニューラルネットワークを記憶し、
符号化対象画像をインター予測又はイントラ予測を含む符号化処理で符号化することによって、複数の空間周波数成分の予測差分値を示す情報を含む符号化データを生成し、
前記符号化データを出力し、
前記符号化データを復号化することによって復号化画像を生成し、
前記複数の学習済みニューラルネットワークのうち前記複数の空間周波数成分の予測差分値のうち大きさが予め定められた値を超える空間周波数成分の予測差分値の量に対応づけられた学習済みニューラルネットワークを選択し、
前記選択した前記学習済みニューラルネットワークを用いて、前記生成した前記復号化画像を処理することによって、前記インター予測又は前記イントラ予測に用いられる参照用画像を生成する
ように構成された回路
を備える装置。 Stores a plurality of trained neural networks for processing the decoded image in association with the amount of spatial frequency components of the encoded image whose magnitude exceeds a predetermined value. ,
By encoding the image to be encoded by a coding process including inter-prediction or intra-prediction, coded data including information indicating predicted difference values of a plurality of spatial frequency components is generated.
Output the coded data and
A decoded image is generated by decoding the coded data,
A trained neural network in which the magnitude of the predicted difference values of the plurality of spatial frequency components exceeds a predetermined value among the plurality of trained neural networks is associated with the amount of the predicted difference values of the spatial frequency components. selection,
An apparatus comprising a circuit configured to generate a reference image used for the inter-prediction or the intra-prediction by processing the generated decoded image using the selected trained neural network. ..

請求項１に記載の装置と、
請求項７に記載の装置と
を備える画像処理装置。 The device according to claim 1 and
An image processing device including the device according to claim 7.

請求項１又は２に記載の装置と、
画像を生成するイメージセンサと
を備える撮像装置。 The device according to claim 1 or 2,
An imaging device including an image sensor that generates an image.

請求項９に記載の撮像装置を備えて移動する移動体。 A moving body that moves with the imaging device according to claim 9.

前記移動体は、無人航空機である
請求項１０に記載の移動体。 The mobile body according to claim 10, wherein the mobile body is an unmanned aerial vehicle.

コンピュータを請求項１又は２に記載の装置
として機能させるためのプログラム。 A program for operating a computer as the device according to claim 1 or 2.

符号化された画像の空間周波数成分のうち大きさが予め定められた値を超える空間周波数成分の量に対応づけて、復号化された画像を処理するための複数の学習済みニューラルネットワークを記憶する段階と、
画像の符号化により生成された符号化データを取得する段階と、
前記符号化データから、符号化された画像の複数の空間周波数成分を取得する段階と、
前記符号化データを復号化することにより、復号化画像を生成する段階と、
前記符号化データから取得した前記複数の空間周波数成分のうち大きさが前記予め定められた値を超える空間周波数成分の量を取得する段階と、
前記複数の学習済みニューラルネットワークのうち前記取得した空間周波数成分の量に対応づけられた学習済みニューラルネットワークを選択する段階と、
前記選択した前記学習済みニューラルネットワークを用いて、前記生成した前記復号化画像を処理する段階と
を備える方法。 Stores a plurality of trained neural networks for processing the decoded image in association with the amount of spatial frequency components of the encoded image whose magnitude exceeds a predetermined value. Stages and
The stage of acquiring the coded data generated by coding the image, and
A step of acquiring a plurality of spatial frequency components of a coded image from the coded data, and
The stage of generating a decoded image by decoding the coded data, and
A step of acquiring the amount of the spatial frequency component whose magnitude exceeds the predetermined value among the plurality of spatial frequency components acquired from the coded data, and a step of acquiring the amount of the spatial frequency component.
A step of selecting a trained neural network corresponding to the amount of the acquired spatial frequency component from the plurality of trained neural networks, and a step of selecting the trained neural network.
A method comprising a step of processing the generated decoded image using the selected trained neural network.

符号化後の画像の空間周波数成分のうち大きさが予め定められた値を超える空間周波数成分の量に対応づけて、復号化後の画像を処理するための複数の学習済みニューラルネットワークを記憶する段階と、
符号化対象画像をインター予測又はイントラ予測を含む符号化処理で符号化することによって、複数の空間周波数成分の予測差分値を示す情報を含む符号化データを生成し、
前記符号化データを出力する段階と、
前記符号化データを復号化することによって復号化画像を生成する段階と、
前記複数の学習済みニューラルネットワークのうち前記複数の空間周波数成分の予測差分値のうち大きさが予め定められた値を超える空間周波数成分の予測差分値の量に対応づけられた学習済みニューラルネットワークを選択する段階と、
前記選択した前記学習済みニューラルネットワークを用いて、前記生成した前記復号化画像を処理することによって、前記インター予測又は前記イントラ予測に用いられる参照用画像を生成する段階と
を備える方法。 Stores a plurality of trained neural networks for processing the decoded image in association with the amount of spatial frequency components whose magnitude exceeds a predetermined value among the spatial frequency components of the encoded image. Stages and
By encoding the image to be encoded by a coding process including inter-prediction or intra-prediction, coded data including information indicating predicted difference values of a plurality of spatial frequency components is generated.
The stage of outputting the coded data and
A step of generating a decoded image by decoding the coded data, and
A trained neural network in which the magnitude of the predicted difference values of the plurality of spatial frequency components exceeds a predetermined value among the plurality of trained neural networks is associated with the amount of the predicted difference values of the spatial frequency components. The stage of selection and
A method comprising a step of generating a reference image used for the inter-prediction or the intra-prediction by processing the generated decoded image using the selected trained neural network.