JP7465500B2

JP7465500B2 - IMAGE PROCESSING METHOD, IMAGE PROCESSING APPARATUS AND PROGRAM

Info

Publication number: JP7465500B2
Application number: JP2020088398A
Authority: JP
Inventors: 聡志鈴木; 隆一谷田; 英明木全; 逸庄野
Original assignee: THE UNIVERSITY OF ELECTRO-COMUNICATINS; Nippon Telegraph and Telephone Corp
Current assignee: THE UNIVERSITY OF ELECTRO-COMUNICATINS; Nippon Telegraph and Telephone Corp
Priority date: 2020-05-20
Filing date: 2020-05-20
Publication date: 2024-04-11
Anticipated expiration: 2040-05-20
Also published as: JP2024045408A; JP2021182345A

Description

本発明は、画像処理方法、画像処理装置及びプログラムに関する。 The present invention relates to an image processing method, an image processing device, and a program.

本説明では、実際の数式における下付文字を、アンダーバー“＿”を用いて記載する。例えば、“Ｘ”の文字に“ｎ”が下付文字として付加された文字列を示す場合には、全角文字又は半角文字によって、“Ｘ＿ｎ”と記載する。 In this explanation, subscripts in actual mathematical formulas are written using an underscore "_". For example, to indicate a string in which "n" is added as a subscript to the letter "X", it is written as "X_n" using full-width or half-width characters.

近年、機械学習技術を用いた画像処理技術の精度向上が発展している。画像処理の具体例としては、画像内の被写体を識別する処理（以下「画像識別処理」という。）、画像内の被写体を検出する処理、画像の領域を特定の基準で分割する処理、などがある。このような処理を行う機械学習技術の中でも、特に畳み込みニューラルネットワーク（Convolutional Neural Network，ＣＮＮ）の精度向上が著しい。これらの機械学習技術を用いることで各種業務における目視工程を自動化する技術が、特に注目を集めている。 In recent years, image processing technology using machine learning has become more and more advanced. Specific examples of image processing include processing to identify subjects in an image (hereinafter referred to as "image identification processing"), processing to detect subjects in an image, and processing to divide areas of an image according to specific criteria. Among the machine learning technologies that perform such processing, the accuracy of Convolutional Neural Networks (CNN) has improved significantly. Technologies that use these machine learning technologies to automate visual inspection processes in various business operations are attracting particular attention.

上述したような目視工程の自動化を推進する場合、撮像画像の品質は必ずしも高くはないことが想定される。例えば、暗所で撮影されることによって、コントラストが低下した画像やカメラノイズが発生した画像が得られる可能性がある。また、撮影対象が高速で移動する場合には、ブラーが発生する可能性がある。さらに、撮像画像を非可逆圧縮する場合には、圧縮歪みが生じる可能性もある。ＣＮＮを用いた画像識別精度では、画像歪みのうち特にノイズ及びブラーは大きな精度低下をもたらす事が知られている（非特許文献１参照）。さらに、人間とＣＮＮとの歪みに対する頑健性を調査した先行技術文献では、人間と比較してＣＮＮの頑健性は大きく劣っていることが知られている（非特許文献２参照）。そのため、人間の目視工程のＣＮＮによる自動化において、人間が想定していないような挙動が示される危険性がある。 When promoting the automation of the visual inspection process as described above, it is expected that the quality of the captured image will not necessarily be high. For example, when an image is captured in a dark place, an image with reduced contrast or an image with camera noise may be obtained. In addition, when the subject moves at high speed, blurring may occur. Furthermore, when the captured image is lossy compressed, compression distortion may occur. It is known that image recognition accuracy using CNN is greatly reduced by image distortion, especially noise and blur (see Non-Patent Document 1). Furthermore, in a prior art document that investigated the robustness of humans and CNNs against distortion, it is known that the robustness of CNNs is significantly inferior to humans (see Non-Patent Document 2). Therefore, there is a risk that behavior that humans do not expect will be exhibited when CNNs are used to automate the human visual inspection process.

このようなＣＮＮの問題に対し、歪みに頑健な画像処理を実現する手法が提案されている。このような手法の一つとして、Fine-tuningに基づく手法が提案されている（非特許文献３及び４）。Fine-tuningとは、歪みをほとんど含んでいない標準データセットで学習済みのＣＮＮの重みを初期値として、歪んだ画像で構成されたデータセットで再度ＣＮＮを学習する技術である。このような処理を行うことによって、歪みに対するロバスト性を高めることを可能としている。Fine-tuningされたＣＮＮは、Fine-tuning前のＣＮＮと比較して、歪みに対する頑健性が高まることが期待されている。 To address these problems with CNNs, methods have been proposed to achieve image processing that is robust to distortion. One such method is based on fine-tuning (Non-Patent Documents 3 and 4). Fine-tuning is a technique in which the weights of a CNN that has already been trained on a standard dataset that contains almost no distortion are used as initial values, and the CNN is trained again on a dataset that is made up of distorted images. By performing this type of processing, it is possible to increase robustness against distortion. A fine-tuned CNN is expected to be more robust against distortion than a CNN before fine-tuning.

S. Dodge, L. Karam,“Understanding How Image Quality Affects Deep Neural Networks”, 2016.S. Dodge, L. Karam, “Understanding How Image Quality Affects Deep Neural Networks”, 2016. R. Geirhos, C. R. M. Temme, J. Rauber, H. H. Schutt, M. Bethge, F. A. Wichmann, “Generalisation in humans and deep neural networks”, 2018.R. Geirhos, C. R. M. Temme, J. Rauber, H. H. Schutt, M. Bethge, F. A. Wichmann, “Generalisation in humans and deep neural networks”, 2018. Y. Zhou, S. Song, N.-M. Cheung, “On classification of distorted images with deep convolutional neural networks”, 2017.Y. Zhou, S. Song, N.-M. Cheung, “On classification of distorted images with deep convolutional neural networks”, 2017. I. Vasiljevic, A. Chakrabarti, G. Shakhnarovich, “Examining the impact of blur on recognition by convolutional networks”, 2017.I. Vasiljevic, A. Chakrabarti, G. Shakhnarovich, “Examining the impact of blur on recognition by convolutional networks”, 2017. K. Simonyan, A. Vedaldi, A. Zisserman, “Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps”, 2014.K. Simonyan, A. Vedaldi, A. Zisserman, “Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps”, 2014.

しかしながら、Fine-tuningに関しては以下のような理由により十分な頑健性が得られない可能性があった。Fine-tuningに基づく手法では、通常のＣＮＮの学習の際に用いられる誤差逆伝播法によって学習が行われる。図１１は、一般的なＣＮＮの順行伝播による認識とそれに対応する逆伝播による学習の過程を示す図である。ここでの非線形変換では、一般的にＣＮＮに用いられるＲｅＬＵによる処理が想定されている。逆伝播における1(X_(n+1)>0)は0より大きい値を持つ行列X_(n+1)の成分に１を、それ以外の成分に０を持っている行列である。したがって、順行伝播時のＲｅＬＵ処理で０にならなかった成分のみ誤差を伝播する。図１２は、図１１において示された誤差の伝播を簡略して示す概略図である。なお、ＲｅＬＵ処理のみでなく、Max-pooling処理においても、選択されなかった成分からは誤差情報を伝播されない（非特許文献５）。つまり、特定の画像特徴に対して選択的に反応するニューロンを経由する誤差情報は、そのニューロンが正の出力を示さない限り、下位の層へ伝播できないという問題がある。 However, there is a possibility that fine-tuning may not be robust enough for the following reasons. In the method based on fine-tuning, learning is performed by the error backpropagation method used in normal CNN learning. FIG. 11 is a diagram showing the recognition by forward propagation of a general CNN and the corresponding learning process by backpropagation. In the nonlinear transformation here, processing by ReLU, which is generally used in CNN, is assumed. 1(X_(n+1)>0) in backpropagation is a matrix in which the elements of the matrix X_(n+1) that have values greater than 0 have 1 and the other elements have 0. Therefore, only the elements that do not become 0 in the ReLU processing during forward propagation propagate the error. FIG. 12 is a schematic diagram showing a simplified propagation of the error shown in FIG. 11. Note that not only in the ReLU processing but also in the Max-pooling processing, error information is not propagated from the unselected components (Non-Patent Document 5). In other words, there is a problem that error information passing through a neuron that selectively responds to a specific image feature cannot be propagated to a lower layer unless that neuron shows a positive output.

このように、歪みを含んだ画像に対するFine-tuningを考えると、歪みによって、ＣＮＮが反応を示すように学習されていた原画像の特徴（例えばテクスチャや曲率など）が消失してしまっている場合がありえる。その場合、そのような画像特徴に対して選択的に反応を示していたニューロンが反応しなくなる。上述のように、誤差情報が伝播されなくなるため、誤差逆伝播学習が不十分になってしまう可能性がある。このような問題は、ＣＮＮに限った問題ではなく、畳み込みが行われるニューラルネットワーク等、ニューラルネットワーク全般に共通した問題である。 When considering fine-tuning for distorted images in this way, it is possible that the distortion may cause the loss of features of the original image that the CNN was trained to respond to (such as texture or curvature). In that case, neurons that selectively responded to such image features will no longer respond. As mentioned above, error information will no longer be propagated, which may result in insufficient error backpropagation learning. This type of problem is not unique to CNNs, but is common to neural networks in general, such as neural networks that perform convolution.

上記事情に鑑み、本発明は、歪みが生じた画像に対するニューラルネットワークの画像処理の精度を向上させる技術の提供を目的としている。 In view of the above circumstances, the present invention aims to provide a technology that improves the accuracy of neural network image processing for distorted images.

本発明の一態様は、処理対象の画像である対象画像を取得する取得ステップと、取得された対象画像に対して所定の画像処理を行うことで画像処理結果を得る画像処理ステップと、を有し、前記画像処理では、歪みがない画像の画像を特徴付ける特徴情報と、歪みがある画像の画像を特徴付ける特徴情報と、を関連付けることで得られた画像処理情報が用いられている、画像処理方法である。 One aspect of the present invention is an image processing method that includes an acquisition step of acquiring a target image that is an image to be processed, and an image processing step of performing a predetermined image processing on the acquired target image to obtain an image processing result, in which the image processing uses image processing information obtained by associating feature information that characterizes an image without distortion with feature information that characterizes an image with distortion.

本発明の一態様は、歪みがない画像の画像を特徴付ける特徴情報と、歪みがある画像の画像を特徴付ける特徴情報と、を関連付けることで得られた画像処理情報を記憶する記憶部と、処理対象の画像である対象画像を取得し、取得された対象画像に対して、前記画像処理情報を用いて所定の画像処理を行うことで画像処理結果を得る画像処理部と、を有する画像処理装置である。 One aspect of the present invention is an image processing device having a storage unit that stores image processing information obtained by associating feature information that characterizes an image without distortion with feature information that characterizes an image with distortion, and an image processing unit that obtains an image processing result by acquiring a target image that is an image to be processed and performing a predetermined image processing on the acquired target image using the image processing information.

本発明の一態様は、上記の画像処理方法をコンピューターに実行させるためのプログラムである。 One aspect of the present invention is a program for causing a computer to execute the image processing method described above.

本発明により、歪みが生じた画像に対するニューラルネットワークの画像処理の精度を向上させることが可能となる。 This invention makes it possible to improve the accuracy of neural network image processing for distorted images.

本発明の第１実施形態の機能構成例を示す図である。FIG. 2 is a diagram illustrating an example of a functional configuration of the first embodiment of the present invention. 制御部１３の処理の全体の流れの具体例を示す図である。4 is a diagram showing a specific example of the overall flow of processing by the control unit 13. FIG. 歪み画像生成部１３１の処理の具体例を示すフローチャートである。10 is a flowchart showing a specific example of processing performed by a distortion image generating unit 131. 原画像処理部１３２の処理の具体例を示すフローチャートである。13 is a flowchart showing a specific example of processing by the original image processing unit 132. 歪み画像処理部１３３の処理の具体例を示すフローチャートである。10 is a flowchart showing a specific example of processing by a distortion image processing unit 133. 中間類似ロス算出部１３４の処理の具体例を示すフローチャートである。13 is a flowchart showing a specific example of a process of the intermediate similar loss calculation unit 134. 最適化部１３５の処理の具体例を示すフローチャートである。13 is a flowchart showing a specific example of processing of an optimization unit 135. 本発明の第２実施形態の機能構成例を示す図である。FIG. 11 is a diagram illustrating an example of a functional configuration of a second embodiment of the present invention. 重み調整部２３６の処理の具体例を示すフローチャートである。13 is a flowchart showing a specific example of processing by a weight adjustment unit 236. 本発明の第３実施形態の機能構成例を示す図である。FIG. 13 is a diagram illustrating an example of a functional configuration of a third embodiment of the present invention. 一般的なＣＮＮの順行伝播による認識とそれに対応する逆伝播による学習の過程を示す図である。FIG. 1 is a diagram showing the process of recognition by forward propagation in a typical CNN and the corresponding learning by backpropagation. 図１１において示された誤差の伝播を簡略して示す概略図である。FIG. 12 is a simplified schematic diagram showing the error propagation shown in FIG. 11 .

まず、本実施形態の概略について説明する。本実施形態では、歪がない画像の特徴の中でも特に画像を特徴づけるための貢献度が高い特徴（以下「高貢献特徴」という。）のニューロンの発火を合わせる。高貢献特徴とは、例えば大きいテクスチャや、境界を示す特徴である。このような高貢献特徴に対応するニューロンの発火状態を歪がある画像と歪がない画像とで一致させることができれば、各画像それぞれに対して行った画像処理の結果の誤差を小さくすることができると考えられる。そのため、本件発明においては高貢献特徴に対応するニューロンの発火状態を歪がある画像と歪がない画像とで一致するよう学習を行う。
後述するＣＮＮにおいて、高貢献特徴と対応するニューロンは、高次のニューロンである場合が多い。そこで、一つの実施形態においてはＣＮＮを用いる場合は高次のニューロンの発火状態をあわせることとした。なお、ＣＮＮ以外のニューラルネットワークを用いる場合、高次のニューロンの発火状態を合わせるのではなく、高貢献特徴に対応するニューロンの発火状態をあわせるようにすべきである。 First, an outline of this embodiment will be described. In this embodiment, the firing of neurons corresponding to features that contribute particularly to characterizing an image among the features of an image without distortion (hereinafter referred to as "high contribution features") is matched. High contribution features are, for example, features that indicate large textures or boundaries. If the firing state of neurons corresponding to such high contribution features can be matched between an image with distortion and an image without distortion, it is considered that the error in the results of image processing performed on each image can be reduced. Therefore, in the present invention, learning is performed so that the firing state of neurons corresponding to high contribution features is matched between an image with distortion and an image without distortion.
In the CNN described later, the neurons corresponding to the high-contribution features are often high-order neurons. Therefore, in one embodiment, when using a CNN, the firing states of the high-order neurons are matched. Note that, when using a neural network other than a CNN, the firing states of the neurons corresponding to the high-contribution features should be matched, rather than matching the firing states of the high-order neurons.

［第１実施形態］
図１は、本発明の第１実施形態の機能構成例を示す図である。学習装置１０は、パーソナルコンピューターやサーバー装置等の情報処理装置を用いて構成される。学習装置１０は、学習画像記憶部１１、パラメータ記憶部１２及び制御部１３を備える。学習装置１０は、ニューラルネットワークを用いた学習処理を実行する。ニューラルネットワークの具体例として、例えば畳み込み処理が行われるニューラルネットワークが適用されてもよい。より具体的な例として、ＣＮＮがある。以下の説明では、ニューラルネットワークの具体例としてＣＮＮを適用した場合の実施形態について説明する。 [First embodiment]
FIG. 1 is a diagram showing an example of a functional configuration of a first embodiment of the present invention. A learning device 10 is configured using an information processing device such as a personal computer or a server device. The learning device 10 includes a learning image storage unit 11, a parameter storage unit 12, and a control unit 13. The learning device 10 executes a learning process using a neural network. As a specific example of a neural network, for example, a neural network in which a convolution process is performed may be applied. As a more specific example, there is a CNN. In the following description, an embodiment in which a CNN is applied as a specific example of a neural network will be described.

学習画像記憶部１１は、磁気ハードディスク装置や半導体記憶装置等の記憶装置を用いて構成される。学習画像記憶部１１は、学習画像のデータを記憶する。学習画像データは、いわゆる教師付き学習に用いられる教師データである。学習画像データは、例えば学習に用いられる画像データと、その画像データにおける正解を示す正解ラベルとの組合せを含む。学習画像に含まれる画像データは、歪みが生じていない画像であることが望ましい。 The training image storage unit 11 is configured using a storage device such as a magnetic hard disk device or a semiconductor storage device. The training image storage unit 11 stores training image data. Training image data is teacher data used in so-called supervised learning. The training image data includes, for example, a combination of image data used for learning and a correct answer label indicating the correct answer in that image data. It is desirable that the image data included in the training image is an image that is not distorted.

正解ラベルがどのような情報であるかは、どのような学習処理が行われるかによって定まる。例えば、画像データに写っている被写体の種別を判定するための学習処理である場合には、画像データに写っている被写体の種別の正解を示す正解ラベルが与えられる。このような場合、正解ラベルは、例えば被写体の種別を示すベクトル列として与えられてもよい。 The type of information that the correct label is is determined by the type of learning process that is being performed. For example, if the learning process is to determine the type of subject that appears in the image data, a correct label that indicates the correct type of subject that appears in the image data is given. In such a case, the correct label may be given, for example, as a vector sequence that indicates the type of subject.

例えば、画像データに写っている被写体の中から所定の物体を検出するための学習処理である場合には、画像データに写っている特定の物体の位置の正解を示す正解ラベルが与えられる。このような場合、正解ラベルは、例えば特定の物体の位置を示す配列として与えられてもよい。 For example, in the case of a learning process for detecting a specific object from among subjects captured in image data, a correct answer label is provided that indicates the correct position of a specific object captured in the image data. In such a case, the correct answer label may be provided, for example, as an array that indicates the position of the specific object.

例えば、画像データの領域分割を行うための学習処理である場合には、画像データの各画素がどの領域に属するかの正解を示す正解ラベルが与えられる。このような場合、正解ラベルは、例えば各画素が属する領域を示す配列として与えられてもよい。 For example, in the case of a learning process for performing region segmentation of image data, a correct answer label is given that indicates the correct answer as to which region each pixel of the image data belongs to. In such a case, the correct answer label may be given, for example, as an array that indicates the region to which each pixel belongs.

パラメータ記憶部１２は、磁気ハードディスク装置や半導体記憶装置等の記憶装置を用いて構成される。パラメータ記憶部１２は、予め実行されたＣＮＮに基づく学習処理によって得られた学習済みモデルを示す画像処理パラメータを記憶する。パラメータ記憶部１２に記憶される画像処理パラメータに係る学習済みモデルは、学習画像記憶部１１において記憶される教師データを用いて行われる学習処理と同種の学習処理によって得られる。例えば、パラメータ記憶部１２に記憶される画学習済みモデルが、画像データに写っている被写体の種別を判定するための学習済みモデルである場合には、学習画像記憶部１１に記憶される教師データは、画像データに写っている被写体の種別を判定するための学習処理に用いられる教師データである。 The parameter storage unit 12 is configured using a storage device such as a magnetic hard disk device or a semiconductor storage device. The parameter storage unit 12 stores image processing parameters indicating a trained model obtained by a learning process based on CNN that has been executed in advance. The trained model related to the image processing parameters stored in the parameter storage unit 12 is obtained by a learning process of the same type as the learning process performed using the teacher data stored in the training image storage unit 11. For example, if the trained image model stored in the parameter storage unit 12 is a trained model for determining the type of subject appearing in image data, the teacher data stored in the training image storage unit 11 is the teacher data used in the learning process for determining the type of subject appearing in image data.

制御部１３は、ＣＰＵ（Central Processing Unit）等のプロセッサーとメモリーとを用いて構成される。制御部１３は、プロセッサーが学習プログラムを実行することによって、歪み画像生成部１３１、原画像処理部１３２、歪み画像処理部１３３、中間類似ロス算出部１３４及び最適化部１３５として機能する。なお、制御部１３の各機能の全て又は一部は、ＡＳＩＣ（Application Specific Integrated Circuit）やＰＬＤ（Programmable Logic Device）やＦＰＧＡ（Field Programmable Gate Array）等のハードウェアを用いて実現されても良い。上記の学習プログラムは、コンピューター読み取り可能な記録媒体に記録されても良い。コンピューター読み取り可能な記録媒体とは、例えばフレキシブルディスク、光磁気ディスク、ＲＯＭ、ＣＤ－ＲＯＭ、半導体記憶装置（例えばＳＳＤ：Solid State Drive）等の可搬媒体、コンピューターシステムに内蔵されるハードディスクや半導体記憶装置等の記憶装置である。上記の学習プログラムは、電気通信回線を介して送信されてもよい。 The control unit 13 is configured using a processor such as a CPU (Central Processing Unit) and a memory. The control unit 13 functions as a distortion image generating unit 131, an original image processing unit 132, a distortion image processing unit 133, an intermediate similarity loss calculating unit 134, and an optimization unit 135 by the processor executing a learning program. All or part of the functions of the control unit 13 may be realized using hardware such as an ASIC (Application Specific Integrated Circuit), a PLD (Programmable Logic Device), or an FPGA (Field Programmable Gate Array). The above learning program may be recorded on a computer-readable recording medium. Examples of computer-readable recording media include portable media such as a flexible disk, a magneto-optical disk, a ROM, a CD-ROM, and a semiconductor storage device (e.g., an SSD: Solid State Drive), and storage devices such as a hard disk and a semiconductor storage device built into a computer system. The above learning program may be transmitted via a telecommunications line.

まず、各機能部の処理の概略について説明する。歪み画像生成部１３１は、学習画像記憶部１１から学習画像データを取得し、歪み画像処理を行うことによって歪み画像を生成する。歪み画像処理とは、何らかの歪みを画像に対して与える処理である。歪み画像処理は、例えばブラーを付与する処理であってもよいし、ノイズを付与する処理であってもよいし、JPEGの画像を生成する過程で行われる圧縮歪みを付与する処理であってもよい。 First, an overview of the processing of each functional unit will be described. The distorted image generation unit 131 acquires training image data from the training image storage unit 11, and generates a distorted image by performing distorted image processing. The distorted image processing is a process of imparting some kind of distortion to an image. The distorted image processing may be, for example, a process of imparting blur, a process of imparting noise, or a process of imparting compression distortion that is performed in the process of generating a JPEG image.

原画像処理部１３２は、学習画像記憶部１１から学習画像データを取得する。また、原画像処理部１３２は、パラメータ記憶部１２から画像処理パラメータを取得する。原画像処理部１３２は、取得された学習画像データに対して、画像処理パラメータにしたがって画像処理を行う。原画像処理部１３２が行う画像処理は、画像処理パラメータ（学習済みモデル）にしたがって行われる推論処理である。原画像処理部１３２は、画像処理パラメータにしたがった推論処理の中間処理結果を取得する。中間処理結果とは、例えば画像処理を行うＮＮの場合、いずれかの中間層の出力である。ＣＮＮの場合、画像を構成する特徴の一部の特徴への反応が中間層の出力として得られる場合が多い。例えば、中間処理結果は、出力層における出力ではなく、いずれかの隠れ層における出力（高次のニューロンの発火状態）を示してもよい。 The raw image processing unit 132 acquires training image data from the training image storage unit 11. The raw image processing unit 132 also acquires image processing parameters from the parameter storage unit 12. The raw image processing unit 132 performs image processing on the acquired training image data according to the image processing parameters. The image processing performed by the raw image processing unit 132 is an inference process performed according to the image processing parameters (trained model). The raw image processing unit 132 acquires an intermediate processing result of the inference process according to the image processing parameters. In the case of a NN that performs image processing, for example, the intermediate processing result is the output of one of the intermediate layers. In the case of a CNN, a response to some of the features that make up an image is often obtained as the output of the intermediate layer. For example, the intermediate processing result may indicate the output of one of the hidden layers (the firing state of a higher-order neuron) rather than the output of the output layer.

歪み画像処理部１３３は、学習画像記憶部１１から学習画像データを取得する。このとき、歪み画像処理部１３３は、画像データを取得せずに、正解ラベルのみを取得してもよい。また、歪み画像処理部１３３は、パラメータ記憶部１２から画像処理パラメータを取得する。歪み画像処理部１３３は、歪み画像生成部１３１によって取得された歪み画像に対して画像処理を行う。歪み画像処理部１３３が行う画像処理は、処理の対象となる画像が異なる点を除けば、原画像処理部１３２と同様の処理である。すなわち、原画像処理部１３２の処理対象となる画像は学習画像記憶部１１に記憶されている画像であるのに対し、歪み画像処理部１３３の処理対象となる画像は歪み画像生成部１３１によって同じ画像に対して歪み画像処理を行うことによって生成された歪み画像である。歪み画像処理部１３３は、歪み画像に対する画像処理を行うことによって中間処理結果を取得する。さらに、歪み画像処理部１３３は、画像処理によって得られる画像処理結果と正解ラベルとの差分を最小化するための画像処理ロスを算出する。 The distortion image processing unit 133 acquires the learning image data from the learning image storage unit 11. At this time, the distortion image processing unit 133 may acquire only the correct answer label without acquiring the image data. In addition, the distortion image processing unit 133 acquires the image processing parameters from the parameter storage unit 12. The distortion image processing unit 133 performs image processing on the distortion image acquired by the distortion image generation unit 131. The image processing performed by the distortion image processing unit 133 is the same as that of the original image processing unit 132, except that the image to be processed is different. That is, the image to be processed by the original image processing unit 132 is an image stored in the learning image storage unit 11, whereas the image to be processed by the distortion image processing unit 133 is a distortion image generated by performing distortion image processing on the same image by the distortion image generation unit 131. The distortion image processing unit 133 acquires an intermediate processing result by performing image processing on the distortion image. Furthermore, the distortion image processing unit 133 calculates an image processing loss for minimizing the difference between the image processing result obtained by the image processing and the correct answer label.

中間類似ロス算出部１３４は、原画像処理部１３２及び歪み画像処理部１３３から、それぞれで得られた中間処理結果を取得する。中間類似ロス算出部１３４は、得られた中間処理結果を用いて、中間類似ロスを算出する。中間類似ロスは、ＣＮＮの学習処理において誤差情報が伝播されない問題を解消するために用いられる情報である。中間類似ロスは、中間処理結果のニューロンの発火パターンを一致させるような情報である。中間類似ロスの具体例としては、原画像に対する画像処理で得られる中間処理結果O_cleanの発火パターン1(O_clean)を教師とした、歪み画像の中間処理結果O_distのsigmoid cross-entropy損失などが挙げられる。例えば、以下の式１で表されるcross-entropy Lが算出されてもよい。 The intermediate similarity loss calculation unit 134 obtains the intermediate processing results obtained from the original image processing unit 132 and the distorted image processing unit 133, respectively. The intermediate similarity loss calculation unit 134 calculates the intermediate similarity loss using the obtained intermediate processing results. The intermediate similarity loss is information used to solve the problem of error information not being propagated in the learning process of the CNN. The intermediate similarity loss is information that matches the firing patterns of neurons in the intermediate processing results. A specific example of the intermediate similarity loss is the sigmoid cross-entropy loss of the intermediate processing result O_dist of the distorted image, using the firing pattern 1 (O_clean) of the intermediate processing result O_clean obtained by image processing on the original image as the teacher. For example, the cross-entropy L expressed by the following formula 1 may be calculated.

L = -t log y - (1-t) log (1-y) ・・・（式１）
tは、1(O_clean)の一要素である。yは、sigmoid(O_dist)の一要素である。なお、中間類似ロスはこのような具体例に限定される必要は無い。中間類似ロスは、中間処理結果の発火パターンを一致させる損失であれば、どのような情報であってもよい。 L = -t log y - (1-t) log (1-y) ... (Equation 1)
t is an element of 1(O_clean). y is an element of sigmoid(O_dist). Note that the intermediate similarity loss does not need to be limited to this specific example. The intermediate similarity loss may be any information that matches the firing patterns of the intermediate processing results.

最適化部１３５は、歪み画像処理部１３３によって生成される画像処理ロスと、中間類似ロス算出部１３４によって生成される中間類似ロスと、を用いて、歪み画像処理部１３３から出力された画像処理パラメータを最適化の枠組みで更新する。更新された画像処理パラメータ（更新後画像処理パラメータ）は、学習装置１０における学習処理が継続される場合には歪み画像処理部１３３に入力される。一方、学習装置１０における学習処理が終了する場合には、更新後画像処理パラメータは学習後パラメータとして出力される。 The optimization unit 135 uses the image processing loss generated by the distorted image processing unit 133 and the intermediate similarity loss generated by the intermediate similarity loss calculation unit 134 to update the image processing parameters output from the distorted image processing unit 133 within an optimization framework. The updated image processing parameters (updated image processing parameters) are input to the distorted image processing unit 133 when the learning process in the learning device 10 continues. On the other hand, when the learning process in the learning device 10 ends, the updated image processing parameters are output as post-learning parameters.

以下、制御部１３の処理の流れについて説明する。図２は、制御部１３の処理の全体の流れの具体例を示す図である。まず、原画像処理部１３２及び歪み画像処理部１３３が、所定の画像処理に対する画像処理パラメータをパラメータ記憶部１２から取得する（ステップＳ００１）。歪み画像生成部１３１は、処理の対象となる学習画像データを学習画像記憶部１１から取得し、取得された画像に対して歪み画像処理を実行する（ステップＳ００２）。原画像処理部１３２は、学習画像記憶部１１から読み出された画像であって歪み画像処理が行われていない画像（原画像）に対し学習処理を行うことで、中間処理結果を取得する（ステップＳ００３）。歪み画像処理部１３３は、歪み画像処理が行われた画像（歪み画像）に対し学習処理を行うことで中間処理結果を取得し、さらに画像処理ロスを取得する（ステップＳ００４）。中間類似ロス算出部１３４は、同一の学習画像データにおける原画像の中間処理結果と歪み画像の中間処理結果とを用いて、中間類似ロスを算出する（ステップＳ００５）。最適化部１３５は、中間類似ロス及び画像処理ロスを用いて、画像処理パラメータを更新する（ステップＳ００６）。 The flow of processing by the control unit 13 will be described below. FIG. 2 is a diagram showing a specific example of the overall flow of processing by the control unit 13. First, the original image processing unit 132 and the distorted image processing unit 133 acquire image processing parameters for a predetermined image processing from the parameter storage unit 12 (step S001). The distorted image generating unit 131 acquires the learning image data to be processed from the learning image storage unit 11 and performs distorted image processing on the acquired image (step S002). The original image processing unit 132 acquires an intermediate processing result by performing a learning process on an image (original image) that has been read out from the learning image storage unit 11 and has not been subjected to distorted image processing (step S003). The distorted image processing unit 133 acquires an intermediate processing result by performing a learning process on an image (distorted image) that has been subjected to distorted image processing, and further acquires an image processing loss (step S004). The intermediate similarity loss calculation unit 134 calculates an intermediate similarity loss using the intermediate processing result of the original image and the intermediate processing result of the distorted image in the same learning image data (step S005). The optimization unit 135 updates the image processing parameters using the intermediate similarity loss and the image processing loss (step S006).

学習処理が終了しない場合には（ステップＳ００７－ＮＯ）、最適化部１３５は、更新後の画像処理パラメータを歪み画像処理部１３３に出力する（ステップ００８）。歪み画像処理部１３３が行う次の処理では、更新後の画像処理パラメータが用いられる。学習処理が終了する場合には（ステップＳ００７－ＹＥＳ）、最適化部１３５は、更新後の画像処理パラメータを学習後パラメータとして出力する（ステップ００９）。 If the learning process has not ended (step S007-NO), the optimization unit 135 outputs the updated image processing parameters to the distortion image processing unit 133 (step S008). The updated image processing parameters are used in the next process performed by the distortion image processing unit 133. If the learning process has ended (step S007-YES), the optimization unit 135 outputs the updated image processing parameters as post-learning parameters (step S009).

図３は、歪み画像生成部１３１の処理の具体例を示すフローチャートである。まず、歪み画像生成部１３１は、歪みなし、ノイズ、ブラーを含む歪み集合Ｄから、歪みｄを選択する（ステップＳ１０１）。この選択は、ランダムであってもよいし、所定の順番であってもよい。歪み集合Ｄは、他の歪みを含んでもよい。例えば、JPEGの圧縮歪みのような歪みが含まれてもよい。選択された歪みｄが“歪みなし”であれば（ステップＳ１０２－歪みなし）、歪み画像生成部１３１は学習画像をそのまま歪み画像として出力する（ステップＳ１０３）。 Figure 3 is a flowchart showing a specific example of the processing of the distorted image generating unit 131. First, the distorted image generating unit 131 selects a distortion d from a distortion set D that includes no distortion, noise, and blur (step S101). This selection may be random or in a predetermined order. The distortion set D may include other distortions. For example, it may include distortions such as JPEG compression distortion. If the selected distortion d is "no distortion" (step S102-no distortion), the distorted image generating unit 131 outputs the training image as is as a distorted image (step S103).

選択された歪みｄが“ノイズ”であれば（ステップＳ１０２－ノイズ）、歪み画像生成部１３１は、ノイズ強度を決定するパラメータｐを選択する（ステップＳ１０４）。パラメータｐは、例えば８ｂｉｔ深度の画像に対しては１０から５０程度の値であってもよい。撮像対象として想定されるノイズ強度に応じてパラメータｐが選択されてもよい。次に、歪み画像生成部１３１は、パラメータｐを標準偏差とするガウス分布から得られたノイズを学習画像に対して重畳することで歪み画像を生成する（ステップＳ１０５）。なお、撮像対象として想定されるノイズの種類に応じて、重畳されるノイズの種別が決定されてもよい。例えば、ごま塩ノイズのようなノイズが選択されてもよい。歪み画像生成部１３１は、生成された歪み画像を出力する（ステップＳ１０６）。 If the selected distortion d is "noise" (step S102-noise), the distortion image generating unit 131 selects a parameter p that determines the noise intensity (step S104). For example, the parameter p may be a value of about 10 to 50 for an image with 8-bit depth. The parameter p may be selected according to the noise intensity expected for the imaging target. Next, the distortion image generating unit 131 generates a distortion image by superimposing noise obtained from a Gaussian distribution with the parameter p as the standard deviation on the training image (step S105). Note that the type of noise to be superimposed may be determined according to the type of noise expected for the imaging target. For example, noise such as salt and pepper noise may be selected. The distortion image generating unit 131 outputs the generated distortion image (step S106).

選択された歪みｄが“ブラー”であれば（ステップＳ１０２－ブラー）、歪み画像生成部１３１は、ブラー処理の内容を選択する（ステップＳ１０７）。例えば、歪み画像生成部１３１は、ブラー強度を決定するパラメータｐと、ブラーカーネルのサイズｋを選択する。パラメータｐは、例えば１から５程度の値であってもよい。サイズｋは、例えばｋ＝４＊ｐ－１のような式によって算出されてもよい。これらの値は、撮像対象として想定されるブラー強度に応じて選択されてもよい。歪み画像生成部１３１は、パラメータｐを標準偏差とするｋ＊ｋサイズのガウシアンフィルタを作成し学習画像に対して畳み込む（ステップＳ１０８）。なお、撮像対象として想定されるブラーの種類に応じて、例えばモーションブラーのような異なるブラーが重畳されてもよい。歪み画像生成部１３１は、得られたブラー画像を歪み画像として出力する（ステップＳ１０９）。 If the selected distortion d is "blur" (step S102-blur), the distortion image generating unit 131 selects the content of the blur processing (step S107). For example, the distortion image generating unit 131 selects a parameter p that determines the blur strength and a size k of the blur kernel. The parameter p may be a value of, for example, about 1 to 5. The size k may be calculated by an equation such as k=4*p-1. These values may be selected according to the blur strength assumed for the imaging target. The distortion image generating unit 131 creates a Gaussian filter of size k*k with the parameter p as the standard deviation and convolves the learning image (step S108). Note that, depending on the type of blur assumed for the imaging target, a different blur such as motion blur may be superimposed. The distortion image generating unit 131 outputs the obtained blur image as a distortion image (step S109).

図４は、原画像処理部１３２の処理の具体例を示すフローチャートである。まず、原画像処理部１３２は、画像処理パラメータをパラメータ記憶部１２から取得する（ステップＳ２０１）。原画像処理部１３２は、学習画像記憶部１１から学習画像を取得する（ステップＳ２０２）。原画像処理部１３２は、取得された学習画像に対して、画像処理パラメータに基づく画像処理を実行する（ステップＳ２０３）。原画像処理部１３２は、画像処理の中間処理結果を中間類似ロス算出部１３４へ出力する（ステップＳ２０４）。 Figure 4 is a flowchart showing a specific example of processing by the original image processing unit 132. First, the original image processing unit 132 acquires image processing parameters from the parameter storage unit 12 (step S201). The original image processing unit 132 acquires training images from the training image storage unit 11 (step S202). The original image processing unit 132 executes image processing based on the image processing parameters for the acquired training images (step S203). The original image processing unit 132 outputs intermediate processing results of the image processing to the intermediate similarity loss calculation unit 134 (step S204).

図５は、歪み画像処理部１３３の処理の具体例を示すフローチャートである。まず、歪み画像処理部１３３は、画像処理パラメータをパラメータ記憶部１２から取得する（ステップＳ３０１）。歪み画像処理部１３３は、学習画像記憶部１１から正解ラベルを取得する（ステップＳ３０２）。歪み画像処理部１３３は、歪み画像生成部１３１によって生成された歪み画像を取得する（ステップＳ３０３）。 Figure 5 is a flowchart showing a specific example of processing by the distortion image processing unit 133. First, the distortion image processing unit 133 acquires image processing parameters from the parameter storage unit 12 (step S301). The distortion image processing unit 133 acquires a correct answer label from the training image storage unit 11 (step S302). The distortion image processing unit 133 acquires the distortion image generated by the distortion image generation unit 131 (step S303).

歪み画像処理部１３３は、取得された歪み画像に対し、画像処理パラメータに基づく画像処理を実行する（ステップ３０４）。歪み画像処理部１３３は、ステップＳ３０４の画像処理の実行によって取得された中間処理結果を中間類似ロス算出部１３４に出力する（ステップＳ３０５）。歪み画像処理部１３３は、画像処理結果xと正解ラベルyとの差分を小さくするような（例えば最小化するような）画像処理ロスを算出する（ステップＳ３０６）。画像処理ロスとして、例えば交差エントロピーL_dist(x',y)=-Σy_q log(x'_q)が用いられてもよいし、平均事情誤差が用いられてもよいし、他の目的関数が用いられてもよい。たとえば、実行される画像処理において適切なものであればどのような関数が用いられてもよい。歪み画像処理部１３３は、算出された画像処理ロスを最適化部１３５に出力する（ステップＳ３０７）。 The distortion image processing unit 133 executes image processing based on the image processing parameters on the acquired distortion image (step S304). The distortion image processing unit 133 outputs the intermediate processing result acquired by executing the image processing in step S304 to the intermediate similarity loss calculation unit 134 (step S305). The distortion image processing unit 133 calculates an image processing loss that reduces (e.g., minimizes) the difference between the image processing result x and the correct label y (step S306). As the image processing loss, for example, cross entropy L_dist(x',y)=-Σy_q log(x'_q) may be used, or the average circumstantial error may be used, or other objective functions may be used. For example, any function may be used as long as it is appropriate for the image processing to be executed. The distortion image processing unit 133 outputs the calculated image processing loss to the optimization unit 135 (step S307).

図６は、中間類似ロス算出部１３４の処理の具体例を示すフローチャートである。まず、中間類似ロス算出部１３４は、原画像処理部１３２及び歪み画像処理部１３３からそれぞれの中間処理結果を取得する（ステップＳ４０１）。中間類似ロス算出部１３４は、歪み画像の中間処理結果と原画像の中間処理結果とが類似するような中間類似ロスを算出する（ステップＳ４０２）。中間類似ロスは、誤差情報が伝播されない問題を解消するために用いられるものである。中間類似ロスには、中間処理結果のニューロンの発火パターンを一致させるような制約がかけられている。中間類似ロス算出部１３４は、算出された中間類似ロスを最適化部１３５へ出力する（ステップＳ４０３）。 Figure 6 is a flowchart showing a specific example of the processing of the intermediate similarity loss calculation unit 134. First, the intermediate similarity loss calculation unit 134 acquires the intermediate processing results from the original image processing unit 132 and the distorted image processing unit 133 (step S401). The intermediate similarity loss calculation unit 134 calculates an intermediate similarity loss that makes the intermediate processing result of the distorted image similar to the intermediate processing result of the original image (step S402). The intermediate similarity loss is used to solve the problem of error information not being propagated. The intermediate similarity loss is constrained to match the firing patterns of neurons in the intermediate processing results. The intermediate similarity loss calculation unit 134 outputs the calculated intermediate similarity loss to the optimization unit 135 (step S403).

図７は、最適化部１３５の処理の具体例を示すフローチャートである。まず、最適化部１３５は、算出された画像処理ロスを取得する（ステップＳ５０１）。最適化部１３５は、算出された中間類似ロスを取得する（ステップＳ５０２）。最適化部１３５は、歪み画像処理部１３３から画像処理パラメータを取得する（ステップＳ５０３）。最適化部１３５は、画像処理ロスと中間類似ロスとに基づいて、画像処理パラメータを更新する（ステップＳ５０４）。例えば、最適化部１３５は、画像処理ロスと中間類似ロスとを結合荷重λで線形結合することによって、画像処理パラメータを更新してもよい。この結合荷重λは、例えば１：０．１程度の比率であってもよい。結合荷重λは、損失関数全体の推移を見ながら人手でチューニングされてもよい。画像処理パラメータの更新には、例えばＳＧＤやＡｄａｍ等の確率的勾配降下法が用いられてもよい。画像処理パラメータの更新には、ニュートン法等の他の最適化アルゴリズムが用いられてもよい。 FIG. 7 is a flowchart showing a specific example of the processing of the optimization unit 135. First, the optimization unit 135 acquires the calculated image processing loss (step S501). The optimization unit 135 acquires the calculated intermediate similarity loss (step S502). The optimization unit 135 acquires image processing parameters from the distortion image processing unit 133 (step S503). The optimization unit 135 updates the image processing parameters based on the image processing loss and the intermediate similarity loss (step S504). For example, the optimization unit 135 may update the image processing parameters by linearly combining the image processing loss and the intermediate similarity loss with a coupling weight λ. This coupling weight λ may have a ratio of, for example, about 1:0.1. The coupling weight λ may be manually tuned while observing the transition of the entire loss function. For example, a stochastic gradient descent method such as SGD or Adam may be used to update the image processing parameters. For example, another optimization algorithm such as Newton's method may be used to update the image processing parameters.

最適化部１３５は、この繰り返しにおいて学習が終了するか否か判定する（ステップＳ５０５）。学習の終了判定は、どのような基準に基づいて行われてもよい。例えば、学習回数（繰り返し回数）が予め固定値として定められてもよい。例えば、損失関数に基づいた終了の条件が予め定められてもよい。学習が終了しない場合（ステップＳ５０５－ＮＯ）、最適化部１３５は、歪み画像処理部１３３に対し、更新後の画像処理パラメータを出力する（ステップＳ５０６）。学習が終了する場合（ステップＳ５０５－ＹＥＳ）、最適化部１３５は、更新後の画像処理パラメータを学習後パラメータとして出力する（ステップＳ５０７）。 The optimization unit 135 determines whether or not learning is to end in this repetition (step S505). The determination of the end of learning may be based on any criteria. For example, the number of times learning is performed (the number of repetitions) may be determined in advance as a fixed value. For example, a termination condition based on a loss function may be determined in advance. If learning is not to end (step S505-NO), the optimization unit 135 outputs the updated image processing parameters to the distortion image processing unit 133 (step S506). If learning is to end (step S505-YES), the optimization unit 135 outputs the updated image processing parameters as post-learning parameters (step S507).

このような処理により、歪み画像処理が施されていない画像（原画像）に対するＣＮＮの中間処理結果O_cleanと、歪み画像処理が施された画像（歪み画像）に対するＣＮＮの中間処理結果O_distとにおいて、それぞれの発火パターン1(O_clean)及び1(O_dist)とが一致するような学習処理が行われる。具体的には、1(O_clean)とO_distとのsigmoid cross-entropyで学習がなされてもよい。すなわち、O_distの反応をsigmoidで変換した際に、1(O_clean)と同じとなるように学習処理が進められる。そのため、原画像に対する反応と歪み画像に対する反応とがより近い反応になる。そのため、このような学習処理によって得られた学習結果（学習後パラメータ）を用いた学習処理では、歪みが生じた画像が入力されたとしても、歪みが生じていない場合に得られる推定結果とより近い値が得られる。 By this processing, a learning process is performed so that the firing patterns 1(O_clean) and 1(O_dist) of the CNN intermediate processing result O_clean for an image (original image) that has not been subjected to distortion image processing and the CNN intermediate processing result O_dist for an image (distorted image) that has been subjected to distortion image processing match. Specifically, learning may be performed using sigmoid cross-entropy between 1(O_clean) and O_dist. That is, the learning process is performed so that when the response of O_dist is converted by sigmoid, it becomes the same as 1(O_clean). Therefore, the response to the original image and the response to the distorted image become closer. Therefore, in a learning process using the learning result (post-learning parameters) obtained by such a learning process, even if a distorted image is input, a value closer to the estimated result obtained when no distortion occurs is obtained.

また、上述した処理により、隠れ層のニューロンが画像内のテクスチャや複数の境界領域が混合したような特徴に反応する事が期待される。そのような高次のニューロンに対して歪んだ画像と原画像とで同様の反応を示すように学習が行われる。これによって、高次のニューロンが反応する複雑な特徴が歪んでしまった場合においても、頑健性を示し、精度よく処理を行うことが可能となる。 In addition, the above-mentioned processing is expected to cause the neurons in the hidden layer to respond to features such as textures in the image and a mixture of multiple boundary regions. Such high-order neurons are trained to respond in the same way to distorted images and the original image. This makes it possible to demonstrate robustness and perform accurate processing even when the complex features to which the high-order neurons respond are distorted.

［第２実施形態］
図８は、本発明の第２実施形態の機能構成例を示す図である。学習装置２０は、パーソナルコンピューターやサーバー装置等の情報処理装置を用いて構成される。学習装置２０は、学習画像記憶部１１、パラメータ記憶部１２及び制御部２３を備える。制御部２３は、歪み画像生成部２３１、原画像処理部２３２、歪み画像処理部２３３、中間類似ロス算出部２３４、最適化部２３５及び重み調整部２３６として機能する。第２実施形態における歪み画像生成部２３１、原画像処理部２３２、歪み画像処理部２３３、中間類似ロス算出部２３４及び最適化部２３５は、それぞれ第１実施形態における歪み画像生成部１３１、原画像処理部１３２、歪み画像処理部１３３、中間類似ロス算出部１３４及び最適化部１３５と同様の処理を行う。 [Second embodiment]
8 is a diagram showing an example of a functional configuration of the second embodiment of the present invention. The learning device 20 is configured using an information processing device such as a personal computer or a server device. The learning device 20 includes a learning image storage unit 11, a parameter storage unit 12, and a control unit 23. The control unit 23 functions as a distortion image generation unit 231, an original image processing unit 232, a distortion image processing unit 233, an intermediate similarity loss calculation unit 234, an optimization unit 235, and a weight adjustment unit 236. The distortion image generation unit 231, the original image processing unit 232, the distortion image processing unit 233, the intermediate similarity loss calculation unit 234, and the optimization unit 235 in the second embodiment perform the same processes as the distortion image generation unit 131, the original image processing unit 132, the distortion image processing unit 133, the intermediate similarity loss calculation unit 134, and the optimization unit 135 in the first embodiment, respectively.

最適化部２３５で最適化学習に用いられる中間類似ロスは、画像処理ロスの低減に寄与する事が期待される。しかし、必ずしも画像処理ロスを低減させるとは言えない場合も存在する。例えば、ノイズやブラー等の歪みの影響が非常に大きい場合には、歪みなし画像を処理した際の反応パターンと一致させるよりも、反応パターンは一致させずに新たな学習パターンとした方が合理的な可能性がある。そのため、歪みが重度になると想定される場合は、最適化学習の際に画像処理ロスと中間類似ロスとの重みを適切に調整する必要がある。この重みの調整を、例えば以下の参考文献に記載の手法を基に行う。 The intermediate similarity loss used in optimization learning by the optimization unit 235 is expected to contribute to reducing image processing loss. However, there are cases in which it cannot necessarily be said to reduce image processing loss. For example, when the influence of distortion such as noise or blur is very large, it may be more reasonable to use a new learning pattern without matching the response pattern, rather than matching it with the response pattern when processing an undistorted image. Therefore, when distortion is expected to be severe, it is necessary to appropriately adjust the weights of the image processing loss and the intermediate similarity loss during optimization learning. This weight adjustment is performed based on the method described in the following reference, for example.

参考文献：Y. Du, W. M. Czarnecki, S. M. Jayakumar, R. Pascanu, B. Lakshminarayanan, “Adapting Auxiliary Losses Using Gradient Similarity”, 2018. Reference: Y. Du, W. M. Czarnecki, S. M. Jayakumar, R. Pascanu, B. Lakshminarayanan, “Adapting Auxiliary Losses Using Gradient Similarity”, 2018.

重み調整部２３６は、画像処理ロスと中間類似ロスとの重みを算出する。重み調整部２３６は、調整重みを最適化部２３５に出力する。最適化部２３５では、ステップＳ５０４における最適化学習の際に、調整重みを結合荷重λとして学習する。 The weight adjustment unit 236 calculates the weights of the image processing loss and the intermediate similarity loss. The weight adjustment unit 236 outputs the adjusted weights to the optimization unit 235. The optimization unit 235 learns the adjusted weights as the connection weight λ during the optimization learning in step S504.

図９は、重み調整部２３６の処理の具体例を示すフローチャートである。重み調整部２３６は、算出された画像処理ロスを取得する（ステップＳ６０１）。重み調整部２３６は、算出された中間類似ロスを取得する（ステップＳ６０２）。重み調整部２３６は、画像処理パラメータを取得する（ステップＳ６０３）。重み調整部２３６は、画像処理ロスと中間類似ロスとをそれぞれ画像処理パラメータの下で逆伝播することで勾配を得る（ステップＳ６０４）。このとき、一般的にＣＮＮの勾配情報の獲得には誤差逆伝播法が用いられるが、適切な勾配獲得手法であれば他の手法であってもよい。重み調整部２３６は、獲得された２つの勾配の類似度を算出する（ステップＳ６０５）。類似度の算出にはコサイン類似度が用いられてもよいし、他の類似度指標が用いられてもよい。重み調整部２３６は、算出された勾配の類似度を調整重みとして最適化部２３５へ出力する。 9 is a flowchart showing a specific example of the process of the weight adjustment unit 236. The weight adjustment unit 236 acquires the calculated image processing loss (step S601). The weight adjustment unit 236 acquires the calculated intermediate similarity loss (step S602). The weight adjustment unit 236 acquires the image processing parameters (step S603). The weight adjustment unit 236 obtains gradients by backpropagating the image processing loss and the intermediate similarity loss under the image processing parameters (step S604). At this time, the backpropagation method is generally used to acquire gradient information of the CNN, but other methods may be used as long as they are appropriate gradient acquisition methods. The weight adjustment unit 236 calculates the similarity between the two acquired gradients (step S605). Cosine similarity may be used to calculate the similarity, or other similarity indexes may be used. The weight adjustment unit 236 outputs the similarity of the calculated gradients to the optimization unit 235 as an adjustment weight.

このような処理が行われることによって、ノイズやブラー等の歪みの影響が非常に大きい場合にも、より適切な処理が行われ、画像処理の精度を向上させることが可能となる。 By performing this type of processing, more appropriate processing can be performed even when the effects of noise, blur, and other distortions are very large, making it possible to improve the accuracy of image processing.

［第３実施形態］
図１０は、本発明の第３実施形態の機能構成例を示す図である。画像処理装置３０は、パーソナルコンピューターやサーバー装置等の情報処理装置を用いて構成される。画像処理装置３０は、第１実施形態又は第２実施形態の学習装置によって生成された学習後パラメータを用いて画像処理を行う装置である。画像処理装置３０は、対象画像記憶部３１、パラメータ記憶部３２及び制御部３３を備える。 [Third embodiment]
10 is a diagram showing an example of a functional configuration of the third embodiment of the present invention. The image processing device 30 is configured using an information processing device such as a personal computer or a server device. The image processing device 30 is a device that performs image processing using post-learning parameters generated by the learning device of the first or second embodiment. The image processing device 30 includes a target image storage unit 31, a parameter storage unit 32, and a control unit 33.

対象画像記憶部３１は、磁気ハードディスク装置や半導体記憶装置等の記憶装置を用いて構成される。対象画像記憶部３１は、画像処理装置３０における画像処理の対象となる画像（対象画像）のデータを記憶する。 The target image storage unit 31 is configured using a storage device such as a magnetic hard disk device or a semiconductor storage device. The target image storage unit 31 stores data of an image (target image) that is to be subjected to image processing in the image processing device 30.

パラメータ記憶部３２は、磁気ハードディスク装置や半導体記憶装置等の記憶装置を用いて構成される。パラメータ記憶部３２は、学習装置１０又は学習装置２０によって予め実行された学習処理によって得られた学習済みモデルを示す画像処理パラメータ（学習後パラメータ）を記憶する。 The parameter storage unit 32 is configured using a storage device such as a magnetic hard disk device or a semiconductor storage device. The parameter storage unit 32 stores image processing parameters (post-learning parameters) that indicate a trained model obtained by a learning process previously executed by the learning device 10 or the learning device 20.

制御部３３は、ＣＰＵ等のプロセッサーとメモリーとを用いて構成される。制御部３３は、プロセッサーが画像処理プログラムを実行することによって、画像処理部３３１として機能する。なお、制御部３３の各機能の全て又は一部は、ＡＳＩＣやＰＬＤやＦＰＧＡ等のハードウェアを用いて実現されても良い。上記の画像処理プログラムは、コンピューター読み取り可能な記録媒体に記録されても良い。コンピューター読み取り可能な記録媒体とは、例えばフレキシブルディスク、光磁気ディスク、ＲＯＭ、ＣＤ－ＲＯＭ、半導体記憶装置（例えばＳＳＤ）等の可搬媒体、コンピューターシステムに内蔵されるハードディスクや半導体記憶装置等の記憶装置である。上記の画像処理プログラムは、電気通信回線を介して送信されてもよい。 The control unit 33 is configured using a processor such as a CPU and a memory. The control unit 33 functions as an image processing unit 331 by the processor executing an image processing program. All or part of the functions of the control unit 33 may be realized using hardware such as an ASIC, a PLD, or an FPGA. The image processing program may be recorded on a computer-readable recording medium. Examples of computer-readable recording media include portable media such as flexible disks, optical magnetic disks, ROMs, CD-ROMs, and semiconductor storage devices (e.g., SSDs), and storage devices such as hard disks and semiconductor storage devices built into a computer system. The image processing program may be transmitted via a telecommunications line.

画像処理部３３１は、パラメータ記憶部３２に記憶される画像処理パラメータ（学習後パラメータ）に基づいて、対象画像記憶部３１に記憶されている対象画像に対して画像処理を行う。画像処理部３３１が処理の対象とする対象画像は、必ずしも対象画像記憶部３１に記憶されているものに限定される必要は無い。例えば、画像処理部３３１は、他の情報処理装置から画像処理の要求とともに送信されてきた対象画像に対して画像処理を実行してもよい。 The image processing unit 331 performs image processing on the target image stored in the target image storage unit 31 based on the image processing parameters (post-learning parameters) stored in the parameter storage unit 32. The target images processed by the image processing unit 331 are not necessarily limited to those stored in the target image storage unit 31. For example, the image processing unit 331 may perform image processing on a target image transmitted from another information processing device together with a request for image processing.

画像処理部３３１は、画像処理の結果を取得すると、所定の装置に対して画像処理の結果を出力する。 When the image processing unit 331 obtains the results of the image processing, it outputs the results of the image processing to a specified device.

このように構成された第３実施形態の画像処理装置によれば、第１実施形態又は第２実施形態の学習装置によって得られた学習後パラメータを用いて画像処理を行うことで、歪み画像に頑強な画像処理を行うことが可能となる。 According to the image processing device of the third embodiment configured in this way, image processing is performed using the learned parameters obtained by the learning device of the first or second embodiment, making it possible to perform image processing that is robust against distorted images.

学習装置１０は、ネットワークを介して通信可能に接続された複数台の情報処理装置を用いて実装されてもよい。この場合、学習装置１０が備える各機能部は、複数の情報処理装置に分散して実装されてもよい。例えば、学習画像記憶部１１及びパラメータ記憶部１２と制御部１３とが異なる情報処理装置に実装されてもよい。また、制御部１３に実装される機能も複数の情報処理装置に分散して実装されてもよい。 The learning device 10 may be implemented using a plurality of information processing devices connected to each other so as to be able to communicate with each other via a network. In this case, each functional unit of the learning device 10 may be distributed and implemented in a plurality of information processing devices. For example, the learning image storage unit 11 and the parameter storage unit 12 may be implemented in different information processing devices from the control unit 13. In addition, the functions implemented in the control unit 13 may also be distributed and implemented in a plurality of information processing devices.

以上、この発明の実施形態について図面を参照して詳述してきたが、具体的な構成はこの実施形態に限られるものではなく、この発明の要旨を逸脱しない範囲の設計等も含まれる。 The above describes an embodiment of the present invention in detail with reference to the drawings, but the specific configuration is not limited to this embodiment, and includes designs that do not deviate from the gist of the present invention.

１０…学習装置，１１…学習画像記憶部，１２…パラメータ記憶部，１３，２３…制御部，１３１，２３１…歪み画像生成部，１３２，２３２…原画像処理部，１３３，２３３…歪み画像処理部，１３４，２３４…中間類似ロス算出部，１３５，２３５…最適化部，２３６…重み調整部 10...Learning device, 11...Learning image storage unit, 12...Parameter storage unit, 13, 23...Control unit, 131, 231...Distortion image generation unit, 132, 232...Original image processing unit, 133, 233...Distortion image processing unit, 134, 234...Intermediate similarity loss calculation unit, 135, 235...Optimization unit, 236...Weight adjustment unit

Claims

処理対象の画像である対象画像を取得する取得ステップと、
取得された対象画像に対して所定の画像処理を行うことで画像処理結果を得る画像処理ステップと、を有し、
前記画像処理では、歪みがない画像の画像を特徴付ける特徴情報と、歪みがある画像の画像を特徴付ける特徴情報と、を関連付けることで得られた画像処理情報が用いられており、
前記関連付けは、前記歪がない画像の画像を特徴付ける特徴情報のニューロンの発火状態と、前記歪がある画像における前記ニューロンと対応するニューロンの発火状態を一致させることであり、
前記画像処理情報は、正解ラベルと対応付けられている学習画像に対して所定の機械学習処理を行うことで得られた画像処理パラメータを用いて、前記歪みがない画像と、前記歪みがある画像と、のそれぞれに対し画像処理を実行した結果を用いて算出される画像処理ロスと、その中間処理結果を用いることで算出される中間類似ロスと、を含み、
前記画像処理ロス及び前記中間類似ロスを用いて、前記画像処理パラメータを更新し、
前記画像処理ロス及び前記中間類似ロスをそれぞれ前記画像処理パラメータの下で逆伝搬することでそれぞれの勾配を取得し、取得された２つの勾配の類似度を用いて前記画像処理パラメータを更新する、画像処理方法。 An acquisition step of acquiring a target image which is an image to be processed;
an image processing step of performing a predetermined image processing on the acquired target image to obtain an image processing result,
In the image processing, image processing information obtained by associating feature information that characterizes an image of an undistorted image with feature information that characterizes an image of a distorted image is used;
The association is to match a firing state of a neuron of feature information characterizing the undistorted image with a firing state of a neuron corresponding to the neuron in the distorted image;
The image processing information includes an image processing loss calculated using results of performing image processing on each of the undistorted image and the distorted image using image processing parameters obtained by performing a predetermined machine learning process on a learning image associated with a correct label, and an intermediate similarity loss calculated using the intermediate processing results;
updating the image processing parameters using the image processing loss and the intermediate similarity loss;
The image processing method includes backpropagating the image processing loss and the intermediate similarity loss under the image processing parameters to obtain respective gradients, and updating the image processing parameters using the similarity between the two obtained gradients .

前記関連付けは、高次のニューロンの発火状態を一致させる処理である、
請求項１に記載の画像処理方法。 The association is a process of matching the firing states of higher-order neurons.
The image processing method according to claim 1 .

歪みがない画像の画像を特徴付ける特徴情報と、歪みがある画像の画像を特徴付ける特徴情報と、を関連付けることで得られた画像処理情報を記憶する記憶部と、
処理対象の画像である対象画像を取得し、取得された対象画像に対して、前記画像処理情報を用いて所定の画像処理を行うことで画像処理結果を得る制御部と、を有し、
前記関連付けは、前記歪がない画像の画像を特徴付ける特徴情報のニューロンの発火状態と、前記歪がある画像における前記ニューロンと対応するニューロンの発火状態を一致させることであり、
前記画像処理情報は、正解ラベルと対応付けられている学習画像に対して所定の機械学習処理を行うことで得られた画像処理パラメータを用いて、前記歪みがない画像と、前記歪みがある画像と、のそれぞれに対し画像処理を実行した結果を用いて算出される画像処理ロスと、その中間処理結果を用いることで算出される中間類似ロスと、を含み、
前記制御部は、前記画像処理ロス及び前記中間類似ロスを用いて、前記画像処理パラメータを更新し、
前記制御部は、前記画像処理ロス及び前記中間類似ロスをそれぞれ前記画像処理パラメータの下で逆伝搬することでそれぞれの勾配を取得し、取得された２つの勾配の類似度を用いて前記画像処理パラメータを更新する、画像処理装置。 a storage unit that stores image processing information obtained by associating characteristic information that characterizes an image without distortion with characteristic information that characterizes an image with distortion;
a control unit that acquires a target image that is an image to be processed, and performs a predetermined image processing on the acquired target image using the image processing information to obtain an image processing result;
The association is to match a firing state of a neuron of feature information characterizing the undistorted image with a firing state of a neuron corresponding to the neuron in the distorted image;
The image processing information includes an image processing loss calculated using results of performing image processing on each of the undistorted image and the distorted image using image processing parameters obtained by performing a predetermined machine learning process on a learning image associated with a correct label, and an intermediate similarity loss calculated using the intermediate processing results;
The control unit updates the image processing parameters using the image processing loss and the intermediate similarity loss;
The control unit obtains respective gradients by backpropagating the image processing loss and the intermediate similarity loss under the image processing parameters, and updates the image processing parameters using a similarity between the two obtained gradients .

請求項１又は請求項２に記載の画像処理方法をコンピューターに実行させるためのプログラム。 A program for causing a computer to execute the image processing method according to claim 1 or 2 .