WO2024043109A1

WO2024043109A1 - Image processing method, image processing device, and program

Info

Publication number: WO2024043109A1
Application number: PCT/JP2023/029203
Authority: WO
Inventors: 雪乃大野
Original assignee: キヤノン株式会社
Priority date: 2022-08-26
Filing date: 2023-08-10
Publication date: 2024-02-29
Also published as: JP2024031629A

Abstract

This image processing method includes: a step for acquiring a first latent variable on the basis of a first image; a step for acquiring a second latent variable different from the first latent variable on the basis of the first latent variable; a step for generating a second image by inputting the second latent variable into a first machine learning model; a step for acquiring a third latent variable different from the second latent variable on the basis of the second image; a step for acquiring a fourth latent variable different from the third latent variable on the basis of the third latent variable; and a step for generating a third image by inputting the fourth latent variable into the first machine learning model.

Description

画像処理方法、画像処理装置、及びプログラムImage processing method, image processing device, and program

　本発明は、機械学習モデルを用いた画像処理方法に関する。 The present invention relates to an image processing method using a machine learning model.

　画像処理の対象となる画像（対象画像）に基づいて取得された潜在変数を操作して機械学習モデルに入力することで、画像における任意の特徴値を編集する画像処理方法が知られている。機械学習モデルは、敵対的生成ネットワーク（ＧＡＮ：Ｇｅｎｅｒａｔｉｖｅ　Ａｄｖｅｒｓａｒｉａｌ　Ｎｅｔｗｏｒｋ）を用いて生成される。 An image processing method is known that edits arbitrary feature values in an image by manipulating latent variables acquired based on an image to be processed (target image) and inputting the manipulated data to a machine learning model. A machine learning model is generated using a generative adversarial network (GAN).

　非特許文献１には、対象画像に対応する潜在変数のうち、機械学習モデルの学習に用いた画像の特徴値が最も高密度に分布する位置に存在する潜在変数を用いて対象画像の編集を行う方法が開示されている。 Non-Patent Document 1 describes that, among the latent variables corresponding to the target image, the target image is edited using a latent variable that exists in the position where the feature values of the image used for learning the machine learning model are most densely distributed. A method for doing so is disclosed.

　しかしながら、非特許文献１に開示された画像処理方法では、特徴値を大きく編集することで、編集後の画像に弊害（アーティファクト）が発生しやすくなる。 However, in the image processing method disclosed in Non-Patent Document 1, by editing the feature values to a large extent, harmful effects (artifacts) are likely to occur in the edited image.

　そこで本発明は、機械学習モデルを用いて弊害の少ない画像を生成することを目的とする。 Therefore, the present invention aims to generate images with fewer harmful effects using a machine learning model.

　本発明の画像処理方法は、第１の画像に基づいて第１の潜在変数を取得するステップと、第１の潜在変数に基づいて第１の潜在変数と異なる第２の潜在変数を取得するステップとを有する。また、第２の潜在変数を第１の機械学習モデルに入力することで、第２の画像を生成するステップと、第２の画像に基づいて第２の潜在変数とは異なる第３の潜在変数を取得するステップとを有する。また、第３の潜在変数に基づいて第３の潜在変数とは異なる第４の潜在変数を取得するステップと、第４の潜在変数を第１の機械学習モデルに入力することで、第３の画像を生成するステップとを有する。 The image processing method of the present invention includes a step of obtaining a first latent variable based on a first image, and a step of obtaining a second latent variable different from the first latent variable based on the first latent variable. and has. Further, the step of generating a second image by inputting the second latent variable into the first machine learning model, and the step of generating a third latent variable different from the second latent variable based on the second image. and a step of obtaining. In addition, the step of obtaining a fourth latent variable different from the third latent variable based on the third latent variable, and inputting the fourth latent variable into the first machine learning model, and generating an image.

　本発明によれば、機械学習モデルを用いて弊害の少ない画像を生成することができる。 According to the present invention, an image with fewer harmful effects can be generated using a machine learning model.

潜在空間を示す概略図である。FIG. 2 is a schematic diagram showing a latent space. 実施例１における画像処理システムのブロック図である。1 is a block diagram of an image processing system in Example 1. FIG. 実施例１における画像処理システムの外観図である。1 is an external view of an image processing system in Example 1. FIG. 実施例１における潜在変数の遷移を示す図である。3 is a diagram showing transitions of latent variables in Example 1. FIG. 実施例１における推定フェーズに関するフローチャートである。7 is a flowchart regarding the estimation phase in Example 1. FIG. 第１の機械学習モデルの学習フェーズに関するフローチャートである。5 is a flowchart regarding the learning phase of the first machine learning model. 第１の機械学習モデルの学習フェーズの流れを示す図である。FIG. 3 is a diagram showing the flow of the learning phase of the first machine learning model. 実施例１における第１の潜在変数の生成のフローチャートである。7 is a flowchart of generation of a first latent variable in Example 1. FIG. 実施例２における画像処理システムのブロック図である。3 is a block diagram of an image processing system in Example 2. FIG. 実施例２における潜在変数の遷移を示す図である。7 is a diagram showing transitions of latent variables in Example 2. FIG. 第２の機械学習モデルの学習フェーズに関するフローチャートである。7 is a flowchart regarding the learning phase of the second machine learning model. 第２の機械学習モデルの学習フェーズの流れを示す図である。FIG. 6 is a diagram showing the flow of the learning phase of the second machine learning model. 実施例３における画像処理システムのブロック図である。3 is a block diagram of an image processing system in Example 3. FIG. 実施例３における推定フェーズに関するフローチャートである。7 is a flowchart regarding the estimation phase in Example 3.

　以下、本発明の実施形態について、図面を参照しながら詳細に説明する。各図において、同一の部材については同一の参照番号を付し、重複する説明は省略する。 Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings. In each figure, the same reference numerals are given to the same members, and duplicate explanations will be omitted.

　まず、各実施例の具体的な説明を行う前に、本発明の実施形態の要旨を説明する。本実施形態における画像処理方法では、敵対的生成ネットワーク（ＧＡＮ：Ｇｅｎｅｒａｔｉｖｅ　Ａｄｖｅｒｓａｒｉａｌ　Ｎｅｔｗｏｒｋｓ）を用いて生成された機械学習モデルを用いて画像における特徴値（属性）を編集することで推定画像を生成する。ＧＡＮには、多次元のテンソルである潜在変数が入力される。また、潜在変数が存在する空間を潜在空間と称する。また、画像に基づいて潜在変数を取得する処理を潜在空間への埋め込み（反転）と称する。潜在空間において、任意の特徴値の変化に対応する向きが存在し、その向きのベクトルを用いて特徴値が編集された画像に対応する潜在変数を取得（算出）する。ベクトルを用いて特徴値が編集された画像に対応する潜在変数を取得する処理を潜在変数の操作と称する。特徴値は、画像における特徴を示す値（量）であり、例えば人の顔の画像を編集する場合、特徴は例えば表情、顔の向き、年齢、性別、髪型などである。操作された潜在変数を機械学習モデルに入力することで画像を生成することができる。本実施形態における弊害は、アーティファクト（偽構造）が出現することである。アーティファクトは画像の写実性を低減させ知覚的に違和感を与える歪みであり、例えば、編集後の画像に意図せず発生したしみである。 First, before giving a specific explanation of each example, the gist of the embodiment of the present invention will be explained. In the image processing method in this embodiment, an estimated image is generated by editing feature values (attributes) in an image using a machine learning model generated using a generative adversarial network (GAN). A latent variable that is a multidimensional tensor is input to the GAN. Further, the space in which latent variables exist is called a latent space. Furthermore, the process of acquiring latent variables based on an image is referred to as embedding (inversion) in a latent space. In the latent space, there is a direction corresponding to a change in an arbitrary feature value, and a vector of that direction is used to obtain (calculate) a latent variable corresponding to an image whose feature values have been edited. The process of acquiring latent variables corresponding to images whose feature values have been edited using vectors is called latent variable manipulation. A feature value is a value (amount) indicating a feature in an image. For example, when editing an image of a person's face, the feature includes, for example, facial expression, facial orientation, age, gender, hairstyle, and the like. An image can be generated by inputting the manipulated latent variables into a machine learning model. A disadvantage of this embodiment is that artifacts (false structures) appear. Artifacts are distortions that reduce the realism of an image and give a perceptually strange feeling, and are, for example, stains that are unintentionally generated in an edited image.

　本実施形態に係る機械学習モデル（第１の機械学習モデル）は、ＧＡＮによって生成される。ＧＡＮは、潜在変数（ノイズ）に基づいて画像を生成する生成器と、生成器が作成した画像（フェイク画像）か正解画像（リアル画像）かを識別する識別器とを有する。また、生成器は、識別器が識別を間違えるような画像を生成するために、識別器における識別の結果に基づいて学習する。また、識別器はリアル画像と生成器が生成したフェイク画像とを識別できるように学習する。例えば機械学習モデルを生成するためにＳｔｙｌｅＧＡＮやＳｔｙｌｅＧＡＮ２、ＳｔｙｌｅＧＡＮ３を用いてもよい。 The machine learning model (first machine learning model) according to this embodiment is generated by GAN. A GAN includes a generator that generates an image based on latent variables (noise), and a classifier that identifies whether the image created by the generator is a fake image (fake image) or a correct image (real image). Further, the generator learns based on the classification result of the classifier in order to generate an image that the classifier misidentifies. The classifier also learns to distinguish between real images and fake images generated by the generator. For example, StyleGAN, StyleGAN2, and StyleGAN3 may be used to generate a machine learning model.

　本実施形態におけるＧＡＮの生成器は、マッピングネットワーク（Ｍａｐｐｉｎｇ　ｎｅｔｗｏｒｋ）と、合成ネットワーク（Ｓｙｎｔｈｅｓｉｓ　ｎｅｔｗｏｒｋ）とを有する。マッピングネットワークは、初期潜在空間を写像潜在空間に非線形変換することで、初期潜在変数に基づいて写像潜在変数を生成する。初期潜在空間及び写像潜在空間の詳細は後述する。また、合成ネットワークは、生成する画像の解像度に応じた個数に複製された写像潜在変数に基づいて画像を生成する。合成ネットワークを用いて解像度が１０２４ｐｘ×１０２４ｐｘの画像を生成するとき、例えば５１２次元の写像潜在変数を複製することで１８個とし、１８個の写像潜在変数を合成ネットワークに入力することで画像を生成する。 The GAN generator in this embodiment includes a mapping network and a synthesis network. The mapping network generates mapped latent variables based on the initial latent variables by nonlinearly transforming the initial latent space into a mapped latent space. Details of the initial latent space and the mapping latent space will be described later. Further, the synthesis network generates an image based on the mapped latent variables that are duplicated in a number corresponding to the resolution of the image to be generated. When generating an image with a resolution of 1024px x 1024px using a synthesis network, for example, by duplicating a 512-dimensional mapping latent variable, it becomes 18, and by inputting the 18 mapping latent variables to the synthesis network, the image is generated. do.

　初期潜在変数は任意の次元数を有すテンソルであるため、例えばガウス分布などに基づいてサンプリングされるテンソルを初期潜在変数として用いることができる。初期潜在変数が存在する空間が初期潜在空間であり、初期潜在空間に含まれる次元と、機械学習モデルの学習に用いた画像の特徴値との間の相関は低い。一方で、マッピングネットワークによって得られた写像潜在空間に含まれる次元と、機械学習モデルの学習した画像の特徴値との間には、高い相関がある。写像潜在空間は、第１の機械学習モデルが学習に用いた画像に基づいて得られた写像潜在変数が存在する領域である。写像潜在空間において複数の写像潜在変数によって定められる重心に近い潜在変数ほど、潜在変数の操作によって大きく特徴値を編集した場合でも編集後の画像に弊害が発生する確率が小さい。 Since the initial latent variable is a tensor with an arbitrary number of dimensions, for example, a tensor sampled based on a Gaussian distribution can be used as the initial latent variable. The space in which the initial latent variables exist is the initial latent space, and the correlation between the dimensions included in the initial latent space and the feature values of the images used for learning the machine learning model is low. On the other hand, there is a high correlation between the dimensions included in the mapping latent space obtained by the mapping network and the feature values of the image learned by the machine learning model. The mapping latent space is an area in which mapping latent variables obtained based on images used for learning by the first machine learning model exist. The closer a latent variable is to the center of gravity defined by a plurality of mapping latent variables in the mapping latent space, the lower the probability that an adverse effect will occur in the edited image even if the feature value is significantly edited by manipulating the latent variables.

　また、潜在変数に基づいて画像を生成する際、中間潜在空間に存在する中間潜在変数又は拡張潜在空間に存在する拡張潜在変数を用いてもよい。写像潜在空間を拡張したものが中間潜在空間であり、中間潜在空間さらに拡張したものが拡張潜在空間である。中間潜在変数及び拡張潜在変数には、複数の写像潜在変数によって定められる重心から離れた潜在変数が含まれる。複数の写像潜在変数によって定められる重心から離れた潜在変数に基づいて生成される画像ほど、弊害が発生する確率が高い。一方で、複数の写像潜在変数によって定められる重心から離れた潜在変数に基づいて生成される画像ほど、第１の機械学習モデルの学習に用いた画像にあまり含まれない特徴値を有する画像を生成することができる。複数の写像潜在変数によって定められる重心から離れた潜在変数を用いることで、学習画像が十分に得られない（第１の機械学習モデルが学習に用いた画像の特徴値が低密度である）場合においても精度よく画像を編集することが可能である。 Furthermore, when generating an image based on latent variables, intermediate latent variables existing in the intermediate latent space or extended latent variables existing in the extended latent space may be used. An extension of the mapping latent space is an intermediate latent space, and an extension of the intermediate latent space is an extended latent space. The intermediate latent variables and extended latent variables include latent variables that are distant from the center of gravity defined by the plurality of mapped latent variables. The farther an image is generated based on a latent variable from the center of gravity defined by a plurality of mapping latent variables, the higher the probability that an adverse effect will occur. On the other hand, the farther an image is generated based on a latent variable from the center of gravity determined by a plurality of mapped latent variables, the more the image has feature values that are less included in the image used for learning the first machine learning model. can do. When a sufficient learning image cannot be obtained by using a latent variable located far from the center of gravity determined by multiple mapping latent variables (the feature values of the image used for learning by the first machine learning model are low density) It is also possible to edit images with high precision.

　具体的には合成ネットワークを用いて解像度が１０２４ｐｘ×１０２４ｐｘの画像を生成するとき、中間潜在変数を用いる場合、例えば５１２次元の中間潜在変数を複製することで１８個とし、１８個の中間潜在変数を合成ネットワークに入力することで画像を生成する。ただし、中間潜在変数は写像潜在変数とは異なる潜在変数である。また、拡張潜在変数を用いる場合、例えば１８種類の５１２次元のテンソルに基づいて得られた５１２×１８次元のテンソルである拡張潜在変数を合成ネットワークに入力する。 Specifically, when using intermediate latent variables to generate an image with a resolution of 1024px x 1024px using a synthesis network, for example, by duplicating a 512-dimensional intermediate latent variable to 18, An image is generated by inputting it into a synthesis network. However, the intermediate latent variable is a latent variable different from the mapping latent variable. Further, when using an extended latent variable, the extended latent variable, which is a 512×18-dimensional tensor obtained based on, for example, 18 types of 512-dimensional tensors, is input to the synthesis network.

　ここで図１を参照して、潜在空間について説明する。図１は、潜在空間の一部を模式的に示したグラフである。なお、１８種類のテンソル（拡張潜在変数）のうち写像潜在変数及び中間潜在変数と同じ次元を有する任意の二つのテンソルを第１及び第２の拡張潜在変数とする。図１のＸは第１の拡張潜在変数であり、Ｙは第２の拡張潜在変数である。なお、Ｙ＝Ｘの直線上に存在する黒点は写像潜在変数を示し、Ｙ＝Ｘの直線上において写像潜在変数が存在する領域が写像潜在空間である。また、等高線は第１の機械学習モデルが学習した画像の特徴値の密度を示し、色の濃い領域ほど特徴値の密度が高いことを示す。したがって、潜在空間において第１の機械学習モデルが学習した画像の特徴値がより高密度な位置に存在する潜在変数に基づいて生成される画像ほど大きく特徴値を編集した場合でも編集後の画像に弊害が発生する確率が小さい。つまり、複数の写像潜在変数によって定められる重心の近くに存在する潜在変数に基づいて画像を生成することで、弊害の少ない画像を取得することができる。さらに、中間潜在空間は図１において、Ｙ＝Ｘの直線上に示される直線である。 Here, the latent space will be explained with reference to FIG. FIG. 1 is a graph schematically showing a part of the latent space. Note that among the 18 types of tensors (extended latent variables), arbitrary two tensors having the same dimensions as the mapping latent variable and the intermediate latent variable are defined as the first and second expanded latent variables. In FIG. 1, X is the first expanded latent variable, and Y is the second expanded latent variable. Note that the black dots existing on the straight line of Y=X indicate mapping latent variables, and the area where the mapping latent variables exist on the straight line of Y=X is the mapping latent space. Furthermore, the contour lines indicate the density of the feature values of the image learned by the first machine learning model, and the darker the area, the higher the density of the feature values. Therefore, even if the feature values of the image generated based on the latent variables that exist in the higher density positions of the image learned by the first machine learning model in the latent space are edited, the edited image will not change. The probability that harm will occur is small. That is, by generating an image based on latent variables existing near the center of gravity defined by a plurality of mapping latent variables, it is possible to obtain an image with fewer harmful effects. Furthermore, the intermediate latent space is a straight line shown on the Y=X straight line in FIG.

　本実施形態において、原画像（第１の画像）に基づいて潜在変数（第１の潜在変数）を取得し、第１の潜在変数を操作すること潜在変数（第２の潜在変数）を取得し、第２の潜在変数を機械学習モデルに入力することで画像（第２の画像）を生成する。また、第２の画像）に基づいて潜在変数（第３の潜在変数）を取得し、第３の潜在変数を操作すること潜在変数（第４の潜在変数）を取得し、第４の潜在変数を機械学習モデルに入力することで画像（第３の画像）を生成する。このとき、第３の潜在変数は、第２の潜在変数に対して、複数の写像潜在変数によって定められる重心の近くに存在している。 In this embodiment, a latent variable (first latent variable) is acquired based on the original image (first image), and a latent variable (second latent variable) is acquired by manipulating the first latent variable. , an image (second image) is generated by inputting the second latent variable to the machine learning model. Also, by obtaining a latent variable (third latent variable) based on the second image) and manipulating the third latent variable, obtaining a latent variable (fourth latent variable), An image (third image) is generated by inputting this into a machine learning model. At this time, the third latent variable exists near the center of gravity defined by the plurality of mapping latent variables with respect to the second latent variable.

　このように、画像の編集において潜在変数の操作を複数回に分け、潜在変数の操作する工程と、操作された潜在変数に基づいて画像を生成する工程と、生成された画像を潜在空間への埋め込む工程とを順番に行う。このような構成とすることで、第１の機械学習モデルを用いて第１の画像から特徴値を大きく編集した上で弊害の少ない画像を生成することができる。 In this way, when editing an image, the operation of the latent variable is divided into multiple steps: the process of manipulating the latent variable, the process of generating an image based on the manipulated latent variable, and the process of transferring the generated image to the latent space. The steps of embedding are performed in order. With such a configuration, it is possible to greatly edit the feature values from the first image using the first machine learning model, and then generate an image with fewer harmful effects.

　［実施例１］
　図２及び図３を参照して、実施例１に係る画像処理システム１００に関して説明する。図２は、本実施例における画像処理システム１００のブロック図である。図３は、画像処理システム１００の外観図である。 [Example 1]
The image processing system 100 according to the first embodiment will be described with reference to FIGS. 2 and 3. FIG. 2 is a block diagram of the image processing system 100 in this embodiment. FIG. 3 is an external view of the image processing system 100.

　画像処理システム１００は、学習装置１０１、画像処理装置（画像推定装置）１０２、表示装置１０３、記録媒体１０４、出力装置１０５、及びネットワーク１０６を有する。学習装置１０１及び画像処理装置１０２は、ネットワーク１０６を介して互いに通信可能である。 The image processing system 100 includes a learning device 101, an image processing device (image estimation device) 102, a display device 103, a recording medium 104, an output device 105, and a network 106. The learning device 101 and the image processing device 102 can communicate with each other via the network 106.

　学習装置（第１の学習装置）１０１は、記憶部１０１ａ、取得部１０１ｂ、生成部１０１ｃ、及び更新部１０１ｄを有し、第１の機械学習モデルのウエイトを決定する。 The learning device (first learning device) 101 includes a storage section 101a, an acquisition section 101b, a generation section 101c, and an updating section 101d, and determines the weight of the first machine learning model.

　画像処理装置１０２は、記憶部１０２ａ、取得部１０２ｂ、変換部１０２ｃ、生成部１０２ｄ、及び推定部１０２ｅを有し、第１機械学習モデルを用いて推定画像（出力画像）を生成する。画像処理装置１０２による推定画像の生成は、１以上のＣＰＵ等のプロセッサ（処理手段）によりその機能を実装することができる。 The image processing device 102 includes a storage unit 102a, an acquisition unit 102b, a conversion unit 102c, a generation unit 102d, and an estimation unit 102e, and generates an estimated image (output image) using a first machine learning model. The function of generating an estimated image by the image processing device 102 can be implemented by one or more processors (processing means) such as a CPU.

　出力画像は、表示装置１０３、記録媒体１０４、又は出力装置１０５の少なくとも１つに出力される。表示装置１０３は、液晶ディスプレイやプロジェクタなどである。ユーザは表示装置１０３を介して、処理途中の画像を確認しながら編集作業などを行うことができる。記録媒体１０４は半導体メモリ、ハードディスク、若しくはネットワーク上のサーバなどであり、出力画像を保存する。出力装置１０５は、プリンタなどである。なお、例えばマウスやキーボードなどの入力装置は不図示である。 The output image is output to at least one of the display device 103, the recording medium 104, or the output device 105. The display device 103 is a liquid crystal display, a projector, or the like. The user can edit the image while checking the image being processed through the display device 103. The recording medium 104 is a semiconductor memory, a hard disk, a server on a network, or the like, and stores the output image. The output device 105 is a printer or the like. Note that input devices such as a mouse and a keyboard are not shown.

　次に、図４及び図５を参照して本実施例の推定フェーズの流れに関して述べる。図４は、図１と同様の潜在空間の一部を模式的に示したグラフにおいて、本実施例における潜在変数の挙動を示す。図５は、推定フェーズに関するフローチャートである。 Next, the flow of the estimation phase of this embodiment will be described with reference to FIGS. 4 and 5. FIG. 4 is a graph schematically showing a part of the latent space similar to FIG. 1, and shows the behavior of the latent variables in this example. FIG. 5 is a flowchart regarding the estimation phase.

　まず、ステップＳ１０１において、取得部１０２ｂは原画像（第１の画像）を取得する。なお、第１の画像は、記憶部１０２ａにあらかじめ保存されている画像でもよい。必要に応じて第１の画像は、前処理されていてもよい。第１の画像に施される前処理は、特徴値に対する補正である。例えば、第１の画像が人の顔の画像である場合、その人の顔の目、口、鼻などの主要な器官の位置の調整（補正）である。以下、本実施例において、人の顔における年齢を第１の特徴値とする。 First, in step S101, the acquisition unit 102b acquires an original image (first image). Note that the first image may be an image stored in advance in the storage unit 102a. The first image may be preprocessed if necessary. The preprocessing performed on the first image is correction of the feature values. For example, when the first image is an image of a person's face, the positions of major organs such as the eyes, mouth, and nose of the person's face are adjusted (corrected). Hereinafter, in this embodiment, the age of a person's face will be used as the first feature value.

　ステップＳ１０２において、変換部１０２ｃは第１の画像を第１の潜在変数に変換する。本実施例において、第１の潜在変数は、第１の機械学習モデルを用いた逆解析によって取得される。ただし、これに限定されない。本実施例におけるステップＳ１０２では、後述する第１の機械学習モデルを用いた逆解析によって、第１の機械学習モデルの学習に用いた画像の特徴値が高密度に分布する位置に埋め込むこととで、第１の画像に基づいて第１の潜在変数を取得する。このような構成とすることで、弊害の発生する確率が低い第１の潜在変数を取得することができる。なお、本実施例において、第１の潜在変数は、中間潜在変数である。 In step S102, the conversion unit 102c converts the first image into a first latent variable. In this example, the first latent variable is obtained by inverse analysis using the first machine learning model. However, it is not limited to this. In step S102 in this embodiment, feature values of the image used for learning the first machine learning model are embedded in positions where they are densely distributed by inverse analysis using the first machine learning model described later. , obtain a first latent variable based on the first image. With such a configuration, it is possible to obtain a first latent variable with a low probability of causing an adverse effect. Note that in this embodiment, the first latent variable is an intermediate latent variable.

　ステップＳ１０３において、生成部１０２ｄは第１の潜在変数に基づいて第２の潜在変数を生成する。第２の潜在変数は、第１の潜在変数を操作することで取得される。このように取得された第２の潜在変数は、図４に示すように潜在空間において第１の潜在変数に対して第１の機械学習モデルの学習に用いた画像の特徴値が低密度に分布する位置に存在する。なお、本実施例において第２の潜在変数は、中間潜在変数である。ただし、これに限定されない。例えば中間潜在変数を複製することで得られた複数の潜在変数が合成ネットワークに入力された際、複数の潜在変数に互いに異なるアフィン変換が施されることで生成される潜在変数（スタイル潜在変数）を用いてもよい。その場合第１の特徴値をスタイル潜在空間において遷移させた潜在変数を算出する。また、拡張潜在変数を用いてもよい。 In step S103, the generation unit 102d generates a second latent variable based on the first latent variable. The second latent variable is obtained by manipulating the first latent variable. The second latent variable obtained in this way has a low density distribution of feature values of the image used for learning the first machine learning model with respect to the first latent variable in the latent space, as shown in Figure 4. be in a position to do so. Note that in this embodiment, the second latent variable is an intermediate latent variable. However, it is not limited to this. For example, when multiple latent variables obtained by duplicating intermediate latent variables are input into a synthesis network, latent variables (style latent variables) are generated by applying different affine transformations to the multiple latent variables. may also be used. In that case, a latent variable is calculated by transitioning the first feature value in the style latent space. Also, extended latent variables may be used.

　本実施例において、潜在変数の操作は、潜在変数の分離超平面の法線方向への遷移に基づいて算出される。分離超平面は、中間潜在空間で第１の特徴値に関するラベルを分割する平面であり、分離超平面で分割された中間潜在空間は、第１の特徴値に関して異なる特徴を有する。例えば、分離超平面で分割された片方の中間潜在空間には「若い」に相当する特徴値を有する潜在変数が分布しており、もう片方の中間潜在空間には「老い」に相当する特徴値を有する潜在変数が分布している。なお、分離超平面は、例えばＳＶＭ（Ｓｕｐｐｏｒｔ　Ｖｅｃｔｏｒ　Ｍａｃｈｉｎ）などを用いて推定される。また、分離超平面の法線ベクトル方向が、第１の特徴値の変化に対応する向きである。なお、第１の特徴値に対応する向きを決定する方法はこれに限定されない。 In this embodiment, the operation of the latent variable is calculated based on the transition of the latent variable in the normal direction of the separating hyperplane. The separating hyperplane is a plane that divides the label related to the first feature value in the intermediate latent space, and the intermediate latent space divided by the separating hyperplane has different features regarding the first feature value. For example, in one intermediate latent space divided by a separating hyperplane, latent variables with feature values corresponding to "young" are distributed, and in the other intermediate latent space, feature values corresponding to "old" are distributed. A latent variable with . Note that the separation hyperplane is estimated using, for example, SVM (Support Vector Machine). Further, the normal vector direction of the separating hyperplane is the direction corresponding to the change in the first feature value. Note that the method for determining the orientation corresponding to the first feature value is not limited to this.

　第１の潜在変数を操作する量が多くなるほど、操作された潜在変数に基づいて得られる画像に弊害が発生する確率が高まる。したがって、必要に応じて第１の潜在変数を操作する量に閾値を設定し、生成部１０２ｄは、第１の潜在変数を操作する量をこの閾値未満に設定する。 The more the first latent variable is manipulated, the higher the probability that an adverse effect will occur in the image obtained based on the manipulated latent variable. Therefore, a threshold is set for the amount by which the first latent variable is manipulated as needed, and the generation unit 102d sets the amount by which the first latent variable is manipulated to be less than this threshold.

　ステップＳ１０４において、推定部１０２ｅは第２の潜在変数に基づいて第２の画像を推定（生成）する。第２の画像は、第１の機械学習モデルを用いて第２の潜在変数に基づいて推定される。なお、第１の機械学習モデルのウエイトの情報は、学習装置１０１において学習されており、記憶部１０２ａに記憶されている。 In step S104, the estimation unit 102e estimates (generates) a second image based on the second latent variable. A second image is estimated based on the second latent variable using the first machine learning model. Note that the information on the weights of the first machine learning model has been learned in the learning device 101 and is stored in the storage unit 102a.

　ステップＳ１０５において、変換部１０２ｃは第２の潜在変数に基づいて第３の潜在変数を生成する。第３の潜在変数は、ステップＳ１０２と同様の方法で第２の潜在変数を変換することで取得することができる。第３の潜在変数は、第２の潜在変数と互いに異なる潜在変数であり、潜在空間において第２の潜在変数に対して、複数の写像潜在変数によって定められる重心の近くに存在する。なお、第３の潜在変数を第２の潜在変数に対して、第１の機械学習モデルの学習に用いた画像の特徴値がより高密度に分布する位置に設定してもよい。なお、本実施例において第３の潜在変数は、中間潜在変数である。 In step S105, the conversion unit 102c generates a third latent variable based on the second latent variable. The third latent variable can be obtained by converting the second latent variable using a method similar to step S102. The third latent variable is a latent variable different from the second latent variable, and exists near the center of gravity defined by the plurality of mapping latent variables with respect to the second latent variable in the latent space. Note that the third latent variable may be set at a position where the feature values of the image used for learning the first machine learning model are more densely distributed with respect to the second latent variable. Note that in this embodiment, the third latent variable is an intermediate latent variable.

　ステップＳ１０６において、生成部１０２ｄは第３の潜在変数に基づいて第４の潜在変数を生成する。第４の潜在変数は、ステップＳ１０３と同様の方法で取得することができる。なお、ステップＳ１０６において、第１の特徴値と互いに異なる第２の特徴値を編集することで第４の潜在変数を取得してもよい。このように取得された第４の潜在変数は、図４に示すように拡張潜在空間において第３の潜在変数に対して第１の機械学習モデルの学習に用いた画像の特徴値が低密度に分布する位置に存在する。なお、本実施例において第４の潜在変数は、中間潜在変数である。 In step S106, the generation unit 102d generates a fourth latent variable based on the third latent variable. The fourth latent variable can be obtained using a method similar to step S103. Note that in step S106, the fourth latent variable may be acquired by editing a second feature value that is different from the first feature value. The fourth latent variable obtained in this way has a lower density of feature values of the image used for learning the first machine learning model than the third latent variable in the extended latent space, as shown in Figure 4. Exist in distributed locations. Note that in this embodiment, the fourth latent variable is an intermediate latent variable.

　ステップＳ１０７において、推定部１０２ｅは第４の潜在変数に基づいて第３の画像を推定（生成）する。第３の画像は、ステップＳ１０４と同様の方法で取得することができる。第３の画像は、第２の画像に対して、第１の特徴値（又は第２の特徴値）が編集された画像である。なお、必要に応じて第３の画像を新たな第２の画像として、新たな第３の画像を生成するためにステップＳ１０４からステップＳ１０７までのステップを１回以上繰り返し実行してもよい。 In step S107, the estimation unit 102e estimates (generates) the third image based on the fourth latent variable. The third image can be acquired using a method similar to step S104. The third image is an image obtained by editing the first feature value (or second feature value) with respect to the second image. Note that, if necessary, the third image may be used as a new second image, and the steps from step S104 to step S107 may be repeated one or more times to generate a new third image.

　このように、画像の編集において潜在変数の操作を複数回に分け、潜在変数の操作する工程と、操作された潜在変数に基づいて画像を生成する工程と、生成された画像を潜在空間への埋め込む工程とを順番に行う。このような構成とすることで、第１の機械学習モデルを用いて第１の画像から特徴値を大きく編集した上で弊害の少ない画像を生成することができる。さらに各潜在変数を中間潜在変数とすることで、拡張潜在変数を用いる場合に対して、弊害の発生する確率が低い潜在変数を用いて画像処理方法を行うことができる。また、写像潜在空間において低密度な特徴値であっても、精度よく編集可能な画像処理方法を提供することができる。 In this way, when editing an image, the operation of the latent variable is divided into multiple steps: the step of manipulating the latent variable, the step of generating an image based on the manipulated latent variable, and the step of transferring the generated image to the latent space. The steps of embedding are performed in order. With such a configuration, it is possible to greatly edit the feature values from the first image using the first machine learning model, and then generate an image with fewer harmful effects. Furthermore, by using each latent variable as an intermediate latent variable, the image processing method can be performed using latent variables that have a lower probability of causing harmful effects than when using expanded latent variables. Furthermore, it is possible to provide an image processing method that allows accurate editing even for low-density feature values in the mapping latent space.

　次に、図６及び図７を参照して、第１の機械学習モデルの学習フェーズ（学習済みモデルの製造方法）に関して述べる。図６は第１の機械学習モデルのウエイトの更新（学習）のフローチャートである。図６の各ステップは、主に、取得部１０１ｂ、生成部１０１ｃ、又は更新部１０１ｄにて実施される。図７は、本実施例におけるＧＡＮの構成を示す図である。本実施例におけるＧＡＮは、画像を生成する生成器１０と生成された画像を識別する識別器１１とを有する。 Next, the learning phase of the first machine learning model (method for manufacturing a learned model) will be described with reference to FIGS. 6 and 7. FIG. 6 is a flowchart of updating (learning) the weights of the first machine learning model. Each step in FIG. 6 is mainly performed by the acquisition unit 101b, the generation unit 101c, or the update unit 101d. FIG. 7 is a diagram showing the configuration of the GAN in this example. The GAN in this embodiment includes a generator 10 that generates an image and a classifier 11 that identifies the generated image.

　まずステップＳ２０１において、取得部１０１ｂは記憶部１０１ａから正解画像１２を取得する。正解画像１２は複数の画像であり、撮像装置によって取得された撮像画像でもよいし、ＣＧ（Ｃｏｍｐｕｔｅｒ　Ｇｒａｐｈｉｃｓ）画像でもよい。また、正解画像１２に含まれる画像は、画像における特徴値が段階的に変化する複数の画像を含むことが好ましい。例えば人の顔の特徴値として年齢を編集する場合、幼児期と青年期の中間にあたる学童期や少年期の人の顔を含む画像を正解画像１２に用いることが好ましい。このような構成とすることで、画像の特徴値を高精度に編集可能な機械学習モデルを生成することができる。なお、正解画像１２は、識別器１１においてリアル画像であるため、リアルに対応する正解ラベルを有する。必要に応じて正解画像１２は、前処理されていてもよい。正解画像１２に施される前処理は、正解画像１２から特徴値に対する調整である。例えば、正解画像１２が人の顔の画像である場合、その人の顔の目、口、鼻などの主要な器官の位置の調整（補正）である。 First, in step S201, the acquisition unit 101b acquires the correct image 12 from the storage unit 101a. The correct image 12 is a plurality of images, and may be a captured image acquired by an imaging device or a CG (Computer Graphics) image. Further, it is preferable that the images included in the correct image 12 include a plurality of images in which feature values of the images change in stages. For example, when editing age as a feature value of a person's face, it is preferable to use an image containing the face of a person in school age or boyhood, which is between infancy and adolescence, as the correct image 12. With such a configuration, it is possible to generate a machine learning model that can edit the feature values of an image with high precision. Note that since the correct image 12 is a real image in the discriminator 11, it has a correct label that corresponds to reality. The correct image 12 may be preprocessed if necessary. The preprocessing performed on the correct image 12 is adjustment of the feature values from the correct image 12. For example, when the correct image 12 is an image of a person's face, the positions of major organs such as the eyes, mouth, nose, etc. of the person's face are adjusted (corrected).

　ステップＳ２０２において、生成部１０１ｃは訓練潜在変数１３を生成する。本実施例において、訓練潜在変数１３は、５１２次元のテンソルであり、例えばガウス分布などに基づいてサンプリングされる任意のテンソルを訓練潜在変数１３として用いてもよい。 In step S202, the generation unit 101c generates the training latent variable 13. In this embodiment, the training latent variable 13 is a 512-dimensional tensor, and for example, any tensor sampled based on a Gaussian distribution may be used as the training latent variable 13.

　ステップＳ２０３において、生成部１０１ｃは訓練潜在変数１３を生成器１０に入力し推定画像１４を生成する。なお、推定画像１４は、識別器１１においてフェイク画像であるため、フェイクに対応する正解ラベルを有する。 In step S203, the generation unit 101c inputs the training latent variable 13 to the generator 10 to generate the estimated image 14. Note that since the estimated image 14 is a fake image in the classifier 11, it has a correct label corresponding to the fake image.

　なお、ステップＳ２０１を行わず、ステップＳ２０２及びステップＳ２０３から実行されてもよい。この場合、正解画像１２を用いず第１の機械学習モデルを学習することができる。また、ステップＳ２０１と、ステップＳ２０２及びステップＳ２０３とをランダムに片方ずつ行われてもよい。さらにステップＳ２０１と、ステップＳ２０２及びステップＳ２０３とが実行される回数は、同じ回数に限られない。 Note that Step S202 and Step S203 may be executed without performing Step S201. In this case, the first machine learning model can be learned without using the correct image 12. Further, step S201, step S202, and step S203 may be performed one by one at random. Furthermore, the number of times that step S201, step S202, and step S203 are performed is not limited to the same number of times.

　ステップＳ２０４において、更新部１０１ｄは識別器１１のウエイトを更新する。識別器１１は、正解画像１２又は推定画像１４を取得し、識別ラベルを生成する。更新部１０１ｄは、識別ラベルと正解ラベルの誤差に基づいて、識別器１１のウエイトを更新する。必要に応じて、推定画像１４及び正解画像１２に反転、平行移動、又は回転などの幾何変換による処理を施した画像を識別器１１に入力することで、識別器１１のウエイトの更新を行ってもよい。このような構成、正解画像１２の枚数が少ない場合や正解画像１２の特徴に偏りがある場合においても識別器１１の識別の精度を高めることができる。 In step S204, the updating unit 101d updates the weight of the discriminator 11. The classifier 11 acquires the correct image 12 or the estimated image 14 and generates an identification label. The updating unit 101d updates the weight of the classifier 11 based on the error between the identification label and the correct label. If necessary, the weights of the classifier 11 are updated by inputting to the classifier 11 an image that has been processed by geometric transformation such as inversion, parallel translation, or rotation on the estimated image 14 and the correct image 12. Good too. With this configuration, even when the number of correct images 12 is small or when the characteristics of the correct images 12 are biased, the accuracy of identification by the classifier 11 can be improved.

　ステップＳ２０５において、更新部１０１ｄは識別ラベルに基づいて生成器１０のウエイトを更新する。本実施例においてウエイトを更新は、識別ラベルのｓｉｇｍｏｉｄ　ｃｒｏｓｓ　ｅｎｔｒｏｐｙを用いる。ただし、これに限定されない。 In step S205, the updating unit 101d updates the weight of the generator 10 based on the identification label. In this embodiment, the weights are updated using the sigmoid cross entropy of the identification label. However, it is not limited to this.

　ステップＳ２０６において、更新部１０１ｄは学習が完了したか否かを判定する。学習の完了は、ウエイトの更新の反復回数が所定の回数に達したか、更新時のウエイトの編集、又は推定画像１４の品質が所定の品質より高いか、などで判定することができる。推定画像１４の品質が所定の品質より高いかは、例えば推定画像１４と正解画像１２の分布間の距離を測定するＦｒｅｃｈｅｔ　Ｉｎｃｅｐｔｉｏｎ　Ｄｉｓｔａｎｃｅなどのメトリックを用いて算出することができる。ウエイトの学習が完了していないと判定された場合、ステップＳ２０１へ戻り、取得部１０１ｂは新たな正解画像１２を取得する。一方、ウエイトの学習が完了したと判定された場合、更新部１０１ｄは学習を終了し、ウエイトの情報を記憶部１０１ａに記憶する。 In step S206, the updating unit 101d determines whether learning has been completed. Completion of learning can be determined based on whether the number of repetitions of weight updates has reached a predetermined number, whether the weights have been edited at the time of updating, or whether the quality of the estimated image 14 is higher than a predetermined quality. Whether the quality of the estimated image 14 is higher than a predetermined quality can be calculated using a metric such as Frechet Inception Distance, which measures the distance between the distributions of the estimated image 14 and the correct image 12, for example. If it is determined that weight learning is not completed, the process returns to step S201, and the acquisition unit 101b acquires a new correct image 12. On the other hand, if it is determined that the weight learning is completed, the updating unit 101d ends the learning and stores the weight information in the storage unit 101a.

　次に図８を参照して、ステップＳ１０２の第１の画像に基づいて第１の潜在変数への変換に関して述べる。図８は第１の潜在変数の生成のフローチャートである。本実施例において、第１の画像から第１の潜在変数への変換は、第１の機械学習モデルを用いた逆解析によって行う。図８の各ステップは、主に変換部１０２ｃ又は生成部１０２ｄにて実施される。 Next, with reference to FIG. 8, the conversion to the first latent variable based on the first image in step S102 will be described. FIG. 8 is a flowchart of the generation of the first latent variable. In this embodiment, the conversion from the first image to the first latent variable is performed by inverse analysis using the first machine learning model. Each step in FIG. 8 is mainly performed by the converter 102c or the generator 102d.

　本実施例における第１の機械学習モデルを用いた逆解析は、第１の機械学習モデルに対して任意の潜在変数（第１の入力潜在変数）を入力することで生成される画像と第１の画像を比較し、その誤差を用いて第１の入力潜在変数を更新（最適化）する。なお、第１の機械学習モデルのウエイトの情報は予め記憶部１０１ａから読みだされ、記憶部１０２ａに記憶されている。第１の機械学習モデルへの第１の入力潜在変数の入力と第１の入力潜在変数を更新とを繰り返すことで、第１の機械学習モデルを用いて第１の画像に類似した画像を取得可能な第２の入力潜在変数を生成する。第１の画像と類似しているか否かは、例えば各画素値の差によって求めることができる。本実施例において、生成された複数の第２の入力潜在変数のうち、複数の写像潜在変数によって定められる重心との距離が最も近い第２の入力潜在変数を第１の潜在変数とする。上述した方法にて第２の入力潜在変数を第１の潜在変数に設定することで、偽構造が発生する確率の少ない第１の潜在変数を取得することができる。なお、第１の機械学習モデルの学習に用いた画像の特徴値がより高密度に分布する位置に存在する第２の入力潜在変数を第１の潜在変数に設定してもよい。 The inverse analysis using the first machine learning model in this example is based on the image generated by inputting an arbitrary latent variable (first input latent variable) to the first machine learning model and the first machine learning model. images are compared, and the first input latent variable is updated (optimized) using the error. Note that information on the weights of the first machine learning model is read out in advance from the storage unit 101a and stored in the storage unit 102a. By repeating inputting the first input latent variable to the first machine learning model and updating the first input latent variable, an image similar to the first image is obtained using the first machine learning model. Generate possible second input latent variables. Whether or not the image is similar to the first image can be determined, for example, from the difference in each pixel value. In this embodiment, among the plurality of generated second input latent variables, the second input latent variable having the closest distance to the center of gravity determined by the plurality of mapping latent variables is set as the first latent variable. By setting the second input latent variable to the first latent variable using the method described above, it is possible to obtain a first latent variable with a low probability of generating a false structure. Note that a second input latent variable that exists in a position where the feature values of the image used for learning the first machine learning model are more densely distributed may be set as the first latent variable.

　ステップＳ３０１において、第１の入力潜在変数を設定する。潜在空間内の任意の潜在変数を第１の入力潜在変数とすることができる。例えばガウス分布などに基づいてサンプリングされる任意のテンソルを第１の入力潜在変数として用いてもよい。 In step S301, a first input latent variable is set. Any latent variable in the latent space can be the first input latent variable. For example, any tensor sampled based on a Gaussian distribution or the like may be used as the first input latent variable.

　ステップＳ３０２において、第１の機械学習モデルを用いて第１の入力潜在変数に基づいて推定画像１４を生成する。また、第１の機械学習モデルはあらかじめ学習されており、ウエイトの情報は、記憶部１０２ａに記憶されている。 In step S302, the estimated image 14 is generated based on the first input latent variable using the first machine learning model. Further, the first machine learning model is trained in advance, and the weight information is stored in the storage unit 102a.

　ステップＳ３０３において、第１の画像と推定画像１４との誤差に基づいて第１の入力潜在変数を更新する。このとき、損失関数は、第１の画像と埋め込み画像の画素値の差のユークリッドノルムや、第１の画像及び埋め込み画像に基づいて変換された特徴マップの要素ごとに算出したユークリッドノルムを用いる。ただし、損失関数はこれに限定されるものではない。例えば誤差逆伝播法（Ｂａｃｋｐｒｏｐａｇａｔｉｏｎ）を用いてもよい。 In step S303, the first input latent variable is updated based on the error between the first image and the estimated image 14. At this time, the loss function uses the Euclidean norm of the difference between the pixel values of the first image and the embedded image, or the Euclidean norm calculated for each element of the feature map converted based on the first image and the embedded image. However, the loss function is not limited to this. For example, an error backpropagation method may be used.

　ステップＳ３０４において、第１の入力潜在変数の更新が完了したか否かを判定する。更新の完了は、第１の入力潜在変数の更新の反復回数が所定の回数に達したか、更新時の第１の入力潜在変数の編集量が所定値より小さいか、又はステップＳ３０３で算出する損失関数の値が所定値より小さいか、などで判定することができる。第１の入力潜在変数の更新が完了していないと判定された場合、ステップＳ３０２へ戻り、更新された第１の入力潜在変数を第１の機械学習モデルに適用し、新たな埋め込み画像を生成する。一方、第１の入力潜在変数の更新が完了したと判定された場合、更新された第１の入力潜在変数を第２の入力潜在変数として、ステップＳ３０５に進む。 In step S304, it is determined whether the update of the first input latent variable is completed. The completion of the update is determined in step S303 when the number of repetitions of updating the first input latent variable reaches a predetermined number of times, the amount of editing of the first input latent variable at the time of update is smaller than a predetermined value, or when the update is completed in step S303. The determination can be made based on, for example, whether the value of the loss function is smaller than a predetermined value. If it is determined that the update of the first input latent variable is not completed, the process returns to step S302, and the updated first input latent variable is applied to the first machine learning model to generate a new embedded image. do. On the other hand, if it is determined that the update of the first input latent variable is completed, the updated first input latent variable is set as the second input latent variable and the process proceeds to step S305.

　ステップＳ３０５において、第２の入力潜在変数の生成が完了したか否かを判定する。生成の完了は、生成された第２の入力潜在変数の個数が所定の個数に達したか、などで判定することができる。第２の入力潜在変数の生成が完了していないと判定された場合、ステップＳ３０１へ戻り、第１の入力潜在変数を設定する。このとき、それまでのステップＳ３０１において設定されていない潜在変数を第１の入力潜在変数に設定する。一方、第２の入力潜在変数の生成が完了したと判定された場合、第２の入力潜在変数の生成を終了し、ステップＳ３０６に進む。 In step S305, it is determined whether the generation of the second input latent variable is completed. Completion of generation can be determined based on whether the number of generated second input latent variables has reached a predetermined number. If it is determined that the generation of the second input latent variable is not completed, the process returns to step S301 and the first input latent variable is set. At this time, the latent variable that has not been set in step S301 up to that point is set as the first input latent variable. On the other hand, if it is determined that the generation of the second input latent variable is completed, the generation of the second input latent variable is finished and the process advances to step S306.

　ステップＳ３０６において、第１の潜在変数を生成する。複数の写像潜在変数によって定められる重心の位置から複数の第２の入力潜在変数までの距離をそれぞれ算出し、最も距離が近い第２の入力潜在変数を第１の潜在変数として記憶部１０２ａに記憶する。複数の写像潜在変数によって定められる重心の位置からの距離は、例えばユークリッドノルムなどで算出することができる。 In step S306, a first latent variable is generated. Distances from the position of the center of gravity determined by the plurality of mapping latent variables to the plurality of second input latent variables are respectively calculated, and the second input latent variable having the closest distance is stored in the storage unit 102a as the first latent variable. do. The distance from the center of gravity determined by the plurality of mapping latent variables can be calculated using, for example, the Euclidean norm.

　［実施例２］
　次に、図９を参照して、実施例２に係る画像処理システム２００に関して説明する。図９は、本実施例における画像処理システム２００のブロック図である。画像処理システム２００は、学習装置２０１が、第１の学習部（第１の学習手段）２１１及び第２の学習部（第２の学習手段）２１２を有する点において、実施例１と異なる。 [Example 2]
Next, with reference to FIG. 9, an image processing system 200 according to a second embodiment will be described. FIG. 9 is a block diagram of the image processing system 200 in this embodiment. The image processing system 200 differs from the first embodiment in that a learning device 201 includes a first learning section (first learning means) 211 and a second learning section (second learning means) 212.

　画像処理システム２００は、学習装置２０１、画像処理装置（画像推定装置）２０２、表示装置２０３、記録媒体２０４、出力装置２０５、及びネットワーク２０６を有する。学習装置２０１及び画像処理装置２０２は、ネットワーク２０６を介して互いに通信可能である。 The image processing system 200 includes a learning device 201, an image processing device (image estimation device) 202, a display device 203, a recording medium 204, an output device 205, and a network 206. The learning device 201 and the image processing device 202 can communicate with each other via the network 206.

　学習装置２０１は、第１の学習部２１１及び第２の学習部２１２を有する。第１の学習部２１１は、記憶部２１１ａ、取得部２１１ｂ、生成部２１１ｃ、及び更新部２１１ｄを有し、第１の機械学習モデルを生成する。なお、第１の学習部２１１は、実施例１における学習装置１０１に相当する。第２の学習部２１２は、記憶部２１２ａ、取得部２１２ｂ、生成部２１２ｃ、及び更新部２１２ｄを有し、第２の機械学習モデルのウエイトを決定する。第２の機械学習モデルは、画像に基づいて潜在変数を取得可能である。学習装置２０１は、１以上のＣＰＵ等のプロセッサ（学習手段）によりその機能を実装することができる。なお、学習装置２０１はサーバでもよい。また、第１及び第２の学習部は、別々の装置でもよい。 The learning device 201 has a first learning section 211 and a second learning section 212. The first learning unit 211 includes a storage unit 211a, an acquisition unit 211b, a generation unit 211c, and an update unit 211d, and generates a first machine learning model. Note that the first learning unit 211 corresponds to the learning device 101 in the first embodiment. The second learning unit 212 includes a storage unit 212a, an acquisition unit 212b, a generation unit 212c, and an update unit 212d, and determines the weight of the second machine learning model. The second machine learning model can obtain latent variables based on images. The learning device 201 can implement its functions using one or more processors (learning means) such as a CPU. Note that the learning device 201 may be a server. Further, the first and second learning sections may be separate devices.

　画像処理装置２０２は、実施例１における画像処理装置１０２と同様であるため説明を省略する。また、出力画像は、表示装置２０３、記録媒体２０４、又は出力装置２０５の少なくとも１つに出力される。表示装置２０３、記録媒体２０４、及び出力装置２０５は、実施例１における表示装置１０３、記録媒体１０４、及び、出力装置１０５と同様である。 The image processing device 202 is the same as the image processing device 102 in the first embodiment, so the description thereof will be omitted. Further, the output image is output to at least one of the display device 203, the recording medium 204, or the output device 205. The display device 203, the recording medium 204, and the output device 205 are the same as the display device 103, the recording medium 104, and the output device 105 in the first embodiment.

　次に、図５及び図１０を参照して本実施例の推定フェーズの流れに関して述べる。図１０は、図１と同様の潜在空間の一部を模式的に示したグラフであり、本実施例における潜在変数の挙動を示す。なお、本実施例における第１の機械学習モデルは、実施例１と同様の方法にて学習されており、ウエイトの情報は、記憶部１０２ａに保存されている。 Next, the flow of the estimation phase of this embodiment will be described with reference to FIGS. 5 and 10. FIG. 10 is a graph schematically showing a part of the latent space similar to FIG. 1, and shows the behavior of the latent variables in this example. Note that the first machine learning model in this example is trained in the same manner as in Example 1, and the weight information is stored in the storage unit 102a.

　本実施例は、Ｓ１０２において変換部１０２ｃが第１の画像を後述する第２の機械学習モデルを用いて第１の潜在変数に変換する点で実施例１と異なる。また、本実施例において、第３の潜在変数は、第２の潜在変数と互いに異なる潜在変数であり、潜在空間において第２の潜在変数に対して、複数の写像潜在変数によって定められる重心の近くに存在する。なお、第３の潜在変数を第２の潜在変数に対して、第１の機械学習モデルの学習に用いた画像の特徴値がより高密度に分布する位置に設定してもよい。 This example differs from Example 1 in that in S102, the conversion unit 102c converts the first image into a first latent variable using a second machine learning model to be described later. Further, in this example, the third latent variable is a latent variable different from the second latent variable, and is near the center of gravity defined by the plurality of mapping latent variables with respect to the second latent variable in the latent space. exists in Note that the third latent variable may be set at a position where the feature values of the image used for learning the first machine learning model are more densely distributed with respect to the second latent variable.

　図１０に示すように本実施例における第１乃至第４の潜在変数は拡張潜在変数である。また、第１の潜在変数は、第２の潜在変数に対して第１の機械学習モデルの学習に用いた画像の特徴値が高密度に分布する位置に存在する。さらに、同様に第３の潜在変数は、第４の潜在変数に対して第１の機械学習モデルの学習に用いた画像の特徴値が高密度に分布する位置に存在する。 As shown in FIG. 10, the first to fourth latent variables in this example are extended latent variables. Further, the first latent variable exists at a position where the feature values of the image used for learning the first machine learning model are distributed with high density with respect to the second latent variable. Furthermore, similarly, the third latent variable exists at a position where the feature values of the image used for learning the first machine learning model are distributed with high density with respect to the fourth latent variable.

　このように、画像の編集において、潜在変数の操作を複数回に分け、潜在変数の操作する工程と、操作された潜在変数に基づいて画像を生成する工程と、生成された画像を潜在空間への埋め込む工程とを順番に行う。このような構成とすることで、第１の機械学習モデルを用いて第１の画像から特徴値を大きく編集した上で弊害の少ない画像を生成することができる。さらに各潜在変数を拡張潜在変数とすることで、中間潜在変数を用いる場合に対して、第１の機械学習モデルの学習に用いた画像の特徴値とは異なる特徴値を有する画像を高精度に生成することができる。 In this way, when editing an image, the manipulation of the latent variable is divided into multiple steps: the process of manipulating the latent variable, the process of generating an image based on the manipulated latent variable, and the process of transferring the generated image to the latent space. The process of embedding is performed in order. With such a configuration, it is possible to greatly edit the feature values from the first image using the first machine learning model, and then generate an image with fewer harmful effects. Furthermore, by making each latent variable an extended latent variable, images with feature values different from the feature values of the image used for learning the first machine learning model can be detected with high accuracy compared to the case where intermediate latent variables are used. can be generated.

　次に図１１及び図１２を参照して、第２の機械学習モデルのウエイトの学習に関して述べる。図１１の各ステップは、主に、取得部２１２ｂ、生成部２１２ｃ、又は更新部２１２ｄにて実施される。図１１は第２の機械学習モデルのウエイトの学習のフローチャートである。図１２は第２の機械学習モデルの学習の流れを示す図である。本実施例において生成器２０及び識別器２１を有するＧＡＮを用いて画像に基づいて潜在変数を生成する第２の機械学習モデルを生成する。 Next, learning of weights in the second machine learning model will be described with reference to FIGS. 11 and 12. Each step in FIG. 11 is mainly performed by the acquisition unit 212b, the generation unit 212c, or the update unit 212d. FIG. 11 is a flowchart of learning weights of the second machine learning model. FIG. 12 is a diagram showing the learning flow of the second machine learning model. In this embodiment, a second machine learning model that generates latent variables based on images is generated using a GAN having a generator 20 and a classifier 21.

　本実施例における第２の機械学習モデルは、複数の写像潜在変数によって定められる重心の近くの潜在変数に変換する。また、第１の機械学習モデルの学習に用いた画像の特徴値がより高密度に分布する位置に存在する第２の入力潜在変数を第１の潜在変数に設定してもよい。なお、第２の機械学習モデルは、潜在変数を生成（推定）する際に、第１及び第２の拡張潜在変数を含む複数のテンソルの分散が小さくなるように学習されてもよい。必要に応じて、第２の機械学習モデルによって推定された潜在変数が、ＧＡＮの識別器によって写像潜在変数と判定されるように学習を行ってもよい。上記のような学習を行うことで、偽構造が発生する確率の少ない潜在変数を推定可能な第２の機械学習モデルを生成することができる。 The second machine learning model in this example converts latent variables near the center of gravity defined by a plurality of mapped latent variables. Further, a second input latent variable that exists in a position where the feature values of the image used for learning the first machine learning model are more densely distributed may be set as the first latent variable. Note that the second machine learning model may be trained so that the variance of a plurality of tensors including the first and second extended latent variables becomes small when generating (estimating) latent variables. If necessary, learning may be performed so that the latent variable estimated by the second machine learning model is determined to be a mapped latent variable by the GAN classifier. By performing the above-described learning, it is possible to generate a second machine learning model that can estimate latent variables with a low probability of generating a false structure.

　まず、ステップＳ５０１において、取得部２１２ｂは正解画像２２を取得する。正解画像２２は複数の画像であり、撮像装置によって取得された撮像画像でもよいし、ＣＧ（Ｃｏｍｐｕｔｅｒ　Ｇｒａｐｈｉｃｓ）画像でもよい。また、正解画像２２は、第１の機械学習モデルの学習に用いた画像（正解画像１２）、又は第１の機械学習モデルによって生成された画像（推定画像１４）が含まれていてもよい。 First, in step S501, the acquisition unit 212b acquires the correct image 22. The correct image 22 is a plurality of images, and may be a captured image acquired by an imaging device or a CG (Computer Graphics) image. Further, the correct image 22 may include an image used for learning the first machine learning model (correct image 12) or an image generated by the first machine learning model (estimated image 14).

　ステップＳ５０２において、生成部２１２ｃは推定潜在変数２３を生成（推定）する。生成部２１２ｃは、生成器２０に正解画像２２を入力することで、推定潜在変数２３を生成する。なお、推定潜在変数２３は、識別器２１においてフェイクに対応する正解ラベルを有する。 In step S502, the generation unit 212c generates (estimates) the estimated latent variable 23. The generation unit 212c generates the estimated latent variable 23 by inputting the correct image 22 to the generator 20. Note that the estimated latent variable 23 has a correct label corresponding to a fake in the classifier 21.

　ステップＳ５０３において、生成部２１２ｃは正解潜在変数２５を取得する。本実施例において、５１２次元のテンソル（初期潜在変数に相当）を第１の機械学習モデルのマッピングネットワークに入力することで、正解潜在変数２５（写像潜在変数に相当）を生成する。なお、正解潜在変数２５は、識別器２１においてリアルに対応する正解ラベルを有する。なお、ステップＳ５０１及びステップＳ５０２と、ステップＳ５０３とをランダムに片方のみ行われてもよい。さらにステップＳ５０１及びステップＳ５０２と、ステップＳ５０３とが実行される回数は、同じ回数に限られない。 In step S503, the generation unit 212c acquires the correct latent variable 25. In this embodiment, a 512-dimensional tensor (corresponding to an initial latent variable) is input to the mapping network of the first machine learning model to generate a correct latent variable 25 (corresponding to a mapping latent variable). Note that the correct answer latent variable 25 has a correct answer label that corresponds to reality in the discriminator 21. Note that only one of steps S501 and S502 and step S503 may be performed at random. Furthermore, the number of times that step S501, step S502, and step S503 are executed is not limited to the same number of times.

　ステップＳ５０４において、更新部２１２ｄは識別器２１のウエイトを更新する。識別器２１は、推定潜在変数２３又は正解潜在変数２５を取得し、識別ラベルを生成する。識別器２１は識別ラベルと正解ラベルとの誤差に基づいて更新される。 In step S504, the updating unit 212d updates the weight of the discriminator 21. The classifier 21 obtains the estimated latent variable 23 or the correct latent variable 25 and generates an identification label. The classifier 21 is updated based on the error between the identification label and the correct label.

　ステップＳ５０５において、生成部２１２ｃは推定画像２４を生成する。推定画像２４は、推定潜在変数２３を第１の機械学習モデルに入力することで生成される。なお、第１の機械学習モデルはあらかじめ学習されており、ウエイトの情報は、記憶部２１２ａに記憶されている。 In step S505, the generation unit 212c generates the estimated image 24. The estimated image 24 is generated by inputting the estimated latent variable 23 into the first machine learning model. Note that the first machine learning model has been trained in advance, and the weight information is stored in the storage unit 212a.

　ステップＳ５０６において、更新部２１２ｄは生成器２０のウエイトを更新する。本実施例において、損失関数は、例えば識別器２１の識別ラベルに関する損失関数、推定潜在変数２３に関する損失関数、及び正解画像２２と推定画像２４に関する損失関数などを用いることができる。識別器２１の識別ラベルに関する損失関数は、識別ラベルのｓｉｇｍｏｉｄ　ｃｒｏｓｓ　ｅｎｔｒｏｐｙに基づいてウエイトの更新を行う。推定潜在変数２３に関する損失関数は、推定潜在変数２３に含まれる特定の次元を有する複数のテンソルの分散である。正解画像２２と推定画像２４に関する損失関数は、各画像の画素値の差のユークリッドノルムや各画像に基づいて変換された特徴マップの要素ごとに算出したユークリッドノルムなどである。 In step S506, the updating unit 212d updates the weight of the generator 20. In this embodiment, the loss function may be, for example, a loss function related to the identification label of the classifier 21, a loss function related to the estimated latent variable 23, a loss function related to the correct image 22 and the estimated image 24, or the like. The loss function related to the identification label of the classifier 21 updates the weight based on the sigmoid cross entropy of the identification label. The loss function for the estimated latent variable 23 is the variance of a plurality of tensors having a specific dimension included in the estimated latent variable 23. The loss function regarding the correct image 22 and the estimated image 24 is the Euclidean norm of the difference between the pixel values of each image, the Euclidean norm calculated for each element of the feature map converted based on each image, or the like.

　ステップＳ５０７において、更新部２１２ｄは学習が完了したか否かを判定する。学習の完了は、ウエイトの更新の反復回数が所定の回数に達したか、更新時のウエイトの編集量が所定値より小さいか、などで判定することができる。ウエイトの学習が完了していないと判定された場合、ステップＳ５０１へ戻り、取得部２１２ｂは新たな正解画像２２を取得する。一方、ウエイトの学習が完了したと判定された場合、更新部２１２ｄは学習を終了し、ウエイトの情報を記憶部２１２ａに記憶する。 In step S507, the updating unit 212d determines whether learning has been completed. Completion of learning can be determined based on whether the number of repetitions of weight updating has reached a predetermined number of times, or whether the amount of weight editing at the time of updating is smaller than a predetermined value. If it is determined that weight learning is not completed, the process returns to step S501, and the acquisition unit 212b acquires a new correct image 22. On the other hand, if it is determined that the weight learning is completed, the updating unit 212d ends the learning and stores the weight information in the storage unit 212a.

　［実施例３］
　次に、図１３及び図１４を参照して、実施例３に係る画像処理システム３００に関して説明する。本実施例の画像処理システム３００は、画像処理装置３０２に第１の画像に対する画像処理に関する要求を行う制御装置３０３を有する点で実施例１と異なる。 [Example 3]
Next, an image processing system 300 according to the third embodiment will be described with reference to FIGS. 13 and 14. The image processing system 300 of this embodiment differs from the first embodiment in that it includes a control device 303 that requests the image processing device 302 regarding image processing for the first image.

　図１３は、本実施例における画像処理システム３００のブロック図である。画像処理システム３００は、学習装置３０１、画像処理装置（画像推定装置）３０２、及び制御装置３０３を有する。本実施例において、学習装置３０１及び画像処理装置３０２はサーバである。制御装置３０３は、例えばパーソナルコンピュータ若しくはスマートフォンのようなユーザ端末である。制御装置３０３はネットワーク３０４を介して画像処理装置３０２に接続されている。画像処理装置３０２はネットワーク３０５を介して学習装置３０１に接続されている。つまり、制御装置３０３及び画像処理装置３０２並びに画像処理装置３０２及び学習装置３０１は互いに通信可能に構成されている。 FIG. 13 is a block diagram of the image processing system 300 in this embodiment. The image processing system 300 includes a learning device 301, an image processing device (image estimation device) 302, and a control device 303. In this embodiment, the learning device 301 and the image processing device 302 are servers. The control device 303 is, for example, a user terminal such as a personal computer or a smartphone. The control device 303 is connected to the image processing device 302 via a network 304. The image processing device 302 is connected to the learning device 301 via a network 305. That is, the control device 303 and the image processing device 302 as well as the image processing device 302 and the learning device 301 are configured to be able to communicate with each other.

　画像処理システム３００における学習装置３０１は、学習装置１０１と同様の構成のため説明を省略する。 The learning device 301 in the image processing system 300 has the same configuration as the learning device 101, so a description thereof will be omitted.

　画像処理装置３０２は、通信部（受信手段）３０２ｆを有する点において画像処理装置１０２と異なる。 The image processing device 302 differs from the image processing device 102 in that it includes a communication section (receiving means) 302f.

　制御装置３０３は、通信部（送信手段）３０３ａ、表示部（表示手段）３０３ｂ、入力部（入力手段）３０３ｃ、処理部（処理手段）３０３ｄ、及び記録部３０３ｅを有する。通信部３０３ａは、第１の画像に対する処理を画像処理装置３０２に実行させるための要求を画像処理装置３０２に送信することができる。また、画像処理装置３０２によって処理された出力画像を受信することができる。表示部３０３ｂは、種々の情報を表示する。表示部３０３ｂによって表示される種々の情報は、例えば画像処理装置３０２への入力画像及び画像処理装置３０２が生成した出力画像である。入力部３０３ｃは、ユーザから画像処理を開始する指示などを入力できる。処理部３０３ｄは、画像処理装置３０２から受信した出力画像に対して任意の画像処理を施すことができる。記録部３０３ｅは、画像処理装置３０２から受信した出力画像を保存する。 The control device 303 includes a communication section (transmission means) 303a, a display section (display means) 303b, an input section (input means) 303c, a processing section (processing means) 303d, and a recording section 303e. The communication unit 303a can transmit a request to the image processing device 302 for causing the image processing device 302 to perform processing on the first image. Additionally, output images processed by the image processing device 302 can be received. The display section 303b displays various information. Various information displayed by the display unit 303b is, for example, an input image to the image processing device 302 and an output image generated by the image processing device 302. The input unit 303c allows the user to input an instruction to start image processing. The processing unit 303d can perform arbitrary image processing on the output image received from the image processing device 302. The recording unit 303e stores the output image received from the image processing device 302.

　なお、処理対象である第１の画像を画像処理装置３０２に送信する方法は問わず、例えば第１の画像はＳ６０１と同時に画像処理装置３０２にアップロードされてもよいし、Ｓ６０１以前に画像処理装置３０２にアップロードされていてもよい。また、第１の画像は画像処理装置３０２とは異なるサーバ上に保存された画像でもよい。 Note that the method for transmitting the first image to be processed to the image processing apparatus 302 does not matter; for example, the first image may be uploaded to the image processing apparatus 302 at the same time as S601, or the first image may be uploaded to the image processing apparatus 302 before S601. 302 may be uploaded. Further, the first image may be an image stored on a server different from the image processing device 302.

　次に、本実施例における出力画像（第３の画像）の生成に関して説明する。図１４は、本実施例における推定フェーズに関するフローチャートである。 Next, generation of the output image (third image) in this embodiment will be explained. FIG. 14 is a flowchart regarding the estimation phase in this embodiment.

　制御装置３０３の動作について説明する。本実施例における画像処理は、制御装置３０３を介してユーザにより画像処理開始の指示によって処理が開始される。 The operation of the control device 303 will be explained. The image processing in this embodiment is started by a user's instruction to start image processing via the control device 303.

　ステップＳ６０１（第１の送信ステップ）において、通信部３０３ａは第１の画像する処理の要求を画像処理装置３０２へ送信する。なお、ステップＳ６０１において、制御装置３０３は第１の画像に対する処理の要求と共に、編集に関する情報、ユーザを認証するＩＤなどを送信してもよい。編集に関する情報は、編集対象となる特徴値とその特徴値の編集の程度を含む。例えばユーザが第１の画像の被写体の年齢を１０歳増やした画像を生成する場合。編集に関する情報は、ユーザが指定した「年齢」、「１０歳」、及び「老い」などの情報が含まれる。 In step S601 (first transmission step), the communication unit 303a transmits a first image processing request to the image processing device 302. Note that in step S601, the control device 303 may transmit information regarding editing, an ID for authenticating the user, and the like together with a request for processing the first image. The information regarding editing includes the feature value to be edited and the degree of editing of the feature value. For example, when a user generates an image in which the age of the subject in the first image is increased by 10 years. The information regarding editing includes information such as "age", "10 years old", and "old age" specified by the user.

　ステップＳ６０２（第１の受信ステップ）において、通信部３０３ａは画像処理装置３０２によって生成された第３の画像を受信する。 In step S602 (first reception step), the communication unit 303a receives the third image generated by the image processing device 302.

　次に、画像処理装置３０２の動作について説明する。 Next, the operation of the image processing device 302 will be explained.

　まず、ステップＳ７０１（第２の受信ステップ）において、通信部３０２ｆは通信部３０３ａから送信された第１の画像に対する処理の要求を受信する。画像処理装置３０２は、第１の画像に対する処理が指示を受けることによって、ステップＳ７０２以降の処理を実行する。 First, in step S701 (second reception step), the communication unit 302f receives a request to process the first image transmitted from the communication unit 303a. The image processing device 302 executes the processing from step S702 upon receiving the instruction to process the first image.

　ステップＳ７０２において、取得部３０２ｂは、編集に関する情報及び第１の画像を取得する。本実施例において、編集に関する情報及び第１の画像は、制御装置３０３から送信されたものである。なお、ステップＳ７０１及びステップＳ７０２の処理は同時に行われてもよい。また、ステップＳ７０２乃至Ｓ７０８は、ステップＳ１０１乃至Ｓ１０７と同様であるため、説明を省略する。 In step S702, the acquisition unit 302b acquires information regarding editing and the first image. In this embodiment, the information regarding editing and the first image are transmitted from the control device 303. Note that the processes in step S701 and step S702 may be performed simultaneously. Further, steps S702 to S708 are the same as steps S101 to S107, so their explanation will be omitted.

　ステップＳ７０９（第２の送信ステップ）において、通信部３０２ｆは第３の画像を制御装置３０３へ送信する。 In step S709 (second transmission step), the communication unit 302f transmits the third image to the control device 303.

　このように、画像の編集において、潜在変数の操作を複数回に分け、潜在変数の操作する工程と、操作された潜在変数に基づいて画像を生成する工程と、生成された画像を潜在空間への埋め込む工程とを順番に行う。このような構成とすることで、第１の機械学習モデルを用いて第１の画像から特徴値を大きく編集した上で弊害の少ない画像を生成することができる。さらに、本実施例において、制御装置３０３は特定の画像に対する処理を要求するのみであり、実際の画像処理は画像処理装置３０２によって行われる。したがって、制御装置３０３をユーザ端末とすれば、ユーザ端末による処理負荷を低減することが可能となる。したがって、ユーザ側は低い処理負荷で出力画像を得ることが可能となる。 In this way, when editing an image, the manipulation of the latent variable is divided into multiple steps: the process of manipulating the latent variable, the process of generating an image based on the manipulated latent variable, and the process of transferring the generated image to the latent space. The process of embedding is performed in order. With such a configuration, it is possible to greatly edit the feature values from the first image using the first machine learning model, and then generate an image with fewer harmful effects. Furthermore, in this embodiment, the control device 303 only requests processing for a specific image, and the actual image processing is performed by the image processing device 302. Therefore, by using the control device 303 as a user terminal, it is possible to reduce the processing load on the user terminal. Therefore, the user can obtain an output image with a low processing load.

　（その他の実施形態）
　本発明は、上述の実施例の１以上の機能を実現するプログラムを、ネットワークまたは記憶媒体を介してシステム、又は装置に供給し、そのシステムまたは装置のコンピュータにおける１つ以上のプロセッサがプログラムを読出し実行する処理でも実現可能である。また、１以上の機能を実現する回路（例えば、ＡＳＩＣ）によっても実現可能である。本発明における画像処理装置は本発明の画像処理機能を有する装置であればよく、ＰＣの形態で実現され得る。 (Other embodiments)
The present invention provides a program that implements one or more of the functions of the embodiments described above to a system or device via a network or a storage medium, and one or more processors in a computer of the system or device reads the program. This can also be achieved by executing a process. It can also be realized by a circuit (for example, ASIC) that realizes one or more functions. The image processing device according to the present invention may be any device having the image processing function according to the present invention, and may be realized in the form of a PC.

　各実施例によれば、第１の機械学習モデルを用いて元の画像の特徴値を大きく編集した上で弊害の少ない画像を生成することができる。 According to each embodiment, it is possible to greatly edit the feature values of the original image using the first machine learning model, and then generate an image with fewer harmful effects.

　以上、本発明の好ましい実施形態及び実施例について説明したが、本発明はこれらの実施形態及び実施例に限定されず、その要旨の範囲内で種々の組合せ、変形及び変更が可能である。従って、本発明の範囲を公にするために以下の請求項を添付する。 Although the preferred embodiments and examples of the present invention have been described above, the present invention is not limited to these embodiments and examples, and various combinations, modifications, and changes can be made within the scope of the gist. Therefore, the following claims are appended to set forth the scope of the invention.

　本願は、２０２２年８月２６日提出の日本国特許出願である特願２０２２－１３５２９６を基礎として優先権を主張するものであり、その記載内容の全てをここに援用する。 This application claims priority based on Japanese Patent Application No. 2022-135296, which is a Japanese patent application filed on August 26, 2022, and the entire content thereof is incorporated herein by reference.

Claims

　第１の画像に基づいて第１の潜在変数を取得するステップと、
　前記第１の潜在変数に基づいて前記第１の潜在変数と異なる第２の潜在変数を取得するステップと、
　前記第２の潜在変数を第１の機械学習モデルに入力することで、第２の画像を生成するステップと、
　前記第２の画像に基づいて前記第２の潜在変数とは異なる第３の潜在変数を取得するステップと、
　前記第３の潜在変数に基づいて前記第３の潜在変数とは異なる第４の潜在変数を取得するステップと、
　前記第４の潜在変数を前記第１の機械学習モデルに入力することで、第３の画像を生成するステップとを有ることを特徴とする画像処理方法。 obtaining a first latent variable based on the first image;
obtaining a second latent variable different from the first latent variable based on the first latent variable;
generating a second image by inputting the second latent variable into a first machine learning model;
obtaining a third latent variable different from the second latent variable based on the second image;
obtaining a fourth latent variable different from the third latent variable based on the third latent variable;
An image processing method comprising the step of generating a third image by inputting the fourth latent variable to the first machine learning model.
　前記第１の潜在変数及び前記第３の潜在変数の少なくとも一方は、前記第１の機械学習モデルを用いた逆解析によって取得されることを特徴とする請求項１に記載の画像処理方法。 The image processing method according to claim 1, wherein at least one of the first latent variable and the third latent variable is obtained by inverse analysis using the first machine learning model.
　前記第１の潜在変数及び前記第３の潜在変数の少なくとも一方は、第２の機械学習モデルを用いて取得されることを特徴とする請求項１に記載の画像処理方法。 The image processing method according to claim 1, wherein at least one of the first latent variable and the third latent variable is obtained using a second machine learning model.
　前記第２の機械学習モデルは、入力された画像に基づいて潜在変数を生成することを特徴とする請求項３に記載の画像処理方法。 The image processing method according to claim 3, wherein the second machine learning model generates latent variables based on the input image.
　前記第１の画像の被写体は、人の顔であることを特徴とする請求項１乃至４の何れか一項に記載の画像処理方法。 The image processing method according to any one of claims 1 to 4, wherein the subject of the first image is a human face.
　前記第３の画像を新たな第２の画像として、新たな第３の画像を生成するステップを１回以上繰り返すことを特徴とする請求項１乃至５の何れか一項に記載の画像処理方法。 The image processing method according to any one of claims 1 to 5, characterized in that the step of generating a new third image is repeated one or more times by using the third image as a new second image. .
　前記第１の機械学習モデルの学習に用いられた画像に基づいて得られた複数の潜在変数の分布である潜在空間において、前記第３の潜在変数は前記第２の潜在変数よりも前記複数の潜在変数によって定められる重心の近くに位置することを特徴とする請求項１乃至６の何れか一項に記載の画像処理方法。 In a latent space that is a distribution of a plurality of latent variables obtained based on the images used for learning the first machine learning model, the third latent variable is larger than the second latent variable. The image processing method according to any one of claims 1 to 6, characterized in that the image processing method is located near a center of gravity defined by a latent variable.
　前記第２の潜在変数は、前記第１の潜在変数よりも前記第１の機械学習モデルの学習に用いられた画像の特徴値が低密度に分布する位置に存在することを特徴とする請求項１乃至７の何れか一項に記載の画像処理方法。 The second latent variable is located at a position where feature values of the image used for learning the first machine learning model are distributed at a lower density than the first latent variable. 8. The image processing method according to any one of 1 to 7.
　前記第４の潜在変数は、前記第３の潜在変数よりも前記第１の機械学習モデルの学習に用いられた画像の特徴値が低密度に分布する位置に存在することを特徴とする請求項１乃至８の何れか一項に記載の画像処理方法。 The fourth latent variable is located at a position where the feature values of the image used for learning the first machine learning model are distributed at a lower density than the third latent variable. 9. The image processing method according to any one of 1 to 8.
　請求項１乃至９の何れか一項に記載の画像処理方法をコンピュータに実行させることを特徴とするプログラム。 A program that causes a computer to execute the image processing method according to any one of claims 1 to 9.
　請求項１０に記載のプログラムを記憶していることを特徴とする記憶媒体。 A storage medium storing the program according to claim 10.
　請求項１乃至９の何れか一項に記載の画像処理方法を実行可能な処理手段を有することを特徴とする画像処理装置。 An image processing device comprising processing means capable of executing the image processing method according to any one of claims 1 to 9.
　請求項１２に記載の画像処理装置と該画像処理装置と通信可能な制御装置とを有する画像処理システムであって、
　前記制御装置は、前記第１の画像に対する処理の実行に関する要求を前記画像処理装置へ送信する手段を有することを特徴とする画像処理システム。 An image processing system comprising the image processing device according to claim 12 and a control device capable of communicating with the image processing device,
An image processing system characterized in that the control device includes means for transmitting a request regarding execution of processing on the first image to the image processing device.
　訓練潜在変数を取得するステップと、
　生成器において前記訓練潜在変数に基づいて推定画像を生成するステップと
　識別器において入力された画像が前記推定画像であるかを識別するステップと、
　前記識別器における識別の結果に基づいて前記生成器を学習するステップとを有することを特徴とする学習済みモデルの製造方法。 obtaining training latent variables;
generating an estimated image based on the training latent variable in a generator; and identifying whether the input image is the estimated image in a discriminator;
A method for manufacturing a trained model, comprising the step of learning the generator based on the result of classification by the classifier.
　請求項１４に記載の学習済みモデルの製造方法を実行可能な第１の学習手段を有することを特徴とする学習装置。 A learning device comprising a first learning means capable of executing the learned model manufacturing method according to claim 14.