WO2023053419A1

WO2023053419A1 - Processing device and processing method

Info

Publication number: WO2023053419A1
Application number: PCT/JP2021/036325
Authority: WO
Inventors: 琢佐々木; 嘉典松尾; 啓太三上; 玲那星野
Original assignee: 日本電信電話株式会社
Priority date: 2021-09-30
Filing date: 2021-09-30
Publication date: 2023-04-06

Abstract

An inference device (10) has: an image input unit (131) that receives the input of an image to be assessed; and a domain assessing unit (132) that, on the basis of an element that enables an environment in which a subject is imaged to be established, assesses whether the image to be assessed belongs to any domain among a plurality of domains respectively defined by environmental conditions.

Description

処理装置及び処理方法Processing equipment and processing method

　本発明は、処理装置及び処理方法に関する。 The present invention relates to a processing apparatus and a processing method.

　迷子等の早期発見及び追跡、防犯などを目的として、監視カメラ映像に対する人物等の照合や推定が行われている。近年、画像の照合や推定に対し、精度の向上が要求されている。例えば人物の照合の場合、同一人物であるのに朝と夜の違いで別人であると推定する場合や、夜に別人を当人であると誤認する場合がある。このように、時間帯、天候、照明などの環境条件に起因して、画像の照合や推定の精度が低下する場合がある。 For the purpose of early detection and tracking of lost children, crime prevention, etc., people are collated and estimated against surveillance camera images. In recent years, there has been a demand for improved accuracy in matching and estimating images. For example, when verifying a person, there are cases in which the same person is presumed to be a different person due to the difference between morning and night, or a different person is mistakenly recognized as the person at night. As described above, the accuracy of image collation and estimation may be degraded due to environmental conditions such as the time of day, weather, and lighting.

　ここで、データの自然変動を利用するモデルにより、環境条件の変化を補完することで精度向上を図る技術がある（非特許文献１参照）。 Here, there is a technique for improving accuracy by complementing changes in environmental conditions with a model that uses natural fluctuations in data (see Non-Patent Document 1).

　しかしながら、非特許文献１記載の技術では、昼と夜とのように環境条件の変化が大きすぎる場合には、十分な補完ができず、画像の照合または推定の精度を確保することができないという課題があった。 However, with the technique described in Non-Patent Document 1, when environmental conditions change too much, such as between day and night, sufficient interpolation cannot be performed, and the accuracy of image matching or estimation cannot be ensured. I had a problem.

　本発明は、上記に鑑みてなされたものであって、対象となる画像の環境条件を適切に判定することで、判定に応じた画像処理の実行を可能とし、画像の照合または推定の精度向上を図ることができる処理装置及び処理方法を提供することを目的とする。 The present invention has been made in view of the above, and by appropriately determining the environmental conditions of a target image, it is possible to execute image processing according to the determination, and improve the accuracy of image matching or estimation. It is an object of the present invention to provide a processing apparatus and a processing method capable of achieving

　上述した課題を解決し、目的を達成するために、本発明に係る処理装置は、判定対象の画像の入力を受け付ける入力部と、被写体が撮像される環境を成立させる要素を基に、判定対象の画像が、環境条件によりそれぞれ定義される複数のドメインのいずれのドメインに属するかを判定する判定部と、を有することを特徴とする。 In order to solve the above-described problems and achieve the object, a processing apparatus according to the present invention provides an input unit that receives an input of an image to be determined, and and a determination unit that determines to which of a plurality of domains each defined by environmental conditions the image belongs to.

　本発明によれば、対象となる画像の環境条件を適切に判定することで、判定に応じた画像処理を実行し、画像の照合または推定の精度の向上を図ることができる。 According to the present invention, by appropriately determining the environmental conditions of the target image, it is possible to perform image processing according to the determination and improve the accuracy of image matching or estimation.

図１は、実施の形態１の概要を説明する図である。FIG. 1 is a diagram for explaining the outline of the first embodiment. 図２は、実施の形態１の概要を説明する図である。FIG. 2 is a diagram for explaining the outline of the first embodiment. 図３は、実施の形態１の概要を説明する図である。FIG. 3 is a diagram explaining an outline of the first embodiment. 図４は、実施の形態１に係る推論装置の構成の一例を模式的に示す図である。4 is a diagram schematically showing an example of the configuration of the inference apparatus according to Embodiment 1. FIG. 図５は、実施の形態１に係るクエリ画像の特徴量の登録処理の処理手順を示すフローチャートである。FIG. 5 is a flow chart showing a processing procedure of registration processing of a feature amount of a query image according to Embodiment 1. FIG. 図６は、実施の形態１に係る推論処理の処理手順を示すフローチャートである。FIG. 6 is a flowchart of a processing procedure of inference processing according to the first embodiment. 図７は、実施の形態１の変形例に係る推論装置の他の構成の一例を模式的に示す図である。FIG. 7 is a diagram schematically showing an example of another configuration of the inference device according to the modification of Embodiment 1. FIG. 図８は、実施の形態１の変形例に係る推論処理の処理手順を示すフローチャートである。8 is a flowchart of a procedure of inference processing according to the modification of Embodiment 1. FIG. 図９は、実施の形態２に係る訓練装置の構成の一例を模式的に示す図である。FIG. 9 is a diagram schematically showing an example of a configuration of a training device according to Embodiment 2. FIG. 図１０は、実施の形態２に係る訓練用画像取得処理の処理手順を示すフローチャートである。FIG. 10 is a flowchart illustrating a processing procedure of training image acquisition processing according to the second embodiment. 図１１は、実施の形態２に係る訓練用画像取得処理の他の処理手順を示すフローチャートである。FIG. 11 is a flowchart illustrating another processing procedure of training image acquisition processing according to the second embodiment. 図１２は、実施の形態２に係る訓練処理の処理手順を示すフローチャートである。FIG. 12 is a flowchart of a training process procedure according to the second embodiment. 図１３は、実施の形態２の変形例に係る訓練装置の他の構成の一例を模式的に示す図である。13 is a diagram schematically showing an example of another configuration of the training device according to the modification of Embodiment 2. FIG. 図１４は、プログラムが実行されることにより、訓練装置及び推定装置が実現されるコンピュータの一例を示す図である。FIG. 14 is a diagram illustrating an example of a computer that implements a training device and an estimation device by executing a program.

　以下、図面を参照して、本発明の一実施形態を詳細に説明する。なお、この実施形態により本発明が限定されるものではない。また、図面の記載において、同一部分には同一の符号を付して示している。 An embodiment of the present invention will be described in detail below with reference to the drawings. It should be noted that the present invention is not limited by this embodiment. Moreover, in the description of the drawings, the same parts are denoted by the same reference numerals.

［実施の形態１］
　図１～図３は、実施の形態１の概要を説明する図である。図１では、例えば、画像の照合を例に説明する。照合対象が写る画像をクエリ画像とし、照合対象が写っているか否かが照合される画像をギャラリ画像として説明する。 [Embodiment 1]
1 to 3 are diagrams for explaining the outline of the first embodiment. In FIG. 1, for example, image collation will be described as an example. An image in which a matching target is shown will be referred to as a query image, and an image for which whether or not a matching target is shown will be referred to as a gallery image.

　実施の形態１に係る推論装置は、複数のドメインにそれぞれ対応する訓練済みのモデルを有し、ギャラリ画像のドメインに応じて、使用するモデルを切り替えて、照合を行う。ドメインは、環境条件により定義される。図１～図３の例では、昼、夜のドメインがあり、昼に対応するモデルＭ１、夜に対応するモデルＭ２がある場合を例に説明する。 The inference apparatus according to Embodiment 1 has trained models corresponding to a plurality of domains, and performs matching by switching the model to be used according to the domain of the gallery image. Domains are defined by environmental conditions. In the examples of FIGS. 1 to 3, there are domains of daytime and nighttime, and a model M1 corresponding to daytime and a model M2 corresponding to nighttime will be described as an example.

　推論装置は、クエリ画像が入力されると、クエリ画像の特徴量の登録を行う。まず、推論装置は、クエリ画像を、昼、夜のドメインの画像に変換する。そして、推論装置は、昼のドメインのクエリ画像に対して、昼のドメインに対応するモデルＭ１を用いて特徴量を抽出し（図１の矢印Ｙ１１参照）、抽出した特徴量を、昼のドメインに対応するクエリ画像の特徴量として登録する。同様に、推論装置は、夜のドメインのクエリ画像に対して、夜のドメインに対応するモデルＭ２を用いて特徴量を抽出する（図１の矢印Ｙ１２参照）。推論装置は、抽出した特徴量を、夜のドメインに対応するクエリ画像の特徴量として登録する（図１の矢印Ｙ１２－１参照）。 When the query image is input, the inference device registers the feature quantity of the query image. First, the inference device transforms the query image into a day/night domain image. Then, the inference apparatus extracts a feature amount from the daytime domain query image using the model M1 corresponding to the daytime domain (see arrow Y11 in FIG. 1), and converts the extracted feature amount to the daytime domain. is registered as the feature quantity of the query image corresponding to . Similarly, the inference device extracts a feature quantity from the query image of the night domain using the model M2 corresponding to the night domain (see arrow Y12 in FIG. 1). The inference device registers the extracted feature amount as the feature amount of the query image corresponding to the night domain (see arrow Y12-1 in FIG. 1).

　次に、推論装置は、ギャラリ画像が入力されると、ギャラリ画像に対する推論を行う。まず、推論装置は、このギャラリ画像が属するドメインを判定する。図１の例では、ギャラリ画像のドメインは、夜と判定される。そして、推論装置は、モデルＭ１，Ｍ２のうち、ギャラリ画像のドメインである夜に対応するモデルＭ２を選択し（図１の（１））、選択したモデルＭ２を用いて、ギャラリ画像の特徴量を抽出する（図１の（２））。 Next, when the gallery image is input, the inference device makes an inference for the gallery image. First, the reasoner determines the domain to which this gallery image belongs. In the example of FIG. 1, the domain of the gallery image is determined to be night. Then, the inference device selects the model M2 corresponding to the night, which is the domain of the gallery image, from among the models M1 and M2 ((1) in FIG. 1), and uses the selected model M2 to obtain the feature values of the gallery image. is extracted ((2) in FIG. 1).

　推論装置は、ギャラリ画像のドメインが夜であるため、夜のドメインに対応するクエリ画像の特徴量を参照し、ギャラリ画像の特徴量と、参照したクエリ画像の特徴量との距離を算出する。推論装置は、算出した距離と、照合用の閾値とを比較することで、ギャラリ画像に照合対象が写っているか否かを照合する（図１の（３））。なお、ドメインごとに照合用の閾値が設定されており、推論装置は、照合時、ギャラリ画像のドメインに対して設定された照合用の閾値を用いる。 Since the domain of the gallery image is night, the inference device refers to the feature amount of the query image corresponding to the night domain, and calculates the distance between the feature amount of the gallery image and the feature amount of the referenced query image. The inference device compares the calculated distance with a matching threshold value to check whether or not the matching object appears in the gallery image ((3) in FIG. 1). Note that a matching threshold is set for each domain, and the inference apparatus uses the matching threshold set for the domain of the gallery image during matching.

　具体的に、図２を参照して、クエリ画像の特徴量の登録について説明する。図２では、複数枚のクエリ画像を受け付けた場合について説明する（図２の（１））。図２の場合、推論装置は、昼のドメインの作業着の人のクエリ画像Ｑ１と、夜のドメインのスーツの人のクエリ画像Ｑ２を受け付ける。推論装置は、ドメインが昼及び夜のギャラリ画像に備えて、クエリ画像のドメインを変換する（図２の（２））。具体的には、推論装置は、クエリ画像Ｑ１を夜のドメインのクエリ画像Ｑ１２に変換する（矢印Ｙ１１参照）。そして、推論装置は、クエリ画像Ｑ２を昼のドメインのクエリ画像Ｑ２１に変換する（矢印Ｙ１２参照）。 Specifically, registration of the feature amount of the query image will be described with reference to FIG. FIG. 2 illustrates a case where a plurality of query images are received ((1) in FIG. 2). In the case of FIG. 2, the inference device receives a query image Q1 of a person in work clothes in the day domain and a query image Q2 of a person in a suit in the night domain. The reasoning apparatus transforms the domain of the query image in preparation for gallery images whose domains are day and night ((2) in FIG. 2). Specifically, the inference device converts the query image Q1 into a query image Q12 of the night domain (see arrow Y11). Then, the inference device transforms the query image Q2 into a query image Q21 in the daytime domain (see arrow Y12).

　そして、推論装置は、昼のドメインのクエリ画像Ｑ１，Ｑ２１の特徴量に対して、モデルＭ１を用いて特徴量を抽出し（図２の（３））、抽出した特徴量を、昼のドメインに対応するクエリ画像の特徴量Ｍ１－１，Ｍ２－１として登録する（図２の（４））。特徴量Ｍ１－１は、昼のドメインの作業着の人に対応し、特徴量Ｍ２－１は、昼のドメインのスーツの人に対応する。 Then, the inference apparatus extracts the feature amount using the model M1 for the feature amount of the query images Q1 and Q21 in the daytime domain ((3) in FIG. 2), and converts the extracted feature amount into the daytime domain. are registered as feature amounts M1-1 and M2-1 of the query image corresponding to ((4) in FIG. 2). The feature quantity M1-1 corresponds to a person in work clothes in the daytime domain, and the feature quantity M2-1 corresponds to a person in a suit in the daytime domain.

　推論装置は、夜のドメインのクエリ画像Ｑ１２，Ｑ２の特徴量に対して、モデルＭ２を用いて特徴量を抽出し（図２の（３））、抽出した特徴量を、夜のドメインに対応するクエリ画像の特徴量Ｍ１－２，Ｍ２－２として登録する（図２の（４））。特徴量Ｍ１－２は、夜のドメインの作業着の人に対応し、特徴量Ｍ２－２は、夜のドメインのスーツの人に対応する。 The inference device extracts feature amounts using the model M2 for the feature amounts of the query images Q12 and Q2 in the night domain ((3) in FIG. 2), and assigns the extracted feature amounts to the night domain. are registered as the feature amounts M1-2 and M2-2 of the query image ((4) in FIG. 2). The feature amount M1-2 corresponds to a person in work clothes in the night domain, and the feature amount M2-2 corresponds to a person in a suit in the night domain.

　次に、図３を参照して、ギャラリ画像に対する推論について説明する。図３では、監視カメラＣ１，Ｃ２によって、夜間の画像Ｉ１及び日中の画像Ｉ２が撮影された場合について説明する。まず、人物切り出しタスクにおいて、画像Ｉ１，Ｉ２から、Ａ～Ｄさんが写るギャラリ画像Ｇ１～Ｇ４が切り出される（図３の（Ａ））。人物切り出しタスクは、監視カメラＣ１，Ｃ２と推論装置との間に設けられた他の装置が実行してもよいし、推論装置が実行してもよい。 Next, inference for gallery images will be described with reference to FIG. FIG. 3 illustrates a case where a nighttime image I1 and a daytime image I2 are captured by the monitoring cameras C1 and C2. First, in the person clipping task, gallery images G1 to G4 in which Mr. A to D appear are clipped from images I1 and I2 ((A) in FIG. 3). The person clipping task may be executed by another device provided between the monitoring cameras C1, C2 and the inference device, or may be executed by the inference device.

　そして、推論装置は、ギャラリ画像Ｇ１～Ｇ４のドメインを判定する。推論装置は、ギャラリ画像Ｇ１，Ｇ２のドメインは夜と判定する。推論装置は、ギャラリ画像Ｇ４のドメインは昼と判定する。推論装置は、ギャラリ画像Ｇ３については、太陽が出ている時間帯であったが木陰の暗い場所での画像であるため、夜のドメインであると判定する（矢印Ｙ３１参照）。そして、推論装置は、昼及び夜のドメインに応じてギャラリ画像Ｇ１～Ｇ４を分ける（図３の（２））。 Then, the inference device determines the domain of the gallery images G1 to G4. The inference device determines that the domain of gallery images G1 and G2 is night. The inference device determines that the domain of gallery image G4 is daytime. The inference device determines that the gallery image G3 is in the night domain because it was in the time when the sun was out but in a dark place under the shade of trees (see arrow Y31). Then, the inference device divides the gallery images G1 to G4 according to the day and night domains ((2) in FIG. 3).

　推論装置は、ギャラリ画像Ｇ１～Ｇ３については、夜のドメインに対応するモデルＭ２を選択し、選択したモデルＭ２を用いて、ギャラリ画像の特徴量を抽出する（図３の（３））。推論装置は、ギャラリ画像Ｇ４については、昼のドメインに対応するモデルＭ１を選択し、選択したモデルＭ２を用いて、ギャラリ画像の特徴量を抽出する（図３の（３））。 The inference device selects the model M2 corresponding to the night domain for the gallery images G1 to G3, and uses the selected model M2 to extract the feature values of the gallery images ((3) in FIG. 3). For the gallery image G4, the inference device selects the model M1 corresponding to the daytime domain, and uses the selected model M2 to extract the feature amount of the gallery image ((3) in FIG. 3).

　推論装置は、ギャラリ画像Ｇ１～Ｇ３の特徴量と、予め登録された夜のドメインのクエリ画像の特徴量とを比較して、照合を行う。推論装置は、ドメインが夜であるＡ～Ｃさんの各特徴量と、夜のドメインに対応するクエリ画像の作業着の人及びスーツの人の特徴量とをそれぞれ比較して、照合を行う（図３の（４））。 The inference device compares the feature amounts of the gallery images G1 to G3 with the feature amounts of the pre-registered query image of the domain of night, and performs matching. The inference device compares each feature amount of Mr. A to C whose domain is night with the feature amount of the person in work clothes and the person in suit of the query image corresponding to the night domain, respectively, and performs matching ( (4) in FIG. 3).

　また、推論装置はギャラリ画像Ｇ４の特徴量と、予め登録された昼のドメインのクエリ画像の特徴量とを比較して、照合を行う。推論装置は、ドメインが昼であるＤさんの特徴量と、昼のドメインに対応するクエリ画像の作業着の人及びスーツの人の特徴量とをそれぞれ比較して、照合を行う（図３の（４））。 In addition, the inference device compares the feature amount of the gallery image G4 with the feature amount of the pre-registered query image of the daytime domain for matching. The inference device compares the feature amount of Mr. D whose domain is daytime with the feature amounts of the person in work clothes and the person in suit in the query image corresponding to the daytime domain, respectively, and performs matching (see FIG. 3). (4)).

　このように、実施の形態１に係る推論装置は、クエリ画像のドメインを、訓練済みのモデルに対応する全てのドメインの画像にそれぞれ変換し、各ドメインに対応するモデルを用いて各ドメインのクエリ画像から特徴量を抽出し、ドメインに対応付けて各クエリ画像の特徴量登録しておく。 As described above, the inference apparatus according to Embodiment 1 transforms the domain of the query image into images of all domains corresponding to trained models, and uses the model corresponding to each domain to query each domain. A feature amount is extracted from an image, and the feature amount of each query image is registered in association with a domain.

　これによって、推論装置は、照合時に、ギャラリ画像のドメインと同じドメインのクエリ画像の特徴量と、ギャラリ画像の特徴量とを比較することができる。このため、推論装置によれば、クエリ画像とギャラリ画像とのドメインが異なることによる精度低下を低減することができる。 This allows the inference device to compare the feature amount of the query image in the same domain as the gallery image with the feature amount of the gallery image at the time of matching. Therefore, according to the inference device, it is possible to reduce accuracy degradation due to the difference in domain between the query image and the gallery image.

　そして、推論装置は、ギャラリ画像のドメインを判定して、そのドメインに対応するモデルを使用するため、判定に応じて適切な特徴量の抽出処理を実行することができ、画像の照合の精度向上を図ることができる。 Since the inference device determines the domain of the gallery image and uses the model corresponding to that domain, it is possible to perform appropriate feature extraction processing according to the determination, improving the accuracy of image matching. can be achieved.

［推論装置］
　次に、実施の形態１に係る推論装置について説明する。図４は、実施の形態１に係る推論装置の構成の一例を模式的に示す図である。図４に示すように、推論装置１０は、入出力部１１、記憶部１２及び制御部１３を有する。 [Inference device]
Next, an inference apparatus according to Embodiment 1 will be described. 4 is a diagram schematically showing an example of the configuration of the inference apparatus according to Embodiment 1. FIG. As shown in FIG. 4 , the inference device 10 has an input/output unit 11 , a storage unit 12 and a control unit 13 .

　入出力部１１は、情報の入力を受け付け、また、情報の出力を行う。入出力部１１は、例えば、ネットワーク等を介して接続された他の装置との間で、各種情報を送受信する通信インタフェースである。入出力部１１は、ＬＡＮ（Local　Area　Network）やインターネットなどの電気通信回線を介した他の装置と制御部１３（後述）との間の通信を行う。また、入出力部１１は、ユーザによる入力操作に対応して、推論装置１０に対する各種指示情報の入力を受け付ける、マウスやキーボード等のデバイス装置である。入出力部１１は、例えば、液晶ディスプレイなどによって実現され、推論装置１０によって表示制御された画面が表示出力される。 The input/output unit 11 receives input of information and outputs information. The input/output unit 11 is, for example, a communication interface that transmits and receives various information to and from other devices connected via a network or the like. The input/output unit 11 performs communication between another device and the control unit 13 (described later) via an electric communication line such as a LAN (Local Area Network) or the Internet. The input/output unit 11 is a device such as a mouse and a keyboard that receives input of various instruction information to the inference apparatus 10 in response to user's input operation. The input/output unit 11 is implemented by, for example, a liquid crystal display, and displays and outputs a screen whose display is controlled by the inference device 10 .

　記憶部１２は、ＲＡＭ（Random　Access　Memory）、フラッシュメモリ（Flash　Memory）等の半導体メモリ素子によって実現され、推論装置１０を動作させる処理プログラムや、処理プログラムの実行中に使用されるデータなどが記憶される。記憶部１２は、クエリ特徴量データ１２１、モデル群１２２、及び、推論部１３５（後述）による推論の結果を示す推論結果１２３を有する。 The storage unit 12 is realized by semiconductor memory devices such as RAM (Random Access Memory) and flash memory, and stores processing programs that operate the inference device 10 and data used during execution of the processing programs. be done. The storage unit 12 has query feature amount data 121, a model group 122, and an inference result 123 indicating the result of inference by an inference unit 135 (described later).

　クエリ特徴量データ１２１は、各ドメインにそれぞれ変換されたクエリ画像の特徴量を有する。例えば、クエリ特徴量データ１２１は、第１ドメインに変換されたクエリ画像の特徴量である第１ドメイン特徴量や、第２ドメインに変換されたクエリ画像の特徴量である第２ドメイン特徴量を有する。 The query feature amount data 121 has the feature amount of the query image transformed into each domain. For example, the query feature amount data 121 includes a first domain feature amount that is the feature amount of the query image transformed into the first domain, and a second domain feature amount that is the feature amount of the query image transformed into the second domain. have.

　モデル群１２２は、推論部１３５（後述）が使用する複数のモデルを有する。各モデルは、特徴抽出モデルであり、いずれも訓練済みである。各モデルは、例えば、ニューラルネットワーク（ＮＮ）によって構成される。モデルは、各ドメインにそれぞれ対応して設けられる。例えば、第１ドメイン用モデルは、第１ドメインに対応したモデルであり、第２ドメイン用モデルは、第２ドメインに対応したモデルである。 The model group 122 has a plurality of models used by the inference unit 135 (described later). Each model is a feature extraction model and has been trained. Each model is configured by, for example, a neural network (NN). A model is provided corresponding to each domain. For example, the first domain model is a model corresponding to the first domain, and the second domain model is a model corresponding to the second domain.

　制御部１３は、推論装置１０全体を制御する。制御部１３は、例えば、ＣＰＵ（Central　Processing　Unit）等の電子回路や、ＡＳＩＣ（Application　Specific　Integrated　Circuit）、ＦＰＧＡ（Field　Programmable　Gate　Array）等の集積回路である。また、制御部１３は、各種の処理手順を規定したプログラムや制御データを格納するための内部メモリを有し、内部メモリを用いて各処理を実行する。また、制御部１３は、各種のプログラムが動作することにより各種の処理部として機能する。制御部１３は、画像入力部１３１（入力部）、ドメイン判定部１３２（判定部）、ドメイン変換部１３３（変換部）、モデル選択部１３４（選択部）、推論部１３５及び第１の登録部１３６を有する。 The control unit 13 controls the inference device 10 as a whole. The control unit 13 is, for example, an electronic circuit such as a CPU (Central Processing Unit) or an integrated circuit such as an ASIC (Application Specific Integrated Circuit) or an FPGA (Field Programmable Gate Array). The control unit 13 also has an internal memory for storing programs defining various processing procedures and control data, and executes each processing using the internal memory. Further, the control unit 13 functions as various processing units by running various programs. The control unit 13 includes an image input unit 131 (input unit), a domain determination unit 132 (determination unit), a domain conversion unit 133 (conversion unit), a model selection unit 134 (selection unit), an inference unit 135, and a first registration unit. 136.

　画像入力部１３１は、判定対象及び変換対象となるクエリ画像の入力を受け付ける。画像入力部１３１は、判定対象となるギャラリ画像の入力を受け付ける。ギャラリ画像は、推論部１３５の推論対象である推論用画像である。 The image input unit 131 receives input of query images to be determined and converted. The image input unit 131 receives an input of a gallery image to be determined. The gallery image is an inference image that is an inference target of the inference unit 135 .

　ドメイン判定部１３２は、被写体が撮像される環境を成立させる要素を基に、判定対象の画像が、複数のドメインのいずれのドメインに属するかを判定する。被写体が撮像される環境を成立させる要素は、例えば、判定対象の画像が撮像された時刻、判定対象の画像の輝度、判定対象の画像の色彩及び彩度の少なくとも一つである。ドメイン判定部１３２は、クエリ画像及びギャラリ画像がいずれのドメインに属するかを判定する。 The domain determination unit 132 determines to which of a plurality of domains the image to be determined belongs, based on the elements that establish the environment in which the subject is imaged. The element that establishes the environment in which the subject is imaged is, for example, at least one of the time when the image to be determined was captured, the brightness of the image to be determined, and the color and saturation of the image to be determined. The domain determination unit 132 determines to which domain the query image and the gallery image belong.

　ドメイン変換部１３３は、変換対象の画像が、複数のドメインのいずれかのドメインに属する場合、被写体が撮像される環境を成立させる要素を基に、変換対象の画像を、変換対象の画像が属するドメインと異なる他のドメインの画像に変換する。被写体が撮像される環境を成立させる要素は、例えば、判定対象の画像の輝度、判定対象の画像の色彩及び彩度の少なくとも一つである。 When the image to be transformed belongs to one of a plurality of domains, the domain transforming unit 133 determines the image to be transformed to belong to one of the domains based on the elements that establish the environment in which the subject is captured. Convert the image to another domain different from the domain. The element that establishes the environment in which the subject is imaged is, for example, at least one of the brightness of the image to be determined, and the color and saturation of the image to be determined.

　ドメイン変換部１３３は、クエリ画像の特徴量の登録段階では、クエリ画像を、クエリ画像が属するドメインと異なる他のドメインの画像に変換する。クエリ画像が属するドメインと異なる他のドメインが複数ある場合には、ドメイン変換部１３３は、他のドメイン全てについて、クエリ画像の変換を行う。例えば、クエリ画像が属するドメインが第１ドメインであって、第１ドメイン以外に、第２ドメイン及び第３ドメインが定義されている場合、ドメイン変換部１３３は、クエリ画像を第２ドメインの画像に変換し、クエリ画像を第３ドメインの画像に変換する。 At the stage of registering the feature amount of the query image, the domain conversion unit 133 converts the query image into an image of another domain different from the domain to which the query image belongs. If there are multiple domains different from the domain to which the query image belongs, the domain conversion unit 133 converts the query image for all other domains. For example, when the domain to which the query image belongs is the first domain, and the second domain and the third domain are defined in addition to the first domain, the domain conversion unit 133 converts the query image into the image of the second domain. Transform the query image into a third domain image.

　モデル選択部１３４は、モデル群１２２のモデルのうち、ドメイン判定部１３２によって判定された判定対象の画像のドメインに対応するモデルを選択する。 The model selection unit 134 selects a model corresponding to the domain of the determination target image determined by the domain determination unit 132 from among the models in the model group 122 .

　モデル選択部１３４は、クエリ画像の特徴量の登録段階では、モデル群１２２のモデルのうち、クエリ画像、及び、ドメイン変換部１３３によって変換されたクエリ画像の各ドメインに対応するモデルをそれぞれ選択する。例えば、クエリ画像が属するドメインが第１ドメインであるとドメイン判定部１３２によって判定された場合、モデル選択部１３４は、クエリ画像に対して第１ドメイン用モデルを選択する。また、モデル選択部１３４は、第２ドメインに変換されたクエリ画像に対して、第２ドメイン用モデルを選択する。 At the stage of registering the feature amount of the query image, the model selection unit 134 selects models corresponding to each domain of the query image and the query image converted by the domain conversion unit 133 from among the models of the model group 122. . For example, when the domain determination unit 132 determines that the domain to which the query image belongs is the first domain, the model selection unit 134 selects the model for the first domain for the query image. Also, the model selection unit 134 selects a model for the second domain for the query image converted to the second domain.

　モデル選択部１３４は、ギャラリ画像に対する推論段階では、ドメイン判定部１３２の判定結果を基に、モデル群１２２のモデルのうち、ギャラリ画像が属するドメインに対応するモデルを選択する。例えば、ギャラリ画像が属するドメインが第２ドメインであるとドメイン判定部１３２によって判定された場合、モデル選択部１３４は、ギャラリ画像に対して第２ドメイン用モデルを選択する。 At the inference stage for the gallery image, the model selection unit 134 selects a model corresponding to the domain to which the gallery image belongs from among the models in the model group 122 based on the determination result of the domain determination unit 132 . For example, when the domain determination unit 132 determines that the domain to which the gallery image belongs is the second domain, the model selection unit 134 selects the model for the second domain for the gallery image.

　推論部１３５は、モデル選択部１３４によって選択されたモデルを用いて推論を行う。推論部１３５は、特徴量抽出部１３５１及び照合部１３５２を有する。 The inference unit 135 makes inferences using the model selected by the model selection unit 134 . The inference unit 135 has a feature extraction unit 1351 and a matching unit 1352 .

　特徴量抽出部１３５１は、モデル選択部１３４によって選択されたモデルを用いて、処理対象の画像の特徴量を抽出する。特徴量抽出部１３５１は、一般に知られるＮＮのforward　propagation等を行えばよい。 The feature amount extraction unit 1351 uses the model selected by the model selection unit 134 to extract the feature amount of the image to be processed. The feature quantity extraction unit 1351 may perform generally known NN forward propagation and the like.

　クエリ画像の特徴量の登録段階では、モデル選択部１３４によって、ドメインごとにそれぞれ選択されたモデルを用いて、クエリ画像と、ドメイン変換部１３３によってドメインが変換されたクエリ画像との特徴量を抽出する。例えば、クエリ画像に対し第１ドメイン用モデル１２３１がモデル選択部１３４によって選択された場合、特徴量抽出部１３５１は、第１ドメイン用モデル１２３１を用いてクエリ画像の特徴量を抽出する。また、第２ドメインに変換されたクエリ画像に対し第２ドメイン用モデル１２３２がモデル選択部１３４によって選択された場合、特徴量抽出部１３５１は、第２ドメイン用モデル１２３２を用いて、第２ドメインに変換されたクエリ画像の特徴量を抽出する。 In the step of registering the feature amount of the query image, the model selection unit 134 uses the model selected for each domain to extract the feature amount of the query image and the query image whose domain has been converted by the domain conversion unit 133. do. For example, when the model selection unit 134 selects the first domain model 1231 for the query image, the feature amount extraction unit 1351 uses the first domain model 1231 to extract the feature amount of the query image. Further, when the second domain model 1232 is selected by the model selection unit 134 for the query image transformed into the second domain, the feature amount extraction unit 1351 uses the second domain model 1232 to convert the second domain Extract the feature quantity of the query image converted to .

　特徴量抽出部１３５１は、ギャラリ画像に対する推論段階では、モデル選択部１３４によって選択されたモデルを用いて、ギャラリ画像の特徴量を抽出する。また、特徴量抽出部１３５１は、ギャラリ画像に対し第２ドメイン用モデル１２３２がモデル選択部１３４によって選択された場合、第２ドメイン用モデル１２３２を用いてギャラリ画像の特徴量を抽出する。 The feature amount extraction unit 1351 uses the model selected by the model selection unit 134 to extract the feature amount of the gallery image in the inference stage for the gallery image. In addition, when the model selection unit 134 selects the second domain model 1232 for the gallery image, the feature amount extraction unit 1351 uses the second domain model 1232 to extract the feature amount of the gallery image.

　照合部１３５２は、ギャラリ画像の特徴量とクエリ画像の特徴量との距離を算出する。この際、照合部１３５２は、クエリ画像の特徴量として、クエリ特徴量データ１２１のうちの、ギャラリ画像が属するドメインに対応するクエリ画像の特徴量を参照する。例えば、ギャラリ画像が属するドメインが第２ドメインであるとドメイン判定部１３２によって判定された場合には、照合部１３５２は、クエリ特徴量データ１２１のうち第２ドメイン特徴量を参照する。 The matching unit 1352 calculates the distance between the feature amount of the gallery image and the feature amount of the query image. At this time, the matching unit 1352 refers to the feature amount of the query image corresponding to the domain to which the gallery image belongs in the query feature amount data 121 as the feature amount of the query image. For example, when the domain determination unit 132 determines that the domain to which the gallery image belongs is the second domain, the matching unit 1352 refers to the second domain feature amount in the query feature amount data 121 .

　そして、照合部１３５２は、算出した距離と、照合用の閾値とを比較し、ギャラリ画像の被写体が照合対象であるか否かを照合する。照合用の閾値は、ドメインごとに設定されている。照合部１３５２は、照合用の閾値として、ギャラリ画像が属するドメインに対して設定された閾値を用いる。照合部１３５２は、算出した距離が、照合用の閾値以下である場合には、ギャラリ画像の被写体は、クエリ画像の照合対象であると判定する。一方、照合部１３５２は、算出した距離が、照合用の閾値より大きい場合には、ギャラリ画像の被写体は、クエリ画像の照合対象でないと判定する。 Then, the matching unit 1352 compares the calculated distance with a matching threshold, and checks whether or not the subject in the gallery image is a matching target. A matching threshold is set for each domain. The matching unit 1352 uses the threshold set for the domain to which the gallery image belongs as the matching threshold. If the calculated distance is equal to or less than the threshold for matching, the matching unit 1352 determines that the subject in the gallery image is a matching target for the query image. On the other hand, if the calculated distance is greater than the threshold for matching, the matching unit 1352 determines that the subject in the gallery image is not a target for matching with the query image.

　第１の登録部１３６は、クエリ画像の各特徴量を、対応するドメインのクエリ画像の特徴量として登録する。第１の登録部１３６は、第１のドメインのクエリ画像の特徴量を、第１ドメイン特徴量として、クエリ特徴量データ１２１に登録する。また、第１の登録部１３６は、第２のドメインのクエリ画像の特徴量を、第２ドメイン特徴量として、クエリ特徴量データ１２１に登録する。 The first registration unit 136 registers each feature amount of the query image as the feature amount of the query image of the corresponding domain. The first registration unit 136 registers the feature amount of the query image of the first domain in the query feature amount data 121 as the first domain feature amount. The first registration unit 136 also registers the feature amount of the query image of the second domain in the query feature amount data 121 as the second domain feature amount.

［ドメイン判定部］
　次に、ドメイン判定部１３２の判定処理について説明する。朝、昼、夜のように時間帯に関するドメインの場合、ドメイン判定部１３２は、判定対象の画像を撮影した時刻と、予め定めておいた朝、昼、夜の各時間帯とを比較することによって、判定対象の画像のドメインを判定する。 [Domain judgment part]
Next, determination processing of the domain determination unit 132 will be described. In the case of domains related to time zones such as morning, noon, and night, the domain determination unit 132 compares the time when the image to be determined was captured with the predetermined time zones of morning, noon, and night. determines the domain of the image to be determined.

　例えば、６時から１１時までの間は朝、１１時から１８時までの間は昼、１８時から翌６時は夜であると、予め各ドメインの時間帯を設定しておく。ドメイン判定部１３２は、判定対象の画像の撮影時刻に関するメタ情報を確認する。そして、ドメイン判定部１３２は、判定対象の画像の撮影時刻に応じて、判定対象の画像のドメインを判定する。 For example, the time zone for each domain is set in advance so that 6:00 to 11:00 is morning, 11:00 to 18:00 is noon, and 18:00 to 6:00 is night. The domain determination unit 132 confirms meta information regarding the shooting time of the image to be determined. Then, the domain determination unit 132 determines the domain of the determination target image according to the shooting time of the determination target image.

　また、朝、昼、夜のように明るさに関するドメインの場合、ドメイン判定部１３２は、判定対象の画像の輝度の全ピクセルに関する平均値と、予め設定された閾値とを比較することによって、判定対象の画像のドメインを判定してもよい。 In the case of domains related to brightness such as morning, noon, and night, the domain determination unit 132 performs determination by comparing the average brightness of all pixels of the image to be determined with a preset threshold value. The domain of the image of interest may be determined.

　具体的には、判定対象の画像の座標（ｉ，ｊ）のピクセルのＲＧＢ値を、（ｒ_ｉｊ，ｇ_ｉｊ，ｂ_ｉｊ）とする。この場合、このとき、座標（ｉ，ｊ）のピクセルの輝度は、式（１）となる。なお、式（１）におけるα、β、γは、予め設定されたパラメータである。 Specifically, let (r _ij , g _ij , b _ij ) be the RGB values of the pixel at the coordinates (i, j) of the image to be determined. In this case, the brightness of the pixel at coordinates (i, j) is given by equation (1). Note that α, β, and γ in Equation (1) are preset parameters.

　ドメイン判定部１３２は、判定対象の画像について、全ピクセルの輝度の平均値を計算し、式（２）と書く。 The domain determination unit 132 calculates the average luminance value of all pixels in the image to be determined, and writes it as Equation (2).

　ドメイン判定部１３２は、式（２）で示す輝度の平均値Ｌと、予め設定された閾値とを比較し、判定対象の画像が、朝、昼、夜のいずれのドメインであるかを判定する。ドメイン判定部１３２は、例えば、平均値Ｌが第１の閾値未満の場合には、判定対象の画像のドメインが夜であると判定する。ドメイン判定部１３２は、例えば、輝度の平均値Ｌが、第１の閾値以上であって第２の閾値（＞第１の閾値）未満の場合には、判定対象の画像のドメインが朝であると判定する。ドメイン判定部１３２は、例えば、平均値Ｌが、第２の閾値以上である場合には、判定対象の画像のドメインが昼であると判定する。 The domain determination unit 132 compares the luminance average value L shown in Equation (2) with a preset threshold value, and determines whether the image to be determined belongs to the morning, daytime, or night domain. . For example, when the average value L is less than the first threshold value, the domain determination unit 132 determines that the domain of the determination target image is night. For example, when the average luminance value L is equal to or greater than the first threshold value and less than the second threshold value (>the first threshold value), the domain determination unit 132 determines that the domain of the image to be determined is morning. I judge. For example, when the average value L is equal to or greater than the second threshold, the domain determining unit 132 determines that the domain of the determination target image is daytime.

　また、人物画像の背景部の色彩及び彩度（色味）が、その時々によって赤、青、黄のように変化する場合を例に説明する。例えば、公園等の屋外環境に適用する場合であって、その環境に設置された大型ディスプレイ等が鮮やかな映像を流しているために、その環境にいる人物の背景部が、赤、青、黄のように色味が変化するように画像に写る場合である。 Also, a case where the color and saturation (color) of the background portion of the person's image changes from time to time such as red, blue, and yellow will be described as an example. For example, when applied to an outdoor environment such as a park, a large display or the like installed in that environment displays vivid images, so the background of a person in that environment may appear red, blue, or yellow. This is the case where the image is captured in such a way that the color changes.

　この場合、ドメインとして、例えば、赤ドメイン、青ドメイン、黄ドメインが設定される。ドメイン判定部１３２は、判定対象の画像のＲ，Ｇ，Ｂの輝度の全ピクセルに関する平均値を算出し、どの平均値が最大となるかによって、判定対象の画像のドメインを判定する。 In this case, for example, red domain, blue domain, and yellow domain are set as domains. The domain determination unit 132 calculates the average value of all pixels of the R, G, and B luminances of the image to be determined, and determines the domain of the image to be determined according to which average value is the maximum.

　具体的には、判定対象の画像の座標（ｉ，ｊ）のピクセルのＲ，Ｇ，Ｂの輝度を、（ｒ_ｉｊ，ｇ_ｉｊ，ｂ_ｉｊ）とする。 Specifically, let (r _ij , g _ij , b _ij ) be the luminances of R, G, and B of the pixel at coordinates (i, j) of the determination target image.

　そして、Ｒ，Ｇ，Ｂの輝度に関して、全ピクセルの平均値を式（３）～式（５）と書く。式（３）は、Ｒの輝度の全ピクセルの平均値を示す。式（４）は、Ｇの輝度の全ピクセルの平均値を示す。式（５）は、Ｂの輝度の全ピクセルの平均値を示す。 Then, regarding the luminance of R, G, and B, the average values of all pixels are written as equations (3) to (5). Equation (3) represents the average value of all pixels of R luminance. Equation (4) represents the average value of all pixels of G intensity. Equation (5) represents the average value of all pixels of B luminance.

　ドメイン判定部１３２は、判定対象の画像について、式（３）～式（５）のＲ，Ｇ，Ｂの輝度の全ピクセルに関する平均値を算出し、式（６）を用いて、ドメインを判定する。 The domain determination unit 132 calculates the average values of all pixels of the R, G, and B luminances of the formulas (3) to (5) for the image to be determined, and uses the formula (6) to determine the domain. do.

　ドメイン判定部１３２は、式（６）が、Ｒである場合には、判定対象の画像のドメインが赤であると判定する。ドメイン判定部１３２は、式（６）がＢである場合には、判定対象の画像のドメインが青であると判定する。ドメイン判定部１３２は、式（６）が（Ｒ＋Ｇ）／２である場合には、判定対象の画像のドメインが黄であると判定する。なお、黄色は、赤と緑との和であるとする。 The domain determination unit 132 determines that the domain of the image to be determined is red when Expression (6) is R. The domain determination unit 132 determines that the domain of the image to be determined is blue when Expression (6) is B. When the expression (6) is (R+G)/2, the domain determination unit 132 determines that the domain of the determination target image is yellow. Note that yellow is the sum of red and green.

　また、ドメイン判定部１３２は、判定対象の画像が撮像された時刻、判定対象の画像の輝度、判定対象の画像の色彩及び彩度の二以上を組み合わせて、判定対象の画像が属するドメインを判定してもよい。 Further, the domain determining unit 132 determines the domain to which the determination target image belongs by combining two or more of the time when the determination target image was captured, the brightness of the determination target image, and the color and saturation of the determination target image. You may

　例えば、昼間は大型ディスプレイの映像による色味の影響を受けないが、夜間では色味の影響を受けやすい場合を例に説明する。この場合、ドメイン判定部１３２は、時刻によるドメイン判定と色彩及び彩度によるドメイン判定とを組み合わせて、判定する。 For example, let's take a case where the image on a large display during the daytime is not affected by the color tone, but at night it is easily affected by the color tone. In this case, the domain determination unit 132 performs determination by combining domain determination based on time and domain determination based on color and saturation.

　例えば、判定対象の画像を撮影した時刻が６時から１８時までの間である場合、ドメイン判定部１３２は、判定対象の画像のドメインが「日中」であると判定する。また、判定対象の画像を撮影した時刻が１８時から翌６時であり、かつ、式（７）である場合、ドメイン判定部１３２は、判定対象の画像のドメインが「夜間・赤」であると判定する。 For example, if the image to be determined was captured between 6:00 and 18:00, the domain determining unit 132 determines that the domain of the image to be determined is "daytime". Further, when the time at which the image to be determined was captured is from 6:00 p.m. I judge.

　また、判定対象の画像を撮影した時刻が１８時から翌６時であり、かつ、式（８）である場合、ドメイン判定部１３２は、判定対象の画像のドメインが「夜間・青」であると判定する。 In addition, when the time at which the determination target image was captured is from 18:00 to 6:00 the next day and Expression (8) holds, the domain determining unit 132 determines that the domain of the determination target image is "night/blue". I judge.

　また、判定対象の画像を撮影した時刻が１８時から翌６時であり、かつ、式（９）である場合、ドメイン判定部１３２は、判定対象の画像のドメインが「夜間・黄」であると判定する。 In addition, when the time at which the determination target image was captured is from 6:00 p.m. I judge.

［ドメイン変換部］
　次に、ドメイン変換部１３３について説明する。ドメイン変換部１３３は、クエリ画像を、クエリ画像が属するドメインと異なる他の全てのドメインの画像に変換することで、ギャラリ画像のドメインと同じドメインのクエリ画像の特徴量を登録する。 [Domain converter]
Next, the domain converter 133 will be explained. The domain transforming unit 133 transforms the query image into images of all domains different from the domain to which the query image belongs, thereby registering the feature amount of the query image of the same domain as that of the gallery image.

　ドメイン変換部１３３は、クエリ画像の輝度を基にクエリ画像を変換する。例えば、朝、昼、夜のように明るさに関するドメインの場合、ドメイン変換部１３３は、変換対象のクエリ画像の各ピクセルのＲＧＢ値に一様に係数を乗ずることによって、乗算後のＲＧＢの輝度が、変換対象のドメインの平均的な輝度と一致するように変換する。 The domain conversion unit 133 converts the query image based on the brightness of the query image. For example, in the case of a domain related to brightness such as morning, noon, and night, the domain conversion unit 133 uniformly multiplies the RGB values of each pixel of the query image to be converted by a coefficient to obtain the RGB brightness after multiplication. is transformed to match the average luminance of the domain being transformed.

　変換対象の画像の座標（ｉ，ｊ）の変換前のピクセルのＲＧＢ値を、（ｒ_ｉｊ，ｇ_ｉｊ，ｂ_ｉｊ）とする。変換対象の画像の座標（ｉ，ｊ）の変換後のピクセルのＲＧＢ値を、（ｒ_ｉｊ´，ｇ_ｉｊ´，ｂ_ｉｊ´）とする。そして、ドメイン変換部１３３は、例えば、Ｒについては、式（１０）を用いて、変換する。Ｇ及びＢについては、式（１０）のｒ_ｉｊをｇ_ｉｊまたはｂ_ｉｊとすることで変換が可能である。 Let (r _ij , g _ij , b _ij ) be the RGB values of the pre-conversion pixel at the coordinates (i, j) of the image to be converted. Let (r _ij ', g _ij ', b _ij ') be the RGB values of the converted pixel at the coordinates (i, j) of the image to be converted. Then, the domain transforming unit 133 transforms, for example, R using equation (10). G and B can be transformed by replacing r _ij in equation (10) with g _ij or b _ij .

　式（１０）におけるＬ´は、変換先のドメインの平均的な輝度であり、予め設定される。式（１０）における判定対象の画像の座標（ｉ，ｊ）のピクセルのＲＧＢ値を、（ｒ_ｉｊ，ｇ_ｉｊ，ｂ_ｉｊ）とする。そして、ｌｉｊは、座標（ｉ，ｊ）のピクセルの輝度であり、式（１１）となる。なお、式（１１）におけるα、β、γは、予め設定されたパラメータである。ｉ及びｋは、いずれも画像ピクセルにおける横方向の座標を表す。ｉとｋとを区別して期待値（平均）を取得する演算を行う。ｊとｌは、いずれも画像ピクセルにおける縦方向の座標を表す。ｊとｌとを区別して期待値（平均）を取得する演算を行う。 L' in equation (10) is the average luminance of the domain to be converted, and is set in advance. Let (r _ij , g _ij , b _ij ) be the RGB values of the pixel at the coordinates (i, j) of the image to be determined in Equation (10). and lij is the brightness of the pixel at coordinates (i, j), which is given by equation (11). Note that α, β, and γ in Equation (11) are preset parameters. Both i and k represent horizontal coordinates in image pixels. An operation is performed to obtain an expected value (average) by distinguishing between i and k. Both j and l represent vertical coordinates in image pixels. An operation is performed to obtain an expected value (average) by distinguishing between j and l.

　ドメイン変換部１３３は、クエリ画像の色彩及び彩度（色味）を基にクエリ画像を変換する。例えば、大型ディスプレイ等の影響による、赤、青、黄のような色味変に関するドメインの場合、目の前のクエリの各ピクセルのＲＧＢ値に一様に係数を乗ずることによって、乗算後のＲＧＢ値が、変換対象のドメインの平均的なＲＧＢ値と一致するように変換する。 The domain conversion unit 133 converts the query image based on the color and saturation (color) of the query image. For example, in the case of domains related to color shifts such as red, blue, and yellow due to the influence of large displays, etc., by uniformly multiplying the RGB values of each pixel of the query in front of us by a factor, the RGB values after multiplication Transform the values so that they match the average RGB value of the domain being transformed.

　変換対象の画像の座標（ｉ，ｊ）の変換前のピクセルのＲＧＢ値を、（ｒ_ｉｊ，ｇ_ｉｊ，ｂ_ｉｊ）とする。変換対象の画像の座標（ｉ，ｊ）の変換後のピクセルのＲＧＢ値を、（ｒ_ｉｊ´，ｇ_ｉｊ´，ｂ_ｉｊ´）とする。そして、ドメイン変換部１３３は、例えば、Ｒについては、式（１２）を用いて、変換する。Ｒ´は、変換先のドメインの平均的なピクセル値であり、予め設定される。Ｇ及びＢについては、式（１２）のｒ_ｉｊをｇ_ｉｊまたはｂ_ｉｊとし、Ｒ´をＧ´まはたＢ´とすることで変換が可能である。 Let (r _ij , g _ij , b _ij ) be the RGB values of the pre-conversion pixel at the coordinates (i, j) of the image to be converted. Let (r _ij ', g _ij ', b _ij ') be the RGB values of the converted pixel at the coordinates (i, j) of the image to be converted. Then, the domain transforming unit 133 transforms, for example, R using equation (12). R' is the average pixel value of the destination domain and is preset. G and B can be converted by setting r _ij in equation (12) to g _ij or b _ij and R' to G' or B'.

　また、ドメイン変換部１３３は、変換器を用いて、変換対象の画像を他のドメインの画像に変換してもよい。変換器は、画像を該画像が属するドメインと異なる他のドメインの画像に変換する変換器であって、各ドメインに属する複数の画像の変換を訓練した変換器である。例えば、ドメイン変換部１３３は、変換器として、ＧＡＮ（Generative　Adversarial　Networks）を用いる。この場合、推論装置１０は、予め、各ドメインの画像を十分な枚数用意し、ＧＡＮに学習させる。そして、ドメイン変換部１３３は、クエリ画像を、ＧＡＮに入力することで、各ドメインの画像に変換する。 Also, the domain conversion unit 133 may use a converter to convert the image to be converted into an image of another domain. A transformer is a transformer that transforms an image into an image of another domain different from the domain to which the image belongs, and is a transformer that has been trained to transform a plurality of images belonging to each domain. For example, the domain conversion unit 133 uses GAN (Generative Adversarial Networks) as a converter. In this case, the inference device 10 prepares a sufficient number of images of each domain in advance and makes the GAN learn. Then, the domain conversion unit 133 converts the query image into an image of each domain by inputting the query image into the GAN.

　なお、実施の形態１に係る推論装置では、ギャラリ画像とのドメインを合わせるためにクエリ画像のドメインを各ドメインの画像となるように変換する。これは、クエリ画像であれば、リアルタイム処理を実現したいときに変換する処理量の増加を許容しやすいためである。 Note that in the inference device according to Embodiment 1, the domain of the query image is transformed into an image of each domain in order to match the domain with the gallery image. This is because a query image can easily allow an increase in the amount of conversion processing when it is desired to realize real-time processing.

　例えば、ギャラリ画像のドメインをクエリ画像のドメインに合わせて変換する場合を考える。フレームレートが5FPSである場合には１秒間に５フレームあり、さらに、ギャラリ画像は、フレーム毎に何枚かあることになる。そのすべてのギャラリ画像に対してドメイン変換を施すことが必要となり、リアルタイム処理の実現が困難となる。これに対して、クエリ登録の頻度は、多くとも数秒に１度くらいを想定して、クエリ画像は、一度登録すれば足りるため、ドメイン変換の頻度を大幅に減らせる。 For example, consider the case of converting the domain of the gallery image to match the domain of the query image. If the frame rate is 5 FPS, there are 5 frames per second, and there are several gallery images per frame. It becomes necessary to apply domain conversion to all the gallery images, which makes it difficult to realize real-time processing. On the other hand, the frequency of query registration is assumed to be about once every several seconds at most, and the query image only needs to be registered once, so the frequency of domain conversion can be greatly reduced.

　また、本実施の形態１では、ギャラリ画像に対してはドメイン変換を行っていない。クエリ画像及びギャラリ画像の双方をドメイン変換すると、一方のみをドメイン変換した場合と比して、照合精度が劣化することが見込まれる。このため、本実施の形態１では、クエリ画像に対してのみ全ドメインに対するドメイン変換を行い、特徴抽出に使用するモデルを切り替えることで、照合精度の保持を図っている。 Also, in Embodiment 1, the gallery image is not subjected to domain conversion. If both the query image and the gallery image are domain-converted, it is expected that matching accuracy will be degraded compared to the case where only one of them is domain-converted. Therefore, in the first embodiment, matching accuracy is maintained by performing domain conversion for all domains only on the query image and switching the model used for feature extraction.

［クエリ登録処理］
　次に、推論装置１０によるクエリ画像の特徴量の登録処理について説明する。図５は、実施の形態１に係るクエリ画像の特徴量の登録処理の処理手順を示すフローチャートである。 [Query registration process]
Next, the registration processing of the feature amount of the query image by the inference device 10 will be described. FIG. 5 is a flow chart showing a processing procedure of registration processing of a feature amount of a query image according to Embodiment 1. FIG.

　推論装置１０では、画像入力部１３１が、クエリ画像の入力を受け付けると（ステップＳ１１）、ドメイン判定部１３２が、クエリ画像がいずれのドメインに属するかを判定する（ステップＳ１２）。 In the inference device 10, when the image input unit 131 receives the input of the query image (step S11), the domain determination unit 132 determines to which domain the query image belongs (step S12).

　そして、ドメイン変換部１３３は、クエリ画像を、クエリ画像が属するドメインと異なる他のドメインの画像に変換する（ステップＳ１３）。モデル選択部１３４は、モデル群１２２のモデルのうち、変換されたクエリ画像のドメインに対応するモデルを選択する（ステップＳ１４）。 Then, the domain conversion unit 133 converts the query image into an image of another domain different from the domain to which the query image belongs (step S13). The model selection unit 134 selects a model corresponding to the domain of the transformed query image from among the models in the model group 122 (step S14).

　特徴量抽出部１３５１は、モデル選択部１３４によって選択されたモデルを用いて、ドメイン変換後のクエリ画像の特徴量を抽出する（ステップＳ１５）。そして、第１の登録部１３６は、クエリ画像の特徴量を、変換されたドメインのクエリ画像の特徴量としてクエリ特徴量データ１２１に登録する（ステップＳ１６）。 The feature amount extraction unit 1351 uses the model selected by the model selection unit 134 to extract the feature amount of the query image after domain conversion (step S15). Then, the first registration unit 136 registers the feature amount of the query image in the query feature amount data 121 as the feature amount of the query image of the converted domain (step S16).

　ドメイン変換部１３３は、次の変換対象のドメインがあるか否かを判定する（ステップＳ１７）。ドメイン変換部１３３は、全モデルに対応する全てのドメインに、クエリ画像を変換した場合には、次の変換対象のドメインがないと判定して（ステップＳ１７：Ｎｏ）、処理を終了する。ドメイン変換部１３３は、全モデルに対応する全てのドメインに、クエリ画像を変換していない場合には、次の変換対象のドメインがあると判定して（ステップＳ１７：Ｙｅｓ）、ステップＳ１３に戻り、未変換のドメインに対するクエリ画像のドメイン処理を実行する。 The domain conversion unit 133 determines whether or not there is a domain to be converted next (step S17). When the query image has been transformed into all domains corresponding to all models, the domain transforming unit 133 determines that there is no domain to be transformed next (Step S17: No), and ends the process. If the query image has not been converted to all domains corresponding to all models, the domain conversion unit 133 determines that there is a next domain to be converted (step S17: Yes), and returns to step S13. , perform domain processing of the query image for the untransformed domain.

［推論処理］
　次に、推論装置１０による推論処理について説明する。図６は、実施の形態１に係る推論処理の処理手順を示すフローチャートである。 [Inference processing]
Next, inference processing by the inference device 10 will be described. FIG. 6 is a flowchart of a processing procedure of inference processing according to the first embodiment.

　推論装置１０では、画像入力部１３１が、ギャラリ画像の入力を受け付けると（ステップＳ２１）、ドメイン判定部１３２が、ギャラリ画像がいずれのドメインに属するかを判定する（ステップＳ２２）。 In the inference device 10, when the image input unit 131 receives the input of the gallery image (step S21), the domain determination unit 132 determines to which domain the gallery image belongs (step S22).

　モデル選択部１３４は、モデル群１２２のモデルのうち、ステップＳ２２によって判定されたギャラリ画像のドメインに対応するモデルを選択する（ステップＳ２３）。特徴量抽出部１３５１は、モデル選択部１３４によって選択されたモデルを用いて、ギャラリ画像の特徴量を抽出する（ステップＳ２４）。 The model selection unit 134 selects a model corresponding to the domain of the gallery image determined in step S22 from among the models in the model group 122 (step S23). The feature amount extraction unit 1351 uses the model selected by the model selection unit 134 to extract the feature amount of the gallery image (step S24).

　続いて、照合部１３５２は、クエリ特徴量データ１２１のうちの、ステップＳ２２において判定された、ギャラリ画像が属するドメインに対応するクエリ画像の特徴量を参照する（ステップＳ２５）。照合部１３５２は、ギャラリ画像の特徴量と、参照したクエリ画像の特徴量との距離を算出し、算出した距離と、照合用の閾値とを比較して、ギャラリ画像の被写体が照合対象であるか否かを照合する（ステップＳ２６）。そして、照合部１３５２は、照合結果を出力して（ステップＳ２７）、処理を終了する。 Subsequently, the collation unit 1352 refers to the feature amount of the query image corresponding to the domain to which the gallery image belongs, determined in step S22, among the query feature amount data 121 (step S25). The matching unit 1352 calculates the distance between the feature amount of the gallery image and the feature amount of the referenced query image, compares the calculated distance with a threshold for matching, and determines that the object in the gallery image is to be matched. (step S26). The collation unit 1352 then outputs the collation result (step S27), and terminates the process.

［実施の形態１の効果］
　このように、推論装置１０は、ギャラリ画像のドメインを判定して、そのドメインに対応するモデルを使用するため、判定に応じて適切な特徴量の抽出処理を実行することができ、画像の照合の精度向上を図ることができる。 [Effect of Embodiment 1]
In this way, the inference apparatus 10 determines the domain of the gallery image and uses the model corresponding to that domain, so that it is possible to perform appropriate feature extraction processing according to the determination, and to perform image matching. accuracy can be improved.

　また、推論装置１０は、クエリ画像のドメインを、訓練済みのモデルに対応する全てのドメインの画像にそれぞれ変換し、各ドメインに対応するモデルを用いて各ドメインのクエリ画像から特徴量を抽出し、登録しておく。したがって、推論装置１０では、訓練済みのモデルに対応する全てのドメインの画像にクエリ画像を変換することで、全てのドメインのクエリ画像を用意することができる。このため、推論装置１０は、照合時に、ギャラリ画像のドメインと同じドメインのクエリ画像の特徴量と、ギャラリ画像の特徴量とを比較することができるため、クエリ画像とギャラリ画像とのドメインが異なることによる精度低下を低減することができる。 In addition, the inference device 10 converts the domain of the query image into images of all domains corresponding to trained models, and extracts features from the query image of each domain using the model corresponding to each domain. , register. Therefore, the inference device 10 can prepare query images of all domains by converting the query images into images of all domains corresponding to the trained model. Therefore, since the inference device 10 can compare the feature amount of the query image in the same domain as the gallery image with the feature amount of the gallery image at the time of matching, the domains of the query image and the gallery image are different. It is possible to reduce the decrease in accuracy caused by this.

［実施の形態１の変形例］
　実施の形態１に係る推論装置は、クラス分類を行ってもよい。図７は、実施の形態１の変形例に係る推論装置の他の構成の一例を模式的に示す図である。 [Modification of Embodiment 1]
The inference apparatus according to Embodiment 1 may perform class classification. FIG. 7 is a diagram schematically showing an example of another configuration of the inference device according to the modification of Embodiment 1. FIG.

　図７に示すように、実施の形態１の変形例に係る推論装置１０Ａは、図４に示す制御部１３に代えて、制御部１３Ａを有する。制御部１３Ａは、制御部１３と同様の機能を有する。 As shown in FIG. 7, the inference device 10A according to the modification of the first embodiment has a control unit 13A instead of the control unit 13 shown in FIG. 13 A of control parts have the same function as the control part 13. FIG.

　制御部１３Ａは、分類部１３５２Ａを有する推論部１３５Ａを有する。分類部１３５２Ａは、クラス分類を行う訓練済みの分類モデルを用いて、特徴量抽出部１３５１によって抽出された推論用画像の特徴量（特徴量ベクトル）からｌｏｇｉｔの計算を行い、推定用画像の被写体のクラスを判定する。分類部１３５２Ａは、クラス分類結果を、推論結果１２４Ａとして登録するとともに、出力してもよい。 The control unit 13A has an inference unit 135A having a classification unit 1352A. The classification unit 1352A uses a trained classification model that performs class classification, calculates logit from the feature amount (feature amount vector) of the inference image extracted by the feature amount extraction unit 1351, and calculates the object of the estimation image. determine the class of The classification unit 1352A may register and output the class classification result as the inference result 124A.

　図８は、実施の形態１の変形例に係る推論処理の処理手順を示すフローチャートである。図８に示すように、推論装置１０Ａは、推論用画像の入力を受け付けると（ステップＳ３１）、図６に示すステップＳ２２～２４と同様の処理を行う。具体的には、ドメイン判定部１３２は、推論用画像のドメインを判定する（ステップＳ３２）。モデル選択部１３４は、ステップＳ３２において判定されたドメインに対応するモデルをモデル群１２２の中から選択する（ステップＳ３３）。特徴量抽出部１３５１は、モデル選択部１３４によって選択されたモデルを用いて、推論用画像の特徴量を抽出する（ステップＳ３４）。 FIG. 8 is a flowchart showing the procedure of inference processing according to the modification of the first embodiment. As shown in FIG. 8, the inference device 10A accepts input of an inference image (step S31), and performs the same processing as steps S22 to S24 shown in FIG. Specifically, the domain determining unit 132 determines the domain of the inference image (step S32). The model selection unit 134 selects a model corresponding to the domain determined in step S32 from the model group 122 (step S33). The feature amount extraction unit 1351 uses the model selected by the model selection unit 134 to extract the feature amount of the inference image (step S34).

　そして、分類部１３５２Ａは、推論用画像の特徴量からｌｏｇｉｔの計算を行い、推定用画像の被写体のクラスを判定するクラス分類を行い（ステップＳ３５）、分類結果を出力する（ステップＳ３６）。 Then, the classification unit 1352A calculates logit from the feature amount of the image for inference, performs class classification for determining the class of the subject of the image for estimation (step S35), and outputs the classification result (step S36).

　この推論装置１０Ａのように、クラス分類を実行する場合にも、推論用画像のドメインを判定して、そのドメインに対応するモデルを使用することで、判定に応じて適切な特徴量の抽出処理を実行することができ、クラス分類の精度向上を図ることができる。 As in this inference device 10A, even when class classification is performed, the domain of the inference image is determined, and a model corresponding to the domain is used to extract an appropriate feature quantity according to the determination. can be executed, and the accuracy of class classification can be improved.

［実施の形態２］
　次に、実施の形態２について説明する。実施の形態２では、推論装置１０，１０Ａが使用する各ドメインにそれぞれ対応したモデルを訓練する訓練装置について説明する。 [Embodiment 2]
Next, Embodiment 2 will be described. In the second embodiment, a training device that trains models corresponding to respective domains used by inference devices 10 and 10A will be described.

　推論装置１０，１０Ａでは、訓練済みのモデルは、各ドメインにそれぞれ対応して設けられている。そこで、実施の形態２に係る訓練装置では、ドメインごとに訓練用画像を用意し、各ドメインの訓練用画像を用いて、ドメインごとにモデルを訓練する。実施の形態２に係る訓練装置では、既存データセットから、ドメインごとの訓練用画像を用意する。 In the inference devices 10 and 10A, trained models are provided corresponding to each domain. Therefore, in the training apparatus according to Embodiment 2, a training image is prepared for each domain, and a model is trained for each domain using the training image of each domain. The training apparatus according to the second embodiment prepares training images for each domain from existing data sets.

［訓練装置］
　実施の形態２に係る訓練装置について説明する。図９は、実施の形態２に係る訓練装置の構成の一例を模式的に示す図である。図９に示すように、訓練装置２０は、入出力部２１、記憶部２２及び制御部２３を有する。 [Training device]
A training device according to Embodiment 2 will be described. FIG. 9 is a diagram schematically showing an example of a configuration of a training device according to Embodiment 2. FIG. As shown in FIG. 9, the training device 20 has an input/output unit 21, a storage unit 22 and a control unit .

　入出力部２１は、情報の入力を受け付け、また、情報の出力を行う。入出力部２１は、例えば、ネットワーク等を介して接続された他の装置との間で、各種情報を送受信する通信インタフェースである。入出力部２１は、ＬＡＮやインターネットなどの電気通信回線を介した他の装置と制御部２３（後述）との間の通信を行う。また、入出力部２１は、ユーザによる入力操作に対応して、訓練装置２０に対する各種指示情報の入力を受け付ける、マウスやキーボード等のデバイス装置である。入出力部２１は、例えば、液晶ディスプレイなどによって実現され、訓練装置２０によって表示制御された画面が表示出力される。 The input/output unit 21 receives input of information and outputs information. The input/output unit 21 is, for example, a communication interface that transmits and receives various information to and from other devices connected via a network or the like. The input/output unit 21 performs communication between another device and a control unit 23 (described later) via an electric communication line such as a LAN or the Internet. The input/output unit 21 is a device such as a mouse and a keyboard that receives input of various instruction information to the training apparatus 20 in response to user's input operation. The input/output unit 21 is realized by, for example, a liquid crystal display or the like, and displays and outputs a screen that is display-controlled by the training device 20 .

　記憶部２２は、ＲＡＭ、フラッシュメモリ等の半導体メモリ素子によって実現され、訓練装置２０を動作させる処理プログラムや、処理プログラムの実行中に使用されるデータなどが記憶される。記憶部２２は、データセット２２１、訓練用画像２２２、及び、モデル群２２３を有する。 The storage unit 22 is implemented by a semiconductor memory device such as RAM and flash memory, and stores processing programs for operating the training device 20, data used during execution of the processing programs, and the like. The storage unit 22 has a dataset 221 , a training image 222 and a model group 223 .

　データセット２２１は、例えば、ＭＳＭＴ公開データセット等の公開データセットである。ＭＳＭＴ公開データセットは、朝から夜にわたって撮影された画像を幅広く含む。 The dataset 221 is, for example, a public dataset such as the MSMT public dataset. The MSMT public dataset contains a wide range of images taken from morning to night.

　訓練用画像２２２は、各ドメインにそれぞれ対応する訓練済み画像群を有する。例えば、訓練用画像２２２は、第１ドメインに対応するモデルの訓練用画像群である第１ドメイン用画像群２２２１、及び、第２ドメインに対応するモデルの訓練用画像群である第２ドメイン用画像群２２２２を有する。 The training image 222 has a trained image group corresponding to each domain. For example, the training images 222 include a first domain image group 2221 that is a training image group for the model corresponding to the first domain, and a second domain image group that is a training image group for the model corresponding to the second domain. It has an image group 2222 .

　モデル群２２３は、推論部１３５が使用する複数の訓練済みモデルを有する。例えば、モデル群２２３は、第１ドメインに対応した第１ドメイン用モデル２２３１、第２ドメインに対応した第２ドメイン用モデル２２３２を有する。 The model group 223 has a plurality of trained models used by the inference unit 135. For example, the model group 223 has a first domain model 2231 corresponding to the first domain and a second domain model 2232 corresponding to the second domain.

　制御部２３は、訓練装置２０全体を制御する。制御部２３は、例えば、ＣＰＵ等の電子回路や、ＡＳＩＣ、ＦＰＧＡ等の集積回路である。また、制御部２３は、各種の処理手順を規定したプログラムや制御データを格納するための内部メモリを有し、内部メモリを用いて各処理を実行する。また、制御部２３は、各種のプログラムが動作することにより各種の処理部として機能する。制御部２３は、訓練用画像取得部２３１と、訓練部２３２とを有する。 The control unit 23 controls the training device 20 as a whole. The control unit 23 is, for example, an electronic circuit such as a CPU, or an integrated circuit such as an ASIC or FPGA. The control unit 23 also has an internal memory for storing programs defining various processing procedures and control data, and executes each processing using the internal memory. Further, the control unit 23 functions as various processing units by running various programs. The control unit 23 has a training image acquisition unit 231 and a training unit 232 .

　訓練用画像取得部２３１は、データセット取得部２３１１（入力部）、ドメイン判定部２３１２（判定部）、ドメイン変換部２３１３（変換部）及び第２の登録部２３１４（登録部、第２の登録部）を有する。 The training image acquisition unit 231 includes a data set acquisition unit 2311 (input unit), a domain determination unit 2312 (determination unit), a domain conversion unit 2313 (conversion unit), and a second registration unit 2314 (registration unit, second registration unit). part).

　データセット取得部２３１１は、訓練用画像として、例えば、ＭＳＭＴ公開データセット等のデータセットを取得する。また、データセット取得部２３１１は、公開データセットのほか、推論用画像を撮像するカメラ等によって実際に撮影された実データを取得してもよい。 The dataset acquisition unit 2311 acquires a dataset such as an MSMT public dataset as a training image. The data set acquisition unit 2311 may also acquire actual data actually captured by a camera or the like that captures an inference image, in addition to the public data set.

　ドメイン判定部２３１２は、ドメイン判定部１３２と同じ機能を有する。ドメイン判定部２３１２は、被写体が撮像される環境を成立させる要素を基に、訓練用画像が属するドメインを判定する。被写体が撮像される環境を成立させる要素は、例えば、判定対象の画像が撮像された時刻、判定対象の画像の輝度、判定対象の画像の色彩及び彩度の少なくとも一つである。ドメイン判定部２３１２は、データセットに含まれる各画像について、画像が属するドメインを判定する。 The domain determination unit 2312 has the same function as the domain determination unit 132. The domain determination unit 2312 determines the domain to which the training image belongs, based on the elements that establish the environment in which the subject is imaged. The element that establishes the environment in which the subject is imaged is, for example, at least one of the time when the image to be determined was captured, the brightness of the image to be determined, and the color and saturation of the image to be determined. The domain determination unit 2312 determines the domain to which each image included in the data set belongs.

　ドメイン変換部２３１３は、被写体が撮像される環境を成立させる要素を基に、訓練用画像を、訓練用画像が属するドメインと異なる他のドメインの画像に変換する。被写体が撮像される環境を成立させる要素は、例えば、訓練用画像の輝度、判定対象の画像の色彩及び彩度の少なくとも一つである。 The domain conversion unit 2313 converts the training image into an image of another domain different from the domain to which the training image belongs, based on the elements that establish the environment in which the subject is imaged. An element that establishes the environment in which the subject is imaged is, for example, at least one of the brightness of the training image and the color and saturation of the determination target image.

　第２の登録部２３１４は、判定対象の訓練用画像を、ドメイン判定部２３１２によって判定されたドメインに対応するモデルの訓練用画像として登録する。第２の登録部２３１４は、例えば、ある訓練用画像が、ドメイン判定部２３１２によって第１のドメインに属することを判定された場合には、この訓練用画像を第１ドメイン用画像群２２２１の画像として登録する。また、第２の登録部２３１４は、ある訓練用画像が、ドメイン変換部２３１３によって第２のドメインの画像に変換された場合には、変換後の訓練用画像を第２ドメイン用画像群２２２２の画像として登録する。 The second registration unit 2314 registers the determination target training image as a model training image corresponding to the domain determined by the domain determination unit 2312 . For example, when the domain determination unit 2312 determines that a certain training image belongs to the first domain, the second registration unit 2314 registers the training image as an image of the first domain image group 2221 . Register as In addition, when a certain training image is converted into an image of the second domain by the domain conversion unit 2313 , the second registration unit 2314 stores the converted training image in the second domain image group 2222 . Register as an image.

　訓練部２３２は、第２の登録部２３１４によって登録された各ドメインの訓練用画像のうち、訓練対象となるモデルのドメインに対応する訓練用画像群を選択し、選択した訓練用画像群を用いて、モデルの訓練を実行する。訓練対象が第１ドメインに対応するモデルである場合、訓練部２３２は、第１ドメイン用画像群２２２１の画像を訓練用画像として選択して、第１ドメインに対応するモデルの訓練を行う。訓練部２３２は、Back　propagationなど公知の機構を用いて、所定の終了条件に達するまで、ニューラルネットワークで構成される各モデルのパラメータの更新を繰り返せばよい。 The training unit 232 selects a training image group corresponding to the domain of the model to be trained from among the training images of each domain registered by the second registration unit 2314, and uses the selected training image group. to perform model training. When the training target is a model corresponding to the first domain, the training unit 232 selects images of the first domain image group 2221 as training images and trains the model corresponding to the first domain. The training unit 232 may use a known mechanism such as back propagation to repeatedly update the parameters of each model composed of a neural network until a predetermined termination condition is reached.

［ドメイン判定部］
　次に、ドメイン判定部２３１２について説明する。ドメイン判定部２３１２は、ドメイン判定部１３２と同様に、時刻、輝度、色彩及び彩度を用いて、訓練用画像が属するドメインを判定する。 [Domain judgment part]
Next, the domain determination unit 2312 will be described. Similar to the domain determination unit 132, the domain determination unit 2312 determines the domain to which the training image belongs using time, brightness, color, and saturation.

　具体的には、朝、昼、夜のように時間帯に関するドメインの場合、ドメイン判定部１３２は、判定対象の画像を撮影した時刻と、予め定めておいた朝、昼、夜の各時間帯とを比較することによって、判定対象の画像のドメインを判定する。 Specifically, in the case of domains related to time zones such as morning, noon, and night, the domain determination unit 132 determines the time when the image to be determined was captured and the predetermined time zones of morning, noon, and night. The domain of the image to be determined is determined by comparing with .

　また、輝度を基準にして、あるデータセットを、朝と夜とのドメインの各データセットに分割したい場合には、予め輝度に関する閾値を設定する。そして、ドメイン判定部２３１２は、判定対象の画像の全ピクセルの輝度の平均値Ｌが閾値以上である場合にはドメインが朝であると判定し、平均値Ｌが閾値未満である場合にはドメインが夜であると判定する。 Also, if you want to divide a certain data set into morning and night domain data sets based on luminance, set a threshold value for luminance in advance. Then, the domain determination unit 2312 determines that the domain is morning when the average value L of luminance of all pixels of the determination target image is equal to or greater than the threshold, and determines that the domain is morning when the average value L is less than the threshold. is night.

　また、人物画像の背景部の色彩及び彩度（色味）を基準にして、あるデータセットを、朝と夜とのドメインの各データセットに分割したい場合、ドメイン判定部２３１２は、判定対象の画像のＲ，Ｇ，Ｂの輝度の全ピクセルに関する平均値を算出し、どの平均値が最大となるかによって、判定対象の画像が属するドメインが、例えば、赤ドメイン、青ドメイン、黄ドメインのいずれであるかを判定する。 In addition, when it is desired to divide a certain data set into respective data sets of morning and night domains based on the color and saturation (color) of the background portion of the person image, the domain determination unit 2312 The average value of all pixels of R, G, and B brightness of the image is calculated, and depending on which average value is the maximum, the domain to which the image to be judged belongs is, for example, red domain, blue domain, or yellow domain. Determine whether it is

　また、ドメイン判定部２３１２は、判定対象の画像が撮像された時刻、判定対象の画像の輝度、判定対象の画像の色彩及び彩度の二以上を組み合わせて、判定対象の画像が属するドメインを判定してもよい。 In addition, the domain determination unit 2312 determines the domain to which the determination target image belongs by combining two or more of the time when the determination target image was captured, the brightness of the determination target image, and the color and saturation of the determination target image. You may

［ドメイン変換部］
　ドメイン変換部２３１３は、各ドメインで訓練用画像の枚数が十分確保できない場合、データセットの画像を、他のドメインの画像に変換して、各ドメインの訓練用画像を生成する。 [Domain converter]
If a sufficient number of training images cannot be secured for each domain, the domain conversion unit 2313 converts the images of the dataset into images of other domains to generate training images of each domain.

　ドメイン変換部２３１３は、ドメインの変換を、輝度、色彩及び彩度（色味）などを基準に行う。この場合、各ドメインに属するデータは、輝度や色味の意味でドメインごとにその傾向が既知であると仮定する。 The domain conversion unit 2313 performs domain conversion based on luminance, color, saturation (color), and the like. In this case, it is assumed that the tendency of data belonging to each domain is known for each domain in terms of brightness and color.

　例えば、ドメインが輝度によって朝、昼、夜と定義される場合であって、さらに朝、昼、夜のドメインで各画像の全ピクセルの輝度の平均値Ｌ（（式（１３）と書く。）が概ね正規分布に従うことが分かっており、さらにその平均及び分散が既知である場合を例に説明する。 For example, when the domain is defined as morning, noon, and night by luminance, the average luminance value L ((written as equation (13)) of all pixels of each image in the morning, noon, and night domains is is known to generally follow a normal distribution, and its mean and variance are known.

　ドメイン変換部２３１３は、手元にある訓練用データセットの画像の輝度の平均値Ｌに関する平均及び分散が、各ドメインの平均及び分散に一致するようにドメイン変換を行う。 The domain conversion unit 2313 performs domain conversion so that the average and variance of the average luminance value L of the image of the training data set at hand match the average and variance of each domain.

　具体的には、入手可能な訓練データセットのｋ枚目の人物画像について、ｘ_ｋ：＝（Ｒ_ｋ，Ｇ_ｋ，Ｂ_ｋ）と書く。変換したい先のドメインの平均ベクトルをｕとし、分散共分散行列をＣとする。 Specifically, for the k-th person image in the available training data set, write x _k :=(R _k , G _k , B _k ). Let u be the mean vector of the domain to be transformed, and let C be the variance-covariance matrix.

　そして、ドメイン変換部２３１３は、入手可能な訓練データセットのｋ枚目の各画素値（式（１４））を式（１５）のように変換する。 Then, the domain conversion unit 2313 converts each k-th pixel value (formula (14)) of the available training data set as shown in formula (15).

　式（１５）において、Ｖは、分散共分散行列であり、Ｅは期待値ベクトルを求める演算を表す。 In Equation (15), V is a variance-covariance matrix, and E represents an operation for obtaining an expected value vector.

　例えば、ドメインが、色彩及び彩度（色味）によって、日中、夜間（赤）、夜間（青）、夜間（黄）と定義される場合であって、さらに、日中、夜間（赤）、夜間（青）、夜間（黄）のドメインで各画像のＲＧＢ値（Ｒ値は式（１６）、Ｇ値は式（１７）、Ｂ値は式（１８）をそれぞれ参照）が、概ね正規分布に従うことが分かっており、さらにその平均及び分散が既知である場合を例に説明する。 For example, if a domain is defined as day, night (red), night (blue), night (yellow) by color and saturation (hue), then day, night (red) , night (blue), and night (yellow) domains (see formula (16) for R values, formula (17) for G values, and formula (18) for B values) of each image are roughly normalized A case in which it is known to follow a distribution and its mean and variance are known will be described as an example.

　この場合、ドメイン変換部２３１３は、手元にある訓練用データセットのＲＧＢ値（式（１６）～式（１８）に示すＲ，Ｇ，Ｂ）に関する平均及び分散が、各ドメインの平均及び分散に一致するように、ドメイン変換を行う。 In this case, the domain conversion unit 2313 converts the average and variance of the RGB values (R, G, and B shown in Equations (16) to (18)) of the training data set at hand to the average and variance of each domain. Do domain translation to match.

　変換したい先のドメインのＲの値に関する平均をμとし、標準偏差をσとする。ドメイン変換部２３１３は、入手可能な訓練データセットのｋ枚目の各画素値を、式（１９）のように変換する。 Let μ be the average of the R values of the domain to be transformed, and let σ be the standard deviation. The domain conversion unit 2313 converts each k-th pixel value of the available training data set as shown in Equation (19).

　式（１９）において、Ｖは、分散であり、Ｅは期待値ベクトルを求める演算を表す。なお、Ｇ及びＢについては、式（１９）のＲ_ｋをＧ_ｋまたはＢ_ｋとし、ｒ^ｉｊ _ｋをｇ^ｉｊ _ｋまたはｂ^ｉｊ _ｋとすることで変換が可能である。 In Equation (19), V is the variance, and E represents the computation for obtaining the expected value vector. G and B can be converted by setting R _k in Equation (19) to G _k or B _k and r ^ij _k to g ^ij _k or b ^ij _k .

［訓練用画像取得処理］
　次に、訓練装置２０による訓練用画像取得処理について説明する。図１０は、実施の形態２に係る訓練用画像取得処理の処理手順を示すフローチャートである。 [Training Image Acquisition Processing]
Next, training image acquisition processing by the training device 20 will be described. FIG. 10 is a flowchart illustrating a processing procedure of training image acquisition processing according to the second embodiment.

　図１０に示すように、訓練装置２０では、データセット取得部２３１１が、データセットを取得する（ステップＳ４１）。そして、ドメイン判定部２３１２は、ステップＳ４１において取得されたデータセットから、任意に選択した判定対象の画像を参照し（ステップＳ４２）、この画像が属するドメインを判定する（ステップＳ４３）。第２の登録部２３１４は、判定対象の画像を、ステップＳ４３において判定されたドメインに対応するモデルの訓練用画像として登録する（ステップＳ４４）。 As shown in FIG. 10, in the training device 20, the dataset acquisition unit 2311 acquires a dataset (step S41). Then, the domain determining unit 2312 refers to the arbitrarily selected determination target image from the data set acquired in step S41 (step S42), and determines the domain to which this image belongs (step S43). The second registration unit 2314 registers the determination target image as a training image for the model corresponding to the domain determined in step S43 (step S44).

　そして、ドメイン判定部２３１２は、次の判定対象の画像がデータセットにあるか否かを判定する（ステップＳ４５）。ドメイン判定部２３１２は、データセットの全ての画像のドメインを判定した場合には、次の判定対象のドメインがないと判定して（ステップＳ４５：Ｎｏ）、処理を終了する。ドメイン判定部２３１２は、データセットの全ての画像のドメインを判定していない場合には、次の判定対象のドメインがあると判定して（ステップＳ４５：Ｙｅｓ）、ステップＳ４２に戻り、次の画像に対するドメイン判定を実行する。 Then, the domain determination unit 2312 determines whether or not the next image to be determined exists in the data set (step S45). When the domains of all the images in the data set have been determined, the domain determination unit 2312 determines that there is no domain to be determined next (step S45: No), and ends the process. If the domains of all the images in the data set have not been determined, the domain determining unit 2312 determines that there is a domain to be determined next (step S45: Yes), returns to step S42, and determines the next image. perform domain determination on

　図１１は、実施の形態２に係る訓練用画像取得処理の他の処理手順を示すフローチャートである。訓練装置２０は、各ドメインで訓練用画像の枚数が十分確保できない場合などにおいて、データセットの画像を、他のドメインの画像に変換して、各ドメインの訓練用画像を生成する。 FIG. 11 is a flowchart showing another processing procedure of the training image acquisition process according to the second embodiment. The training device 20 converts the images of the dataset into images of other domains to generate training images of each domain, for example, when a sufficient number of training images cannot be secured in each domain.

　具体的には、図１１に示すように、訓練装置２０は、変換対象のドメインとして、訓練用画像の枚数が十分確保できないドメインの入力を受け付けると（ステップＳ５１）、データセット取得部２３１１が、データセットを取得する（ステップＳ５２）。 Specifically, as shown in FIG. 11, when the training device 20 receives an input of a domain for which a sufficient number of training images cannot be secured as a domain to be transformed (step S51), the dataset acquisition unit 2311 A data set is obtained (step S52).

　そして、ドメイン判定部２３１２は、ステップＳ５２において取得されたデータセットから、任意に選択した変換対象の画像を参照し（ステップＳ５３）、この画像が属するドメインを判定する（ステップＳ５４）。 Then, the domain determining unit 2312 refers to an arbitrarily selected image to be transformed from the data set acquired in step S52 (step S53), and determines the domain to which this image belongs (step S54).

　そして、ドメイン変換部２３１３は、この画像が変換対象のドメインに属していない場合には、この画像を、変換対象のドメインの画像に変換する（ステップＳ５５）。第２の登録部２３１４は、ステップＳ５５においてドメインが変換された画像を、変換対象のドメインの訓練用ドメインに対応するモデルの訓練用画像として登録する（ステップＳ５６）。 Then, if the image does not belong to the conversion target domain, the domain conversion unit 2313 converts this image into the conversion target domain image (step S55). The second registration unit 2314 registers the image whose domain has been transformed in step S55 as a training image of the model corresponding to the training domain of the transformation target domain (step S56).

　そして、訓練装置２０は、次の変換対象の画像があるか否かを判定する（ステップＳ５７）。訓練装置２０は、変換対象のドメインについて、訓練用画像の枚数が十分確保できた場合には、次の変換対象のドメインがないと判定して（ステップＳ５７：Ｎｏ）、処理を終了する。訓練装置２０は、変換対象のドメインについて、訓練用画像の枚数が十分確保できていない場合には、次の変換対象の画像があると判定して（ステップＳ５７：Ｙｅｓ）、ステップＳ５３に戻り、次の画像に対して、変換対象のドメインへの変換処理を実行する。 Then, the training device 20 determines whether or not there is an image to be transformed next (step S57). If a sufficient number of training images can be secured for the domain to be transformed, the training device 20 determines that there is no domain to be transformed next (step S57: No), and ends the process. If the number of training images is not sufficient for the domain to be transformed, the training device 20 determines that there is an image to be transformed next (step S57: Yes), returns to step S53, The next image is transformed into the domain to be transformed.

［訓練処理］
　次に、訓練装置２０による推論処理について説明する。図１２は、実施の形態２に係る訓練処理の処理手順を示すフローチャートである。 [Training process]
Next, inference processing by the training device 20 will be described. FIG. 12 is a flowchart of a training process procedure according to the second embodiment.

　図１２に示すように、訓練装置２０は、訓練対象のドメインの入力を受け付けると（ステップＳ６１）、訓練部２３２は、訓練対象のドメインに対応する訓練用画像群を選択する（ステップＳ６２）。訓練部２３２は、選択した選択した訓練用画像群を用いて、モデルの訓練を実行する（ステップＳ６３）。 As shown in FIG. 12, when the training device 20 receives an input of a training target domain (step S61), the training unit 232 selects a training image group corresponding to the training target domain (step S62). The training unit 232 executes model training using the selected training image group (step S63).

［実施の形態２の効果］
　このように、訓練装置２０は、データセットの各画像のドメインを判定して、ドメインごとに訓練用画像群を用意している。したがって、訓練装置２０では、各ドメインに属する訓練用画像群を用いて、ドメインごとにモデルを訓練する。このため、訓練装置２０によれば、各ドメインに対応したモデルを適切に訓練することができ、モデルの推論精度の向上を図ることができる。 [Effect of Embodiment 2]
Thus, the training device 20 determines the domain of each image in the dataset and prepares training images for each domain. Therefore, the training device 20 trains a model for each domain using a group of training images belonging to each domain. Therefore, according to the training device 20, the model corresponding to each domain can be appropriately trained, and the inference accuracy of the model can be improved.

　また、訓練装置２０では、各ドメインで訓練用画像の枚数が十分確保できない場合には、データセットの画像を、確保したいドメインの画像に変換して、各ドメインの訓練用画像を生成する。したがって、訓練装置２０では、ドメインごとに十分な枚数の訓練用画像を確保することができる。このため、訓練装置２０によれば、いずれのドメインについてもモデルの訓練を適切に実行することができ、モデルの推論精度の向上を図ることができる。 In addition, in the training device 20, when a sufficient number of training images cannot be secured in each domain, the images of the dataset are converted into images of the desired domain to generate training images of each domain. Therefore, the training device 20 can secure a sufficient number of training images for each domain. Therefore, according to the training device 20, model training can be appropriately executed for any domain, and the inference accuracy of the model can be improved.

［実施の形態２の変形例］
　図１３に示す訓練装置２０Ａのように、図９に示すドメイン判定部２３１２を、ドメイン判定部２３１２Ａに代えた制御部２３Ａを有してもよい。 [Modification of Embodiment 2]
As in the training apparatus 20A shown in FIG. 13, a control unit 23A may be provided in which the domain determination unit 2312 shown in FIG. 9 is replaced with a domain determination unit 2312A.

　そこで、ドメイン判定部２３１２Ａは、例えば輝度による分割を行う場合、予め輝度に関する閾値と糊代幅とを定めておき、「閾値－糊代幅」以上出る場合にはドメインと判定し、「閾値＋糊代幅」を下回る場合にはドメインが夜であると判定する。 Therefore, for example, when performing division by luminance, the domain determination unit 2312A determines a threshold value and margin width for luminance in advance, and determines that it is a domain when the value is equal to or greater than "threshold - margin width". If it is less than "margin width", it is determined that the domain is at night.

　ドメイン判定部２３１２による判定では、閾値付近の訓練用画像の枚数が少なくなりやすく、閾値付近の推論用画像に対して各ドメインのモデルを適用する際に精度低下を生じる場合がある。これに対し、ドメイン判定部２３１２Ａによる判定では、閾値付近の訓練用画像の枚数を増やすことができ、各ドメインのモデルの精度低下も低減することができる。 In the determination by the domain determination unit 2312, the number of training images near the threshold tends to be small, and the accuracy may decrease when applying the models of each domain to the inference images near the threshold. On the other hand, in the determination by the domain determination unit 2312A, the number of training images near the threshold can be increased, and the deterioration of the model accuracy of each domain can be reduced.

［評価実験］
　実施の形態１，２における推論装置１０，１０Ａ及び訓練装置２０，２０Ａに対する評価実験Ａ０，Ａ１，Ｂを行った。 [Evaluation experiment]
Evaluation experiments A0, A1 and B were performed on the inference devices 10 and 10A and the training devices 20 and 20A in the first and second embodiments.

　評価実験Ａ０，Ａ１，Ｂに共通の評価指標について説明する。評価実験Ａ０，Ａ１，Ｂでは、rank-k、及び、mAP（mean　Average　Precision）による評価を行った。rank-kは、「あるクエリに対して距離の近い順にギャラリを並び替えたとき、上位k枚の中に本人のものが１枚でも現れる確率」の全クエリに関する平均である。rank-kは、0～1の値を取り、値が大きいほど精度が良い。 The evaluation index common to the evaluation experiments A0, A1, and B will be explained. In evaluation experiments A0, A1, and B, evaluation was performed using rank-k and mAP (mean Average Precision). rank-k is the average for all queries of "the probability that even one image of the person himself appears among the top k images when the gallery is rearranged in descending order of distance to a certain query". rank-k takes a value between 0 and 1, and the higher the value, the better the accuracy.

　mAPは、「適合率（あるクエリに関して上位k枚のギャラリのうち本人のものが占める割合）のkに関する平均」の全クエリに関する平均である。mAPは、0～1の値を取り、値が大きいほど精度が良い。 mAP is the average for all queries of the "average for k of the precision rate (the percentage of the top k galleries for a given query that are occupied by the person himself/herself)". mAP takes a value between 0 and 1, and the higher the value, the better the accuracy.

［評価実験Ａ０］
　評価実験Ａ０について説明する。評価実験Ａ０では、屋外で人物照合を行う場合、モデル切り替えの有無に応じて精度の差があるか否かを評価した。 [Evaluation experiment A0]
Evaluation experiment A0 will be described. In the evaluation experiment A0, it was evaluated whether or not there is a difference in accuracy depending on the presence or absence of model switching when person verification is performed outdoors.

　評価実験Ａ０では、ＭＳＭＴデータセットを使用した。ＭＳＭＴデータセットは、訓練用画像群及び推論用画像群からなるデータセットであり、朝～夜に撮影された画像を幅広く含む。 The MSMT dataset was used in the evaluation experiment A0. The MSMT dataset is a dataset consisting of a training image group and an inference image group, and includes a wide range of images taken from morning to night.

　訓練時には、図１０に示す処理手順にしたがって、ＭＳＭＴデータセットの各画像をドメイン判定し、判定したドメインに応じてドメインごとに分割した訓練用画像を用いて、各ドメインに対応するモデルを訓練した。また、推論時には、ギャラリ画像のドメインを判定して、ギャラリ画像のドメインに応じてモデルを切り替えて推論を行った。評価実験Ａ０では、クエリ画像とギャラリ画像とのドメインが同一である場合について評価した。 At the time of training, according to the processing procedure shown in FIG. 10, each image of the MSMT dataset is subjected to domain determination, and a training image divided into each domain according to the determined domain is used to train a model corresponding to each domain. . During inference, the domain of the gallery image was determined, and inference was performed by switching the model according to the domain of the gallery image. In evaluation experiment A0, the case where the domain of the query image and the gallery image was the same was evaluated.

　評価実験Ａ０では、一例として、輝度を基準にして、ＭＳＭＴデータセットを、３つのドメイン（例えば、朝、昼、夜とする。）の各訓練用画像に分割した。そして、評価実験Ａ０では、訓練画像群において各ドメインに属する画像数が等しくなる閾値を採用する。また、訓練用画像の分割については、実施の形態２の変形例に示す糊代を持たせて分割する場合も評価した。推論時には、輝度を基に推論用画像のドメインを判定した。 In the evaluation experiment A0, as an example, the MSMT dataset was divided into training images of three domains (for example, morning, noon, and night) on the basis of brightness. Then, in the evaluation experiment A0, a threshold value that makes the number of images belonging to each domain equal in the training image group is adopted. In addition, regarding the division of the training image, evaluation was also made in the case of division with a margin shown in the modified example of the second embodiment. During inference, the domain of the inference image was determined based on the luminance.

　モデルの訓練と評価の手順を示す。まず、ドメインを分ける輝度に関する境界値を定める。具体的には、式（２）に示す全ピクセルの輝度の平均値Ｌが、θ_１・（１－α）より大きい画像が属するドメインを朝ドメインと便宜上よぶ。また、輝度の平均値Ｌが、θ_１・（１＋α）より小さく、θ_２・（１－α）より大きい画像が属するドメインを昼ドメインと便宜上よぶ。また、輝度の平均値Ｌがθ_２・（１＋α）小さい大きい画像が属するドメインを夜ドメインと便宜上よぶ。 Demonstrate the steps for model training and evaluation. First, a boundary value relating to luminance that divides domains is determined. Specifically, a domain to which an image in which the average luminance value L of all pixels shown in Equation (2) is greater than θ ₁ ·(1−α) will be referred to as a morning domain for convenience. For convenience, a domain to which an image having an average luminance value L smaller than θ ₁ ·(1+α) and larger than θ ₂ ·(1−α) belongs is called a daytime domain. For convenience, a domain to which a large image whose average luminance value L is smaller than θ ₂ ·(1+α) belongs is called a night domain.

　閾値については、θ_１＝61.7、θ_２＝85.6を用いるほか、糊代はα∈｛0.00,0.04,0.08,0.12,0.16｝の各パターンを試す。なお、ＭＳＭＴの訓練用画像群において輝度の平均と標準偏差とは、それぞれ76.1、27.7である。 As for the thresholds, θ ₁ =61.7 and θ ₂ =85.6 are used, and each pattern of margin α∈{0.00, 0.04, 0.08, 0.12, 0.16} is tested. The mean and standard deviation of luminance in the MSMT training image group are 76.1 and 27.7, respectively.

　そして、訓練装置２０，２０Ａは、訓練用画像群をドメインごとに分ける。これによって、ＭＳＭＴデータセットは、朝、昼、夜の各ドメインの訓練用画像群に分かれる。そして、訓練装置２０，２０Ａは、各ドメインの訓練用画像群を用いて、各ドメインに対応するモデルを訓練する。推論時において、推論装置１０は、ドメイン判定を行い、推論用画像群をドメインごとに分ける。これによって、朝、昼、夜の各ドメインのクエリ画像群およびギャラリ画像群ができる。 Then, the training devices 20 and 20A divide the training image group by domain. As a result, the MSMT dataset is divided into training image groups for morning, noon, and night domains. Then, the training devices 20 and 20A train a model corresponding to each domain using the training image group of each domain. At the time of inference, the inference device 10 performs domain determination and divides the inference image group by domain. This results in query images and gallery images for the morning, noon, and night domains.

　続いて、推論装置１０における各モデルを評価する。朝、昼または夜のモデルを、それぞれ対応する朝、昼または夜のクエリ画像群および朝、昼または夜のギャラリ画像群に適用し、ドメインごとに特徴量ベクトルの距離を計算することで、rank-kとmAPの値を算出する。 Next, each model in the inference device 10 is evaluated. By applying the morning, noon, or night model to the corresponding morning, noon, or night query image group and the morning, noon, or night gallery image group, and calculating the feature vector distance for each domain, rank Calculate the values of -k and mAP.

　次に、汎用モデルの訓練と評価の手順の概要を説明する。汎用モデルの訓練では、ＭＳＭＴデータセットをそのまま使用して、汎用モデルを訓練する。そして、推論時には、推論用画像群をドメインごとに分ける。これによって、朝、昼、夜の各ドメインのクエリ画像群およびギャラリ画像群ができる。 Next, we will outline the procedure for training and evaluating the general-purpose model. In general model training, the MSMT dataset is used as is to train a general model. Then, at the time of inference, the inference image group is divided for each domain. This results in query images and gallery images for the morning, noon, and night domains.

　続いて、汎用モデルを評価する。汎用モデルを朝／昼／夜のクエリ画像群および朝／昼／夜のギャラリ画像群に適用し、ドメインごとに特徴量ベクトルの距離を計算することで、rank-kとmAPの値を算出する。 Next, evaluate the general-purpose model. Apply the generic model to the morning/day/night query image group and the morning/day/night gallery image group, and calculate the distance of the feature vectors for each domain to calculate rank-k and mAP values. .

　評価実験Ａ０の結果を表１に示す。 Table 1 shows the results of the evaluation experiment A0.

　表１に示すように、汎用モデル（実験データＡ０－００）より、実施の形態１，２におけるドメインごとに訓練したモデル（例えば、実験データＡ０－１３）の方が、推論精度が高い結果となった。したがって、実施の形態１のようにギャラリ画像のドメインに応じてモデルを切り替えることで、有効な結果が得られた。 As shown in Table 1, the model trained for each domain in Embodiments 1 and 2 (for example, experimental data A0-13) has higher inference accuracy than the general-purpose model (experimental data A0-00). became. Therefore, effective results were obtained by switching the model according to the domain of the gallery image as in the first embodiment.

　この際、訓練装置２０のドメイン判定部２３１２による単純なデータセット分割（実験データＡ０－１０）より、訓練装置２０Ａのドメイン判定部２３１２Ａによる糊代を用いたデータセット分割（実験データＡ０－１３）の方が有効であった。糊代を用いてデータセット分割の場合、α＝0.12付近で精度のピークを迎えるため、この例では、α＝0.12と設定すればよい。 At this time, the data set division (experimental data A0-13) using the margin by the domain judgment unit 2312A of the training device 20A is more than the simple data set division (experimental data A0-10) by the domain judgment unit 2312 of the training device 20. was more effective. In the case of data set division using margins, the accuracy peaks around α=0.12, so in this example, α=0.12 should be set.

［評価実験Ａ１］
　評価実験Ａ１について説明する。評価実験Ａ１では、クエリ画像とギャラリ画像との間でドメインが異なる場合、クエリ画像のドメイン変換の有無、すなわち、各ドメインのクエリ画像の特徴量の登録の有無に応じて、精度の差があるか否かを評価した。 [Evaluation experiment A1]
Evaluation experiment A1 will be described. In the evaluation experiment A1, when the domains of the query image and the gallery image are different, there is a difference in accuracy depending on whether or not the domain conversion of the query image is performed, that is, whether or not the feature amount of the query image of each domain is registered. evaluated whether or not

　評価実験Ａ１では、ＭＳＭＴデータセットを使用した。訓練時には、ＭＳＭＴデータセットの各画像をドメイン判定し、判定したドメインに応じてドメインごとに分割した訓練用画像を用いて、各ドメインに対応するモデルを訓練した。推論時には、ギャラリ画像のドメインを判定して、ギャラリ画像のドメインに応じてモデルを切り替えて推論を行った。また、評価実験Ａ１では、クエリ画像とギャラリ画像とのドメインが異なる場合について評価した。 The MSMT dataset was used in the evaluation experiment A1. During training, each image in the MSMT data set was subjected to domain determination, and training images divided into domains according to the determined domain were used to train a model corresponding to each domain. At the time of inference, the domain of the gallery image was determined, and inference was performed by switching the model according to the domain of the gallery image. Also, in the evaluation experiment A1, the case where the domain of the query image and the gallery image was different was evaluated.

　評価実験Ａ１では、訓練用画像は評価実験Ａ０と同じ設定とした。なお、糊代は、α＝0.12に固定する。訓練用画像群を３つのドメイン（例えば、朝、昼、夜とする。）ごとに分ける。これによって、ＭＳＭＴデータセットは、朝、昼、夜の各ドメインの訓練用画像群に分かれる。そして、各ドメインの訓練用画像群を用いて、各ドメインに対応するモデルを訓練する。 In the evaluation experiment A1, the training images were set the same as in the evaluation experiment A0. The glue margin is fixed at α=0.12. The training images are divided into three domains (for example, morning, noon, and night). As a result, the MSMT dataset is divided into training image groups for morning, noon, and night domains. Then, the training image group of each domain is used to train a model corresponding to each domain.

　推論時において、推論装置１０は、ドメイン判定を行い、推論用画像群をドメインごとに分ける。これによって、朝、昼、夜の各ドメインのクエリ画像群およびギャラリ画像群ができる。 At the time of inference, the inference device 10 performs domain determination and divides the inference image group by domain. This results in query images and gallery images for the morning, noon, and night domains.

　なお、推論装置１０では、輝度によるクエリ画像のドメイン変換を採用する。変換先のドメインの平均的な輝度Ｌ´は、訓練用画像群における各ドメインに属する画像群の輝度の平均値とする。続いて、推論装置１０における各モデルを評価する。朝、昼または夜のモデルを、それぞれ対応する朝、昼または夜のクエリ画像群および朝、昼または夜のギャラリ画像群に適用し、ドメインごとに特徴量ベクトルの距離を計算することで、rank-kとmAPの値を算出する。 Note that the inference device 10 employs domain conversion of the query image based on luminance. The average luminance L′ of the conversion destination domain is the average luminance value of the image group belonging to each domain in the training image group. Subsequently, each model in the inference device 10 is evaluated. By applying the morning, noon, or night model to the corresponding morning, noon, or night query image group and the morning, noon, or night gallery image group, and calculating the feature vector distance for each domain, rank Calculate the values of -k and mAP.

　また、推論時にクエリ画像の変換を行わない場合には、各ドメインのモデル評価の際に、朝／昼／夜の専用モデルを、昼＋夜／夜＋朝／朝＋昼のクエリ画像群そのままと、朝／昼／夜のギャラリ画像群とに適用し、ドメインごとに特徴量ベクトルの距離を計算することで、rank-1とmAPの値を算出する。 In addition, if the query image is not converted during inference, the dedicated model for morning/day/night is used as it is when evaluating the model for each domain. and gallery images of morning/day/night, and calculate the distance of the feature vector for each domain to calculate the values of rank-1 and mAP.

　続いて、汎用モデルを評価する。汎用モデルを昼＋夜／夜＋朝／朝＋昼のクエリ画像群を朝／昼／夜にドメイン変換した画像群と、朝／昼／夜のギャラリ画像群とに適用し、ドメインごとに特徴量ベクトルの距離を計算することで、rank-kとmAPの値を算出する。また、クエリのドメイン変換を行わない場合には、汎用モデルを、昼＋夜／夜＋朝／朝＋昼のクエリ画像群そのままと、朝／昼／夜のギャラリ画像群とに適用し、ドメインごとに特徴量ベクトルの距離を計算することで、rank-1とmAPの値を算出する。 Next, evaluate the general-purpose model. The generic model is applied to the images obtained by domain-converting the day + night/night + morning/morning + day query images to morning/day/night and the morning/day/night gallery images. Calculate the rank-k and mAP values by calculating the distance of the quantity vector. In addition, when the domain conversion of the query is not performed, the general model is applied to the query image group of day + night / night + morning / morning + day as it is and the gallery image group of morning / day / night, domain The values of rank-1 and mAP are calculated by calculating the distance of the feature amount vector for each.

　評価実験Ａ１の結果を表２に示す。 Table 2 shows the results of the evaluation experiment A1.

　表２に示すように、汎用モデルについても、実施の形態１，２におけるドメインごとに訓練したモデルについても、クエリ画像のドメイン変換を行ったクエリ画像の特徴量を用いて照合を行った方が、推論精度が高い結果となった。また、クエリ画像のドメイン変換を行わない場合であっても、ギャラリ画像のドメインに応じてモデルを切り替えた方が、推論精度が高い結果となった。 As shown in Table 2, for both the general-purpose model and the model trained for each domain in Embodiments 1 and 2, it is better to perform matching using the feature amount of the query image that has undergone domain transformation of the query image. , the inference accuracy was high. In addition, even when the domain conversion of the query image was not performed, the inference accuracy was higher when the model was switched according to the domain of the gallery image.

　このように、評価実験Ａ１では、クエリ画像とギャラリ画像との間でドメインが異なる場合、推論装置１０において、クエリ画像のドメイン変換を行うことが有効であるという結果が得られた。 In this way, in the evaluation experiment A1, when the query image and the gallery image had different domains, it was found that it is effective to perform the domain conversion of the query image in the inference device 10 .

［評価実験Ｂ］
　次に、評価実験Ｂについて説明する。評価実験Ｂでは、公開データセットではなく実データに適用する場合、モデルの訓練用画像として、分割及び変換による各ドメインの訓練用画像のいずれが好適であるかを評価した。 [Evaluation experiment B]
Next, evaluation experiment B will be described. In the evaluation experiment B, when applied to real data instead of a public data set, it was evaluated which of the training images of each domain by segmentation and transformation is suitable as training images for the model.

　分割による訓練用画像は、図１０に示す処理手順に示す、訓練装置２０のドメイン判定部２３１２によるデータセット分割によって用意された各ドメインの訓練用画像である。分割による訓練用画像は、図１１に示す処理手順に示す、ドメイン変換部２３１３によるドメイン変換（図１１参照）による訓練用画像の生成によって用意された各ドメインの訓練用画像である。 The divided training image is a training image of each domain prepared by data set division by the domain determination unit 2312 of the training device 20 shown in the processing procedure shown in FIG. The divided training image is a training image of each domain prepared by generating a training image by domain conversion (see FIG. 11) by the domain conversion unit 2313 shown in the processing procedure shown in FIG.

　訓練用画像として、ＭＳＭＴデータセットをドメイン判定してドメインごとに分割して用意した訓練用画像群と、ＭＳＭＴデータセットの画像をドメイン変換することでドメインごとに生成した訓練用画像群とを用意した。 As training images, a group of training images prepared by dividing the MSMT data set into domains and a group of training images generated for each domain by subjecting the images of the MSMT data set to domain conversion are prepared. bottom.

　推論用画像は、実データ風データセットを用いた。実データ風データセットは、朝～夜に撮影された画像を幅広く含むほか、大型ディスプレイによる色彩及び彩度（色味）の影響など、公開データセットには含まれない実データならではの難しさを含む。なお、実データそのものを取得することは難しいため、本実験では、公開データセットの画像の輝度を、変換することでドメイン変換したMarket1501の推論用画像群にて代用した。 A real data-like dataset was used for the inference image. The real-data-like dataset includes a wide range of images taken from morning to night, and the difficulties unique to real data that are not included in the public dataset, such as the effects of color and saturation (color) on a large display. include. Since it is difficult to obtain the actual data itself, in this experiment, the inference image group of Market 1501, which is domain-converted by converting the brightness of the image of the public dataset, was substituted.

　また、推論時において、ドメインの分け方は、時刻及び色味に基に行っている。時刻は日中または夜間のラベルが各画像に付与済みであるため、それを利用する。色味については、前述した赤、青、黄で分ける。なお、この評価実験Ｂでは、クエリ画像とギャラリ画像との間でドメインは同一のものとする。 Also, during inference, domains are divided based on time and color. As for the time, the daytime or nighttime label is already assigned to each image, so that label is used. Colors are classified into red, blue, and yellow as described above. Note that in this evaluation experiment B, the domain is the same between the query image and the gallery image.

　分割による訓練用画像を用いたモデルの訓練と評価の手順について説明する。まず、訓練装置２０は、訓練用画像群をドメインごとに分ける。これによって、ＭＳＭＴデータセットは、日中、夜間赤、夜間青、夜間黄の各ドメインの訓練用画像群に分かれる。訓練装置２０は、各ドメインの訓練用画像群を用いて、各ドメインに対応するモデルを訓練する。 We will explain the procedure for training and evaluating the model using the segmented training image. First, the training device 20 divides the training image groups into domains. This divides the MSMT dataset into training images for the day, night red, night blue, and night yellow domains. The training device 20 trains a model corresponding to each domain using the training image group of each domain.

　推論時において、推論装置１０は、ドメイン判定を行い、推論用画像群をドメインごとに分ける。これによって、日中、夜間赤、夜間青、夜間黄の各ドメインのクエリ画像群およびギャラリ画像群ができる。続いて、推論装置１０における各モデルを評価する。日中／夜間赤／夜間青／夜間黄の専用モデルを日中／夜間赤／夜間青／夜間黄のクエリ画像群および日中／夜間赤／夜間青／夜間黄のギャラリ画像群に適用し、ドメインごとに特徴量ベクトルの距離を計算することで、rank-1とmAPの値を算出する。 At the time of inference, the inference device 10 performs domain determination and divides the inference image group by domain. This results in query and gallery images for the day, night red, night blue, and night yellow domains. Subsequently, each model in the inference device 10 is evaluated. applying a day/night red/night blue/night yellow dedicated model to the day/night red/night blue/night yellow query images and the day/night red/night blue/night yellow gallery images, By calculating the distance of the feature vector for each domain, rank-1 and mAP values are calculated.

　変換による訓練用画像を用いたモデルの訓練と評価の手順について説明する。まず、訓練装置２０は、訓練用画像群を各ドメインに変換する。これによって、ＭＳＭＴデータセットは、日中、夜間赤、夜間青、夜間黄の各ドメインに変換される。変換として、輝度による変換を採用する。実データ画像のドメインごとのＲＧＢ値の平均及び分散に合うように変換を行った。そして、訓練装置２０は、各ドメインの訓練用画像群を用いて、各ドメインに対応するモデルを訓練する。推論装置１０は、分割による訓練用画像を用いた場合と同様に、各モデルを評価する。評価対象のモデルは、各ドメインに対応するモデルの他、汎用モデルを含む。 We will explain the procedure for training and evaluating the model using the transformed training image. First, the training device 20 transforms the training image group into each domain. This transforms the MSMT dataset into day, night red, night blue, and night yellow domains. As conversion, conversion by luminance is adopted. A transformation was performed to fit the mean and variance of the RGB values for each domain of the real data image. Then, the training device 20 trains a model corresponding to each domain using the training image group of each domain. The inference device 10 evaluates each model in the same way as when using the divided training images. Models to be evaluated include models corresponding to each domain as well as general-purpose models.

　評価実験Ｂの結果を表３に示す。 Table 3 shows the results of evaluation experiment B.

　表３に示すように、汎用モデルについても、ドメインごとに訓練されたモデルについても、分割による訓練用画像を用いた場合と比して、ドメイン変換により生成した訓練用画像群を用いて訓練した方が、精度が高い結果となった。このため、モデルの訓練用画像は、分割による各ドメインの訓練用画像よりも、変換による各ドメインの訓練用画像が好適であるという結果が得られた。 As shown in Table 3, both the general-purpose model and the model trained for each domain were trained using the training image group generated by the domain transformation compared to the case of using the training image by segmentation. The result was more accurate. For this reason, it was found that training images of each domain obtained by transformation are preferable to training images of each domain obtained by division.

　そして、汎用モデルと比して、ギャラリ画像のドメインに応じてモデルを切り替えた方が有効であった。実データに適用する場合、データセットの枚数が少ない場合や、推論時のドメインをカバーしていない場合があるため、訓練装置２０では、ドメイン変換を行って、訓練用画像を生成することが望ましい。 And, compared to the general-purpose model, it was more effective to switch the model according to the domain of the gallery image. When applied to real data, the number of data sets may be small, or the domain at the time of inference may not be covered. Therefore, it is desirable that the training device 20 performs domain conversion to generate training images. .

　実施の形態では、朝、昼、夜の時間帯によるドメインを例に説明したが、ドメインは、これに限定されるものではない。例えば、ドメインは、天候やライティング（光源）の違いによるドメインであってもよい。天候によるドメインは、例えば、晴れ、曇り、雨、雪などがある。また、季節や時間帯の変化による太陽の位置によるドメインは、順光、逆光などがある。また、人物の姿勢によるドメインを設定してもよく、この場合には、直立、椅子等に座っている、逆立ちなどがある。 In the embodiment, an example of a domain by time zone of morning, noon, and night was explained, but the domain is not limited to this. For example, the domain may be due to differences in weather or lighting (light source). Weather domains include, for example, sunny, cloudy, rainy, and snowy. Domains based on the position of the sun due to changes in seasons and time zones include front light and backlight. Also, a domain may be set according to a person's posture, and in this case, there are upright, sitting on a chair, handstand, and the like.

［実施の形態のシステム構成について］
　推論装置１０，１０Ａ及び訓練装置２０，２０Ａの各構成要素は機能概念的なものであり、必ずしも物理的に図示のように構成されていることを要しない。すなわち、推論装置１０，１０Ａ及び訓練装置２０，２０Ａの機能の分散及び統合の具体的形態は図示のものに限られず、その全部または一部を、各種の負荷や使用状況などに応じて、任意の単位で機能的または物理的に分散または統合して構成することができる。 [Regarding the system configuration of the embodiment]
Each component of the inference devices 10, 10A and the training devices 20, 20A is functionally conceptual, and does not necessarily need to be physically configured as illustrated. That is, the specific form of distribution and integration of the functions of the inference devices 10, 10A and the training devices 20, 20A is not limited to the illustrated one, and all or part of them can be arbitrarily changed according to various loads and usage conditions. can be functionally or physically distributed or integrated in units of

　また、推論装置１０，１０Ａ及び訓練装置２０，２０Ａにおいておこなわれる各処理は、全部または任意の一部が、ＣＰＵ、ＧＰＵ（Graphics　Processing　Unit）、及び、ＣＰＵ、ＧＰＵにより解析実行されるプログラムにて実現されてもよい。また、推論装置１０，１０Ａ及び訓練装置２０，２０Ａにおいておこなわれる各処理は、ワイヤードロジックによるハードウェアとして実現されてもよい。 In addition, all or any part of the processing performed in the inference devices 10, 10A and the training devices 20, 20A is a CPU, a GPU (Graphics Processing Unit), and a program that is analyzed and executed by the CPU and GPU. may be implemented. Further, each process performed in the inference devices 10, 10A and the training devices 20, 20A may be realized as hardware by wired logic.

　また、実施の形態において説明した各処理のうち、自動的におこなわれるものとして説明した処理の全部または一部を手動的に行うこともできる。もしくは、手動的におこなわれるものとして説明した処理の全部または一部を公知の方法で自動的に行うこともできる。この他、上述及び図示の処理手順、制御手順、具体的名称、各種のデータやパラメータを含む情報については、特記する場合を除いて適宜変更することができる。 Also, among the processes described in the embodiments, all or part of the processes described as being performed automatically can also be performed manually. Alternatively, all or part of the processes described as being performed manually can be performed automatically by known methods. In addition, the above-described and illustrated processing procedures, control procedures, specific names, and information including various data and parameters can be changed as appropriate unless otherwise specified.

［プログラム］
　図１４は、プログラムが実行されることにより、推論装置１０，１０Ａ及び訓練装置２０，２０Ａが実現されるコンピュータの一例を示す図である。コンピュータ１０００は、例えば、メモリ１０１０、ＣＰＵ１０２０を有する。また、コンピュータ１０００は、ハードディスクドライブインタフェース１０３０、ディスクドライブインタフェース１０４０、シリアルポートインタフェース１０５０、ビデオアダプタ１０６０、ネットワークインタフェース１０７０を有する。これらの各部は、バス１０８０によって接続される。 [program]
FIG. 14 is a diagram showing an example of a computer that implements the inference devices 10 and 10A and the training devices 20 and 20A by executing programs. The computer 1000 has a memory 1010 and a CPU 1020, for example. Computer 1000 also has hard disk drive interface 1030 , disk drive interface 1040 , serial port interface 1050 , video adapter 1060 and network interface 1070 . These units are connected by a bus 1080 .

　メモリ１０１０は、ＲＯＭ１０１１及びＲＡＭ１０１２を含む。ＲＯＭ１０１１は、例えば、ＢＩＯＳ（Basic　Input　Output　System）等のブートプログラムを記憶する。ハードディスクドライブインタフェース１０３０は、ハードディスクドライブ１０９０に接続される。ディスクドライブインタフェース１０４０は、ディスクドライブ１１００に接続される。例えば磁気ディスクや光ディスク等の着脱可能な記憶媒体が、ディスクドライブ１１００に挿入される。シリアルポートインタフェース１０５０は、例えばマウス１１１０、キーボード１１２０に接続される。ビデオアダプタ１０６０は、例えばディスプレイ１１３０に接続される。 The memory 1010 includes a ROM 1011 and a RAM 1012. The ROM 1011 stores a boot program such as BIOS (Basic Input Output System). Hard disk drive interface 1030 is connected to hard disk drive 1090 . A disk drive interface 1040 is connected to the disk drive 1100 . A removable storage medium such as a magnetic disk or optical disk is inserted into the disk drive 1100 . Serial port interface 1050 is connected to mouse 1110 and keyboard 1120, for example. Video adapter 1060 is connected to display 1130, for example.

　ハードディスクドライブ１０９０は、例えば、ＯＳ（Operating　System）１０９１、アプリケーションプログラム１０９２、プログラムモジュール１０９３、プログラムデータ１０９４を記憶する。すなわち、推論装置１０，１０Ａ及び訓練装置２０，２０Ａの各処理を規定するプログラムは、コンピュータ１０００により実行可能なコードが記述されたプログラムモジュール１０９３として実装される。プログラムモジュール１０９３は、例えばハードディスクドライブ１０９０に記憶される。例えば、推論装置１０，１０Ａ及び訓練装置２０，２０Ａにおける機能構成と同様の処理を実行するためのプログラムモジュール１０９３が、ハードディスクドライブ１０９０に記憶される。なお、ハードディスクドライブ１０９０は、ＳＳＤ（Solid　State　Drive）により代替されてもよい。 The hard disk drive 1090 stores an OS (Operating System) 1091, application programs 1092, program modules 1093, and program data 1094, for example. That is, a program that defines each process of the inference devices 10, 10A and the training devices 20, 20A is implemented as a program module 1093 in which code executable by the computer 1000 is written. Program modules 1093 are stored, for example, on hard disk drive 1090 . For example, the hard disk drive 1090 stores a program module 1093 for executing processing similar to the functional configurations of the inference devices 10 and 10A and the training devices 20 and 20A. The hard disk drive 1090 may be replaced by an SSD (Solid State Drive).

　また、上述した実施の形態の処理で用いられる設定データは、プログラムデータ１０９４として、例えばメモリ１０１０やハードディスクドライブ１０９０に記憶される。そして、ＣＰＵ１０２０が、メモリ１０１０やハードディスクドライブ１０９０に記憶されたプログラムモジュール１０９３やプログラムデータ１０９４を必要に応じてＲＡＭ１０１２に読み出して実行する。 Also, the setting data used in the processing of the above-described embodiment is stored as program data 1094 in the memory 1010 or the hard disk drive 1090, for example. Then, the CPU 1020 reads out the program module 1093 and the program data 1094 stored in the memory 1010 and the hard disk drive 1090 to the RAM 1012 as necessary and executes them.

　なお、プログラムモジュール１０９３やプログラムデータ１０９４は、ハードディスクドライブ１０９０に記憶される場合に限らず、例えば着脱可能な記憶媒体に記憶され、ディスクドライブ１１００等を介してＣＰＵ１０２０によって読み出されてもよい。あるいは、プログラムモジュール１０９３及びプログラムデータ１０９４は、ネットワーク（ＬＡＮ（Local　Area　Network）、ＷＡＮ（Wide　Area　Network）等）を介して接続された他のコンピュータに記憶されてもよい。そして、プログラムモジュール１０９３及びプログラムデータ１０９４は、他のコンピュータから、ネットワークインタフェース１０７０を介してＣＰＵ１０２０によって読み出されてもよい。 The program modules 1093 and program data 1094 are not limited to being stored in the hard disk drive 1090, but may be stored in a removable storage medium, for example, and read by the CPU 1020 via the disk drive 1100 or the like. Alternatively, the program modules 1093 and program data 1094 may be stored in another computer connected via a network (LAN (Local Area Network), WAN (Wide Area Network), etc.). Program modules 1093 and program data 1094 may then be read by CPU 1020 through network interface 1070 from other computers.

　以上、本発明者によってなされた発明を適用した実施の形態について説明したが、本実施の形態による本発明の開示の一部をなす記述及び図面により本発明は限定されることはない。すなわち、本実施の形態に基づいて当業者等によりなされる他の実施の形態、実施例及び運用技術等は全て本発明の範疇に含まれる。 Although the embodiment to which the invention made by the present inventor is applied has been described above, the present invention is not limited by the description and drawings forming part of the disclosure of the present invention according to the present embodiment. That is, other embodiments, examples, operation techniques, etc. made by those skilled in the art based on the present embodiment are all included in the scope of the present invention.

　１０，１０Ａ　推論装置
　１１，２１　入出力部
　１２，２２　記憶部
　１３，２３　制御部
　２０，２０Ａ　訓練装置
　１２１　クエリ特徴量データ
　１２２，２２３　モデル群
　１２３，１２３Ａ　推論結果
　１３１　画像入力部
　１３２　ドメイン判定部
　１３３　ドメイン変換部
　１３４　モデル選択部
　１３５，１３５Ａ　推論部
　１３６　第１の登録部
　２２１　データセット
　２２２　訓練用画像
　２３１　訓練用画像取得部
　２３２　訓練部
　１３５１　特徴量抽出部
　１３５２　照合部
　１３５２Ａ　分類部
　２３１１　データセット取得部
　２３１２　ドメイン判定部
　２３１３　ドメイン変換部
　２３１４　第２の登録部 10, 10A inference device 11, 21 input/output unit 12, 22 storage unit 13, 23 control unit 20, 20A training device 121 query feature amount data 122, 223 model group 123, 123A inference result 131 image input unit 132 domain determination unit 133 Domain conversion unit 134 model selection unit 135, 135A inference unit 136 first registration unit 221 data set 222 training image 231 training image acquisition unit 232 training unit 1351 feature amount extraction unit 1352 matching unit 1352A classification unit 2311 data set acquisition unit 2312 Domain determination unit 2313 Domain conversion unit 2314 Second registration unit

Claims

　判定対象の画像の入力を受け付ける入力部と、
　被写体が撮像される環境を成立させる要素を基に、前記判定対象の画像が、環境条件によりそれぞれ定義される複数のドメインのいずれのドメインに属するかを判定する判定部と、
　を有することを特徴とする処理装置。 an input unit that receives input of an image to be determined;
a determination unit that determines to which of a plurality of domains each defined by environmental conditions the image to be determined belongs, based on elements that establish an environment in which the subject is imaged;
A processing apparatus comprising:
　前記判定部は、前記判定対象の画像が撮像された時刻、前記判定対象の画像の輝度、前記判定対象の画像の色彩及び彩度の少なくとも一つを基に、前記判定対象の画像が前記複数のドメインのいずれのドメインに属するかを判定することを特徴とする請求項１に記載の処理装置。 The determination unit selects the plurality of determination target images based on at least one of the time when the determination target image was captured, the brightness of the determination target image, and the color and saturation of the determination target image. 2. The processing apparatus according to claim 1, wherein the processing apparatus determines to which of the domains of .
　前記入力部は、前記判定対象の画像として推論用画像の入力を受け付け、
　前記判定部は、前記推論用画像が属するドメインを判定し、
　前記判定部による判定を基に、前記複数のドメインにそれぞれ対応する複数のモデルのうち、前記判定対象の画像が属するドメインに対応するモデルを選択する選択部と、
　前記選択部によって選択されたモデルを用いて、前記推論用画像に対する推論を行う推論部と、
　をさらに有することを特徴とする請求項１または２に記載の処理装置。 The input unit receives input of an inference image as the determination target image,
The determination unit determines a domain to which the inference image belongs,
a selection unit that selects a model corresponding to a domain to which the image to be determined belongs, from among a plurality of models respectively corresponding to the plurality of domains, based on the determination by the determination unit;
an inference unit that infers the inference image using the model selected by the selection unit;
3. The processing apparatus according to claim 1, further comprising:
　前記複数のモデルは、画像の特徴量を抽出するモデルであり、
　前記ドメインごとに照合用の閾値が設定されており、
　前記推論部は、
　前記選択部によって選択されたモデルを用いて、前記推論用画像の特徴量を抽出する抽出部と、
　前記推論用画像の特徴量と照合対象が写る画像の特徴量との距離を算出し、算出した距離と、前記推論用画像が属するドメインに対して設定された照合用の閾値とを比較し、前記推論用画像の被写体が前記照合対象であるか否かを照合する照合部と、
　を有することを特徴とする請求項３に記載の処理装置。 The plurality of models are models for extracting feature amounts of images,
A matching threshold is set for each domain,
The reasoning unit
an extraction unit that extracts the feature amount of the inference image using the model selected by the selection unit;
calculating the distance between the feature amount of the image for inference and the feature amount of the image in which the matching target is shown, and comparing the calculated distance with a matching threshold set for the domain to which the image for inference belongs, a verification unit that verifies whether the subject of the inference image is the verification target;
4. The processing apparatus according to claim 3, comprising:
　前記入力部は、前記判定対象の画像として訓練用画像の入力を受け付け、
　前記判定部は、前記訓練用画像が属するドメインを判定し、
　前記訓練用画像を、前記判定部によって判定されたドメインに対応するモデルの訓練用画像として登録する登録部
　をさらに有することを特徴とする請求項１または２に記載の処理装置。 The input unit receives an input of a training image as the determination target image,
The determination unit determines a domain to which the training image belongs,
3. The processing apparatus according to claim 1, further comprising a registration unit that registers the training image as a training image of a model corresponding to the domain determined by the determination unit.
　前記登録部によって登録された各ドメインの訓練用画像のうち、訓練対象となるモデルのドメインに対応する訓練用画像を選択し、選択した訓練用画像を用いて、前記モデルの訓練を実行する訓練部
　をさらに有することを特徴とする請求項５に記載の処理装置。 training of selecting a training image corresponding to the domain of the model to be trained from among the training images of each domain registered by the registration unit, and executing training of the model using the selected training image; 6. The processing apparatus of claim 5, further comprising a unit.
　処理装置が実行する処理方法であって、
　判定対象の画像の入力を受け付ける工程と、
　被写体が撮像される環境を成立させる要素を基に、前記判定対象の画像が、環境条件によりそれぞれ定義される複数のドメインのいずれのドメインに属するかを判定する工程と、
　を含んだことを特徴とする処理方法。 A processing method executed by a processing device,
a step of receiving input of an image to be determined;
a step of determining to which of a plurality of domains each defined by environmental conditions the image to be determined belongs, based on elements that establish an environment in which the subject is imaged;
A processing method characterized by including