WO2023053419A1 - Processing device and processing method - Google Patents

Processing device and processing method Download PDF

Info

Publication number
WO2023053419A1
WO2023053419A1 PCT/JP2021/036325 JP2021036325W WO2023053419A1 WO 2023053419 A1 WO2023053419 A1 WO 2023053419A1 JP 2021036325 W JP2021036325 W JP 2021036325W WO 2023053419 A1 WO2023053419 A1 WO 2023053419A1
Authority
WO
WIPO (PCT)
Prior art keywords
domain
image
unit
training
inference
Prior art date
Application number
PCT/JP2021/036325
Other languages
French (fr)
Japanese (ja)
Inventor
琢 佐々木
嘉典 松尾
啓太 三上
玲那 星野
Original Assignee
日本電信電話株式会社
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 日本電信電話株式会社 filed Critical 日本電信電話株式会社
Priority to PCT/JP2021/036325 priority Critical patent/WO2023053419A1/en
Publication of WO2023053419A1 publication Critical patent/WO2023053419A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis

Definitions

  • the present invention relates to a processing apparatus and a processing method.
  • Non-Patent Document 1 there is a technique for improving accuracy by complementing changes in environmental conditions with a model that uses natural fluctuations in data (see Non-Patent Document 1).
  • Non-Patent Document 1 when environmental conditions change too much, such as between day and night, sufficient interpolation cannot be performed, and the accuracy of image matching or estimation cannot be ensured. I had a problem.
  • the present invention has been made in view of the above, and by appropriately determining the environmental conditions of a target image, it is possible to execute image processing according to the determination, and improve the accuracy of image matching or estimation. It is an object of the present invention to provide a processing apparatus and a processing method capable of achieving
  • a processing apparatus provides an input unit that receives an input of an image to be determined, and and a determination unit that determines to which of a plurality of domains each defined by environmental conditions the image belongs to.
  • the present invention by appropriately determining the environmental conditions of the target image, it is possible to perform image processing according to the determination and improve the accuracy of image matching or estimation.
  • FIG. 1 is a diagram for explaining the outline of the first embodiment.
  • FIG. 2 is a diagram for explaining the outline of the first embodiment.
  • FIG. 3 is a diagram explaining an outline of the first embodiment.
  • 4 is a diagram schematically showing an example of the configuration of the inference apparatus according to Embodiment 1.
  • FIG. 5 is a flow chart showing a processing procedure of registration processing of a feature amount of a query image according to Embodiment 1.
  • FIG. FIG. 6 is a flowchart of a processing procedure of inference processing according to the first embodiment.
  • FIG. 7 is a diagram schematically showing an example of another configuration of the inference device according to the modification of Embodiment 1.
  • FIG. 8 is a flowchart of a procedure of inference processing according to the modification of Embodiment 1.
  • FIG. 9 is a diagram schematically showing an example of a configuration of a training device according to Embodiment 2.
  • FIG. 10 is a flowchart illustrating a processing procedure of training image acquisition processing according to the second embodiment.
  • FIG. 11 is a flowchart illustrating another processing procedure of training image acquisition processing according to the second embodiment.
  • FIG. 12 is a flowchart of a training process procedure according to the second embodiment.
  • 13 is a diagram schematically showing an example of another configuration of the training device according to the modification of Embodiment 2.
  • FIG. FIG. 14 is a diagram illustrating an example of a computer that implements a training device and an estimation device by executing a program.
  • FIG. 1 is a diagram for explaining the outline of the first embodiment.
  • image collation will be described as an example.
  • An image in which a matching target is shown will be referred to as a query image, and an image for which whether or not a matching target is shown will be referred to as a gallery image.
  • the inference apparatus has trained models corresponding to a plurality of domains, and performs matching by switching the model to be used according to the domain of the gallery image. Domains are defined by environmental conditions. In the examples of FIGS. 1 to 3, there are domains of daytime and nighttime, and a model M1 corresponding to daytime and a model M2 corresponding to nighttime will be described as an example.
  • the inference device registers the feature quantity of the query image.
  • the inference device transforms the query image into a day/night domain image.
  • the inference apparatus extracts a feature amount from the daytime domain query image using the model M1 corresponding to the daytime domain (see arrow Y11 in FIG. 1), and converts the extracted feature amount to the daytime domain. is registered as the feature quantity of the query image corresponding to .
  • the inference device extracts a feature quantity from the query image of the night domain using the model M2 corresponding to the night domain (see arrow Y12 in FIG. 1).
  • the inference device registers the extracted feature amount as the feature amount of the query image corresponding to the night domain (see arrow Y12-1 in FIG. 1).
  • the inference device makes an inference for the gallery image.
  • the reasoner determines the domain to which this gallery image belongs. In the example of FIG. 1, the domain of the gallery image is determined to be night. Then, the inference device selects the model M2 corresponding to the night, which is the domain of the gallery image, from among the models M1 and M2 ((1) in FIG. 1), and uses the selected model M2 to obtain the feature values of the gallery image. is extracted ((2) in FIG. 1).
  • the inference device refers to the feature amount of the query image corresponding to the night domain, and calculates the distance between the feature amount of the gallery image and the feature amount of the referenced query image.
  • the inference device compares the calculated distance with a matching threshold value to check whether or not the matching object appears in the gallery image ((3) in FIG. 1). Note that a matching threshold is set for each domain, and the inference apparatus uses the matching threshold set for the domain of the gallery image during matching.
  • FIG. 2 illustrates a case where a plurality of query images are received ((1) in FIG. 2).
  • the inference device receives a query image Q1 of a person in work clothes in the day domain and a query image Q2 of a person in a suit in the night domain.
  • the reasoning apparatus transforms the domain of the query image in preparation for gallery images whose domains are day and night ((2) in FIG. 2).
  • the inference device converts the query image Q1 into a query image Q12 of the night domain (see arrow Y11).
  • the inference device transforms the query image Q2 into a query image Q21 in the daytime domain (see arrow Y12).
  • the inference apparatus extracts the feature amount using the model M1 for the feature amount of the query images Q1 and Q21 in the daytime domain ((3) in FIG. 2), and converts the extracted feature amount into the daytime domain. are registered as feature amounts M1-1 and M2-1 of the query image corresponding to ((4) in FIG. 2).
  • the feature quantity M1-1 corresponds to a person in work clothes in the daytime domain
  • the feature quantity M2-1 corresponds to a person in a suit in the daytime domain.
  • the inference device extracts feature amounts using the model M2 for the feature amounts of the query images Q12 and Q2 in the night domain ((3) in FIG. 2), and assigns the extracted feature amounts to the night domain. are registered as the feature amounts M1-2 and M2-2 of the query image ((4) in FIG. 2).
  • the feature amount M1-2 corresponds to a person in work clothes in the night domain
  • the feature amount M2-2 corresponds to a person in a suit in the night domain.
  • FIG. 3 illustrates a case where a nighttime image I1 and a daytime image I2 are captured by the monitoring cameras C1 and C2.
  • gallery images G1 to G4 in which Mr. A to D appear are clipped from images I1 and I2 ((A) in FIG. 3).
  • the person clipping task may be executed by another device provided between the monitoring cameras C1, C2 and the inference device, or may be executed by the inference device.
  • the inference device determines the domain of the gallery images G1 to G4.
  • the inference device determines that the domain of gallery images G1 and G2 is night.
  • the inference device determines that the domain of gallery image G4 is daytime.
  • the inference device determines that the gallery image G3 is in the night domain because it was in the time when the sun was out but in a dark place under the shade of trees (see arrow Y31).
  • the inference device divides the gallery images G1 to G4 according to the day and night domains ((2) in FIG. 3).
  • the inference device selects the model M2 corresponding to the night domain for the gallery images G1 to G3, and uses the selected model M2 to extract the feature values of the gallery images ((3) in FIG. 3).
  • the inference device selects the model M1 corresponding to the daytime domain, and uses the selected model M2 to extract the feature amount of the gallery image ((3) in FIG. 3).
  • the inference device compares the feature amounts of the gallery images G1 to G3 with the feature amounts of the pre-registered query image of the domain of night, and performs matching.
  • the inference device compares each feature amount of Mr. A to C whose domain is night with the feature amount of the person in work clothes and the person in suit of the query image corresponding to the night domain, respectively, and performs matching ( (4) in FIG. 3).
  • the inference device compares the feature amount of the gallery image G4 with the feature amount of the pre-registered query image of the daytime domain for matching.
  • the inference device compares the feature amount of Mr. D whose domain is daytime with the feature amounts of the person in work clothes and the person in suit in the query image corresponding to the daytime domain, respectively, and performs matching (see FIG. 3). (4)).
  • the inference apparatus transforms the domain of the query image into images of all domains corresponding to trained models, and uses the model corresponding to each domain to query each domain.
  • a feature amount is extracted from an image, and the feature amount of each query image is registered in association with a domain.
  • the inference device to compare the feature amount of the query image in the same domain as the gallery image with the feature amount of the gallery image at the time of matching. Therefore, according to the inference device, it is possible to reduce accuracy degradation due to the difference in domain between the query image and the gallery image.
  • the inference device determines the domain of the gallery image and uses the model corresponding to that domain, it is possible to perform appropriate feature extraction processing according to the determination, improving the accuracy of image matching. can be achieved.
  • FIG. 4 is a diagram schematically showing an example of the configuration of the inference apparatus according to Embodiment 1.
  • the inference device 10 has an input/output unit 11 , a storage unit 12 and a control unit 13 .
  • the input/output unit 11 receives input of information and outputs information.
  • the input/output unit 11 is, for example, a communication interface that transmits and receives various information to and from other devices connected via a network or the like.
  • the input/output unit 11 performs communication between another device and the control unit 13 (described later) via an electric communication line such as a LAN (Local Area Network) or the Internet.
  • the input/output unit 11 is a device such as a mouse and a keyboard that receives input of various instruction information to the inference apparatus 10 in response to user's input operation.
  • the input/output unit 11 is implemented by, for example, a liquid crystal display, and displays and outputs a screen whose display is controlled by the inference device 10 .
  • the storage unit 12 is realized by semiconductor memory devices such as RAM (Random Access Memory) and flash memory, and stores processing programs that operate the inference device 10 and data used during execution of the processing programs. be done.
  • the storage unit 12 has query feature amount data 121, a model group 122, and an inference result 123 indicating the result of inference by an inference unit 135 (described later).
  • the query feature amount data 121 has the feature amount of the query image transformed into each domain.
  • the query feature amount data 121 includes a first domain feature amount that is the feature amount of the query image transformed into the first domain, and a second domain feature amount that is the feature amount of the query image transformed into the second domain. have.
  • the model group 122 has a plurality of models used by the inference unit 135 (described later). Each model is a feature extraction model and has been trained. Each model is configured by, for example, a neural network (NN). A model is provided corresponding to each domain. For example, the first domain model is a model corresponding to the first domain, and the second domain model is a model corresponding to the second domain.
  • NN neural network
  • the control unit 13 controls the inference device 10 as a whole.
  • the control unit 13 is, for example, an electronic circuit such as a CPU (Central Processing Unit) or an integrated circuit such as an ASIC (Application Specific Integrated Circuit) or an FPGA (Field Programmable Gate Array).
  • the control unit 13 also has an internal memory for storing programs defining various processing procedures and control data, and executes each processing using the internal memory. Further, the control unit 13 functions as various processing units by running various programs.
  • the control unit 13 includes an image input unit 131 (input unit), a domain determination unit 132 (determination unit), a domain conversion unit 133 (conversion unit), a model selection unit 134 (selection unit), an inference unit 135, and a first registration unit. 136.
  • the image input unit 131 receives input of query images to be determined and converted.
  • the image input unit 131 receives an input of a gallery image to be determined.
  • the gallery image is an inference image that is an inference target of the inference unit 135 .
  • the domain determination unit 132 determines to which of a plurality of domains the image to be determined belongs, based on the elements that establish the environment in which the subject is imaged.
  • the element that establishes the environment in which the subject is imaged is, for example, at least one of the time when the image to be determined was captured, the brightness of the image to be determined, and the color and saturation of the image to be determined.
  • the domain determination unit 132 determines to which domain the query image and the gallery image belong.
  • the domain transforming unit 133 determines the image to be transformed to belong to one of the domains based on the elements that establish the environment in which the subject is captured. Convert the image to another domain different from the domain.
  • the element that establishes the environment in which the subject is imaged is, for example, at least one of the brightness of the image to be determined, and the color and saturation of the image to be determined.
  • the domain conversion unit 133 converts the query image into an image of another domain different from the domain to which the query image belongs. If there are multiple domains different from the domain to which the query image belongs, the domain conversion unit 133 converts the query image for all other domains. For example, when the domain to which the query image belongs is the first domain, and the second domain and the third domain are defined in addition to the first domain, the domain conversion unit 133 converts the query image into the image of the second domain. Transform the query image into a third domain image.
  • the model selection unit 134 selects a model corresponding to the domain of the determination target image determined by the domain determination unit 132 from among the models in the model group 122 .
  • the model selection unit 134 selects models corresponding to each domain of the query image and the query image converted by the domain conversion unit 133 from among the models of the model group 122. . For example, when the domain determination unit 132 determines that the domain to which the query image belongs is the first domain, the model selection unit 134 selects the model for the first domain for the query image. Also, the model selection unit 134 selects a model for the second domain for the query image converted to the second domain.
  • the model selection unit 134 selects a model corresponding to the domain to which the gallery image belongs from among the models in the model group 122 based on the determination result of the domain determination unit 132 . For example, when the domain determination unit 132 determines that the domain to which the gallery image belongs is the second domain, the model selection unit 134 selects the model for the second domain for the gallery image.
  • the inference unit 135 makes inferences using the model selected by the model selection unit 134 .
  • the inference unit 135 has a feature extraction unit 1351 and a matching unit 1352 .
  • the feature amount extraction unit 1351 uses the model selected by the model selection unit 134 to extract the feature amount of the image to be processed.
  • the feature quantity extraction unit 1351 may perform generally known NN forward propagation and the like.
  • the model selection unit 134 uses the model selected for each domain to extract the feature amount of the query image and the query image whose domain has been converted by the domain conversion unit 133. do. For example, when the model selection unit 134 selects the first domain model 1231 for the query image, the feature amount extraction unit 1351 uses the first domain model 1231 to extract the feature amount of the query image. Further, when the second domain model 1232 is selected by the model selection unit 134 for the query image transformed into the second domain, the feature amount extraction unit 1351 uses the second domain model 1232 to convert the second domain Extract the feature quantity of the query image converted to .
  • the feature amount extraction unit 1351 uses the model selected by the model selection unit 134 to extract the feature amount of the gallery image in the inference stage for the gallery image. In addition, when the model selection unit 134 selects the second domain model 1232 for the gallery image, the feature amount extraction unit 1351 uses the second domain model 1232 to extract the feature amount of the gallery image.
  • the matching unit 1352 calculates the distance between the feature amount of the gallery image and the feature amount of the query image. At this time, the matching unit 1352 refers to the feature amount of the query image corresponding to the domain to which the gallery image belongs in the query feature amount data 121 as the feature amount of the query image. For example, when the domain determination unit 132 determines that the domain to which the gallery image belongs is the second domain, the matching unit 1352 refers to the second domain feature amount in the query feature amount data 121 .
  • the matching unit 1352 compares the calculated distance with a matching threshold, and checks whether or not the subject in the gallery image is a matching target.
  • a matching threshold is set for each domain. The matching unit 1352 uses the threshold set for the domain to which the gallery image belongs as the matching threshold. If the calculated distance is equal to or less than the threshold for matching, the matching unit 1352 determines that the subject in the gallery image is a matching target for the query image. On the other hand, if the calculated distance is greater than the threshold for matching, the matching unit 1352 determines that the subject in the gallery image is not a target for matching with the query image.
  • the first registration unit 136 registers each feature amount of the query image as the feature amount of the query image of the corresponding domain.
  • the first registration unit 136 registers the feature amount of the query image of the first domain in the query feature amount data 121 as the first domain feature amount.
  • the first registration unit 136 also registers the feature amount of the query image of the second domain in the query feature amount data 121 as the second domain feature amount.
  • the domain determination unit 132 compares the time when the image to be determined was captured with the predetermined time zones of morning, noon, and night. determines the domain of the image to be determined.
  • the time zone for each domain is set in advance so that 6:00 to 11:00 is morning, 11:00 to 18:00 is noon, and 18:00 to 6:00 is night.
  • the domain determination unit 132 confirms meta information regarding the shooting time of the image to be determined. Then, the domain determination unit 132 determines the domain of the determination target image according to the shooting time of the determination target image.
  • the domain determination unit 132 performs determination by comparing the average brightness of all pixels of the image to be determined with a preset threshold value.
  • the domain of the image of interest may be determined.
  • Equation (1) the brightness of the pixel at coordinates (i, j) is given by equation (1).
  • ⁇ , ⁇ , and ⁇ in Equation (1) are preset parameters.
  • the domain determination unit 132 calculates the average luminance value of all pixels in the image to be determined, and writes it as Equation (2).
  • the domain determination unit 132 compares the luminance average value L shown in Equation (2) with a preset threshold value, and determines whether the image to be determined belongs to the morning, daytime, or night domain. . For example, when the average value L is less than the first threshold value, the domain determination unit 132 determines that the domain of the determination target image is night. For example, when the average luminance value L is equal to or greater than the first threshold value and less than the second threshold value (>the first threshold value), the domain determination unit 132 determines that the domain of the image to be determined is morning. I judge. For example, when the average value L is equal to or greater than the second threshold, the domain determining unit 132 determines that the domain of the determination target image is daytime.
  • red domain, blue domain, and yellow domain are set as domains.
  • the domain determination unit 132 calculates the average value of all pixels of the R, G, and B luminances of the image to be determined, and determines the domain of the image to be determined according to which average value is the maximum.
  • Equation (3) represents the average value of all pixels of R luminance.
  • Equation (4) represents the average value of all pixels of G intensity.
  • Equation (5) represents the average value of all pixels of B luminance.
  • the domain determination unit 132 calculates the average values of all pixels of the R, G, and B luminances of the formulas (3) to (5) for the image to be determined, and uses the formula (6) to determine the domain. do.
  • the domain determination unit 132 determines that the domain of the image to be determined is red when Expression (6) is R.
  • the domain determination unit 132 determines that the domain of the image to be determined is blue when Expression (6) is B.
  • the domain determination unit 132 determines that the domain of the determination target image is yellow. Note that yellow is the sum of red and green.
  • the domain determining unit 132 determines the domain to which the determination target image belongs by combining two or more of the time when the determination target image was captured, the brightness of the determination target image, and the color and saturation of the determination target image. You may
  • the domain determination unit 132 performs determination by combining domain determination based on time and domain determination based on color and saturation.
  • the domain determining unit 132 determines that the domain of the image to be determined is "daytime”. Further, when the time at which the image to be determined was captured is from 6:00 p.m. I judge.
  • the domain determining unit 132 determines that the domain of the determination target image is "night/blue". I judge.
  • the domain transforming unit 133 transforms the query image into images of all domains different from the domain to which the query image belongs, thereby registering the feature amount of the query image of the same domain as that of the gallery image.
  • the domain conversion unit 133 converts the query image based on the brightness of the query image. For example, in the case of a domain related to brightness such as morning, noon, and night, the domain conversion unit 133 uniformly multiplies the RGB values of each pixel of the query image to be converted by a coefficient to obtain the RGB brightness after multiplication. is transformed to match the average luminance of the domain being transformed.
  • (r ij , g ij , b ij ) be the RGB values of the pre-conversion pixel at the coordinates (i, j) of the image to be converted.
  • (r ij ', g ij ', b ij ') be the RGB values of the converted pixel at the coordinates (i, j) of the image to be converted.
  • the domain transforming unit 133 transforms, for example, R using equation (10).
  • G and B can be transformed by replacing r ij in equation (10) with g ij or b ij .
  • L' in equation (10) is the average luminance of the domain to be converted, and is set in advance.
  • Let (r ij , g ij , b ij ) be the RGB values of the pixel at the coordinates (i, j) of the image to be determined in Equation (10).
  • lij is the brightness of the pixel at coordinates (i, j), which is given by equation (11).
  • ⁇ , ⁇ , and ⁇ in Equation (11) are preset parameters. Both i and k represent horizontal coordinates in image pixels. An operation is performed to obtain an expected value (average) by distinguishing between i and k. Both j and l represent vertical coordinates in image pixels. An operation is performed to obtain an expected value (average) by distinguishing between j and l.
  • the domain conversion unit 133 converts the query image based on the color and saturation (color) of the query image. For example, in the case of domains related to color shifts such as red, blue, and yellow due to the influence of large displays, etc., by uniformly multiplying the RGB values of each pixel of the query in front of us by a factor, the RGB values after multiplication Transform the values so that they match the average RGB value of the domain being transformed.
  • (r ij , g ij , b ij ) be the RGB values of the pre-conversion pixel at the coordinates (i, j) of the image to be converted.
  • (r ij ', g ij ', b ij ') be the RGB values of the converted pixel at the coordinates (i, j) of the image to be converted.
  • the domain transforming unit 133 transforms, for example, R using equation (12).
  • R' is the average pixel value of the destination domain and is preset.
  • G and B can be converted by setting r ij in equation (12) to g ij or b ij and R' to G' or B'.
  • the domain conversion unit 133 may use a converter to convert the image to be converted into an image of another domain.
  • a transformer is a transformer that transforms an image into an image of another domain different from the domain to which the image belongs, and is a transformer that has been trained to transform a plurality of images belonging to each domain.
  • the domain conversion unit 133 uses GAN (Generative Adversarial Networks) as a converter.
  • GAN Generic Adversarial Networks
  • the inference device 10 prepares a sufficient number of images of each domain in advance and makes the GAN learn. Then, the domain conversion unit 133 converts the query image into an image of each domain by inputting the query image into the GAN.
  • the domain of the query image is transformed into an image of each domain in order to match the domain with the gallery image. This is because a query image can easily allow an increase in the amount of conversion processing when it is desired to realize real-time processing.
  • the frequency of query registration is assumed to be about once every several seconds at most, and the query image only needs to be registered once, so the frequency of domain conversion can be greatly reduced.
  • the gallery image is not subjected to domain conversion. If both the query image and the gallery image are domain-converted, it is expected that matching accuracy will be degraded compared to the case where only one of them is domain-converted. Therefore, in the first embodiment, matching accuracy is maintained by performing domain conversion for all domains only on the query image and switching the model used for feature extraction.
  • FIG. 5 is a flow chart showing a processing procedure of registration processing of a feature amount of a query image according to Embodiment 1.
  • the domain determination unit 132 determines to which domain the query image belongs (step S12).
  • the domain conversion unit 133 converts the query image into an image of another domain different from the domain to which the query image belongs (step S13).
  • the model selection unit 134 selects a model corresponding to the domain of the transformed query image from among the models in the model group 122 (step S14).
  • the feature amount extraction unit 1351 uses the model selected by the model selection unit 134 to extract the feature amount of the query image after domain conversion (step S15). Then, the first registration unit 136 registers the feature amount of the query image in the query feature amount data 121 as the feature amount of the query image of the converted domain (step S16).
  • the domain conversion unit 133 determines whether or not there is a domain to be converted next (step S17). When the query image has been transformed into all domains corresponding to all models, the domain transforming unit 133 determines that there is no domain to be transformed next (Step S17: No), and ends the process. If the query image has not been converted to all domains corresponding to all models, the domain conversion unit 133 determines that there is a next domain to be converted (step S17: Yes), and returns to step S13. , perform domain processing of the query image for the untransformed domain.
  • FIG. 6 is a flowchart of a processing procedure of inference processing according to the first embodiment.
  • the domain determination unit 132 determines to which domain the gallery image belongs (step S22).
  • the model selection unit 134 selects a model corresponding to the domain of the gallery image determined in step S22 from among the models in the model group 122 (step S23).
  • the feature amount extraction unit 1351 uses the model selected by the model selection unit 134 to extract the feature amount of the gallery image (step S24).
  • the collation unit 1352 refers to the feature amount of the query image corresponding to the domain to which the gallery image belongs, determined in step S22, among the query feature amount data 121 (step S25).
  • the matching unit 1352 calculates the distance between the feature amount of the gallery image and the feature amount of the referenced query image, compares the calculated distance with a threshold for matching, and determines that the object in the gallery image is to be matched. (step S26).
  • the collation unit 1352 then outputs the collation result (step S27), and terminates the process.
  • the inference apparatus 10 determines the domain of the gallery image and uses the model corresponding to that domain, so that it is possible to perform appropriate feature extraction processing according to the determination, and to perform image matching. accuracy can be improved.
  • the inference device 10 converts the domain of the query image into images of all domains corresponding to trained models, and extracts features from the query image of each domain using the model corresponding to each domain. , register. Therefore, the inference device 10 can prepare query images of all domains by converting the query images into images of all domains corresponding to the trained model. Therefore, since the inference device 10 can compare the feature amount of the query image in the same domain as the gallery image with the feature amount of the gallery image at the time of matching, the domains of the query image and the gallery image are different. It is possible to reduce the decrease in accuracy caused by this.
  • FIG. 7 is a diagram schematically showing an example of another configuration of the inference device according to the modification of Embodiment 1.
  • FIG. 7 is a diagram schematically showing an example of another configuration of the inference device according to the modification of Embodiment 1.
  • the inference device 10A according to the modification of the first embodiment has a control unit 13A instead of the control unit 13 shown in FIG. 13 A of control parts have the same function as the control part 13.
  • FIG. 7 the inference device 10A according to the modification of the first embodiment has a control unit 13A instead of the control unit 13 shown in FIG. 13 A of control parts have the same function as the control part 13.
  • FIG. 13A the control unit 13A instead of the control unit 13 shown in FIG. 13 A of control parts have the same function as the control part 13.
  • the control unit 13A has an inference unit 135A having a classification unit 1352A.
  • the classification unit 1352A uses a trained classification model that performs class classification, calculates logit from the feature amount (feature amount vector) of the inference image extracted by the feature amount extraction unit 1351, and calculates the object of the estimation image. determine the class of The classification unit 1352A may register and output the class classification result as the inference result 124A.
  • FIG. 8 is a flowchart showing the procedure of inference processing according to the modification of the first embodiment.
  • the inference device 10A accepts input of an inference image (step S31), and performs the same processing as steps S22 to S24 shown in FIG. Specifically, the domain determining unit 132 determines the domain of the inference image (step S32).
  • the model selection unit 134 selects a model corresponding to the domain determined in step S32 from the model group 122 (step S33).
  • the feature amount extraction unit 1351 uses the model selected by the model selection unit 134 to extract the feature amount of the inference image (step S34).
  • the classification unit 1352A calculates logit from the feature amount of the image for inference, performs class classification for determining the class of the subject of the image for estimation (step S35), and outputs the classification result (step S36).
  • the domain of the inference image is determined, and a model corresponding to the domain is used to extract an appropriate feature quantity according to the determination. can be executed, and the accuracy of class classification can be improved.
  • Embodiment 2 Next, Embodiment 2 will be described.
  • a training device that trains models corresponding to respective domains used by inference devices 10 and 10A will be described.
  • trained models are provided corresponding to each domain. Therefore, in the training apparatus according to Embodiment 2, a training image is prepared for each domain, and a model is trained for each domain using the training image of each domain.
  • the training apparatus according to the second embodiment prepares training images for each domain from existing data sets.
  • FIG. 9 is a diagram schematically showing an example of a configuration of a training device according to Embodiment 2.
  • the training device 20 has an input/output unit 21, a storage unit 22 and a control unit .
  • the input/output unit 21 receives input of information and outputs information.
  • the input/output unit 21 is, for example, a communication interface that transmits and receives various information to and from other devices connected via a network or the like.
  • the input/output unit 21 performs communication between another device and a control unit 23 (described later) via an electric communication line such as a LAN or the Internet.
  • the input/output unit 21 is a device such as a mouse and a keyboard that receives input of various instruction information to the training apparatus 20 in response to user's input operation.
  • the input/output unit 21 is realized by, for example, a liquid crystal display or the like, and displays and outputs a screen that is display-controlled by the training device 20 .
  • the storage unit 22 is implemented by a semiconductor memory device such as RAM and flash memory, and stores processing programs for operating the training device 20, data used during execution of the processing programs, and the like.
  • the storage unit 22 has a dataset 221 , a training image 222 and a model group 223 .
  • the dataset 221 is, for example, a public dataset such as the MSMT public dataset.
  • the MSMT public dataset contains a wide range of images taken from morning to night.
  • the training image 222 has a trained image group corresponding to each domain.
  • the training images 222 include a first domain image group 2221 that is a training image group for the model corresponding to the first domain, and a second domain image group that is a training image group for the model corresponding to the second domain. It has an image group 2222 .
  • the model group 223 has a plurality of trained models used by the inference unit 135.
  • the model group 223 has a first domain model 2231 corresponding to the first domain and a second domain model 2232 corresponding to the second domain.
  • the control unit 23 controls the training device 20 as a whole.
  • the control unit 23 is, for example, an electronic circuit such as a CPU, or an integrated circuit such as an ASIC or FPGA.
  • the control unit 23 also has an internal memory for storing programs defining various processing procedures and control data, and executes each processing using the internal memory. Further, the control unit 23 functions as various processing units by running various programs.
  • the control unit 23 has a training image acquisition unit 231 and a training unit 232 .
  • the training image acquisition unit 231 includes a data set acquisition unit 2311 (input unit), a domain determination unit 2312 (determination unit), a domain conversion unit 2313 (conversion unit), and a second registration unit 2314 (registration unit, second registration unit). part).
  • the dataset acquisition unit 2311 acquires a dataset such as an MSMT public dataset as a training image.
  • the data set acquisition unit 2311 may also acquire actual data actually captured by a camera or the like that captures an inference image, in addition to the public data set.
  • the domain determination unit 2312 has the same function as the domain determination unit 132.
  • the domain determination unit 2312 determines the domain to which the training image belongs, based on the elements that establish the environment in which the subject is imaged.
  • the element that establishes the environment in which the subject is imaged is, for example, at least one of the time when the image to be determined was captured, the brightness of the image to be determined, and the color and saturation of the image to be determined.
  • the domain determination unit 2312 determines the domain to which each image included in the data set belongs.
  • the domain conversion unit 2313 converts the training image into an image of another domain different from the domain to which the training image belongs, based on the elements that establish the environment in which the subject is imaged.
  • An element that establishes the environment in which the subject is imaged is, for example, at least one of the brightness of the training image and the color and saturation of the determination target image.
  • the second registration unit 2314 registers the determination target training image as a model training image corresponding to the domain determined by the domain determination unit 2312 . For example, when the domain determination unit 2312 determines that a certain training image belongs to the first domain, the second registration unit 2314 registers the training image as an image of the first domain image group 2221 . Register as In addition, when a certain training image is converted into an image of the second domain by the domain conversion unit 2313 , the second registration unit 2314 stores the converted training image in the second domain image group 2222 . Register as an image.
  • the training unit 232 selects a training image group corresponding to the domain of the model to be trained from among the training images of each domain registered by the second registration unit 2314, and uses the selected training image group. to perform model training.
  • the training unit 232 selects images of the first domain image group 2221 as training images and trains the model corresponding to the first domain.
  • the training unit 232 may use a known mechanism such as back propagation to repeatedly update the parameters of each model composed of a neural network until a predetermined termination condition is reached.
  • the domain determination unit 2312 Similar to the domain determination unit 132, the domain determination unit 2312 determines the domain to which the training image belongs using time, brightness, color, and saturation.
  • the domain determination unit 132 determines the time when the image to be determined was captured and the predetermined time zones of morning, noon, and night. The domain of the image to be determined is determined by comparing with .
  • the domain determination unit 2312 determines that the domain is morning when the average value L of luminance of all pixels of the determination target image is equal to or greater than the threshold, and determines that the domain is morning when the average value L is less than the threshold. is night.
  • the domain determination unit 2312 determines whether it is red domain, blue domain, or yellow domain. Determine whether it is red domain, blue domain, or yellow domain.
  • the domain determination unit 2312 determines the domain to which the determination target image belongs by combining two or more of the time when the determination target image was captured, the brightness of the determination target image, and the color and saturation of the determination target image. You may
  • the domain conversion unit 2313 converts the images of the dataset into images of other domains to generate training images of each domain.
  • the domain conversion unit 2313 performs domain conversion based on luminance, color, saturation (color), and the like. In this case, it is assumed that the tendency of data belonging to each domain is known for each domain in terms of brightness and color.
  • the average luminance value L ((written as equation (13)) of all pixels of each image in the morning, noon, and night domains is is known to generally follow a normal distribution, and its mean and variance are known.
  • the domain conversion unit 2313 performs domain conversion so that the average and variance of the average luminance value L of the image of the training data set at hand match the average and variance of each domain.
  • the domain conversion unit 2313 converts each k-th pixel value (formula (14)) of the available training data set as shown in formula (15).
  • V is a variance-covariance matrix
  • E represents an operation for obtaining an expected value vector
  • the domain conversion unit 2313 converts the average and variance of the RGB values (R, G, and B shown in Equations (16) to (18)) of the training data set at hand to the average and variance of each domain. Do domain translation to match.
  • the domain conversion unit 2313 converts each k-th pixel value of the available training data set as shown in Equation (19).
  • Equation (19) V is the variance, and E represents the computation for obtaining the expected value vector.
  • G and B can be converted by setting R k in Equation (19) to G k or B k and r ij k to g ij k or b ij k .
  • FIG. 10 is a flowchart illustrating a processing procedure of training image acquisition processing according to the second embodiment.
  • the dataset acquisition unit 2311 acquires a dataset (step S41).
  • the domain determining unit 2312 refers to the arbitrarily selected determination target image from the data set acquired in step S41 (step S42), and determines the domain to which this image belongs (step S43).
  • the second registration unit 2314 registers the determination target image as a training image for the model corresponding to the domain determined in step S43 (step S44).
  • the domain determination unit 2312 determines whether or not the next image to be determined exists in the data set (step S45). When the domains of all the images in the data set have been determined, the domain determination unit 2312 determines that there is no domain to be determined next (step S45: No), and ends the process. If the domains of all the images in the data set have not been determined, the domain determining unit 2312 determines that there is a domain to be determined next (step S45: Yes), returns to step S42, and determines the next image. perform domain determination on
  • FIG. 11 is a flowchart showing another processing procedure of the training image acquisition process according to the second embodiment.
  • the training device 20 converts the images of the dataset into images of other domains to generate training images of each domain, for example, when a sufficient number of training images cannot be secured in each domain.
  • step S51 when the training device 20 receives an input of a domain for which a sufficient number of training images cannot be secured as a domain to be transformed (step S51), the dataset acquisition unit 2311 A data set is obtained (step S52).
  • the domain determining unit 2312 refers to an arbitrarily selected image to be transformed from the data set acquired in step S52 (step S53), and determines the domain to which this image belongs (step S54).
  • the domain conversion unit 2313 converts this image into the conversion target domain image (step S55).
  • the second registration unit 2314 registers the image whose domain has been transformed in step S55 as a training image of the model corresponding to the training domain of the transformation target domain (step S56).
  • the training device 20 determines whether or not there is an image to be transformed next (step S57). If a sufficient number of training images can be secured for the domain to be transformed, the training device 20 determines that there is no domain to be transformed next (step S57: No), and ends the process. If the number of training images is not sufficient for the domain to be transformed, the training device 20 determines that there is an image to be transformed next (step S57: Yes), returns to step S53, The next image is transformed into the domain to be transformed.
  • FIG. 12 is a flowchart of a training process procedure according to the second embodiment.
  • the training unit 232 selects a training image group corresponding to the training target domain (step S62).
  • the training unit 232 executes model training using the selected training image group (step S63).
  • the training device 20 determines the domain of each image in the dataset and prepares training images for each domain. Therefore, the training device 20 trains a model for each domain using a group of training images belonging to each domain. Therefore, according to the training device 20, the model corresponding to each domain can be appropriately trained, and the inference accuracy of the model can be improved.
  • the training device 20 when a sufficient number of training images cannot be secured in each domain, the images of the dataset are converted into images of the desired domain to generate training images of each domain. Therefore, the training device 20 can secure a sufficient number of training images for each domain. Therefore, according to the training device 20, model training can be appropriately executed for any domain, and the inference accuracy of the model can be improved.
  • a control unit 23A may be provided in which the domain determination unit 2312 shown in FIG. 9 is replaced with a domain determination unit 2312A.
  • the domain determination unit 2312A determines a threshold value and margin width for luminance in advance, and determines that it is a domain when the value is equal to or greater than "threshold - margin width". If it is less than "margin width", it is determined that the domain is at night.
  • the number of training images near the threshold tends to be small, and the accuracy may decrease when applying the models of each domain to the inference images near the threshold.
  • the number of training images near the threshold can be increased, and the deterioration of the model accuracy of each domain can be reduced.
  • Evaluation experiments A0, A1 and B were performed on the inference devices 10 and 10A and the training devices 20 and 20A in the first and second embodiments.
  • rank-k is the average for all queries of "the probability that even one image of the person himself appears among the top k images when the gallery is rearranged in descending order of distance to a certain query". rank-k takes a value between 0 and 1, and the higher the value, the better the accuracy.
  • mAP is the average for all queries of the "average for k of the precision rate (the percentage of the top k galleries for a given query that are occupied by the person himself/herself)". mAP takes a value between 0 and 1, and the higher the value, the better the accuracy.
  • Evaluation experiment A0 will be described. In the evaluation experiment A0, it was evaluated whether or not there is a difference in accuracy depending on the presence or absence of model switching when person verification is performed outdoors.
  • the MSMT dataset was used in the evaluation experiment A0.
  • the MSMT dataset is a dataset consisting of a training image group and an inference image group, and includes a wide range of images taken from morning to night.
  • each image of the MSMT dataset is subjected to domain determination, and a training image divided into each domain according to the determined domain is used to train a model corresponding to each domain. .
  • the domain of the gallery image was determined, and inference was performed by switching the model according to the domain of the gallery image.
  • evaluation experiment A0 the case where the domain of the query image and the gallery image was the same was evaluated.
  • the MSMT dataset was divided into training images of three domains (for example, morning, noon, and night) on the basis of brightness. Then, in the evaluation experiment A0, a threshold value that makes the number of images belonging to each domain equal in the training image group is adopted. In addition, regarding the division of the training image, evaluation was also made in the case of division with a margin shown in the modified example of the second embodiment. During inference, the domain of the inference image was determined based on the luminance.
  • a boundary value relating to luminance that divides domains is determined. Specifically, a domain to which an image in which the average luminance value L of all pixels shown in Equation (2) is greater than ⁇ 1 ⁇ (1 ⁇ ) will be referred to as a morning domain for convenience. For convenience, a domain to which an image having an average luminance value L smaller than ⁇ 1 ⁇ (1+ ⁇ ) and larger than ⁇ 2 ⁇ (1 ⁇ ) belongs is called a daytime domain. For convenience, a domain to which a large image whose average luminance value L is smaller than ⁇ 2 ⁇ (1+ ⁇ ) belongs is called a night domain.
  • the mean and standard deviation of luminance in the MSMT training image group are 76.1 and 27.7, respectively.
  • the training devices 20 and 20A divide the training image group by domain.
  • the MSMT dataset is divided into training image groups for morning, noon, and night domains.
  • the training devices 20 and 20A train a model corresponding to each domain using the training image group of each domain.
  • the inference device 10 performs domain determination and divides the inference image group by domain. This results in query images and gallery images for the morning, noon, and night domains.
  • each model in the inference device 10 is evaluated.
  • the morning, noon, or night model By applying the morning, noon, or night model to the corresponding morning, noon, or night query image group and the morning, noon, or night gallery image group, and calculating the feature vector distance for each domain, rank Calculate the values of -k and mAP.
  • the MSMT dataset is used as is to train a general model. Then, at the time of inference, the inference image group is divided for each domain. This results in query images and gallery images for the morning, noon, and night domains.
  • Table 1 shows the results of the evaluation experiment A0.
  • the data set division (experimental data A0-13) using the margin by the domain judgment unit 2312A of the training device 20A is more than the simple data set division (experimental data A0-10) by the domain judgment unit 2312 of the training device 20. was more effective.
  • Evaluation experiment A1 will be described. In the evaluation experiment A1, when the domains of the query image and the gallery image are different, there is a difference in accuracy depending on whether or not the domain conversion of the query image is performed, that is, whether or not the feature amount of the query image of each domain is registered. evaluated whether or not
  • the MSMT dataset was used in the evaluation experiment A1.
  • each image in the MSMT data set was subjected to domain determination, and training images divided into domains according to the determined domain were used to train a model corresponding to each domain.
  • the domain of the gallery image was determined, and inference was performed by switching the model according to the domain of the gallery image.
  • the evaluation experiment A1 the case where the domain of the query image and the gallery image was different was evaluated.
  • the training images were set the same as in the evaluation experiment A0.
  • the training images are divided into three domains (for example, morning, noon, and night).
  • the MSMT dataset is divided into training image groups for morning, noon, and night domains.
  • the training image group of each domain is used to train a model corresponding to each domain.
  • the inference device 10 performs domain determination and divides the inference image group by domain. This results in query images and gallery images for the morning, noon, and night domains.
  • the inference device 10 employs domain conversion of the query image based on luminance.
  • the average luminance L′ of the conversion destination domain is the average luminance value of the image group belonging to each domain in the training image group.
  • each model in the inference device 10 is evaluated.
  • rank Calculate the values of -k and mAP.
  • the dedicated model for morning/day/night is used as it is when evaluating the model for each domain. and gallery images of morning/day/night, and calculate the distance of the feature vector for each domain to calculate the values of rank-1 and mAP.
  • the MSMT dataset is used as is to train a general model. Then, at the time of inference, the inference image group is divided for each domain. This results in query images and gallery images for the morning, noon, and night domains.
  • the generic model is applied to the images obtained by domain-converting the day + night/night + morning/morning + day query images to morning/day/night and the morning/day/night gallery images. Calculate the rank-k and mAP values by calculating the distance of the quantity vector.
  • the general model is applied to the query image group of day + night / night + morning / morning + day as it is and the gallery image group of morning / day / night, domain The values of rank-1 and mAP are calculated by calculating the distance of the feature amount vector for each.
  • Table 2 shows the results of the evaluation experiment A1.
  • evaluation experiment B Next, evaluation experiment B will be described.
  • evaluation experiment B when applied to real data instead of a public data set, it was evaluated which of the training images of each domain by segmentation and transformation is suitable as training images for the model.
  • the divided training image is a training image of each domain prepared by data set division by the domain determination unit 2312 of the training device 20 shown in the processing procedure shown in FIG.
  • the divided training image is a training image of each domain prepared by generating a training image by domain conversion (see FIG. 11) by the domain conversion unit 2313 shown in the processing procedure shown in FIG.
  • training images a group of training images prepared by dividing the MSMT data set into domains and a group of training images generated for each domain by subjecting the images of the MSMT data set to domain conversion are prepared. bottom.
  • a real data-like dataset was used for the inference image.
  • the real-data-like dataset includes a wide range of images taken from morning to night, and the difficulties unique to real data that are not included in the public dataset, such as the effects of color and saturation (color) on a large display. include. Since it is difficult to obtain the actual data itself, in this experiment, the inference image group of Market 1501, which is domain-converted by converting the brightness of the image of the public dataset, was substituted.
  • domains are divided based on time and color.
  • time the daytime or nighttime label is already assigned to each image, so that label is used. Colors are classified into red, blue, and yellow as described above. Note that in this evaluation experiment B, the domain is the same between the query image and the gallery image.
  • the training device 20 divides the training image groups into domains. This divides the MSMT dataset into training images for the day, night red, night blue, and night yellow domains. The training device 20 trains a model corresponding to each domain using the training image group of each domain.
  • the inference device 10 performs domain determination and divides the inference image group by domain. This results in query and gallery images for the day, night red, night blue, and night yellow domains. Subsequently, each model in the inference device 10 is evaluated. applying a day/night red/night blue/night yellow dedicated model to the day/night red/night blue/night yellow query images and the day/night red/night blue/night yellow gallery images, By calculating the distance of the feature vector for each domain, rank-1 and mAP values are calculated.
  • the training device 20 transforms the training image group into each domain. This transforms the MSMT dataset into day, night red, night blue, and night yellow domains. As conversion, conversion by luminance is adopted. A transformation was performed to fit the mean and variance of the RGB values for each domain of the real data image. Then, the training device 20 trains a model corresponding to each domain using the training image group of each domain. The inference device 10 evaluates each model in the same way as when using the divided training images. Models to be evaluated include models corresponding to each domain as well as general-purpose models.
  • Table 3 shows the results of evaluation experiment B.
  • both the general-purpose model and the model trained for each domain were trained using the training image group generated by the domain transformation compared to the case of using the training image by segmentation. The result was more accurate. For this reason, it was found that training images of each domain obtained by transformation are preferable to training images of each domain obtained by division.
  • the training device 20 performs domain conversion to generate training images. .
  • a domain by time zone of morning, noon, and night was explained, but the domain is not limited to this.
  • the domain may be due to differences in weather or lighting (light source).
  • Weather domains include, for example, sunny, cloudy, rainy, and snowy. Domains based on the position of the sun due to changes in seasons and time zones include front light and backlight.
  • a domain may be set according to a person's posture, and in this case, there are upright, sitting on a chair, handstand, and the like.
  • Each component of the inference devices 10, 10A and the training devices 20, 20A is functionally conceptual, and does not necessarily need to be physically configured as illustrated. That is, the specific form of distribution and integration of the functions of the inference devices 10, 10A and the training devices 20, 20A is not limited to the illustrated one, and all or part of them can be arbitrarily changed according to various loads and usage conditions. can be functionally or physically distributed or integrated in units of
  • all or any part of the processing performed in the inference devices 10, 10A and the training devices 20, 20A is a CPU, a GPU (Graphics Processing Unit), and a program that is analyzed and executed by the CPU and GPU. may be implemented. Further, each process performed in the inference devices 10, 10A and the training devices 20, 20A may be realized as hardware by wired logic.
  • FIG. 14 is a diagram showing an example of a computer that implements the inference devices 10 and 10A and the training devices 20 and 20A by executing programs.
  • the computer 1000 has a memory 1010 and a CPU 1020, for example.
  • Computer 1000 also has hard disk drive interface 1030 , disk drive interface 1040 , serial port interface 1050 , video adapter 1060 and network interface 1070 . These units are connected by a bus 1080 .
  • the memory 1010 includes a ROM 1011 and a RAM 1012.
  • the ROM 1011 stores a boot program such as BIOS (Basic Input Output System).
  • BIOS Basic Input Output System
  • Hard disk drive interface 1030 is connected to hard disk drive 1090 .
  • a disk drive interface 1040 is connected to the disk drive 1100 .
  • a removable storage medium such as a magnetic disk or optical disk is inserted into the disk drive 1100 .
  • Serial port interface 1050 is connected to mouse 1110 and keyboard 1120, for example.
  • Video adapter 1060 is connected to display 1130, for example.
  • the hard disk drive 1090 stores an OS (Operating System) 1091, application programs 1092, program modules 1093, and program data 1094, for example. That is, a program that defines each process of the inference devices 10, 10A and the training devices 20, 20A is implemented as a program module 1093 in which code executable by the computer 1000 is written. Program modules 1093 are stored, for example, on hard disk drive 1090 .
  • the hard disk drive 1090 stores a program module 1093 for executing processing similar to the functional configurations of the inference devices 10 and 10A and the training devices 20 and 20A.
  • the hard disk drive 1090 may be replaced by an SSD (Solid State Drive).
  • the setting data used in the processing of the above-described embodiment is stored as program data 1094 in the memory 1010 or the hard disk drive 1090, for example. Then, the CPU 1020 reads out the program module 1093 and the program data 1094 stored in the memory 1010 and the hard disk drive 1090 to the RAM 1012 as necessary and executes them.
  • the program modules 1093 and program data 1094 are not limited to being stored in the hard disk drive 1090, but may be stored in a removable storage medium, for example, and read by the CPU 1020 via the disk drive 1100 or the like. Alternatively, the program modules 1093 and program data 1094 may be stored in another computer connected via a network (LAN (Local Area Network), WAN (Wide Area Network), etc.). Program modules 1093 and program data 1094 may then be read by CPU 1020 through network interface 1070 from other computers.
  • LAN Local Area Network
  • WAN Wide Area Network

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Image Analysis (AREA)

Abstract

An inference device (10) has: an image input unit (131) that receives the input of an image to be assessed; and a domain assessing unit (132) that, on the basis of an element that enables an environment in which a subject is imaged to be established, assesses whether the image to be assessed belongs to any domain among a plurality of domains respectively defined by environmental conditions.

Description

処理装置及び処理方法Processing equipment and processing method
 本発明は、処理装置及び処理方法に関する。 The present invention relates to a processing apparatus and a processing method.
 迷子等の早期発見及び追跡、防犯などを目的として、監視カメラ映像に対する人物等の照合や推定が行われている。近年、画像の照合や推定に対し、精度の向上が要求されている。例えば人物の照合の場合、同一人物であるのに朝と夜の違いで別人であると推定する場合や、夜に別人を当人であると誤認する場合がある。このように、時間帯、天候、照明などの環境条件に起因して、画像の照合や推定の精度が低下する場合がある。 For the purpose of early detection and tracking of lost children, crime prevention, etc., people are collated and estimated against surveillance camera images. In recent years, there has been a demand for improved accuracy in matching and estimating images. For example, when verifying a person, there are cases in which the same person is presumed to be a different person due to the difference between morning and night, or a different person is mistakenly recognized as the person at night. As described above, the accuracy of image collation and estimation may be degraded due to environmental conditions such as the time of day, weather, and lighting.
 ここで、データの自然変動を利用するモデルにより、環境条件の変化を補完することで精度向上を図る技術がある(非特許文献1参照)。 Here, there is a technique for improving accuracy by complementing changes in environmental conditions with a model that uses natural fluctuations in data (see Non-Patent Document 1).
 しかしながら、非特許文献1記載の技術では、昼と夜とのように環境条件の変化が大きすぎる場合には、十分な補完ができず、画像の照合または推定の精度を確保することができないという課題があった。 However, with the technique described in Non-Patent Document 1, when environmental conditions change too much, such as between day and night, sufficient interpolation cannot be performed, and the accuracy of image matching or estimation cannot be ensured. I had a problem.
 本発明は、上記に鑑みてなされたものであって、対象となる画像の環境条件を適切に判定することで、判定に応じた画像処理の実行を可能とし、画像の照合または推定の精度向上を図ることができる処理装置及び処理方法を提供することを目的とする。 The present invention has been made in view of the above, and by appropriately determining the environmental conditions of a target image, it is possible to execute image processing according to the determination, and improve the accuracy of image matching or estimation. It is an object of the present invention to provide a processing apparatus and a processing method capable of achieving
 上述した課題を解決し、目的を達成するために、本発明に係る処理装置は、判定対象の画像の入力を受け付ける入力部と、被写体が撮像される環境を成立させる要素を基に、判定対象の画像が、環境条件によりそれぞれ定義される複数のドメインのいずれのドメインに属するかを判定する判定部と、を有することを特徴とする。 In order to solve the above-described problems and achieve the object, a processing apparatus according to the present invention provides an input unit that receives an input of an image to be determined, and and a determination unit that determines to which of a plurality of domains each defined by environmental conditions the image belongs to.
 本発明によれば、対象となる画像の環境条件を適切に判定することで、判定に応じた画像処理を実行し、画像の照合または推定の精度の向上を図ることができる。 According to the present invention, by appropriately determining the environmental conditions of the target image, it is possible to perform image processing according to the determination and improve the accuracy of image matching or estimation.
図1は、実施の形態1の概要を説明する図である。FIG. 1 is a diagram for explaining the outline of the first embodiment. 図2は、実施の形態1の概要を説明する図である。FIG. 2 is a diagram for explaining the outline of the first embodiment. 図3は、実施の形態1の概要を説明する図である。FIG. 3 is a diagram explaining an outline of the first embodiment. 図4は、実施の形態1に係る推論装置の構成の一例を模式的に示す図である。4 is a diagram schematically showing an example of the configuration of the inference apparatus according to Embodiment 1. FIG. 図5は、実施の形態1に係るクエリ画像の特徴量の登録処理の処理手順を示すフローチャートである。FIG. 5 is a flow chart showing a processing procedure of registration processing of a feature amount of a query image according to Embodiment 1. FIG. 図6は、実施の形態1に係る推論処理の処理手順を示すフローチャートである。FIG. 6 is a flowchart of a processing procedure of inference processing according to the first embodiment. 図7は、実施の形態1の変形例に係る推論装置の他の構成の一例を模式的に示す図である。FIG. 7 is a diagram schematically showing an example of another configuration of the inference device according to the modification of Embodiment 1. FIG. 図8は、実施の形態1の変形例に係る推論処理の処理手順を示すフローチャートである。8 is a flowchart of a procedure of inference processing according to the modification of Embodiment 1. FIG. 図9は、実施の形態2に係る訓練装置の構成の一例を模式的に示す図である。FIG. 9 is a diagram schematically showing an example of a configuration of a training device according to Embodiment 2. FIG. 図10は、実施の形態2に係る訓練用画像取得処理の処理手順を示すフローチャートである。FIG. 10 is a flowchart illustrating a processing procedure of training image acquisition processing according to the second embodiment. 図11は、実施の形態2に係る訓練用画像取得処理の他の処理手順を示すフローチャートである。FIG. 11 is a flowchart illustrating another processing procedure of training image acquisition processing according to the second embodiment. 図12は、実施の形態2に係る訓練処理の処理手順を示すフローチャートである。FIG. 12 is a flowchart of a training process procedure according to the second embodiment. 図13は、実施の形態2の変形例に係る訓練装置の他の構成の一例を模式的に示す図である。13 is a diagram schematically showing an example of another configuration of the training device according to the modification of Embodiment 2. FIG. 図14は、プログラムが実行されることにより、訓練装置及び推定装置が実現されるコンピュータの一例を示す図である。FIG. 14 is a diagram illustrating an example of a computer that implements a training device and an estimation device by executing a program.
 以下、図面を参照して、本発明の一実施形態を詳細に説明する。なお、この実施形態により本発明が限定されるものではない。また、図面の記載において、同一部分には同一の符号を付して示している。 An embodiment of the present invention will be described in detail below with reference to the drawings. It should be noted that the present invention is not limited by this embodiment. Moreover, in the description of the drawings, the same parts are denoted by the same reference numerals.
[実施の形態1]
 図1~図3は、実施の形態1の概要を説明する図である。図1では、例えば、画像の照合を例に説明する。照合対象が写る画像をクエリ画像とし、照合対象が写っているか否かが照合される画像をギャラリ画像として説明する。
[Embodiment 1]
1 to 3 are diagrams for explaining the outline of the first embodiment. In FIG. 1, for example, image collation will be described as an example. An image in which a matching target is shown will be referred to as a query image, and an image for which whether or not a matching target is shown will be referred to as a gallery image.
 実施の形態1に係る推論装置は、複数のドメインにそれぞれ対応する訓練済みのモデルを有し、ギャラリ画像のドメインに応じて、使用するモデルを切り替えて、照合を行う。ドメインは、環境条件により定義される。図1~図3の例では、昼、夜のドメインがあり、昼に対応するモデルM1、夜に対応するモデルM2がある場合を例に説明する。 The inference apparatus according to Embodiment 1 has trained models corresponding to a plurality of domains, and performs matching by switching the model to be used according to the domain of the gallery image. Domains are defined by environmental conditions. In the examples of FIGS. 1 to 3, there are domains of daytime and nighttime, and a model M1 corresponding to daytime and a model M2 corresponding to nighttime will be described as an example.
 推論装置は、クエリ画像が入力されると、クエリ画像の特徴量の登録を行う。まず、推論装置は、クエリ画像を、昼、夜のドメインの画像に変換する。そして、推論装置は、昼のドメインのクエリ画像に対して、昼のドメインに対応するモデルM1を用いて特徴量を抽出し(図1の矢印Y11参照)、抽出した特徴量を、昼のドメインに対応するクエリ画像の特徴量として登録する。同様に、推論装置は、夜のドメインのクエリ画像に対して、夜のドメインに対応するモデルM2を用いて特徴量を抽出する(図1の矢印Y12参照)。推論装置は、抽出した特徴量を、夜のドメインに対応するクエリ画像の特徴量として登録する(図1の矢印Y12-1参照)。 When the query image is input, the inference device registers the feature quantity of the query image. First, the inference device transforms the query image into a day/night domain image. Then, the inference apparatus extracts a feature amount from the daytime domain query image using the model M1 corresponding to the daytime domain (see arrow Y11 in FIG. 1), and converts the extracted feature amount to the daytime domain. is registered as the feature quantity of the query image corresponding to . Similarly, the inference device extracts a feature quantity from the query image of the night domain using the model M2 corresponding to the night domain (see arrow Y12 in FIG. 1). The inference device registers the extracted feature amount as the feature amount of the query image corresponding to the night domain (see arrow Y12-1 in FIG. 1).
 次に、推論装置は、ギャラリ画像が入力されると、ギャラリ画像に対する推論を行う。まず、推論装置は、このギャラリ画像が属するドメインを判定する。図1の例では、ギャラリ画像のドメインは、夜と判定される。そして、推論装置は、モデルM1,M2のうち、ギャラリ画像のドメインである夜に対応するモデルM2を選択し(図1の(1))、選択したモデルM2を用いて、ギャラリ画像の特徴量を抽出する(図1の(2))。 Next, when the gallery image is input, the inference device makes an inference for the gallery image. First, the reasoner determines the domain to which this gallery image belongs. In the example of FIG. 1, the domain of the gallery image is determined to be night. Then, the inference device selects the model M2 corresponding to the night, which is the domain of the gallery image, from among the models M1 and M2 ((1) in FIG. 1), and uses the selected model M2 to obtain the feature values of the gallery image. is extracted ((2) in FIG. 1).
 推論装置は、ギャラリ画像のドメインが夜であるため、夜のドメインに対応するクエリ画像の特徴量を参照し、ギャラリ画像の特徴量と、参照したクエリ画像の特徴量との距離を算出する。推論装置は、算出した距離と、照合用の閾値とを比較することで、ギャラリ画像に照合対象が写っているか否かを照合する(図1の(3))。なお、ドメインごとに照合用の閾値が設定されており、推論装置は、照合時、ギャラリ画像のドメインに対して設定された照合用の閾値を用いる。 Since the domain of the gallery image is night, the inference device refers to the feature amount of the query image corresponding to the night domain, and calculates the distance between the feature amount of the gallery image and the feature amount of the referenced query image. The inference device compares the calculated distance with a matching threshold value to check whether or not the matching object appears in the gallery image ((3) in FIG. 1). Note that a matching threshold is set for each domain, and the inference apparatus uses the matching threshold set for the domain of the gallery image during matching.
 具体的に、図2を参照して、クエリ画像の特徴量の登録について説明する。図2では、複数枚のクエリ画像を受け付けた場合について説明する(図2の(1))。図2の場合、推論装置は、昼のドメインの作業着の人のクエリ画像Q1と、夜のドメインのスーツの人のクエリ画像Q2を受け付ける。推論装置は、ドメインが昼及び夜のギャラリ画像に備えて、クエリ画像のドメインを変換する(図2の(2))。具体的には、推論装置は、クエリ画像Q1を夜のドメインのクエリ画像Q12に変換する(矢印Y11参照)。そして、推論装置は、クエリ画像Q2を昼のドメインのクエリ画像Q21に変換する(矢印Y12参照)。 Specifically, registration of the feature amount of the query image will be described with reference to FIG. FIG. 2 illustrates a case where a plurality of query images are received ((1) in FIG. 2). In the case of FIG. 2, the inference device receives a query image Q1 of a person in work clothes in the day domain and a query image Q2 of a person in a suit in the night domain. The reasoning apparatus transforms the domain of the query image in preparation for gallery images whose domains are day and night ((2) in FIG. 2). Specifically, the inference device converts the query image Q1 into a query image Q12 of the night domain (see arrow Y11). Then, the inference device transforms the query image Q2 into a query image Q21 in the daytime domain (see arrow Y12).
 そして、推論装置は、昼のドメインのクエリ画像Q1,Q21の特徴量に対して、モデルM1を用いて特徴量を抽出し(図2の(3))、抽出した特徴量を、昼のドメインに対応するクエリ画像の特徴量M1-1,M2-1として登録する(図2の(4))。特徴量M1-1は、昼のドメインの作業着の人に対応し、特徴量M2-1は、昼のドメインのスーツの人に対応する。 Then, the inference apparatus extracts the feature amount using the model M1 for the feature amount of the query images Q1 and Q21 in the daytime domain ((3) in FIG. 2), and converts the extracted feature amount into the daytime domain. are registered as feature amounts M1-1 and M2-1 of the query image corresponding to ((4) in FIG. 2). The feature quantity M1-1 corresponds to a person in work clothes in the daytime domain, and the feature quantity M2-1 corresponds to a person in a suit in the daytime domain.
 推論装置は、夜のドメインのクエリ画像Q12,Q2の特徴量に対して、モデルM2を用いて特徴量を抽出し(図2の(3))、抽出した特徴量を、夜のドメインに対応するクエリ画像の特徴量M1-2,M2-2として登録する(図2の(4))。特徴量M1-2は、夜のドメインの作業着の人に対応し、特徴量M2-2は、夜のドメインのスーツの人に対応する。 The inference device extracts feature amounts using the model M2 for the feature amounts of the query images Q12 and Q2 in the night domain ((3) in FIG. 2), and assigns the extracted feature amounts to the night domain. are registered as the feature amounts M1-2 and M2-2 of the query image ((4) in FIG. 2). The feature amount M1-2 corresponds to a person in work clothes in the night domain, and the feature amount M2-2 corresponds to a person in a suit in the night domain.
 次に、図3を参照して、ギャラリ画像に対する推論について説明する。図3では、監視カメラC1,C2によって、夜間の画像I1及び日中の画像I2が撮影された場合について説明する。まず、人物切り出しタスクにおいて、画像I1,I2から、A~Dさんが写るギャラリ画像G1~G4が切り出される(図3の(A))。人物切り出しタスクは、監視カメラC1,C2と推論装置との間に設けられた他の装置が実行してもよいし、推論装置が実行してもよい。 Next, inference for gallery images will be described with reference to FIG. FIG. 3 illustrates a case where a nighttime image I1 and a daytime image I2 are captured by the monitoring cameras C1 and C2. First, in the person clipping task, gallery images G1 to G4 in which Mr. A to D appear are clipped from images I1 and I2 ((A) in FIG. 3). The person clipping task may be executed by another device provided between the monitoring cameras C1, C2 and the inference device, or may be executed by the inference device.
 そして、推論装置は、ギャラリ画像G1~G4のドメインを判定する。推論装置は、ギャラリ画像G1,G2のドメインは夜と判定する。推論装置は、ギャラリ画像G4のドメインは昼と判定する。推論装置は、ギャラリ画像G3については、太陽が出ている時間帯であったが木陰の暗い場所での画像であるため、夜のドメインであると判定する(矢印Y31参照)。そして、推論装置は、昼及び夜のドメインに応じてギャラリ画像G1~G4を分ける(図3の(2))。 Then, the inference device determines the domain of the gallery images G1 to G4. The inference device determines that the domain of gallery images G1 and G2 is night. The inference device determines that the domain of gallery image G4 is daytime. The inference device determines that the gallery image G3 is in the night domain because it was in the time when the sun was out but in a dark place under the shade of trees (see arrow Y31). Then, the inference device divides the gallery images G1 to G4 according to the day and night domains ((2) in FIG. 3).
 推論装置は、ギャラリ画像G1~G3については、夜のドメインに対応するモデルM2を選択し、選択したモデルM2を用いて、ギャラリ画像の特徴量を抽出する(図3の(3))。推論装置は、ギャラリ画像G4については、昼のドメインに対応するモデルM1を選択し、選択したモデルM2を用いて、ギャラリ画像の特徴量を抽出する(図3の(3))。 The inference device selects the model M2 corresponding to the night domain for the gallery images G1 to G3, and uses the selected model M2 to extract the feature values of the gallery images ((3) in FIG. 3). For the gallery image G4, the inference device selects the model M1 corresponding to the daytime domain, and uses the selected model M2 to extract the feature amount of the gallery image ((3) in FIG. 3).
 推論装置は、ギャラリ画像G1~G3の特徴量と、予め登録された夜のドメインのクエリ画像の特徴量とを比較して、照合を行う。推論装置は、ドメインが夜であるA~Cさんの各特徴量と、夜のドメインに対応するクエリ画像の作業着の人及びスーツの人の特徴量とをそれぞれ比較して、照合を行う(図3の(4))。 The inference device compares the feature amounts of the gallery images G1 to G3 with the feature amounts of the pre-registered query image of the domain of night, and performs matching. The inference device compares each feature amount of Mr. A to C whose domain is night with the feature amount of the person in work clothes and the person in suit of the query image corresponding to the night domain, respectively, and performs matching ( (4) in FIG. 3).
 また、推論装置はギャラリ画像G4の特徴量と、予め登録された昼のドメインのクエリ画像の特徴量とを比較して、照合を行う。推論装置は、ドメインが昼であるDさんの特徴量と、昼のドメインに対応するクエリ画像の作業着の人及びスーツの人の特徴量とをそれぞれ比較して、照合を行う(図3の(4))。 In addition, the inference device compares the feature amount of the gallery image G4 with the feature amount of the pre-registered query image of the daytime domain for matching. The inference device compares the feature amount of Mr. D whose domain is daytime with the feature amounts of the person in work clothes and the person in suit in the query image corresponding to the daytime domain, respectively, and performs matching (see FIG. 3). (4)).
 このように、実施の形態1に係る推論装置は、クエリ画像のドメインを、訓練済みのモデルに対応する全てのドメインの画像にそれぞれ変換し、各ドメインに対応するモデルを用いて各ドメインのクエリ画像から特徴量を抽出し、ドメインに対応付けて各クエリ画像の特徴量登録しておく。 As described above, the inference apparatus according to Embodiment 1 transforms the domain of the query image into images of all domains corresponding to trained models, and uses the model corresponding to each domain to query each domain. A feature amount is extracted from an image, and the feature amount of each query image is registered in association with a domain.
 これによって、推論装置は、照合時に、ギャラリ画像のドメインと同じドメインのクエリ画像の特徴量と、ギャラリ画像の特徴量とを比較することができる。このため、推論装置によれば、クエリ画像とギャラリ画像とのドメインが異なることによる精度低下を低減することができる。 This allows the inference device to compare the feature amount of the query image in the same domain as the gallery image with the feature amount of the gallery image at the time of matching. Therefore, according to the inference device, it is possible to reduce accuracy degradation due to the difference in domain between the query image and the gallery image.
 そして、推論装置は、ギャラリ画像のドメインを判定して、そのドメインに対応するモデルを使用するため、判定に応じて適切な特徴量の抽出処理を実行することができ、画像の照合の精度向上を図ることができる。 Since the inference device determines the domain of the gallery image and uses the model corresponding to that domain, it is possible to perform appropriate feature extraction processing according to the determination, improving the accuracy of image matching. can be achieved.
[推論装置]
 次に、実施の形態1に係る推論装置について説明する。図4は、実施の形態1に係る推論装置の構成の一例を模式的に示す図である。図4に示すように、推論装置10は、入出力部11、記憶部12及び制御部13を有する。
[Inference device]
Next, an inference apparatus according to Embodiment 1 will be described. 4 is a diagram schematically showing an example of the configuration of the inference apparatus according to Embodiment 1. FIG. As shown in FIG. 4 , the inference device 10 has an input/output unit 11 , a storage unit 12 and a control unit 13 .
 入出力部11は、情報の入力を受け付け、また、情報の出力を行う。入出力部11は、例えば、ネットワーク等を介して接続された他の装置との間で、各種情報を送受信する通信インタフェースである。入出力部11は、LAN(Local Area Network)やインターネットなどの電気通信回線を介した他の装置と制御部13(後述)との間の通信を行う。また、入出力部11は、ユーザによる入力操作に対応して、推論装置10に対する各種指示情報の入力を受け付ける、マウスやキーボード等のデバイス装置である。入出力部11は、例えば、液晶ディスプレイなどによって実現され、推論装置10によって表示制御された画面が表示出力される。 The input/output unit 11 receives input of information and outputs information. The input/output unit 11 is, for example, a communication interface that transmits and receives various information to and from other devices connected via a network or the like. The input/output unit 11 performs communication between another device and the control unit 13 (described later) via an electric communication line such as a LAN (Local Area Network) or the Internet. The input/output unit 11 is a device such as a mouse and a keyboard that receives input of various instruction information to the inference apparatus 10 in response to user's input operation. The input/output unit 11 is implemented by, for example, a liquid crystal display, and displays and outputs a screen whose display is controlled by the inference device 10 .
 記憶部12は、RAM(Random Access Memory)、フラッシュメモリ(Flash Memory)等の半導体メモリ素子によって実現され、推論装置10を動作させる処理プログラムや、処理プログラムの実行中に使用されるデータなどが記憶される。記憶部12は、クエリ特徴量データ121、モデル群122、及び、推論部135(後述)による推論の結果を示す推論結果123を有する。 The storage unit 12 is realized by semiconductor memory devices such as RAM (Random Access Memory) and flash memory, and stores processing programs that operate the inference device 10 and data used during execution of the processing programs. be done. The storage unit 12 has query feature amount data 121, a model group 122, and an inference result 123 indicating the result of inference by an inference unit 135 (described later).
 クエリ特徴量データ121は、各ドメインにそれぞれ変換されたクエリ画像の特徴量を有する。例えば、クエリ特徴量データ121は、第1ドメインに変換されたクエリ画像の特徴量である第1ドメイン特徴量や、第2ドメインに変換されたクエリ画像の特徴量である第2ドメイン特徴量を有する。 The query feature amount data 121 has the feature amount of the query image transformed into each domain. For example, the query feature amount data 121 includes a first domain feature amount that is the feature amount of the query image transformed into the first domain, and a second domain feature amount that is the feature amount of the query image transformed into the second domain. have.
 モデル群122は、推論部135(後述)が使用する複数のモデルを有する。各モデルは、特徴抽出モデルであり、いずれも訓練済みである。各モデルは、例えば、ニューラルネットワーク(NN)によって構成される。モデルは、各ドメインにそれぞれ対応して設けられる。例えば、第1ドメイン用モデルは、第1ドメインに対応したモデルであり、第2ドメイン用モデルは、第2ドメインに対応したモデルである。 The model group 122 has a plurality of models used by the inference unit 135 (described later). Each model is a feature extraction model and has been trained. Each model is configured by, for example, a neural network (NN). A model is provided corresponding to each domain. For example, the first domain model is a model corresponding to the first domain, and the second domain model is a model corresponding to the second domain.
 制御部13は、推論装置10全体を制御する。制御部13は、例えば、CPU(Central Processing Unit)等の電子回路や、ASIC(Application Specific Integrated Circuit)、FPGA(Field Programmable Gate Array)等の集積回路である。また、制御部13は、各種の処理手順を規定したプログラムや制御データを格納するための内部メモリを有し、内部メモリを用いて各処理を実行する。また、制御部13は、各種のプログラムが動作することにより各種の処理部として機能する。制御部13は、画像入力部131(入力部)、ドメイン判定部132(判定部)、ドメイン変換部133(変換部)、モデル選択部134(選択部)、推論部135及び第1の登録部136を有する。 The control unit 13 controls the inference device 10 as a whole. The control unit 13 is, for example, an electronic circuit such as a CPU (Central Processing Unit) or an integrated circuit such as an ASIC (Application Specific Integrated Circuit) or an FPGA (Field Programmable Gate Array). The control unit 13 also has an internal memory for storing programs defining various processing procedures and control data, and executes each processing using the internal memory. Further, the control unit 13 functions as various processing units by running various programs. The control unit 13 includes an image input unit 131 (input unit), a domain determination unit 132 (determination unit), a domain conversion unit 133 (conversion unit), a model selection unit 134 (selection unit), an inference unit 135, and a first registration unit. 136.
 画像入力部131は、判定対象及び変換対象となるクエリ画像の入力を受け付ける。画像入力部131は、判定対象となるギャラリ画像の入力を受け付ける。ギャラリ画像は、推論部135の推論対象である推論用画像である。 The image input unit 131 receives input of query images to be determined and converted. The image input unit 131 receives an input of a gallery image to be determined. The gallery image is an inference image that is an inference target of the inference unit 135 .
 ドメイン判定部132は、被写体が撮像される環境を成立させる要素を基に、判定対象の画像が、複数のドメインのいずれのドメインに属するかを判定する。被写体が撮像される環境を成立させる要素は、例えば、判定対象の画像が撮像された時刻、判定対象の画像の輝度、判定対象の画像の色彩及び彩度の少なくとも一つである。ドメイン判定部132は、クエリ画像及びギャラリ画像がいずれのドメインに属するかを判定する。 The domain determination unit 132 determines to which of a plurality of domains the image to be determined belongs, based on the elements that establish the environment in which the subject is imaged. The element that establishes the environment in which the subject is imaged is, for example, at least one of the time when the image to be determined was captured, the brightness of the image to be determined, and the color and saturation of the image to be determined. The domain determination unit 132 determines to which domain the query image and the gallery image belong.
 ドメイン変換部133は、変換対象の画像が、複数のドメインのいずれかのドメインに属する場合、被写体が撮像される環境を成立させる要素を基に、変換対象の画像を、変換対象の画像が属するドメインと異なる他のドメインの画像に変換する。被写体が撮像される環境を成立させる要素は、例えば、判定対象の画像の輝度、判定対象の画像の色彩及び彩度の少なくとも一つである。 When the image to be transformed belongs to one of a plurality of domains, the domain transforming unit 133 determines the image to be transformed to belong to one of the domains based on the elements that establish the environment in which the subject is captured. Convert the image to another domain different from the domain. The element that establishes the environment in which the subject is imaged is, for example, at least one of the brightness of the image to be determined, and the color and saturation of the image to be determined.
 ドメイン変換部133は、クエリ画像の特徴量の登録段階では、クエリ画像を、クエリ画像が属するドメインと異なる他のドメインの画像に変換する。クエリ画像が属するドメインと異なる他のドメインが複数ある場合には、ドメイン変換部133は、他のドメイン全てについて、クエリ画像の変換を行う。例えば、クエリ画像が属するドメインが第1ドメインであって、第1ドメイン以外に、第2ドメイン及び第3ドメインが定義されている場合、ドメイン変換部133は、クエリ画像を第2ドメインの画像に変換し、クエリ画像を第3ドメインの画像に変換する。 At the stage of registering the feature amount of the query image, the domain conversion unit 133 converts the query image into an image of another domain different from the domain to which the query image belongs. If there are multiple domains different from the domain to which the query image belongs, the domain conversion unit 133 converts the query image for all other domains. For example, when the domain to which the query image belongs is the first domain, and the second domain and the third domain are defined in addition to the first domain, the domain conversion unit 133 converts the query image into the image of the second domain. Transform the query image into a third domain image.
 モデル選択部134は、モデル群122のモデルのうち、ドメイン判定部132によって判定された判定対象の画像のドメインに対応するモデルを選択する。 The model selection unit 134 selects a model corresponding to the domain of the determination target image determined by the domain determination unit 132 from among the models in the model group 122 .
 モデル選択部134は、クエリ画像の特徴量の登録段階では、モデル群122のモデルのうち、クエリ画像、及び、ドメイン変換部133によって変換されたクエリ画像の各ドメインに対応するモデルをそれぞれ選択する。例えば、クエリ画像が属するドメインが第1ドメインであるとドメイン判定部132によって判定された場合、モデル選択部134は、クエリ画像に対して第1ドメイン用モデルを選択する。また、モデル選択部134は、第2ドメインに変換されたクエリ画像に対して、第2ドメイン用モデルを選択する。 At the stage of registering the feature amount of the query image, the model selection unit 134 selects models corresponding to each domain of the query image and the query image converted by the domain conversion unit 133 from among the models of the model group 122. . For example, when the domain determination unit 132 determines that the domain to which the query image belongs is the first domain, the model selection unit 134 selects the model for the first domain for the query image. Also, the model selection unit 134 selects a model for the second domain for the query image converted to the second domain.
 モデル選択部134は、ギャラリ画像に対する推論段階では、ドメイン判定部132の判定結果を基に、モデル群122のモデルのうち、ギャラリ画像が属するドメインに対応するモデルを選択する。例えば、ギャラリ画像が属するドメインが第2ドメインであるとドメイン判定部132によって判定された場合、モデル選択部134は、ギャラリ画像に対して第2ドメイン用モデルを選択する。 At the inference stage for the gallery image, the model selection unit 134 selects a model corresponding to the domain to which the gallery image belongs from among the models in the model group 122 based on the determination result of the domain determination unit 132 . For example, when the domain determination unit 132 determines that the domain to which the gallery image belongs is the second domain, the model selection unit 134 selects the model for the second domain for the gallery image.
 推論部135は、モデル選択部134によって選択されたモデルを用いて推論を行う。推論部135は、特徴量抽出部1351及び照合部1352を有する。 The inference unit 135 makes inferences using the model selected by the model selection unit 134 . The inference unit 135 has a feature extraction unit 1351 and a matching unit 1352 .
 特徴量抽出部1351は、モデル選択部134によって選択されたモデルを用いて、処理対象の画像の特徴量を抽出する。特徴量抽出部1351は、一般に知られるNNのforward propagation等を行えばよい。 The feature amount extraction unit 1351 uses the model selected by the model selection unit 134 to extract the feature amount of the image to be processed. The feature quantity extraction unit 1351 may perform generally known NN forward propagation and the like.
 クエリ画像の特徴量の登録段階では、モデル選択部134によって、ドメインごとにそれぞれ選択されたモデルを用いて、クエリ画像と、ドメイン変換部133によってドメインが変換されたクエリ画像との特徴量を抽出する。例えば、クエリ画像に対し第1ドメイン用モデル1231がモデル選択部134によって選択された場合、特徴量抽出部1351は、第1ドメイン用モデル1231を用いてクエリ画像の特徴量を抽出する。また、第2ドメインに変換されたクエリ画像に対し第2ドメイン用モデル1232がモデル選択部134によって選択された場合、特徴量抽出部1351は、第2ドメイン用モデル1232を用いて、第2ドメインに変換されたクエリ画像の特徴量を抽出する。 In the step of registering the feature amount of the query image, the model selection unit 134 uses the model selected for each domain to extract the feature amount of the query image and the query image whose domain has been converted by the domain conversion unit 133. do. For example, when the model selection unit 134 selects the first domain model 1231 for the query image, the feature amount extraction unit 1351 uses the first domain model 1231 to extract the feature amount of the query image. Further, when the second domain model 1232 is selected by the model selection unit 134 for the query image transformed into the second domain, the feature amount extraction unit 1351 uses the second domain model 1232 to convert the second domain Extract the feature quantity of the query image converted to .
 特徴量抽出部1351は、ギャラリ画像に対する推論段階では、モデル選択部134によって選択されたモデルを用いて、ギャラリ画像の特徴量を抽出する。また、特徴量抽出部1351は、ギャラリ画像に対し第2ドメイン用モデル1232がモデル選択部134によって選択された場合、第2ドメイン用モデル1232を用いてギャラリ画像の特徴量を抽出する。 The feature amount extraction unit 1351 uses the model selected by the model selection unit 134 to extract the feature amount of the gallery image in the inference stage for the gallery image. In addition, when the model selection unit 134 selects the second domain model 1232 for the gallery image, the feature amount extraction unit 1351 uses the second domain model 1232 to extract the feature amount of the gallery image.
 照合部1352は、ギャラリ画像の特徴量とクエリ画像の特徴量との距離を算出する。この際、照合部1352は、クエリ画像の特徴量として、クエリ特徴量データ121のうちの、ギャラリ画像が属するドメインに対応するクエリ画像の特徴量を参照する。例えば、ギャラリ画像が属するドメインが第2ドメインであるとドメイン判定部132によって判定された場合には、照合部1352は、クエリ特徴量データ121のうち第2ドメイン特徴量を参照する。 The matching unit 1352 calculates the distance between the feature amount of the gallery image and the feature amount of the query image. At this time, the matching unit 1352 refers to the feature amount of the query image corresponding to the domain to which the gallery image belongs in the query feature amount data 121 as the feature amount of the query image. For example, when the domain determination unit 132 determines that the domain to which the gallery image belongs is the second domain, the matching unit 1352 refers to the second domain feature amount in the query feature amount data 121 .
 そして、照合部1352は、算出した距離と、照合用の閾値とを比較し、ギャラリ画像の被写体が照合対象であるか否かを照合する。照合用の閾値は、ドメインごとに設定されている。照合部1352は、照合用の閾値として、ギャラリ画像が属するドメインに対して設定された閾値を用いる。照合部1352は、算出した距離が、照合用の閾値以下である場合には、ギャラリ画像の被写体は、クエリ画像の照合対象であると判定する。一方、照合部1352は、算出した距離が、照合用の閾値より大きい場合には、ギャラリ画像の被写体は、クエリ画像の照合対象でないと判定する。 Then, the matching unit 1352 compares the calculated distance with a matching threshold, and checks whether or not the subject in the gallery image is a matching target. A matching threshold is set for each domain. The matching unit 1352 uses the threshold set for the domain to which the gallery image belongs as the matching threshold. If the calculated distance is equal to or less than the threshold for matching, the matching unit 1352 determines that the subject in the gallery image is a matching target for the query image. On the other hand, if the calculated distance is greater than the threshold for matching, the matching unit 1352 determines that the subject in the gallery image is not a target for matching with the query image.
 第1の登録部136は、クエリ画像の各特徴量を、対応するドメインのクエリ画像の特徴量として登録する。第1の登録部136は、第1のドメインのクエリ画像の特徴量を、第1ドメイン特徴量として、クエリ特徴量データ121に登録する。また、第1の登録部136は、第2のドメインのクエリ画像の特徴量を、第2ドメイン特徴量として、クエリ特徴量データ121に登録する。 The first registration unit 136 registers each feature amount of the query image as the feature amount of the query image of the corresponding domain. The first registration unit 136 registers the feature amount of the query image of the first domain in the query feature amount data 121 as the first domain feature amount. The first registration unit 136 also registers the feature amount of the query image of the second domain in the query feature amount data 121 as the second domain feature amount.
[ドメイン判定部]
 次に、ドメイン判定部132の判定処理について説明する。朝、昼、夜のように時間帯に関するドメインの場合、ドメイン判定部132は、判定対象の画像を撮影した時刻と、予め定めておいた朝、昼、夜の各時間帯とを比較することによって、判定対象の画像のドメインを判定する。
[Domain judgment part]
Next, determination processing of the domain determination unit 132 will be described. In the case of domains related to time zones such as morning, noon, and night, the domain determination unit 132 compares the time when the image to be determined was captured with the predetermined time zones of morning, noon, and night. determines the domain of the image to be determined.
 例えば、6時から11時までの間は朝、11時から18時までの間は昼、18時から翌6時は夜であると、予め各ドメインの時間帯を設定しておく。ドメイン判定部132は、判定対象の画像の撮影時刻に関するメタ情報を確認する。そして、ドメイン判定部132は、判定対象の画像の撮影時刻に応じて、判定対象の画像のドメインを判定する。 For example, the time zone for each domain is set in advance so that 6:00 to 11:00 is morning, 11:00 to 18:00 is noon, and 18:00 to 6:00 is night. The domain determination unit 132 confirms meta information regarding the shooting time of the image to be determined. Then, the domain determination unit 132 determines the domain of the determination target image according to the shooting time of the determination target image.
 また、朝、昼、夜のように明るさに関するドメインの場合、ドメイン判定部132は、判定対象の画像の輝度の全ピクセルに関する平均値と、予め設定された閾値とを比較することによって、判定対象の画像のドメインを判定してもよい。 In the case of domains related to brightness such as morning, noon, and night, the domain determination unit 132 performs determination by comparing the average brightness of all pixels of the image to be determined with a preset threshold value. The domain of the image of interest may be determined.
 具体的には、判定対象の画像の座標(i,j)のピクセルのRGB値を、(rij,gij,bij)とする。この場合、このとき、座標(i,j)のピクセルの輝度は、式(1)となる。なお、式(1)におけるα、β、γは、予め設定されたパラメータである。 Specifically, let (r ij , g ij , b ij ) be the RGB values of the pixel at the coordinates (i, j) of the image to be determined. In this case, the brightness of the pixel at coordinates (i, j) is given by equation (1). Note that α, β, and γ in Equation (1) are preset parameters.
Figure JPOXMLDOC01-appb-M000001
Figure JPOXMLDOC01-appb-M000001
 ドメイン判定部132は、判定対象の画像について、全ピクセルの輝度の平均値を計算し、式(2)と書く。 The domain determination unit 132 calculates the average luminance value of all pixels in the image to be determined, and writes it as Equation (2).
Figure JPOXMLDOC01-appb-M000002
Figure JPOXMLDOC01-appb-M000002
 ドメイン判定部132は、式(2)で示す輝度の平均値Lと、予め設定された閾値とを比較し、判定対象の画像が、朝、昼、夜のいずれのドメインであるかを判定する。ドメイン判定部132は、例えば、平均値Lが第1の閾値未満の場合には、判定対象の画像のドメインが夜であると判定する。ドメイン判定部132は、例えば、輝度の平均値Lが、第1の閾値以上であって第2の閾値(>第1の閾値)未満の場合には、判定対象の画像のドメインが朝であると判定する。ドメイン判定部132は、例えば、平均値Lが、第2の閾値以上である場合には、判定対象の画像のドメインが昼であると判定する。 The domain determination unit 132 compares the luminance average value L shown in Equation (2) with a preset threshold value, and determines whether the image to be determined belongs to the morning, daytime, or night domain. . For example, when the average value L is less than the first threshold value, the domain determination unit 132 determines that the domain of the determination target image is night. For example, when the average luminance value L is equal to or greater than the first threshold value and less than the second threshold value (>the first threshold value), the domain determination unit 132 determines that the domain of the image to be determined is morning. I judge. For example, when the average value L is equal to or greater than the second threshold, the domain determining unit 132 determines that the domain of the determination target image is daytime.
 また、人物画像の背景部の色彩及び彩度(色味)が、その時々によって赤、青、黄のように変化する場合を例に説明する。例えば、公園等の屋外環境に適用する場合であって、その環境に設置された大型ディスプレイ等が鮮やかな映像を流しているために、その環境にいる人物の背景部が、赤、青、黄のように色味が変化するように画像に写る場合である。 Also, a case where the color and saturation (color) of the background portion of the person's image changes from time to time such as red, blue, and yellow will be described as an example. For example, when applied to an outdoor environment such as a park, a large display or the like installed in that environment displays vivid images, so the background of a person in that environment may appear red, blue, or yellow. This is the case where the image is captured in such a way that the color changes.
 この場合、ドメインとして、例えば、赤ドメイン、青ドメイン、黄ドメインが設定される。ドメイン判定部132は、判定対象の画像のR,G,Bの輝度の全ピクセルに関する平均値を算出し、どの平均値が最大となるかによって、判定対象の画像のドメインを判定する。 In this case, for example, red domain, blue domain, and yellow domain are set as domains. The domain determination unit 132 calculates the average value of all pixels of the R, G, and B luminances of the image to be determined, and determines the domain of the image to be determined according to which average value is the maximum.
 具体的には、判定対象の画像の座標(i,j)のピクセルのR,G,Bの輝度を、(rij,gij,bij)とする。 Specifically, let (r ij , g ij , b ij ) be the luminances of R, G, and B of the pixel at coordinates (i, j) of the determination target image.
 そして、R,G,Bの輝度に関して、全ピクセルの平均値を式(3)~式(5)と書く。式(3)は、Rの輝度の全ピクセルの平均値を示す。式(4)は、Gの輝度の全ピクセルの平均値を示す。式(5)は、Bの輝度の全ピクセルの平均値を示す。 Then, regarding the luminance of R, G, and B, the average values of all pixels are written as equations (3) to (5). Equation (3) represents the average value of all pixels of R luminance. Equation (4) represents the average value of all pixels of G intensity. Equation (5) represents the average value of all pixels of B luminance.
Figure JPOXMLDOC01-appb-M000003
Figure JPOXMLDOC01-appb-M000003
Figure JPOXMLDOC01-appb-M000004
Figure JPOXMLDOC01-appb-M000004
Figure JPOXMLDOC01-appb-M000005
Figure JPOXMLDOC01-appb-M000005
 ドメイン判定部132は、判定対象の画像について、式(3)~式(5)のR,G,Bの輝度の全ピクセルに関する平均値を算出し、式(6)を用いて、ドメインを判定する。 The domain determination unit 132 calculates the average values of all pixels of the R, G, and B luminances of the formulas (3) to (5) for the image to be determined, and uses the formula (6) to determine the domain. do.
Figure JPOXMLDOC01-appb-M000006
Figure JPOXMLDOC01-appb-M000006
 ドメイン判定部132は、式(6)が、Rである場合には、判定対象の画像のドメインが赤であると判定する。ドメイン判定部132は、式(6)がBである場合には、判定対象の画像のドメインが青であると判定する。ドメイン判定部132は、式(6)が(R+G)/2である場合には、判定対象の画像のドメインが黄であると判定する。なお、黄色は、赤と緑との和であるとする。 The domain determination unit 132 determines that the domain of the image to be determined is red when Expression (6) is R. The domain determination unit 132 determines that the domain of the image to be determined is blue when Expression (6) is B. When the expression (6) is (R+G)/2, the domain determination unit 132 determines that the domain of the determination target image is yellow. Note that yellow is the sum of red and green.
 また、ドメイン判定部132は、判定対象の画像が撮像された時刻、判定対象の画像の輝度、判定対象の画像の色彩及び彩度の二以上を組み合わせて、判定対象の画像が属するドメインを判定してもよい。 Further, the domain determining unit 132 determines the domain to which the determination target image belongs by combining two or more of the time when the determination target image was captured, the brightness of the determination target image, and the color and saturation of the determination target image. You may
 例えば、昼間は大型ディスプレイの映像による色味の影響を受けないが、夜間では色味の影響を受けやすい場合を例に説明する。この場合、ドメイン判定部132は、時刻によるドメイン判定と色彩及び彩度によるドメイン判定とを組み合わせて、判定する。 For example, let's take a case where the image on a large display during the daytime is not affected by the color tone, but at night it is easily affected by the color tone. In this case, the domain determination unit 132 performs determination by combining domain determination based on time and domain determination based on color and saturation.
 例えば、判定対象の画像を撮影した時刻が6時から18時までの間である場合、ドメイン判定部132は、判定対象の画像のドメインが「日中」であると判定する。また、判定対象の画像を撮影した時刻が18時から翌6時であり、かつ、式(7)である場合、ドメイン判定部132は、判定対象の画像のドメインが「夜間・赤」であると判定する。 For example, if the image to be determined was captured between 6:00 and 18:00, the domain determining unit 132 determines that the domain of the image to be determined is "daytime". Further, when the time at which the image to be determined was captured is from 6:00 p.m. I judge.
Figure JPOXMLDOC01-appb-M000007
Figure JPOXMLDOC01-appb-M000007
 また、判定対象の画像を撮影した時刻が18時から翌6時であり、かつ、式(8)である場合、ドメイン判定部132は、判定対象の画像のドメインが「夜間・青」であると判定する。 In addition, when the time at which the determination target image was captured is from 18:00 to 6:00 the next day and Expression (8) holds, the domain determining unit 132 determines that the domain of the determination target image is "night/blue". I judge.
Figure JPOXMLDOC01-appb-M000008
Figure JPOXMLDOC01-appb-M000008
 また、判定対象の画像を撮影した時刻が18時から翌6時であり、かつ、式(9)である場合、ドメイン判定部132は、判定対象の画像のドメインが「夜間・黄」であると判定する。 In addition, when the time at which the determination target image was captured is from 6:00 p.m. I judge.
Figure JPOXMLDOC01-appb-M000009
Figure JPOXMLDOC01-appb-M000009
[ドメイン変換部]
 次に、ドメイン変換部133について説明する。ドメイン変換部133は、クエリ画像を、クエリ画像が属するドメインと異なる他の全てのドメインの画像に変換することで、ギャラリ画像のドメインと同じドメインのクエリ画像の特徴量を登録する。
[Domain converter]
Next, the domain converter 133 will be explained. The domain transforming unit 133 transforms the query image into images of all domains different from the domain to which the query image belongs, thereby registering the feature amount of the query image of the same domain as that of the gallery image.
 ドメイン変換部133は、クエリ画像の輝度を基にクエリ画像を変換する。例えば、朝、昼、夜のように明るさに関するドメインの場合、ドメイン変換部133は、変換対象のクエリ画像の各ピクセルのRGB値に一様に係数を乗ずることによって、乗算後のRGBの輝度が、変換対象のドメインの平均的な輝度と一致するように変換する。 The domain conversion unit 133 converts the query image based on the brightness of the query image. For example, in the case of a domain related to brightness such as morning, noon, and night, the domain conversion unit 133 uniformly multiplies the RGB values of each pixel of the query image to be converted by a coefficient to obtain the RGB brightness after multiplication. is transformed to match the average luminance of the domain being transformed.
 変換対象の画像の座標(i,j)の変換前のピクセルのRGB値を、(rij,gij,bij)とする。変換対象の画像の座標(i,j)の変換後のピクセルのRGB値を、(rij´,gij´,bij´)とする。そして、ドメイン変換部133は、例えば、Rについては、式(10)を用いて、変換する。G及びBについては、式(10)のrijをgijまたはbijとすることで変換が可能である。 Let (r ij , g ij , b ij ) be the RGB values of the pre-conversion pixel at the coordinates (i, j) of the image to be converted. Let (r ij ', g ij ', b ij ') be the RGB values of the converted pixel at the coordinates (i, j) of the image to be converted. Then, the domain transforming unit 133 transforms, for example, R using equation (10). G and B can be transformed by replacing r ij in equation (10) with g ij or b ij .
Figure JPOXMLDOC01-appb-M000010
Figure JPOXMLDOC01-appb-M000010
 式(10)におけるL´は、変換先のドメインの平均的な輝度であり、予め設定される。式(10)における判定対象の画像の座標(i,j)のピクセルのRGB値を、(rij,gij,bij)とする。そして、lijは、座標(i,j)のピクセルの輝度であり、式(11)となる。なお、式(11)におけるα、β、γは、予め設定されたパラメータである。i及びkは、いずれも画像ピクセルにおける横方向の座標を表す。iとkとを区別して期待値(平均)を取得する演算を行う。jとlは、いずれも画像ピクセルにおける縦方向の座標を表す。jとlとを区別して期待値(平均)を取得する演算を行う。 L' in equation (10) is the average luminance of the domain to be converted, and is set in advance. Let (r ij , g ij , b ij ) be the RGB values of the pixel at the coordinates (i, j) of the image to be determined in Equation (10). and lij is the brightness of the pixel at coordinates (i, j), which is given by equation (11). Note that α, β, and γ in Equation (11) are preset parameters. Both i and k represent horizontal coordinates in image pixels. An operation is performed to obtain an expected value (average) by distinguishing between i and k. Both j and l represent vertical coordinates in image pixels. An operation is performed to obtain an expected value (average) by distinguishing between j and l.
Figure JPOXMLDOC01-appb-M000011
Figure JPOXMLDOC01-appb-M000011
 ドメイン変換部133は、クエリ画像の色彩及び彩度(色味)を基にクエリ画像を変換する。例えば、大型ディスプレイ等の影響による、赤、青、黄のような色味変に関するドメインの場合、目の前のクエリの各ピクセルのRGB値に一様に係数を乗ずることによって、乗算後のRGB値が、変換対象のドメインの平均的なRGB値と一致するように変換する。 The domain conversion unit 133 converts the query image based on the color and saturation (color) of the query image. For example, in the case of domains related to color shifts such as red, blue, and yellow due to the influence of large displays, etc., by uniformly multiplying the RGB values of each pixel of the query in front of us by a factor, the RGB values after multiplication Transform the values so that they match the average RGB value of the domain being transformed.
 変換対象の画像の座標(i,j)の変換前のピクセルのRGB値を、(rij,gij,bij)とする。変換対象の画像の座標(i,j)の変換後のピクセルのRGB値を、(rij´,gij´,bij´)とする。そして、ドメイン変換部133は、例えば、Rについては、式(12)を用いて、変換する。R´は、変換先のドメインの平均的なピクセル値であり、予め設定される。G及びBについては、式(12)のrijをgijまたはbijとし、R´をG´まはたB´とすることで変換が可能である。 Let (r ij , g ij , b ij ) be the RGB values of the pre-conversion pixel at the coordinates (i, j) of the image to be converted. Let (r ij ', g ij ', b ij ') be the RGB values of the converted pixel at the coordinates (i, j) of the image to be converted. Then, the domain transforming unit 133 transforms, for example, R using equation (12). R' is the average pixel value of the destination domain and is preset. G and B can be converted by setting r ij in equation (12) to g ij or b ij and R' to G' or B'.
Figure JPOXMLDOC01-appb-M000012
Figure JPOXMLDOC01-appb-M000012
 また、ドメイン変換部133は、変換器を用いて、変換対象の画像を他のドメインの画像に変換してもよい。変換器は、画像を該画像が属するドメインと異なる他のドメインの画像に変換する変換器であって、各ドメインに属する複数の画像の変換を訓練した変換器である。例えば、ドメイン変換部133は、変換器として、GAN(Generative Adversarial Networks)を用いる。この場合、推論装置10は、予め、各ドメインの画像を十分な枚数用意し、GANに学習させる。そして、ドメイン変換部133は、クエリ画像を、GANに入力することで、各ドメインの画像に変換する。 Also, the domain conversion unit 133 may use a converter to convert the image to be converted into an image of another domain. A transformer is a transformer that transforms an image into an image of another domain different from the domain to which the image belongs, and is a transformer that has been trained to transform a plurality of images belonging to each domain. For example, the domain conversion unit 133 uses GAN (Generative Adversarial Networks) as a converter. In this case, the inference device 10 prepares a sufficient number of images of each domain in advance and makes the GAN learn. Then, the domain conversion unit 133 converts the query image into an image of each domain by inputting the query image into the GAN.
 なお、実施の形態1に係る推論装置では、ギャラリ画像とのドメインを合わせるためにクエリ画像のドメインを各ドメインの画像となるように変換する。これは、クエリ画像であれば、リアルタイム処理を実現したいときに変換する処理量の増加を許容しやすいためである。 Note that in the inference device according to Embodiment 1, the domain of the query image is transformed into an image of each domain in order to match the domain with the gallery image. This is because a query image can easily allow an increase in the amount of conversion processing when it is desired to realize real-time processing.
 例えば、ギャラリ画像のドメインをクエリ画像のドメインに合わせて変換する場合を考える。フレームレートが5FPSである場合には1秒間に5フレームあり、さらに、ギャラリ画像は、フレーム毎に何枚かあることになる。そのすべてのギャラリ画像に対してドメイン変換を施すことが必要となり、リアルタイム処理の実現が困難となる。これに対して、クエリ登録の頻度は、多くとも数秒に1度くらいを想定して、クエリ画像は、一度登録すれば足りるため、ドメイン変換の頻度を大幅に減らせる。 For example, consider the case of converting the domain of the gallery image to match the domain of the query image. If the frame rate is 5 FPS, there are 5 frames per second, and there are several gallery images per frame. It becomes necessary to apply domain conversion to all the gallery images, which makes it difficult to realize real-time processing. On the other hand, the frequency of query registration is assumed to be about once every several seconds at most, and the query image only needs to be registered once, so the frequency of domain conversion can be greatly reduced.
 また、本実施の形態1では、ギャラリ画像に対してはドメイン変換を行っていない。クエリ画像及びギャラリ画像の双方をドメイン変換すると、一方のみをドメイン変換した場合と比して、照合精度が劣化することが見込まれる。このため、本実施の形態1では、クエリ画像に対してのみ全ドメインに対するドメイン変換を行い、特徴抽出に使用するモデルを切り替えることで、照合精度の保持を図っている。 Also, in Embodiment 1, the gallery image is not subjected to domain conversion. If both the query image and the gallery image are domain-converted, it is expected that matching accuracy will be degraded compared to the case where only one of them is domain-converted. Therefore, in the first embodiment, matching accuracy is maintained by performing domain conversion for all domains only on the query image and switching the model used for feature extraction.
[クエリ登録処理]
 次に、推論装置10によるクエリ画像の特徴量の登録処理について説明する。図5は、実施の形態1に係るクエリ画像の特徴量の登録処理の処理手順を示すフローチャートである。
[Query registration process]
Next, the registration processing of the feature amount of the query image by the inference device 10 will be described. FIG. 5 is a flow chart showing a processing procedure of registration processing of a feature amount of a query image according to Embodiment 1. FIG.
 推論装置10では、画像入力部131が、クエリ画像の入力を受け付けると(ステップS11)、ドメイン判定部132が、クエリ画像がいずれのドメインに属するかを判定する(ステップS12)。 In the inference device 10, when the image input unit 131 receives the input of the query image (step S11), the domain determination unit 132 determines to which domain the query image belongs (step S12).
 そして、ドメイン変換部133は、クエリ画像を、クエリ画像が属するドメインと異なる他のドメインの画像に変換する(ステップS13)。モデル選択部134は、モデル群122のモデルのうち、変換されたクエリ画像のドメインに対応するモデルを選択する(ステップS14)。 Then, the domain conversion unit 133 converts the query image into an image of another domain different from the domain to which the query image belongs (step S13). The model selection unit 134 selects a model corresponding to the domain of the transformed query image from among the models in the model group 122 (step S14).
 特徴量抽出部1351は、モデル選択部134によって選択されたモデルを用いて、ドメイン変換後のクエリ画像の特徴量を抽出する(ステップS15)。そして、第1の登録部136は、クエリ画像の特徴量を、変換されたドメインのクエリ画像の特徴量としてクエリ特徴量データ121に登録する(ステップS16)。 The feature amount extraction unit 1351 uses the model selected by the model selection unit 134 to extract the feature amount of the query image after domain conversion (step S15). Then, the first registration unit 136 registers the feature amount of the query image in the query feature amount data 121 as the feature amount of the query image of the converted domain (step S16).
 ドメイン変換部133は、次の変換対象のドメインがあるか否かを判定する(ステップS17)。ドメイン変換部133は、全モデルに対応する全てのドメインに、クエリ画像を変換した場合には、次の変換対象のドメインがないと判定して(ステップS17:No)、処理を終了する。ドメイン変換部133は、全モデルに対応する全てのドメインに、クエリ画像を変換していない場合には、次の変換対象のドメインがあると判定して(ステップS17:Yes)、ステップS13に戻り、未変換のドメインに対するクエリ画像のドメイン処理を実行する。 The domain conversion unit 133 determines whether or not there is a domain to be converted next (step S17). When the query image has been transformed into all domains corresponding to all models, the domain transforming unit 133 determines that there is no domain to be transformed next (Step S17: No), and ends the process. If the query image has not been converted to all domains corresponding to all models, the domain conversion unit 133 determines that there is a next domain to be converted (step S17: Yes), and returns to step S13. , perform domain processing of the query image for the untransformed domain.
[推論処理]
 次に、推論装置10による推論処理について説明する。図6は、実施の形態1に係る推論処理の処理手順を示すフローチャートである。
[Inference processing]
Next, inference processing by the inference device 10 will be described. FIG. 6 is a flowchart of a processing procedure of inference processing according to the first embodiment.
 推論装置10では、画像入力部131が、ギャラリ画像の入力を受け付けると(ステップS21)、ドメイン判定部132が、ギャラリ画像がいずれのドメインに属するかを判定する(ステップS22)。 In the inference device 10, when the image input unit 131 receives the input of the gallery image (step S21), the domain determination unit 132 determines to which domain the gallery image belongs (step S22).
 モデル選択部134は、モデル群122のモデルのうち、ステップS22によって判定されたギャラリ画像のドメインに対応するモデルを選択する(ステップS23)。特徴量抽出部1351は、モデル選択部134によって選択されたモデルを用いて、ギャラリ画像の特徴量を抽出する(ステップS24)。 The model selection unit 134 selects a model corresponding to the domain of the gallery image determined in step S22 from among the models in the model group 122 (step S23). The feature amount extraction unit 1351 uses the model selected by the model selection unit 134 to extract the feature amount of the gallery image (step S24).
 続いて、照合部1352は、クエリ特徴量データ121のうちの、ステップS22において判定された、ギャラリ画像が属するドメインに対応するクエリ画像の特徴量を参照する(ステップS25)。照合部1352は、ギャラリ画像の特徴量と、参照したクエリ画像の特徴量との距離を算出し、算出した距離と、照合用の閾値とを比較して、ギャラリ画像の被写体が照合対象であるか否かを照合する(ステップS26)。そして、照合部1352は、照合結果を出力して(ステップS27)、処理を終了する。 Subsequently, the collation unit 1352 refers to the feature amount of the query image corresponding to the domain to which the gallery image belongs, determined in step S22, among the query feature amount data 121 (step S25). The matching unit 1352 calculates the distance between the feature amount of the gallery image and the feature amount of the referenced query image, compares the calculated distance with a threshold for matching, and determines that the object in the gallery image is to be matched. (step S26). The collation unit 1352 then outputs the collation result (step S27), and terminates the process.
[実施の形態1の効果]
 このように、推論装置10は、ギャラリ画像のドメインを判定して、そのドメインに対応するモデルを使用するため、判定に応じて適切な特徴量の抽出処理を実行することができ、画像の照合の精度向上を図ることができる。
[Effect of Embodiment 1]
In this way, the inference apparatus 10 determines the domain of the gallery image and uses the model corresponding to that domain, so that it is possible to perform appropriate feature extraction processing according to the determination, and to perform image matching. accuracy can be improved.
 また、推論装置10は、クエリ画像のドメインを、訓練済みのモデルに対応する全てのドメインの画像にそれぞれ変換し、各ドメインに対応するモデルを用いて各ドメインのクエリ画像から特徴量を抽出し、登録しておく。したがって、推論装置10では、訓練済みのモデルに対応する全てのドメインの画像にクエリ画像を変換することで、全てのドメインのクエリ画像を用意することができる。このため、推論装置10は、照合時に、ギャラリ画像のドメインと同じドメインのクエリ画像の特徴量と、ギャラリ画像の特徴量とを比較することができるため、クエリ画像とギャラリ画像とのドメインが異なることによる精度低下を低減することができる。 In addition, the inference device 10 converts the domain of the query image into images of all domains corresponding to trained models, and extracts features from the query image of each domain using the model corresponding to each domain. , register. Therefore, the inference device 10 can prepare query images of all domains by converting the query images into images of all domains corresponding to the trained model. Therefore, since the inference device 10 can compare the feature amount of the query image in the same domain as the gallery image with the feature amount of the gallery image at the time of matching, the domains of the query image and the gallery image are different. It is possible to reduce the decrease in accuracy caused by this.
[実施の形態1の変形例]
 実施の形態1に係る推論装置は、クラス分類を行ってもよい。図7は、実施の形態1の変形例に係る推論装置の他の構成の一例を模式的に示す図である。
[Modification of Embodiment 1]
The inference apparatus according to Embodiment 1 may perform class classification. FIG. 7 is a diagram schematically showing an example of another configuration of the inference device according to the modification of Embodiment 1. FIG.
 図7に示すように、実施の形態1の変形例に係る推論装置10Aは、図4に示す制御部13に代えて、制御部13Aを有する。制御部13Aは、制御部13と同様の機能を有する。 As shown in FIG. 7, the inference device 10A according to the modification of the first embodiment has a control unit 13A instead of the control unit 13 shown in FIG. 13 A of control parts have the same function as the control part 13. FIG.
 制御部13Aは、分類部1352Aを有する推論部135Aを有する。分類部1352Aは、クラス分類を行う訓練済みの分類モデルを用いて、特徴量抽出部1351によって抽出された推論用画像の特徴量(特徴量ベクトル)からlogitの計算を行い、推定用画像の被写体のクラスを判定する。分類部1352Aは、クラス分類結果を、推論結果124Aとして登録するとともに、出力してもよい。 The control unit 13A has an inference unit 135A having a classification unit 1352A. The classification unit 1352A uses a trained classification model that performs class classification, calculates logit from the feature amount (feature amount vector) of the inference image extracted by the feature amount extraction unit 1351, and calculates the object of the estimation image. determine the class of The classification unit 1352A may register and output the class classification result as the inference result 124A.
 図8は、実施の形態1の変形例に係る推論処理の処理手順を示すフローチャートである。図8に示すように、推論装置10Aは、推論用画像の入力を受け付けると(ステップS31)、図6に示すステップS22~24と同様の処理を行う。具体的には、ドメイン判定部132は、推論用画像のドメインを判定する(ステップS32)。モデル選択部134は、ステップS32において判定されたドメインに対応するモデルをモデル群122の中から選択する(ステップS33)。特徴量抽出部1351は、モデル選択部134によって選択されたモデルを用いて、推論用画像の特徴量を抽出する(ステップS34)。 FIG. 8 is a flowchart showing the procedure of inference processing according to the modification of the first embodiment. As shown in FIG. 8, the inference device 10A accepts input of an inference image (step S31), and performs the same processing as steps S22 to S24 shown in FIG. Specifically, the domain determining unit 132 determines the domain of the inference image (step S32). The model selection unit 134 selects a model corresponding to the domain determined in step S32 from the model group 122 (step S33). The feature amount extraction unit 1351 uses the model selected by the model selection unit 134 to extract the feature amount of the inference image (step S34).
 そして、分類部1352Aは、推論用画像の特徴量からlogitの計算を行い、推定用画像の被写体のクラスを判定するクラス分類を行い(ステップS35)、分類結果を出力する(ステップS36)。 Then, the classification unit 1352A calculates logit from the feature amount of the image for inference, performs class classification for determining the class of the subject of the image for estimation (step S35), and outputs the classification result (step S36).
 この推論装置10Aのように、クラス分類を実行する場合にも、推論用画像のドメインを判定して、そのドメインに対応するモデルを使用することで、判定に応じて適切な特徴量の抽出処理を実行することができ、クラス分類の精度向上を図ることができる。 As in this inference device 10A, even when class classification is performed, the domain of the inference image is determined, and a model corresponding to the domain is used to extract an appropriate feature quantity according to the determination. can be executed, and the accuracy of class classification can be improved.
[実施の形態2]
 次に、実施の形態2について説明する。実施の形態2では、推論装置10,10Aが使用する各ドメインにそれぞれ対応したモデルを訓練する訓練装置について説明する。
[Embodiment 2]
Next, Embodiment 2 will be described. In the second embodiment, a training device that trains models corresponding to respective domains used by inference devices 10 and 10A will be described.
 推論装置10,10Aでは、訓練済みのモデルは、各ドメインにそれぞれ対応して設けられている。そこで、実施の形態2に係る訓練装置では、ドメインごとに訓練用画像を用意し、各ドメインの訓練用画像を用いて、ドメインごとにモデルを訓練する。実施の形態2に係る訓練装置では、既存データセットから、ドメインごとの訓練用画像を用意する。 In the inference devices 10 and 10A, trained models are provided corresponding to each domain. Therefore, in the training apparatus according to Embodiment 2, a training image is prepared for each domain, and a model is trained for each domain using the training image of each domain. The training apparatus according to the second embodiment prepares training images for each domain from existing data sets.
[訓練装置]
 実施の形態2に係る訓練装置について説明する。図9は、実施の形態2に係る訓練装置の構成の一例を模式的に示す図である。図9に示すように、訓練装置20は、入出力部21、記憶部22及び制御部23を有する。
[Training device]
A training device according to Embodiment 2 will be described. FIG. 9 is a diagram schematically showing an example of a configuration of a training device according to Embodiment 2. FIG. As shown in FIG. 9, the training device 20 has an input/output unit 21, a storage unit 22 and a control unit .
 入出力部21は、情報の入力を受け付け、また、情報の出力を行う。入出力部21は、例えば、ネットワーク等を介して接続された他の装置との間で、各種情報を送受信する通信インタフェースである。入出力部21は、LANやインターネットなどの電気通信回線を介した他の装置と制御部23(後述)との間の通信を行う。また、入出力部21は、ユーザによる入力操作に対応して、訓練装置20に対する各種指示情報の入力を受け付ける、マウスやキーボード等のデバイス装置である。入出力部21は、例えば、液晶ディスプレイなどによって実現され、訓練装置20によって表示制御された画面が表示出力される。 The input/output unit 21 receives input of information and outputs information. The input/output unit 21 is, for example, a communication interface that transmits and receives various information to and from other devices connected via a network or the like. The input/output unit 21 performs communication between another device and a control unit 23 (described later) via an electric communication line such as a LAN or the Internet. The input/output unit 21 is a device such as a mouse and a keyboard that receives input of various instruction information to the training apparatus 20 in response to user's input operation. The input/output unit 21 is realized by, for example, a liquid crystal display or the like, and displays and outputs a screen that is display-controlled by the training device 20 .
 記憶部22は、RAM、フラッシュメモリ等の半導体メモリ素子によって実現され、訓練装置20を動作させる処理プログラムや、処理プログラムの実行中に使用されるデータなどが記憶される。記憶部22は、データセット221、訓練用画像222、及び、モデル群223を有する。 The storage unit 22 is implemented by a semiconductor memory device such as RAM and flash memory, and stores processing programs for operating the training device 20, data used during execution of the processing programs, and the like. The storage unit 22 has a dataset 221 , a training image 222 and a model group 223 .
 データセット221は、例えば、MSMT公開データセット等の公開データセットである。MSMT公開データセットは、朝から夜にわたって撮影された画像を幅広く含む。 The dataset 221 is, for example, a public dataset such as the MSMT public dataset. The MSMT public dataset contains a wide range of images taken from morning to night.
 訓練用画像222は、各ドメインにそれぞれ対応する訓練済み画像群を有する。例えば、訓練用画像222は、第1ドメインに対応するモデルの訓練用画像群である第1ドメイン用画像群2221、及び、第2ドメインに対応するモデルの訓練用画像群である第2ドメイン用画像群2222を有する。 The training image 222 has a trained image group corresponding to each domain. For example, the training images 222 include a first domain image group 2221 that is a training image group for the model corresponding to the first domain, and a second domain image group that is a training image group for the model corresponding to the second domain. It has an image group 2222 .
 モデル群223は、推論部135が使用する複数の訓練済みモデルを有する。例えば、モデル群223は、第1ドメインに対応した第1ドメイン用モデル2231、第2ドメインに対応した第2ドメイン用モデル2232を有する。 The model group 223 has a plurality of trained models used by the inference unit 135. For example, the model group 223 has a first domain model 2231 corresponding to the first domain and a second domain model 2232 corresponding to the second domain.
 制御部23は、訓練装置20全体を制御する。制御部23は、例えば、CPU等の電子回路や、ASIC、FPGA等の集積回路である。また、制御部23は、各種の処理手順を規定したプログラムや制御データを格納するための内部メモリを有し、内部メモリを用いて各処理を実行する。また、制御部23は、各種のプログラムが動作することにより各種の処理部として機能する。制御部23は、訓練用画像取得部231と、訓練部232とを有する。 The control unit 23 controls the training device 20 as a whole. The control unit 23 is, for example, an electronic circuit such as a CPU, or an integrated circuit such as an ASIC or FPGA. The control unit 23 also has an internal memory for storing programs defining various processing procedures and control data, and executes each processing using the internal memory. Further, the control unit 23 functions as various processing units by running various programs. The control unit 23 has a training image acquisition unit 231 and a training unit 232 .
 訓練用画像取得部231は、データセット取得部2311(入力部)、ドメイン判定部2312(判定部)、ドメイン変換部2313(変換部)及び第2の登録部2314(登録部、第2の登録部)を有する。 The training image acquisition unit 231 includes a data set acquisition unit 2311 (input unit), a domain determination unit 2312 (determination unit), a domain conversion unit 2313 (conversion unit), and a second registration unit 2314 (registration unit, second registration unit). part).
 データセット取得部2311は、訓練用画像として、例えば、MSMT公開データセット等のデータセットを取得する。また、データセット取得部2311は、公開データセットのほか、推論用画像を撮像するカメラ等によって実際に撮影された実データを取得してもよい。 The dataset acquisition unit 2311 acquires a dataset such as an MSMT public dataset as a training image. The data set acquisition unit 2311 may also acquire actual data actually captured by a camera or the like that captures an inference image, in addition to the public data set.
 ドメイン判定部2312は、ドメイン判定部132と同じ機能を有する。ドメイン判定部2312は、被写体が撮像される環境を成立させる要素を基に、訓練用画像が属するドメインを判定する。被写体が撮像される環境を成立させる要素は、例えば、判定対象の画像が撮像された時刻、判定対象の画像の輝度、判定対象の画像の色彩及び彩度の少なくとも一つである。ドメイン判定部2312は、データセットに含まれる各画像について、画像が属するドメインを判定する。 The domain determination unit 2312 has the same function as the domain determination unit 132. The domain determination unit 2312 determines the domain to which the training image belongs, based on the elements that establish the environment in which the subject is imaged. The element that establishes the environment in which the subject is imaged is, for example, at least one of the time when the image to be determined was captured, the brightness of the image to be determined, and the color and saturation of the image to be determined. The domain determination unit 2312 determines the domain to which each image included in the data set belongs.
 ドメイン変換部2313は、被写体が撮像される環境を成立させる要素を基に、訓練用画像を、訓練用画像が属するドメインと異なる他のドメインの画像に変換する。被写体が撮像される環境を成立させる要素は、例えば、訓練用画像の輝度、判定対象の画像の色彩及び彩度の少なくとも一つである。 The domain conversion unit 2313 converts the training image into an image of another domain different from the domain to which the training image belongs, based on the elements that establish the environment in which the subject is imaged. An element that establishes the environment in which the subject is imaged is, for example, at least one of the brightness of the training image and the color and saturation of the determination target image.
 第2の登録部2314は、判定対象の訓練用画像を、ドメイン判定部2312によって判定されたドメインに対応するモデルの訓練用画像として登録する。第2の登録部2314は、例えば、ある訓練用画像が、ドメイン判定部2312によって第1のドメインに属することを判定された場合には、この訓練用画像を第1ドメイン用画像群2221の画像として登録する。また、第2の登録部2314は、ある訓練用画像が、ドメイン変換部2313によって第2のドメインの画像に変換された場合には、変換後の訓練用画像を第2ドメイン用画像群2222の画像として登録する。 The second registration unit 2314 registers the determination target training image as a model training image corresponding to the domain determined by the domain determination unit 2312 . For example, when the domain determination unit 2312 determines that a certain training image belongs to the first domain, the second registration unit 2314 registers the training image as an image of the first domain image group 2221 . Register as In addition, when a certain training image is converted into an image of the second domain by the domain conversion unit 2313 , the second registration unit 2314 stores the converted training image in the second domain image group 2222 . Register as an image.
 訓練部232は、第2の登録部2314によって登録された各ドメインの訓練用画像のうち、訓練対象となるモデルのドメインに対応する訓練用画像群を選択し、選択した訓練用画像群を用いて、モデルの訓練を実行する。訓練対象が第1ドメインに対応するモデルである場合、訓練部232は、第1ドメイン用画像群2221の画像を訓練用画像として選択して、第1ドメインに対応するモデルの訓練を行う。訓練部232は、Back propagationなど公知の機構を用いて、所定の終了条件に達するまで、ニューラルネットワークで構成される各モデルのパラメータの更新を繰り返せばよい。 The training unit 232 selects a training image group corresponding to the domain of the model to be trained from among the training images of each domain registered by the second registration unit 2314, and uses the selected training image group. to perform model training. When the training target is a model corresponding to the first domain, the training unit 232 selects images of the first domain image group 2221 as training images and trains the model corresponding to the first domain. The training unit 232 may use a known mechanism such as back propagation to repeatedly update the parameters of each model composed of a neural network until a predetermined termination condition is reached.
[ドメイン判定部]
 次に、ドメイン判定部2312について説明する。ドメイン判定部2312は、ドメイン判定部132と同様に、時刻、輝度、色彩及び彩度を用いて、訓練用画像が属するドメインを判定する。
[Domain judgment part]
Next, the domain determination unit 2312 will be described. Similar to the domain determination unit 132, the domain determination unit 2312 determines the domain to which the training image belongs using time, brightness, color, and saturation.
 具体的には、朝、昼、夜のように時間帯に関するドメインの場合、ドメイン判定部132は、判定対象の画像を撮影した時刻と、予め定めておいた朝、昼、夜の各時間帯とを比較することによって、判定対象の画像のドメインを判定する。 Specifically, in the case of domains related to time zones such as morning, noon, and night, the domain determination unit 132 determines the time when the image to be determined was captured and the predetermined time zones of morning, noon, and night. The domain of the image to be determined is determined by comparing with .
 また、輝度を基準にして、あるデータセットを、朝と夜とのドメインの各データセットに分割したい場合には、予め輝度に関する閾値を設定する。そして、ドメイン判定部2312は、判定対象の画像の全ピクセルの輝度の平均値Lが閾値以上である場合にはドメインが朝であると判定し、平均値Lが閾値未満である場合にはドメインが夜であると判定する。 Also, if you want to divide a certain data set into morning and night domain data sets based on luminance, set a threshold value for luminance in advance. Then, the domain determination unit 2312 determines that the domain is morning when the average value L of luminance of all pixels of the determination target image is equal to or greater than the threshold, and determines that the domain is morning when the average value L is less than the threshold. is night.
 また、人物画像の背景部の色彩及び彩度(色味)を基準にして、あるデータセットを、朝と夜とのドメインの各データセットに分割したい場合、ドメイン判定部2312は、判定対象の画像のR,G,Bの輝度の全ピクセルに関する平均値を算出し、どの平均値が最大となるかによって、判定対象の画像が属するドメインが、例えば、赤ドメイン、青ドメイン、黄ドメインのいずれであるかを判定する。 In addition, when it is desired to divide a certain data set into respective data sets of morning and night domains based on the color and saturation (color) of the background portion of the person image, the domain determination unit 2312 The average value of all pixels of R, G, and B brightness of the image is calculated, and depending on which average value is the maximum, the domain to which the image to be judged belongs is, for example, red domain, blue domain, or yellow domain. Determine whether it is
 また、ドメイン判定部2312は、判定対象の画像が撮像された時刻、判定対象の画像の輝度、判定対象の画像の色彩及び彩度の二以上を組み合わせて、判定対象の画像が属するドメインを判定してもよい。 In addition, the domain determination unit 2312 determines the domain to which the determination target image belongs by combining two or more of the time when the determination target image was captured, the brightness of the determination target image, and the color and saturation of the determination target image. You may
[ドメイン変換部]
 ドメイン変換部2313は、各ドメインで訓練用画像の枚数が十分確保できない場合、データセットの画像を、他のドメインの画像に変換して、各ドメインの訓練用画像を生成する。
[Domain converter]
If a sufficient number of training images cannot be secured for each domain, the domain conversion unit 2313 converts the images of the dataset into images of other domains to generate training images of each domain.
 ドメイン変換部2313は、ドメインの変換を、輝度、色彩及び彩度(色味)などを基準に行う。この場合、各ドメインに属するデータは、輝度や色味の意味でドメインごとにその傾向が既知であると仮定する。 The domain conversion unit 2313 performs domain conversion based on luminance, color, saturation (color), and the like. In this case, it is assumed that the tendency of data belonging to each domain is known for each domain in terms of brightness and color.
 例えば、ドメインが輝度によって朝、昼、夜と定義される場合であって、さらに朝、昼、夜のドメインで各画像の全ピクセルの輝度の平均値L((式(13)と書く。)が概ね正規分布に従うことが分かっており、さらにその平均及び分散が既知である場合を例に説明する。 For example, when the domain is defined as morning, noon, and night by luminance, the average luminance value L ((written as equation (13)) of all pixels of each image in the morning, noon, and night domains is is known to generally follow a normal distribution, and its mean and variance are known.
Figure JPOXMLDOC01-appb-M000013
Figure JPOXMLDOC01-appb-M000013
 ドメイン変換部2313は、手元にある訓練用データセットの画像の輝度の平均値Lに関する平均及び分散が、各ドメインの平均及び分散に一致するようにドメイン変換を行う。 The domain conversion unit 2313 performs domain conversion so that the average and variance of the average luminance value L of the image of the training data set at hand match the average and variance of each domain.
 具体的には、入手可能な訓練データセットのk枚目の人物画像について、x:=(R,G,B)と書く。変換したい先のドメインの平均ベクトルをuとし、分散共分散行列をCとする。 Specifically, for the k-th person image in the available training data set, write x k :=(R k , G k , B k ). Let u be the mean vector of the domain to be transformed, and let C be the variance-covariance matrix.
 そして、ドメイン変換部2313は、入手可能な訓練データセットのk枚目の各画素値(式(14))を式(15)のように変換する。 Then, the domain conversion unit 2313 converts each k-th pixel value (formula (14)) of the available training data set as shown in formula (15).
Figure JPOXMLDOC01-appb-M000014
Figure JPOXMLDOC01-appb-M000014
Figure JPOXMLDOC01-appb-M000015
Figure JPOXMLDOC01-appb-M000015
 式(15)において、Vは、分散共分散行列であり、Eは期待値ベクトルを求める演算を表す。 In Equation (15), V is a variance-covariance matrix, and E represents an operation for obtaining an expected value vector.
 例えば、ドメインが、色彩及び彩度(色味)によって、日中、夜間(赤)、夜間(青)、夜間(黄)と定義される場合であって、さらに、日中、夜間(赤)、夜間(青)、夜間(黄)のドメインで各画像のRGB値(R値は式(16)、G値は式(17)、B値は式(18)をそれぞれ参照)が、概ね正規分布に従うことが分かっており、さらにその平均及び分散が既知である場合を例に説明する。 For example, if a domain is defined as day, night (red), night (blue), night (yellow) by color and saturation (hue), then day, night (red) , night (blue), and night (yellow) domains (see formula (16) for R values, formula (17) for G values, and formula (18) for B values) of each image are roughly normalized A case in which it is known to follow a distribution and its mean and variance are known will be described as an example.
Figure JPOXMLDOC01-appb-M000016
Figure JPOXMLDOC01-appb-M000016
Figure JPOXMLDOC01-appb-M000017
Figure JPOXMLDOC01-appb-M000017
Figure JPOXMLDOC01-appb-M000018
Figure JPOXMLDOC01-appb-M000018
 この場合、ドメイン変換部2313は、手元にある訓練用データセットのRGB値(式(16)~式(18)に示すR,G,B)に関する平均及び分散が、各ドメインの平均及び分散に一致するように、ドメイン変換を行う。 In this case, the domain conversion unit 2313 converts the average and variance of the RGB values (R, G, and B shown in Equations (16) to (18)) of the training data set at hand to the average and variance of each domain. Do domain translation to match.
 変換したい先のドメインのRの値に関する平均をμとし、標準偏差をσとする。ドメイン変換部2313は、入手可能な訓練データセットのk枚目の各画素値を、式(19)のように変換する。 Let μ be the average of the R values of the domain to be transformed, and let σ be the standard deviation. The domain conversion unit 2313 converts each k-th pixel value of the available training data set as shown in Equation (19).
Figure JPOXMLDOC01-appb-M000019
Figure JPOXMLDOC01-appb-M000019
 式(19)において、Vは、分散であり、Eは期待値ベクトルを求める演算を表す。なお、G及びBについては、式(19)のRをGまたはBとし、rij をgij またはbij とすることで変換が可能である。 In Equation (19), V is the variance, and E represents the computation for obtaining the expected value vector. G and B can be converted by setting R k in Equation (19) to G k or B k and r ij k to g ij k or b ij k .
[訓練用画像取得処理]
 次に、訓練装置20による訓練用画像取得処理について説明する。図10は、実施の形態2に係る訓練用画像取得処理の処理手順を示すフローチャートである。
[Training Image Acquisition Processing]
Next, training image acquisition processing by the training device 20 will be described. FIG. 10 is a flowchart illustrating a processing procedure of training image acquisition processing according to the second embodiment.
 図10に示すように、訓練装置20では、データセット取得部2311が、データセットを取得する(ステップS41)。そして、ドメイン判定部2312は、ステップS41において取得されたデータセットから、任意に選択した判定対象の画像を参照し(ステップS42)、この画像が属するドメインを判定する(ステップS43)。第2の登録部2314は、判定対象の画像を、ステップS43において判定されたドメインに対応するモデルの訓練用画像として登録する(ステップS44)。 As shown in FIG. 10, in the training device 20, the dataset acquisition unit 2311 acquires a dataset (step S41). Then, the domain determining unit 2312 refers to the arbitrarily selected determination target image from the data set acquired in step S41 (step S42), and determines the domain to which this image belongs (step S43). The second registration unit 2314 registers the determination target image as a training image for the model corresponding to the domain determined in step S43 (step S44).
 そして、ドメイン判定部2312は、次の判定対象の画像がデータセットにあるか否かを判定する(ステップS45)。ドメイン判定部2312は、データセットの全ての画像のドメインを判定した場合には、次の判定対象のドメインがないと判定して(ステップS45:No)、処理を終了する。ドメイン判定部2312は、データセットの全ての画像のドメインを判定していない場合には、次の判定対象のドメインがあると判定して(ステップS45:Yes)、ステップS42に戻り、次の画像に対するドメイン判定を実行する。 Then, the domain determination unit 2312 determines whether or not the next image to be determined exists in the data set (step S45). When the domains of all the images in the data set have been determined, the domain determination unit 2312 determines that there is no domain to be determined next (step S45: No), and ends the process. If the domains of all the images in the data set have not been determined, the domain determining unit 2312 determines that there is a domain to be determined next (step S45: Yes), returns to step S42, and determines the next image. perform domain determination on
 図11は、実施の形態2に係る訓練用画像取得処理の他の処理手順を示すフローチャートである。訓練装置20は、各ドメインで訓練用画像の枚数が十分確保できない場合などにおいて、データセットの画像を、他のドメインの画像に変換して、各ドメインの訓練用画像を生成する。 FIG. 11 is a flowchart showing another processing procedure of the training image acquisition process according to the second embodiment. The training device 20 converts the images of the dataset into images of other domains to generate training images of each domain, for example, when a sufficient number of training images cannot be secured in each domain.
 具体的には、図11に示すように、訓練装置20は、変換対象のドメインとして、訓練用画像の枚数が十分確保できないドメインの入力を受け付けると(ステップS51)、データセット取得部2311が、データセットを取得する(ステップS52)。 Specifically, as shown in FIG. 11, when the training device 20 receives an input of a domain for which a sufficient number of training images cannot be secured as a domain to be transformed (step S51), the dataset acquisition unit 2311 A data set is obtained (step S52).
 そして、ドメイン判定部2312は、ステップS52において取得されたデータセットから、任意に選択した変換対象の画像を参照し(ステップS53)、この画像が属するドメインを判定する(ステップS54)。 Then, the domain determining unit 2312 refers to an arbitrarily selected image to be transformed from the data set acquired in step S52 (step S53), and determines the domain to which this image belongs (step S54).
 そして、ドメイン変換部2313は、この画像が変換対象のドメインに属していない場合には、この画像を、変換対象のドメインの画像に変換する(ステップS55)。第2の登録部2314は、ステップS55においてドメインが変換された画像を、変換対象のドメインの訓練用ドメインに対応するモデルの訓練用画像として登録する(ステップS56)。 Then, if the image does not belong to the conversion target domain, the domain conversion unit 2313 converts this image into the conversion target domain image (step S55). The second registration unit 2314 registers the image whose domain has been transformed in step S55 as a training image of the model corresponding to the training domain of the transformation target domain (step S56).
 そして、訓練装置20は、次の変換対象の画像があるか否かを判定する(ステップS57)。訓練装置20は、変換対象のドメインについて、訓練用画像の枚数が十分確保できた場合には、次の変換対象のドメインがないと判定して(ステップS57:No)、処理を終了する。訓練装置20は、変換対象のドメインについて、訓練用画像の枚数が十分確保できていない場合には、次の変換対象の画像があると判定して(ステップS57:Yes)、ステップS53に戻り、次の画像に対して、変換対象のドメインへの変換処理を実行する。 Then, the training device 20 determines whether or not there is an image to be transformed next (step S57). If a sufficient number of training images can be secured for the domain to be transformed, the training device 20 determines that there is no domain to be transformed next (step S57: No), and ends the process. If the number of training images is not sufficient for the domain to be transformed, the training device 20 determines that there is an image to be transformed next (step S57: Yes), returns to step S53, The next image is transformed into the domain to be transformed.
[訓練処理]
 次に、訓練装置20による推論処理について説明する。図12は、実施の形態2に係る訓練処理の処理手順を示すフローチャートである。
[Training process]
Next, inference processing by the training device 20 will be described. FIG. 12 is a flowchart of a training process procedure according to the second embodiment.
 図12に示すように、訓練装置20は、訓練対象のドメインの入力を受け付けると(ステップS61)、訓練部232は、訓練対象のドメインに対応する訓練用画像群を選択する(ステップS62)。訓練部232は、選択した選択した訓練用画像群を用いて、モデルの訓練を実行する(ステップS63)。 As shown in FIG. 12, when the training device 20 receives an input of a training target domain (step S61), the training unit 232 selects a training image group corresponding to the training target domain (step S62). The training unit 232 executes model training using the selected training image group (step S63).
[実施の形態2の効果]
 このように、訓練装置20は、データセットの各画像のドメインを判定して、ドメインごとに訓練用画像群を用意している。したがって、訓練装置20では、各ドメインに属する訓練用画像群を用いて、ドメインごとにモデルを訓練する。このため、訓練装置20によれば、各ドメインに対応したモデルを適切に訓練することができ、モデルの推論精度の向上を図ることができる。
[Effect of Embodiment 2]
Thus, the training device 20 determines the domain of each image in the dataset and prepares training images for each domain. Therefore, the training device 20 trains a model for each domain using a group of training images belonging to each domain. Therefore, according to the training device 20, the model corresponding to each domain can be appropriately trained, and the inference accuracy of the model can be improved.
 また、訓練装置20では、各ドメインで訓練用画像の枚数が十分確保できない場合には、データセットの画像を、確保したいドメインの画像に変換して、各ドメインの訓練用画像を生成する。したがって、訓練装置20では、ドメインごとに十分な枚数の訓練用画像を確保することができる。このため、訓練装置20によれば、いずれのドメインについてもモデルの訓練を適切に実行することができ、モデルの推論精度の向上を図ることができる。 In addition, in the training device 20, when a sufficient number of training images cannot be secured in each domain, the images of the dataset are converted into images of the desired domain to generate training images of each domain. Therefore, the training device 20 can secure a sufficient number of training images for each domain. Therefore, according to the training device 20, model training can be appropriately executed for any domain, and the inference accuracy of the model can be improved.
[実施の形態2の変形例]
 図13に示す訓練装置20Aのように、図9に示すドメイン判定部2312を、ドメイン判定部2312Aに代えた制御部23Aを有してもよい。
[Modification of Embodiment 2]
As in the training apparatus 20A shown in FIG. 13, a control unit 23A may be provided in which the domain determination unit 2312 shown in FIG. 9 is replaced with a domain determination unit 2312A.
 そこで、ドメイン判定部2312Aは、例えば輝度による分割を行う場合、予め輝度に関する閾値と糊代幅とを定めておき、「閾値-糊代幅」以上出る場合にはドメインと判定し、「閾値+糊代幅」を下回る場合にはドメインが夜であると判定する。 Therefore, for example, when performing division by luminance, the domain determination unit 2312A determines a threshold value and margin width for luminance in advance, and determines that it is a domain when the value is equal to or greater than "threshold - margin width". If it is less than "margin width", it is determined that the domain is at night.
 ドメイン判定部2312による判定では、閾値付近の訓練用画像の枚数が少なくなりやすく、閾値付近の推論用画像に対して各ドメインのモデルを適用する際に精度低下を生じる場合がある。これに対し、ドメイン判定部2312Aによる判定では、閾値付近の訓練用画像の枚数を増やすことができ、各ドメインのモデルの精度低下も低減することができる。 In the determination by the domain determination unit 2312, the number of training images near the threshold tends to be small, and the accuracy may decrease when applying the models of each domain to the inference images near the threshold. On the other hand, in the determination by the domain determination unit 2312A, the number of training images near the threshold can be increased, and the deterioration of the model accuracy of each domain can be reduced.
[評価実験]
 実施の形態1,2における推論装置10,10A及び訓練装置20,20Aに対する評価実験A0,A1,Bを行った。
[Evaluation experiment]
Evaluation experiments A0, A1 and B were performed on the inference devices 10 and 10A and the training devices 20 and 20A in the first and second embodiments.
 評価実験A0,A1,Bに共通の評価指標について説明する。評価実験A0,A1,Bでは、rank-k、及び、mAP(mean Average Precision)による評価を行った。rank-kは、「あるクエリに対して距離の近い順にギャラリを並び替えたとき、上位k枚の中に本人のものが1枚でも現れる確率」の全クエリに関する平均である。rank-kは、0~1の値を取り、値が大きいほど精度が良い。 The evaluation index common to the evaluation experiments A0, A1, and B will be explained. In evaluation experiments A0, A1, and B, evaluation was performed using rank-k and mAP (mean Average Precision). rank-k is the average for all queries of "the probability that even one image of the person himself appears among the top k images when the gallery is rearranged in descending order of distance to a certain query". rank-k takes a value between 0 and 1, and the higher the value, the better the accuracy.
 mAPは、「適合率(あるクエリに関して上位k枚のギャラリのうち本人のものが占める割合)のkに関する平均」の全クエリに関する平均である。mAPは、0~1の値を取り、値が大きいほど精度が良い。 mAP is the average for all queries of the "average for k of the precision rate (the percentage of the top k galleries for a given query that are occupied by the person himself/herself)". mAP takes a value between 0 and 1, and the higher the value, the better the accuracy.
[評価実験A0]
 評価実験A0について説明する。評価実験A0では、屋外で人物照合を行う場合、モデル切り替えの有無に応じて精度の差があるか否かを評価した。
[Evaluation experiment A0]
Evaluation experiment A0 will be described. In the evaluation experiment A0, it was evaluated whether or not there is a difference in accuracy depending on the presence or absence of model switching when person verification is performed outdoors.
 評価実験A0では、MSMTデータセットを使用した。MSMTデータセットは、訓練用画像群及び推論用画像群からなるデータセットであり、朝~夜に撮影された画像を幅広く含む。 The MSMT dataset was used in the evaluation experiment A0. The MSMT dataset is a dataset consisting of a training image group and an inference image group, and includes a wide range of images taken from morning to night.
 訓練時には、図10に示す処理手順にしたがって、MSMTデータセットの各画像をドメイン判定し、判定したドメインに応じてドメインごとに分割した訓練用画像を用いて、各ドメインに対応するモデルを訓練した。また、推論時には、ギャラリ画像のドメインを判定して、ギャラリ画像のドメインに応じてモデルを切り替えて推論を行った。評価実験A0では、クエリ画像とギャラリ画像とのドメインが同一である場合について評価した。 At the time of training, according to the processing procedure shown in FIG. 10, each image of the MSMT dataset is subjected to domain determination, and a training image divided into each domain according to the determined domain is used to train a model corresponding to each domain. . During inference, the domain of the gallery image was determined, and inference was performed by switching the model according to the domain of the gallery image. In evaluation experiment A0, the case where the domain of the query image and the gallery image was the same was evaluated.
 評価実験A0では、一例として、輝度を基準にして、MSMTデータセットを、3つのドメイン(例えば、朝、昼、夜とする。)の各訓練用画像に分割した。そして、評価実験A0では、訓練画像群において各ドメインに属する画像数が等しくなる閾値を採用する。また、訓練用画像の分割については、実施の形態2の変形例に示す糊代を持たせて分割する場合も評価した。推論時には、輝度を基に推論用画像のドメインを判定した。 In the evaluation experiment A0, as an example, the MSMT dataset was divided into training images of three domains (for example, morning, noon, and night) on the basis of brightness. Then, in the evaluation experiment A0, a threshold value that makes the number of images belonging to each domain equal in the training image group is adopted. In addition, regarding the division of the training image, evaluation was also made in the case of division with a margin shown in the modified example of the second embodiment. During inference, the domain of the inference image was determined based on the luminance.
 モデルの訓練と評価の手順を示す。まず、ドメインを分ける輝度に関する境界値を定める。具体的には、式(2)に示す全ピクセルの輝度の平均値Lが、θ・(1-α)より大きい画像が属するドメインを朝ドメインと便宜上よぶ。また、輝度の平均値Lが、θ・(1+α)より小さく、θ・(1-α)より大きい画像が属するドメインを昼ドメインと便宜上よぶ。また、輝度の平均値Lがθ・(1+α)小さい大きい画像が属するドメインを夜ドメインと便宜上よぶ。 Demonstrate the steps for model training and evaluation. First, a boundary value relating to luminance that divides domains is determined. Specifically, a domain to which an image in which the average luminance value L of all pixels shown in Equation (2) is greater than θ 1 ·(1−α) will be referred to as a morning domain for convenience. For convenience, a domain to which an image having an average luminance value L smaller than θ 1 ·(1+α) and larger than θ 2 ·(1−α) belongs is called a daytime domain. For convenience, a domain to which a large image whose average luminance value L is smaller than θ 2 ·(1+α) belongs is called a night domain.
 閾値については、θ=61.7、θ=85.6を用いるほか、糊代はα∈{0.00,0.04,0.08,0.12,0.16}の各パターンを試す。なお、MSMTの訓練用画像群において輝度の平均と標準偏差とは、それぞれ76.1、27.7である。 As for the thresholds, θ 1 =61.7 and θ 2 =85.6 are used, and each pattern of margin α∈{0.00, 0.04, 0.08, 0.12, 0.16} is tested. The mean and standard deviation of luminance in the MSMT training image group are 76.1 and 27.7, respectively.
 そして、訓練装置20,20Aは、訓練用画像群をドメインごとに分ける。これによって、MSMTデータセットは、朝、昼、夜の各ドメインの訓練用画像群に分かれる。そして、訓練装置20,20Aは、各ドメインの訓練用画像群を用いて、各ドメインに対応するモデルを訓練する。推論時において、推論装置10は、ドメイン判定を行い、推論用画像群をドメインごとに分ける。これによって、朝、昼、夜の各ドメインのクエリ画像群およびギャラリ画像群ができる。 Then, the training devices 20 and 20A divide the training image group by domain. As a result, the MSMT dataset is divided into training image groups for morning, noon, and night domains. Then, the training devices 20 and 20A train a model corresponding to each domain using the training image group of each domain. At the time of inference, the inference device 10 performs domain determination and divides the inference image group by domain. This results in query images and gallery images for the morning, noon, and night domains.
 続いて、推論装置10における各モデルを評価する。朝、昼または夜のモデルを、それぞれ対応する朝、昼または夜のクエリ画像群および朝、昼または夜のギャラリ画像群に適用し、ドメインごとに特徴量ベクトルの距離を計算することで、rank-kとmAPの値を算出する。 Next, each model in the inference device 10 is evaluated. By applying the morning, noon, or night model to the corresponding morning, noon, or night query image group and the morning, noon, or night gallery image group, and calculating the feature vector distance for each domain, rank Calculate the values of -k and mAP.
 次に、汎用モデルの訓練と評価の手順の概要を説明する。汎用モデルの訓練では、MSMTデータセットをそのまま使用して、汎用モデルを訓練する。そして、推論時には、推論用画像群をドメインごとに分ける。これによって、朝、昼、夜の各ドメインのクエリ画像群およびギャラリ画像群ができる。 Next, we will outline the procedure for training and evaluating the general-purpose model. In general model training, the MSMT dataset is used as is to train a general model. Then, at the time of inference, the inference image group is divided for each domain. This results in query images and gallery images for the morning, noon, and night domains.
 続いて、汎用モデルを評価する。汎用モデルを朝/昼/夜のクエリ画像群および朝/昼/夜のギャラリ画像群に適用し、ドメインごとに特徴量ベクトルの距離を計算することで、rank-kとmAPの値を算出する。 Next, evaluate the general-purpose model. Apply the generic model to the morning/day/night query image group and the morning/day/night gallery image group, and calculate the distance of the feature vectors for each domain to calculate rank-k and mAP values. .
 評価実験A0の結果を表1に示す。 Table 1 shows the results of the evaluation experiment A0.
Figure JPOXMLDOC01-appb-T000020
Figure JPOXMLDOC01-appb-T000020
 表1に示すように、汎用モデル(実験データA0-00)より、実施の形態1,2におけるドメインごとに訓練したモデル(例えば、実験データA0-13)の方が、推論精度が高い結果となった。したがって、実施の形態1のようにギャラリ画像のドメインに応じてモデルを切り替えることで、有効な結果が得られた。 As shown in Table 1, the model trained for each domain in Embodiments 1 and 2 (for example, experimental data A0-13) has higher inference accuracy than the general-purpose model (experimental data A0-00). became. Therefore, effective results were obtained by switching the model according to the domain of the gallery image as in the first embodiment.
 この際、訓練装置20のドメイン判定部2312による単純なデータセット分割(実験データA0-10)より、訓練装置20Aのドメイン判定部2312Aによる糊代を用いたデータセット分割(実験データA0-13)の方が有効であった。糊代を用いてデータセット分割の場合、α=0.12付近で精度のピークを迎えるため、この例では、α=0.12と設定すればよい。 At this time, the data set division (experimental data A0-13) using the margin by the domain judgment unit 2312A of the training device 20A is more than the simple data set division (experimental data A0-10) by the domain judgment unit 2312 of the training device 20. was more effective. In the case of data set division using margins, the accuracy peaks around α=0.12, so in this example, α=0.12 should be set.
[評価実験A1]
 評価実験A1について説明する。評価実験A1では、クエリ画像とギャラリ画像との間でドメインが異なる場合、クエリ画像のドメイン変換の有無、すなわち、各ドメインのクエリ画像の特徴量の登録の有無に応じて、精度の差があるか否かを評価した。
[Evaluation experiment A1]
Evaluation experiment A1 will be described. In the evaluation experiment A1, when the domains of the query image and the gallery image are different, there is a difference in accuracy depending on whether or not the domain conversion of the query image is performed, that is, whether or not the feature amount of the query image of each domain is registered. evaluated whether or not
 評価実験A1では、MSMTデータセットを使用した。訓練時には、MSMTデータセットの各画像をドメイン判定し、判定したドメインに応じてドメインごとに分割した訓練用画像を用いて、各ドメインに対応するモデルを訓練した。推論時には、ギャラリ画像のドメインを判定して、ギャラリ画像のドメインに応じてモデルを切り替えて推論を行った。また、評価実験A1では、クエリ画像とギャラリ画像とのドメインが異なる場合について評価した。 The MSMT dataset was used in the evaluation experiment A1. During training, each image in the MSMT data set was subjected to domain determination, and training images divided into domains according to the determined domain were used to train a model corresponding to each domain. At the time of inference, the domain of the gallery image was determined, and inference was performed by switching the model according to the domain of the gallery image. Also, in the evaluation experiment A1, the case where the domain of the query image and the gallery image was different was evaluated.
 評価実験A1では、訓練用画像は評価実験A0と同じ設定とした。なお、糊代は、α=0.12に固定する。訓練用画像群を3つのドメイン(例えば、朝、昼、夜とする。)ごとに分ける。これによって、MSMTデータセットは、朝、昼、夜の各ドメインの訓練用画像群に分かれる。そして、各ドメインの訓練用画像群を用いて、各ドメインに対応するモデルを訓練する。 In the evaluation experiment A1, the training images were set the same as in the evaluation experiment A0. The glue margin is fixed at α=0.12. The training images are divided into three domains (for example, morning, noon, and night). As a result, the MSMT dataset is divided into training image groups for morning, noon, and night domains. Then, the training image group of each domain is used to train a model corresponding to each domain.
 推論時において、推論装置10は、ドメイン判定を行い、推論用画像群をドメインごとに分ける。これによって、朝、昼、夜の各ドメインのクエリ画像群およびギャラリ画像群ができる。 At the time of inference, the inference device 10 performs domain determination and divides the inference image group by domain. This results in query images and gallery images for the morning, noon, and night domains.
 なお、推論装置10では、輝度によるクエリ画像のドメイン変換を採用する。変換先のドメインの平均的な輝度L´は、訓練用画像群における各ドメインに属する画像群の輝度の平均値とする。続いて、推論装置10における各モデルを評価する。朝、昼または夜のモデルを、それぞれ対応する朝、昼または夜のクエリ画像群および朝、昼または夜のギャラリ画像群に適用し、ドメインごとに特徴量ベクトルの距離を計算することで、rank-kとmAPの値を算出する。 Note that the inference device 10 employs domain conversion of the query image based on luminance. The average luminance L′ of the conversion destination domain is the average luminance value of the image group belonging to each domain in the training image group. Subsequently, each model in the inference device 10 is evaluated. By applying the morning, noon, or night model to the corresponding morning, noon, or night query image group and the morning, noon, or night gallery image group, and calculating the feature vector distance for each domain, rank Calculate the values of -k and mAP.
 また、推論時にクエリ画像の変換を行わない場合には、各ドメインのモデル評価の際に、朝/昼/夜の専用モデルを、昼+夜/夜+朝/朝+昼のクエリ画像群そのままと、朝/昼/夜のギャラリ画像群とに適用し、ドメインごとに特徴量ベクトルの距離を計算することで、rank-1とmAPの値を算出する。 In addition, if the query image is not converted during inference, the dedicated model for morning/day/night is used as it is when evaluating the model for each domain. and gallery images of morning/day/night, and calculate the distance of the feature vector for each domain to calculate the values of rank-1 and mAP.
 次に、汎用モデルの訓練と評価の手順の概要を説明する。汎用モデルの訓練では、MSMTデータセットをそのまま使用して、汎用モデルを訓練する。そして、推論時には、推論用画像群をドメインごとに分ける。これによって、朝、昼、夜の各ドメインのクエリ画像群およびギャラリ画像群ができる。 Next, we will outline the procedure for training and evaluating the general-purpose model. In general model training, the MSMT dataset is used as is to train a general model. Then, at the time of inference, the inference image group is divided for each domain. This results in query images and gallery images for the morning, noon, and night domains.
 続いて、汎用モデルを評価する。汎用モデルを昼+夜/夜+朝/朝+昼のクエリ画像群を朝/昼/夜にドメイン変換した画像群と、朝/昼/夜のギャラリ画像群とに適用し、ドメインごとに特徴量ベクトルの距離を計算することで、rank-kとmAPの値を算出する。また、クエリのドメイン変換を行わない場合には、汎用モデルを、昼+夜/夜+朝/朝+昼のクエリ画像群そのままと、朝/昼/夜のギャラリ画像群とに適用し、ドメインごとに特徴量ベクトルの距離を計算することで、rank-1とmAPの値を算出する。 Next, evaluate the general-purpose model. The generic model is applied to the images obtained by domain-converting the day + night/night + morning/morning + day query images to morning/day/night and the morning/day/night gallery images. Calculate the rank-k and mAP values by calculating the distance of the quantity vector. In addition, when the domain conversion of the query is not performed, the general model is applied to the query image group of day + night / night + morning / morning + day as it is and the gallery image group of morning / day / night, domain The values of rank-1 and mAP are calculated by calculating the distance of the feature amount vector for each.
 評価実験A1の結果を表2に示す。 Table 2 shows the results of the evaluation experiment A1.
Figure JPOXMLDOC01-appb-T000021
Figure JPOXMLDOC01-appb-T000021
 表2に示すように、汎用モデルについても、実施の形態1,2におけるドメインごとに訓練したモデルについても、クエリ画像のドメイン変換を行ったクエリ画像の特徴量を用いて照合を行った方が、推論精度が高い結果となった。また、クエリ画像のドメイン変換を行わない場合であっても、ギャラリ画像のドメインに応じてモデルを切り替えた方が、推論精度が高い結果となった。 As shown in Table 2, for both the general-purpose model and the model trained for each domain in Embodiments 1 and 2, it is better to perform matching using the feature amount of the query image that has undergone domain transformation of the query image. , the inference accuracy was high. In addition, even when the domain conversion of the query image was not performed, the inference accuracy was higher when the model was switched according to the domain of the gallery image.
 このように、評価実験A1では、クエリ画像とギャラリ画像との間でドメインが異なる場合、推論装置10において、クエリ画像のドメイン変換を行うことが有効であるという結果が得られた。 In this way, in the evaluation experiment A1, when the query image and the gallery image had different domains, it was found that it is effective to perform the domain conversion of the query image in the inference device 10 .
[評価実験B]
 次に、評価実験Bについて説明する。評価実験Bでは、公開データセットではなく実データに適用する場合、モデルの訓練用画像として、分割及び変換による各ドメインの訓練用画像のいずれが好適であるかを評価した。
[Evaluation experiment B]
Next, evaluation experiment B will be described. In the evaluation experiment B, when applied to real data instead of a public data set, it was evaluated which of the training images of each domain by segmentation and transformation is suitable as training images for the model.
 分割による訓練用画像は、図10に示す処理手順に示す、訓練装置20のドメイン判定部2312によるデータセット分割によって用意された各ドメインの訓練用画像である。分割による訓練用画像は、図11に示す処理手順に示す、ドメイン変換部2313によるドメイン変換(図11参照)による訓練用画像の生成によって用意された各ドメインの訓練用画像である。 The divided training image is a training image of each domain prepared by data set division by the domain determination unit 2312 of the training device 20 shown in the processing procedure shown in FIG. The divided training image is a training image of each domain prepared by generating a training image by domain conversion (see FIG. 11) by the domain conversion unit 2313 shown in the processing procedure shown in FIG.
 訓練用画像として、MSMTデータセットをドメイン判定してドメインごとに分割して用意した訓練用画像群と、MSMTデータセットの画像をドメイン変換することでドメインごとに生成した訓練用画像群とを用意した。 As training images, a group of training images prepared by dividing the MSMT data set into domains and a group of training images generated for each domain by subjecting the images of the MSMT data set to domain conversion are prepared. bottom.
 推論用画像は、実データ風データセットを用いた。実データ風データセットは、朝~夜に撮影された画像を幅広く含むほか、大型ディスプレイによる色彩及び彩度(色味)の影響など、公開データセットには含まれない実データならではの難しさを含む。なお、実データそのものを取得することは難しいため、本実験では、公開データセットの画像の輝度を、変換することでドメイン変換したMarket1501の推論用画像群にて代用した。 A real data-like dataset was used for the inference image. The real-data-like dataset includes a wide range of images taken from morning to night, and the difficulties unique to real data that are not included in the public dataset, such as the effects of color and saturation (color) on a large display. include. Since it is difficult to obtain the actual data itself, in this experiment, the inference image group of Market 1501, which is domain-converted by converting the brightness of the image of the public dataset, was substituted.
 また、推論時において、ドメインの分け方は、時刻及び色味に基に行っている。時刻は日中または夜間のラベルが各画像に付与済みであるため、それを利用する。色味については、前述した赤、青、黄で分ける。なお、この評価実験Bでは、クエリ画像とギャラリ画像との間でドメインは同一のものとする。 Also, during inference, domains are divided based on time and color. As for the time, the daytime or nighttime label is already assigned to each image, so that label is used. Colors are classified into red, blue, and yellow as described above. Note that in this evaluation experiment B, the domain is the same between the query image and the gallery image.
 分割による訓練用画像を用いたモデルの訓練と評価の手順について説明する。まず、訓練装置20は、訓練用画像群をドメインごとに分ける。これによって、MSMTデータセットは、日中、夜間赤、夜間青、夜間黄の各ドメインの訓練用画像群に分かれる。訓練装置20は、各ドメインの訓練用画像群を用いて、各ドメインに対応するモデルを訓練する。 We will explain the procedure for training and evaluating the model using the segmented training image. First, the training device 20 divides the training image groups into domains. This divides the MSMT dataset into training images for the day, night red, night blue, and night yellow domains. The training device 20 trains a model corresponding to each domain using the training image group of each domain.
 推論時において、推論装置10は、ドメイン判定を行い、推論用画像群をドメインごとに分ける。これによって、日中、夜間赤、夜間青、夜間黄の各ドメインのクエリ画像群およびギャラリ画像群ができる。続いて、推論装置10における各モデルを評価する。日中/夜間赤/夜間青/夜間黄の専用モデルを日中/夜間赤/夜間青/夜間黄のクエリ画像群および日中/夜間赤/夜間青/夜間黄のギャラリ画像群に適用し、ドメインごとに特徴量ベクトルの距離を計算することで、rank-1とmAPの値を算出する。 At the time of inference, the inference device 10 performs domain determination and divides the inference image group by domain. This results in query and gallery images for the day, night red, night blue, and night yellow domains. Subsequently, each model in the inference device 10 is evaluated. applying a day/night red/night blue/night yellow dedicated model to the day/night red/night blue/night yellow query images and the day/night red/night blue/night yellow gallery images, By calculating the distance of the feature vector for each domain, rank-1 and mAP values are calculated.
 変換による訓練用画像を用いたモデルの訓練と評価の手順について説明する。まず、訓練装置20は、訓練用画像群を各ドメインに変換する。これによって、MSMTデータセットは、日中、夜間赤、夜間青、夜間黄の各ドメインに変換される。変換として、輝度による変換を採用する。実データ画像のドメインごとのRGB値の平均及び分散に合うように変換を行った。そして、訓練装置20は、各ドメインの訓練用画像群を用いて、各ドメインに対応するモデルを訓練する。推論装置10は、分割による訓練用画像を用いた場合と同様に、各モデルを評価する。評価対象のモデルは、各ドメインに対応するモデルの他、汎用モデルを含む。 We will explain the procedure for training and evaluating the model using the transformed training image. First, the training device 20 transforms the training image group into each domain. This transforms the MSMT dataset into day, night red, night blue, and night yellow domains. As conversion, conversion by luminance is adopted. A transformation was performed to fit the mean and variance of the RGB values for each domain of the real data image. Then, the training device 20 trains a model corresponding to each domain using the training image group of each domain. The inference device 10 evaluates each model in the same way as when using the divided training images. Models to be evaluated include models corresponding to each domain as well as general-purpose models.
 評価実験Bの結果を表3に示す。 Table 3 shows the results of evaluation experiment B.
Figure JPOXMLDOC01-appb-T000022
Figure JPOXMLDOC01-appb-T000022
 表3に示すように、汎用モデルについても、ドメインごとに訓練されたモデルについても、分割による訓練用画像を用いた場合と比して、ドメイン変換により生成した訓練用画像群を用いて訓練した方が、精度が高い結果となった。このため、モデルの訓練用画像は、分割による各ドメインの訓練用画像よりも、変換による各ドメインの訓練用画像が好適であるという結果が得られた。 As shown in Table 3, both the general-purpose model and the model trained for each domain were trained using the training image group generated by the domain transformation compared to the case of using the training image by segmentation. The result was more accurate. For this reason, it was found that training images of each domain obtained by transformation are preferable to training images of each domain obtained by division.
 そして、汎用モデルと比して、ギャラリ画像のドメインに応じてモデルを切り替えた方が有効であった。実データに適用する場合、データセットの枚数が少ない場合や、推論時のドメインをカバーしていない場合があるため、訓練装置20では、ドメイン変換を行って、訓練用画像を生成することが望ましい。 And, compared to the general-purpose model, it was more effective to switch the model according to the domain of the gallery image. When applied to real data, the number of data sets may be small, or the domain at the time of inference may not be covered. Therefore, it is desirable that the training device 20 performs domain conversion to generate training images. .
 実施の形態では、朝、昼、夜の時間帯によるドメインを例に説明したが、ドメインは、これに限定されるものではない。例えば、ドメインは、天候やライティング(光源)の違いによるドメインであってもよい。天候によるドメインは、例えば、晴れ、曇り、雨、雪などがある。また、季節や時間帯の変化による太陽の位置によるドメインは、順光、逆光などがある。また、人物の姿勢によるドメインを設定してもよく、この場合には、直立、椅子等に座っている、逆立ちなどがある。 In the embodiment, an example of a domain by time zone of morning, noon, and night was explained, but the domain is not limited to this. For example, the domain may be due to differences in weather or lighting (light source). Weather domains include, for example, sunny, cloudy, rainy, and snowy. Domains based on the position of the sun due to changes in seasons and time zones include front light and backlight. Also, a domain may be set according to a person's posture, and in this case, there are upright, sitting on a chair, handstand, and the like.
[実施の形態のシステム構成について]
 推論装置10,10A及び訓練装置20,20Aの各構成要素は機能概念的なものであり、必ずしも物理的に図示のように構成されていることを要しない。すなわち、推論装置10,10A及び訓練装置20,20Aの機能の分散及び統合の具体的形態は図示のものに限られず、その全部または一部を、各種の負荷や使用状況などに応じて、任意の単位で機能的または物理的に分散または統合して構成することができる。
[Regarding the system configuration of the embodiment]
Each component of the inference devices 10, 10A and the training devices 20, 20A is functionally conceptual, and does not necessarily need to be physically configured as illustrated. That is, the specific form of distribution and integration of the functions of the inference devices 10, 10A and the training devices 20, 20A is not limited to the illustrated one, and all or part of them can be arbitrarily changed according to various loads and usage conditions. can be functionally or physically distributed or integrated in units of
 また、推論装置10,10A及び訓練装置20,20Aにおいておこなわれる各処理は、全部または任意の一部が、CPU、GPU(Graphics Processing Unit)、及び、CPU、GPUにより解析実行されるプログラムにて実現されてもよい。また、推論装置10,10A及び訓練装置20,20Aにおいておこなわれる各処理は、ワイヤードロジックによるハードウェアとして実現されてもよい。 In addition, all or any part of the processing performed in the inference devices 10, 10A and the training devices 20, 20A is a CPU, a GPU (Graphics Processing Unit), and a program that is analyzed and executed by the CPU and GPU. may be implemented. Further, each process performed in the inference devices 10, 10A and the training devices 20, 20A may be realized as hardware by wired logic.
 また、実施の形態において説明した各処理のうち、自動的におこなわれるものとして説明した処理の全部または一部を手動的に行うこともできる。もしくは、手動的におこなわれるものとして説明した処理の全部または一部を公知の方法で自動的に行うこともできる。この他、上述及び図示の処理手順、制御手順、具体的名称、各種のデータやパラメータを含む情報については、特記する場合を除いて適宜変更することができる。 Also, among the processes described in the embodiments, all or part of the processes described as being performed automatically can also be performed manually. Alternatively, all or part of the processes described as being performed manually can be performed automatically by known methods. In addition, the above-described and illustrated processing procedures, control procedures, specific names, and information including various data and parameters can be changed as appropriate unless otherwise specified.
[プログラム]
 図14は、プログラムが実行されることにより、推論装置10,10A及び訓練装置20,20Aが実現されるコンピュータの一例を示す図である。コンピュータ1000は、例えば、メモリ1010、CPU1020を有する。また、コンピュータ1000は、ハードディスクドライブインタフェース1030、ディスクドライブインタフェース1040、シリアルポートインタフェース1050、ビデオアダプタ1060、ネットワークインタフェース1070を有する。これらの各部は、バス1080によって接続される。
[program]
FIG. 14 is a diagram showing an example of a computer that implements the inference devices 10 and 10A and the training devices 20 and 20A by executing programs. The computer 1000 has a memory 1010 and a CPU 1020, for example. Computer 1000 also has hard disk drive interface 1030 , disk drive interface 1040 , serial port interface 1050 , video adapter 1060 and network interface 1070 . These units are connected by a bus 1080 .
 メモリ1010は、ROM1011及びRAM1012を含む。ROM1011は、例えば、BIOS(Basic Input Output System)等のブートプログラムを記憶する。ハードディスクドライブインタフェース1030は、ハードディスクドライブ1090に接続される。ディスクドライブインタフェース1040は、ディスクドライブ1100に接続される。例えば磁気ディスクや光ディスク等の着脱可能な記憶媒体が、ディスクドライブ1100に挿入される。シリアルポートインタフェース1050は、例えばマウス1110、キーボード1120に接続される。ビデオアダプタ1060は、例えばディスプレイ1130に接続される。 The memory 1010 includes a ROM 1011 and a RAM 1012. The ROM 1011 stores a boot program such as BIOS (Basic Input Output System). Hard disk drive interface 1030 is connected to hard disk drive 1090 . A disk drive interface 1040 is connected to the disk drive 1100 . A removable storage medium such as a magnetic disk or optical disk is inserted into the disk drive 1100 . Serial port interface 1050 is connected to mouse 1110 and keyboard 1120, for example. Video adapter 1060 is connected to display 1130, for example.
 ハードディスクドライブ1090は、例えば、OS(Operating System)1091、アプリケーションプログラム1092、プログラムモジュール1093、プログラムデータ1094を記憶する。すなわち、推論装置10,10A及び訓練装置20,20Aの各処理を規定するプログラムは、コンピュータ1000により実行可能なコードが記述されたプログラムモジュール1093として実装される。プログラムモジュール1093は、例えばハードディスクドライブ1090に記憶される。例えば、推論装置10,10A及び訓練装置20,20Aにおける機能構成と同様の処理を実行するためのプログラムモジュール1093が、ハードディスクドライブ1090に記憶される。なお、ハードディスクドライブ1090は、SSD(Solid State Drive)により代替されてもよい。 The hard disk drive 1090 stores an OS (Operating System) 1091, application programs 1092, program modules 1093, and program data 1094, for example. That is, a program that defines each process of the inference devices 10, 10A and the training devices 20, 20A is implemented as a program module 1093 in which code executable by the computer 1000 is written. Program modules 1093 are stored, for example, on hard disk drive 1090 . For example, the hard disk drive 1090 stores a program module 1093 for executing processing similar to the functional configurations of the inference devices 10 and 10A and the training devices 20 and 20A. The hard disk drive 1090 may be replaced by an SSD (Solid State Drive).
 また、上述した実施の形態の処理で用いられる設定データは、プログラムデータ1094として、例えばメモリ1010やハードディスクドライブ1090に記憶される。そして、CPU1020が、メモリ1010やハードディスクドライブ1090に記憶されたプログラムモジュール1093やプログラムデータ1094を必要に応じてRAM1012に読み出して実行する。 Also, the setting data used in the processing of the above-described embodiment is stored as program data 1094 in the memory 1010 or the hard disk drive 1090, for example. Then, the CPU 1020 reads out the program module 1093 and the program data 1094 stored in the memory 1010 and the hard disk drive 1090 to the RAM 1012 as necessary and executes them.
 なお、プログラムモジュール1093やプログラムデータ1094は、ハードディスクドライブ1090に記憶される場合に限らず、例えば着脱可能な記憶媒体に記憶され、ディスクドライブ1100等を介してCPU1020によって読み出されてもよい。あるいは、プログラムモジュール1093及びプログラムデータ1094は、ネットワーク(LAN(Local Area Network)、WAN(Wide Area Network)等)を介して接続された他のコンピュータに記憶されてもよい。そして、プログラムモジュール1093及びプログラムデータ1094は、他のコンピュータから、ネットワークインタフェース1070を介してCPU1020によって読み出されてもよい。 The program modules 1093 and program data 1094 are not limited to being stored in the hard disk drive 1090, but may be stored in a removable storage medium, for example, and read by the CPU 1020 via the disk drive 1100 or the like. Alternatively, the program modules 1093 and program data 1094 may be stored in another computer connected via a network (LAN (Local Area Network), WAN (Wide Area Network), etc.). Program modules 1093 and program data 1094 may then be read by CPU 1020 through network interface 1070 from other computers.
 以上、本発明者によってなされた発明を適用した実施の形態について説明したが、本実施の形態による本発明の開示の一部をなす記述及び図面により本発明は限定されることはない。すなわち、本実施の形態に基づいて当業者等によりなされる他の実施の形態、実施例及び運用技術等は全て本発明の範疇に含まれる。 Although the embodiment to which the invention made by the present inventor is applied has been described above, the present invention is not limited by the description and drawings forming part of the disclosure of the present invention according to the present embodiment. That is, other embodiments, examples, operation techniques, etc. made by those skilled in the art based on the present embodiment are all included in the scope of the present invention.
 10,10A 推論装置
 11,21 入出力部
 12,22 記憶部
 13,23 制御部
 20,20A 訓練装置
 121 クエリ特徴量データ
 122,223 モデル群
 123,123A 推論結果
 131 画像入力部
 132 ドメイン判定部
 133 ドメイン変換部
 134 モデル選択部
 135,135A 推論部
 136 第1の登録部
 221 データセット
 222 訓練用画像
 231 訓練用画像取得部
 232 訓練部
 1351 特徴量抽出部
 1352 照合部
 1352A 分類部
 2311 データセット取得部
 2312 ドメイン判定部
 2313 ドメイン変換部
 2314 第2の登録部
10, 10A inference device 11, 21 input/output unit 12, 22 storage unit 13, 23 control unit 20, 20A training device 121 query feature amount data 122, 223 model group 123, 123A inference result 131 image input unit 132 domain determination unit 133 Domain conversion unit 134 model selection unit 135, 135A inference unit 136 first registration unit 221 data set 222 training image 231 training image acquisition unit 232 training unit 1351 feature amount extraction unit 1352 matching unit 1352A classification unit 2311 data set acquisition unit 2312 Domain determination unit 2313 Domain conversion unit 2314 Second registration unit

Claims (7)

  1.  判定対象の画像の入力を受け付ける入力部と、
     被写体が撮像される環境を成立させる要素を基に、前記判定対象の画像が、環境条件によりそれぞれ定義される複数のドメインのいずれのドメインに属するかを判定する判定部と、
     を有することを特徴とする処理装置。
    an input unit that receives input of an image to be determined;
    a determination unit that determines to which of a plurality of domains each defined by environmental conditions the image to be determined belongs, based on elements that establish an environment in which the subject is imaged;
    A processing apparatus comprising:
  2.  前記判定部は、前記判定対象の画像が撮像された時刻、前記判定対象の画像の輝度、前記判定対象の画像の色彩及び彩度の少なくとも一つを基に、前記判定対象の画像が前記複数のドメインのいずれのドメインに属するかを判定することを特徴とする請求項1に記載の処理装置。 The determination unit selects the plurality of determination target images based on at least one of the time when the determination target image was captured, the brightness of the determination target image, and the color and saturation of the determination target image. 2. The processing apparatus according to claim 1, wherein the processing apparatus determines to which of the domains of .
  3.  前記入力部は、前記判定対象の画像として推論用画像の入力を受け付け、
     前記判定部は、前記推論用画像が属するドメインを判定し、
     前記判定部による判定を基に、前記複数のドメインにそれぞれ対応する複数のモデルのうち、前記判定対象の画像が属するドメインに対応するモデルを選択する選択部と、
     前記選択部によって選択されたモデルを用いて、前記推論用画像に対する推論を行う推論部と、
     をさらに有することを特徴とする請求項1または2に記載の処理装置。
    The input unit receives input of an inference image as the determination target image,
    The determination unit determines a domain to which the inference image belongs,
    a selection unit that selects a model corresponding to a domain to which the image to be determined belongs, from among a plurality of models respectively corresponding to the plurality of domains, based on the determination by the determination unit;
    an inference unit that infers the inference image using the model selected by the selection unit;
    3. The processing apparatus according to claim 1, further comprising:
  4.  前記複数のモデルは、画像の特徴量を抽出するモデルであり、
     前記ドメインごとに照合用の閾値が設定されており、
     前記推論部は、
     前記選択部によって選択されたモデルを用いて、前記推論用画像の特徴量を抽出する抽出部と、
     前記推論用画像の特徴量と照合対象が写る画像の特徴量との距離を算出し、算出した距離と、前記推論用画像が属するドメインに対して設定された照合用の閾値とを比較し、前記推論用画像の被写体が前記照合対象であるか否かを照合する照合部と、
     を有することを特徴とする請求項3に記載の処理装置。
    The plurality of models are models for extracting feature amounts of images,
    A matching threshold is set for each domain,
    The reasoning unit
    an extraction unit that extracts the feature amount of the inference image using the model selected by the selection unit;
    calculating the distance between the feature amount of the image for inference and the feature amount of the image in which the matching target is shown, and comparing the calculated distance with a matching threshold set for the domain to which the image for inference belongs, a verification unit that verifies whether the subject of the inference image is the verification target;
    4. The processing apparatus according to claim 3, comprising:
  5.  前記入力部は、前記判定対象の画像として訓練用画像の入力を受け付け、
     前記判定部は、前記訓練用画像が属するドメインを判定し、
     前記訓練用画像を、前記判定部によって判定されたドメインに対応するモデルの訓練用画像として登録する登録部
     をさらに有することを特徴とする請求項1または2に記載の処理装置。
    The input unit receives an input of a training image as the determination target image,
    The determination unit determines a domain to which the training image belongs,
    3. The processing apparatus according to claim 1, further comprising a registration unit that registers the training image as a training image of a model corresponding to the domain determined by the determination unit.
  6.  前記登録部によって登録された各ドメインの訓練用画像のうち、訓練対象となるモデルのドメインに対応する訓練用画像を選択し、選択した訓練用画像を用いて、前記モデルの訓練を実行する訓練部
     をさらに有することを特徴とする請求項5に記載の処理装置。
    training of selecting a training image corresponding to the domain of the model to be trained from among the training images of each domain registered by the registration unit, and executing training of the model using the selected training image; 6. The processing apparatus of claim 5, further comprising a unit.
  7.  処理装置が実行する処理方法であって、
     判定対象の画像の入力を受け付ける工程と、
     被写体が撮像される環境を成立させる要素を基に、前記判定対象の画像が、環境条件によりそれぞれ定義される複数のドメインのいずれのドメインに属するかを判定する工程と、
     を含んだことを特徴とする処理方法。
    A processing method executed by a processing device,
    a step of receiving input of an image to be determined;
    a step of determining to which of a plurality of domains each defined by environmental conditions the image to be determined belongs, based on elements that establish an environment in which the subject is imaged;
    A processing method characterized by including
PCT/JP2021/036325 2021-09-30 2021-09-30 Processing device and processing method WO2023053419A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/JP2021/036325 WO2023053419A1 (en) 2021-09-30 2021-09-30 Processing device and processing method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/JP2021/036325 WO2023053419A1 (en) 2021-09-30 2021-09-30 Processing device and processing method

Publications (1)

Publication Number Publication Date
WO2023053419A1 true WO2023053419A1 (en) 2023-04-06

Family

ID=85782113

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2021/036325 WO2023053419A1 (en) 2021-09-30 2021-09-30 Processing device and processing method

Country Status (1)

Country Link
WO (1) WO2023053419A1 (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020170486A1 (en) * 2019-02-18 2020-08-27 三菱電機株式会社 Image processing device, image processing method, and image processing program
WO2020202591A1 (en) * 2019-03-29 2020-10-08 日本電気株式会社 Model generation device, model adjustment device, model generation method, model adjustment method, and recording medium

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020170486A1 (en) * 2019-02-18 2020-08-27 三菱電機株式会社 Image processing device, image processing method, and image processing program
WO2020202591A1 (en) * 2019-03-29 2020-10-08 日本電気株式会社 Model generation device, model adjustment device, model generation method, model adjustment method, and recording medium

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
SAWADA, AZUSA: "Rough Domain Adaptation through Model Selection for Neural Networks", IEICE TECHNICAL REPORT, vol. 119, no. 193, August 2009 (2009-08-01), pages 109 - 113 *

Similar Documents

Publication Publication Date Title
US20200186714A1 (en) Estimating hdr lighting conditions from a single ldr digital image
CN111709902B (en) Infrared and visible light image fusion method based on self-attention mechanism
US8737738B2 (en) Parameters interpolation for high dynamic range video tone mapping
Thangadurai et al. Computer visionimage enhancement for plant leaves disease detection
Huang et al. Deep learning for image colorization: Current and future prospects
US9697583B2 (en) Image processing apparatus, image processing method, and computer-readable recording medium
CN108010024B (en) Blind reference tone mapping image quality evaluation method
Al Sobbahi et al. Low-light homomorphic filtering network for integrating image enhancement and classification
Lin An approach to adaptive infrared image enhancement for long-range surveillance
CN105550989B (en) The image super-resolution method returned based on non local Gaussian process
CN109389569B (en) Monitoring video real-time defogging method based on improved DehazeNet
CN101697593B (en) Time domain prediction-based saliency extraction method
Huang et al. SIDNet: a single image dedusting network with color cast correction
Wei et al. Sidgan: Single image dehazing without paired supervision
Liu et al. Enhanced image no‐reference quality assessment based on colour space distribution
Qian et al. Fast color contrast enhancement method for color night vision
CN111489333B (en) No-reference night natural image quality evaluation method
Hmue et al. Image enhancement and quality assessment methods in turbid water: A review article
WO2023053419A1 (en) Processing device and processing method
WO2023053420A1 (en) Processing device and processing method
Kong et al. Full-reference IPTV image quality assessment by deeply learning structural cues
WO2023110880A1 (en) Image processing methods and systems for low-light image enhancement using machine learning models
CN116977190A (en) Image processing method, apparatus, device, storage medium, and program product
Choi et al. Learning-based illuminant estimation model with a persistent memory residual network (PMRN) architecture
Vršnak et al. Illuminant segmentation for multi-illuminant scenes using latent illumination encoding

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21959453

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE