CN113706428B

CN113706428B - Image generation method and device

Info

Publication number: CN113706428B
Application number: CN202110749973.5A
Authority: CN
Inventors: 陈玉辉; 杨彭举
Original assignee: Hangzhou Hikvision Digital Technology Co Ltd
Current assignee: Hangzhou Hikvision Digital Technology Co Ltd
Priority date: 2021-07-02
Filing date: 2021-07-02
Publication date: 2024-01-05
Anticipated expiration: 2041-07-02
Also published as: CN113706428A

Abstract

The embodiment of the application provides an image generation method and device, which are used for acquiring a target non-occlusion face image to be fused and a corresponding target occlusion object image; fusing the target non-occlusion face image and the target occlusion object image to obtain a fused image to be optimized; inputting the fusion image to be optimized into a pre-trained generator network to obtain a target fusion image output by the generator network; the generator network belongs to a generating type contrast network which is obtained by training based on a discriminator network and an identity consistency network, wherein the discriminator network is used for restricting the detail information of the fused image, and the identity consistency network is used for restricting the identity consistency. Thus, an effective mask face image can be generated.

Description

Image generation method and device

Technical Field

The present disclosure relates to the field of image processing technologies, and in particular, to an image generating method and apparatus.

Background

The face recognition technology is widely applied to various fields such as access attendance checking, security inspection clearance, security monitoring, financial payment and the like. In the related art, face images can be recognized based on a pre-trained face recognition network, however, the recognition accuracy of the existing face recognition network is low for face images in which the face area of the face is blocked (may be referred to as blocked face images).

In order to improve the recognition accuracy of the face recognition network on the face image which is blocked, a sample image of the face, in which the face area of the face is blocked, can be obtained, and the face recognition network is trained based on the obtained sample image, so that the trained face recognition network can effectively recognize the face image which is blocked.

However, the face image acquired in the actual scene is often a non-occlusion face image, and thus, a method is needed to generate an effective occlusion face image.

Disclosure of Invention

The embodiment of the application aims to provide an image generation method and device for generating an effective occlusion face image. The specific technical scheme is as follows:

in order to achieve the above object, an embodiment of the present application discloses an image generating method, including:

acquiring a target non-occlusion face image to be fused and a corresponding target occlusion object image;

fusing the target non-occlusion face image and the target occlusion object image to obtain a fused image to be optimized;

inputting the fusion image to be optimized into a pre-trained generator network to obtain a target fusion image output by the generator network; the generator network belongs to a generation type reactance network which is obtained by training based on a discriminator network and an identity consistency network, wherein the discriminator network is used for restricting the detail information of the fused image, and the identity consistency network is used for restricting the identity consistency.

Optionally, the target occlusion image is: and acquiring from the occlusion face image matched with the target non-occlusion face image.

Optionally, the acquiring the target non-occlusion face image to be fused and the corresponding target occlusion object image includes:

acquiring a target non-occlusion face image to be fused and a plurality of occlusion face images containing target occlusion objects;

determining an occlusion face image matched with the target non-occlusion face image from the occlusion face images based on the face pose and/or the face attribute, and taking the occlusion face image as a target occlusion face image;

and acquiring a target shielding object image in the target shielding face image.

Optionally, the acquiring the target occlusion object image in the target occlusion face image includes:

based on an image segmentation algorithm, extracting an image of a region occupied by the target occlusion from the target occlusion face image to serve as an occlusion image to be processed;

and carrying out Gaussian blur processing on the to-be-processed shielding object image to obtain a target shielding object image.

Optionally, the fusing the target non-occlusion face image and the target occlusion object image to obtain a fused image to be optimized includes:

Acquiring face key points in the target non-occlusion face image and face key points in the target occlusion object image;

calculating affine transformation parameters between the face key points in the target non-occlusion face image and the face key points in the target occlusion object image;

and fusing the target non-occlusion face image and the target occlusion object image based on the affine transformation parameters to obtain a fused image to be optimized.

Optionally, the fusing the target non-occlusion face image and the target occlusion object image based on the affine transformation parameters to obtain a fused image to be optimized includes:

based on the affine transformation parameters, fusing the target non-occlusion face image and the target occlusion object image according to a preset formula to obtain a fused image to be optimized;

wherein, the preset formula is:

A _mix representing the fusion image to be optimized, T representing the affine transformation parameters, alpha representing a preset fusion coefficient, and G [ I ] _c ]Representing the target occlusion image, I _o And representing the occlusion face image matched with the target non-occlusion face image, wherein I represents the target non-occlusion face image.

Optionally, the generator network is obtained through training by adopting the following steps:

obtaining a non-occlusion face sample image, a target occlusion object sample image and an occlusion face sample image containing a target occlusion object;

fusing the non-occlusion face sample image and the target occlusion object sample image to obtain a fused sample image to be optimized;

inputting the fusion sample image to be optimized into a generator network to be trained to obtain a target fusion sample image;

inputting the target fusion sample image and the non-occlusion face sample image into a pre-trained identity coincidence network to obtain a feature map for determining whether the identities of faces in the target fusion sample image and the non-occlusion face sample image are consistent;

inputting the target fusion sample image and the occlusion face sample image containing the target occlusion object into a to-be-trained discriminator network to obtain the probability that the target fusion sample image is a real image and the probability that the occlusion face sample image containing the target occlusion object is a real image;

calculating a feature loss value of the identity-consistent network based on feature graphs corresponding to the target fusion sample image and the non-occlusion face sample image;

Calculating a loss value of the discriminator network based on the probability that the target fusion sample image is a real image and the probability that the sample image of the face which contains the target shielding object is a real image;

calculating a loss value of the generator network based on the difference between the target fusion sample image and the fusion sample image to be optimized;

adjusting network parameters of the generator network and the arbiter network based on the total loss value, and continuing training until the generator network and the arbiter network converge; wherein the total loss value is determined based on the loss value of the generator network, the loss value of the arbiter network, and the characteristic loss value of the identity matching network.

Optionally, the calculating, based on the feature graphs corresponding to the target fusion sample image and the non-occlusion face sample image, a feature loss value of the identity-consistent network includes:

for each preset channel, calculating the distance between the target fusion sample image and the feature image corresponding to the preset channel of the non-occlusion face sample image;

calculating an average value of distances corresponding to all preset channels to serve as a characteristic loss value of the identity-consistent network;

The calculating a loss value of the generator network based on a difference between the target fusion sample image and the fusion sample image to be optimized, comprising:

and calculating a loss value of the generator network based on the distance between the target fusion sample image and the fusion sample image to be optimized and/or the difference value of the total variation of the target fusion sample image and the fusion sample image to be optimized.

Optionally, the total loss value is:

L＝L _gan +γL _re +μL _ip

wherein L represents the total loss value, L _gan Representing the loss value, L, of the arbiter network _re Representing the loss value of the generator network, L _ip And representing the characteristic loss value of the identity-consistent network, wherein gamma represents a first preset coefficient, and mu represents a second preset coefficient.

In order to achieve the above object, an embodiment of the present application discloses an image generating apparatus, including:

the image acquisition module is used for acquiring the target non-occlusion face image to be fused and the corresponding target occlusion object image;

the image fusion module is used for fusing the target non-occlusion face image and the target occlusion object image to obtain a fusion image to be optimized;

the image generation module is used for inputting the fusion image to be optimized into a pre-trained generator network to obtain a target fusion image output by the generator network; the generator network belongs to a generation type reactance network which is obtained by training based on a discriminator network and an identity consistency network, wherein the discriminator network is used for restricting the detail information of the fused image, and the identity consistency network is used for restricting the identity consistency.

Optionally, the image acquisition module includes:

the first image acquisition sub-module is used for acquiring target non-occlusion face images to be fused and a plurality of occlusion face images containing target occlusion objects;

the image matching sub-module is used for determining an occlusion face image matched with the target non-occlusion face image from the multiple occlusion face images based on the face gesture and/or the face attribute, and taking the occlusion face image as a target occlusion face image;

and the second image acquisition sub-module is used for acquiring the target occlusion object image in the target occlusion face image.

Optionally, the second image acquisition sub-module includes:

the image segmentation unit is used for extracting an image of the area occupied by the target occlusion object from the target occlusion face image based on an image segmentation algorithm, and taking the image as an occlusion object image to be processed;

and the blurring processing unit is used for carrying out Gaussian blurring processing on the to-be-processed shielding object image to obtain a target shielding object image.

Optionally, the image fusion module includes:

The face key point acquisition module sub-module is used for acquiring the face key points in the target non-occlusion face image and the face key points in the target occlusion object image;

an affine transformation parameter calculation sub-module, configured to calculate affine transformation parameters between the face key points in the target non-occlusion face image and the face key points in the target occlusion object image;

and the image fusion sub-module is used for fusing the target non-occlusion face image and the target occlusion object image based on the affine transformation parameters to obtain a fusion image to be optimized.

Optionally, the image fusion sub-module is specifically configured to fuse the target non-occlusion face image and the target occlusion object image according to a preset formula based on the affine transformation parameters, so as to obtain a fusion image to be optimized;

wherein, the preset formula is:

A _mix representing the fusion image to be optimized, T representing the affine transformation parameters,alpha represents a preset fusion coefficient, G [ I ] _c ]Representing the target occlusion image, I _o And representing the occlusion face image matched with the target non-occlusion face image, wherein I represents the target non-occlusion face image.

Optionally, the apparatus further includes:

the sample image acquisition module is used for acquiring a non-occlusion face sample image, a target occlusion object sample image and an occlusion face sample image containing a target occlusion object;

the to-be-optimized fusion sample image acquisition module is used for fusing the non-occlusion face sample image and the target occlusion object sample image to obtain a to-be-optimized fusion sample image;

the target fusion sample image acquisition module is used for inputting the fusion sample image to be optimized into a generator network to be trained to obtain a target fusion sample image;

the feature map acquisition module is used for inputting the target fusion sample image and the non-occlusion face sample image into a pre-trained identity coincidence network to obtain a feature map for determining whether the identities of faces in the target fusion sample image and the non-occlusion face sample image are consistent;

the probability acquisition module is used for inputting the target fusion sample image and the shielding face sample image containing the target shielding object into a to-be-trained discriminator network to obtain the probability that the target fusion sample image is a real image and the probability that the shielding face sample image containing the target shielding object is a real image;

The first loss value calculation module is used for calculating the characteristic loss value of the identity-consistent network based on the characteristic graphs corresponding to the target fusion sample image and the non-occlusion face sample image;

the second loss value calculation module is used for calculating the loss value of the discriminator network based on the probability that the target fusion sample image is a real image and the probability that the face sample image containing the target shielding object is a real image;

a third loss value calculation module, configured to calculate a loss value of the generator network based on a difference between the target fusion sample image and the fusion sample image to be optimized;

the training module is used for adjusting network parameters of the generator network and the discriminator network based on the total loss value, and continuing training until the generator network and the discriminator network converge; wherein the total loss value is determined based on the loss value of the generator network, the loss value of the arbiter network, and the characteristic loss value of the identity matching network.

Optionally, the first loss value calculating module is specifically configured to calculate, for each preset channel, a distance between the target fusion sample image and a feature map corresponding to the preset channel, where the feature map corresponds to the non-occlusion face sample image; calculating an average value of distances corresponding to all preset channels to serve as a characteristic loss value of the identity-consistent network;

And the third loss value calculation module is specifically configured to calculate a loss value of the generator network based on a distance between the target fusion sample image and the fusion sample image to be optimized and/or a difference value of total variation between the target fusion sample image and the fusion sample image to be optimized.

Optionally, the total loss value is:

L＝L _gan +γL _re +μL _ip

In order to achieve the above object, the embodiments of the present application further disclose an electronic device, where the electronic device includes a processor, a communication interface, a memory, and a communication bus, where the processor, the communication interface, and the memory complete communication with each other through the communication bus;

the memory is used for storing a computer program;

the processor is configured to implement the image generating method according to the first aspect when executing the program stored in the memory.

Embodiments of the present application also provide a computer readable storage medium having a computer program stored therein, which when executed by a processor implements the image generation method according to the first aspect described above.

Embodiments of the present application also provide a computer program product comprising instructions which, when run on a computer, cause the computer to perform any of the above-described image generation methods.

The beneficial effects of the embodiment of the application are that:

the image generation method provided by the embodiment of the application can acquire the target non-occlusion face image to be fused and the corresponding target occlusion object image; fusing the target non-occlusion face image and the target occlusion object image to obtain a fused image to be optimized; inputting the fusion image to be optimized into a pre-trained generator network to obtain a target fusion image output by the generator network; the generator network belongs to a generating type countermeasure network which is obtained by training based on a discriminator network and an identity coincidence network, wherein the discriminator network is used for restraining the detail information of the fused image, and the identity coincidence network is used for restraining identity coincidence.

Because the generator network belongs to a generating type countermeasure network which is obtained by training based on the identifier network and the identity coincidence network, and the identifier network is used for restraining the detail information of the fused image, and the identity coincidence network is used for restraining the identity coincidence, the identity coincidence of the face in the image can be ensured based on the image obtained by the generator network, and the reality is higher. That is, the target fusion image contains the target shielding object, is consistent with the identity of the face in the target non-shielding face image, and has higher reality. That is, the image generation method provided by the embodiment of the application can generate an effective occlusion face image.

Of course, not all of the above-described advantages need be achieved simultaneously in practicing any one of the products or methods of the present application.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present application, and that other embodiments may be obtained according to these drawings without inventive effort to a person skilled in the art.

Fig. 1 is a flowchart of an image generating method according to an embodiment of the present application;

FIG. 2 is a flowchart of another image generation method according to an embodiment of the present application;

FIG. 3 is a flowchart of another image generation method according to an embodiment of the present application;

FIG. 4 is a flowchart of another image generation method according to an embodiment of the present application;

FIG. 5 is a flowchart of a training generator network according to an embodiment of the present application;

fig. 6 is a schematic flow chart of generating a target fusion image according to an embodiment of the present application;

FIG. 7 is a schematic diagram of a framework of a generated countermeasure network according to an embodiment of the present disclosure;

Fig. 8 is a block diagram of an image generating apparatus according to an embodiment of the present application;

fig. 9 is a block diagram of an electronic device according to an embodiment of the present application.

Detailed Description

The following description of the embodiments of the present application will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are only some, but not all, of the embodiments of the present application. Based on the embodiments herein, a person of ordinary skill in the art would be able to obtain all other embodiments based on the disclosure herein, which are within the scope of the disclosure herein.

In order to improve the recognition accuracy of the face recognition network on the face image which is blocked, a sample image of the face, in which the face area of the face is blocked, can be obtained, and the face recognition network is trained based on the obtained sample image, so that the trained face recognition network can effectively recognize the face image which is blocked. However, in the related art, a corresponding method is not provided to generate an effective occlusion face image.

In order to solve the above problems, the embodiment of the application provides an image generation method, which can be applied to an electronic device, wherein the electronic device is used for generating an occlusion face image, and the generated occlusion face image can be used for training a face recognition network.

Referring to fig. 1, fig. 1 is a flowchart of an image generating method according to an embodiment of the present application, where the method may include the following steps:

s101: and acquiring the target non-occlusion face image to be fused and a corresponding target occlusion object image.

S102: and fusing the target non-occlusion face image and the target occlusion object image to obtain a fused image to be optimized.

S103: and inputting the fusion image to be optimized into a pre-trained generator network to obtain a target fusion image output by the generator network.

The generator network belongs to a generating type countermeasure network which is obtained by training based on a discriminator network and an identity coincidence network, wherein the discriminator network is used for restraining the detail information of the fused image, and the identity coincidence network is used for restraining identity coincidence.

In one embodiment, for step S101, the target non-occlusion face image is a face image that needs to be fused currently, and the face image does not include an occlusion object that occludes the face. The target occlusion image, i.e. the image which is currently required to be fused with the target non-occlusion face image, only contains the target occlusion. The target barrier may be glasses, mask or other item.

In one embodiment, the target occlusion image may be: and acquiring from the occlusion face image matched with the target non-occlusion face image.

In the embodiment of the application, an occlusion face image (which may be referred to as a target occlusion face image) that includes the target occlusion object and is matched with the target non-occlusion face image may be obtained, and then, the target occlusion object image may be obtained from the target occlusion face image.

The target occlusion face image is matched with the target non-occlusion face image, that is, the matching degree of the target occlusion object image obtained from the target occlusion face image and the target non-occlusion face image is also higher, and accordingly, the target occlusion object image and the target non-occlusion face image are fused, so that the authenticity of the fused image to be optimized can be improved, and further, the authenticity of the generated target fused image can be improved.

Accordingly, in one embodiment, referring to fig. 2, the step S101 may include the following steps based on fig. 1:

s1011: and acquiring the target non-occlusion face image to be fused and a plurality of occlusion face images containing target occlusion objects.

S1012: and determining an occlusion face image matched with the target non-occlusion face image from a plurality of occlusion face images based on the face pose and/or the face attribute, and taking the occlusion face image as the target occlusion face image.

S1013: and acquiring a target occlusion object image in the target occlusion face image.

In the embodiment of the application, the face gesture can be represented by the head angle of the face; the face attribute may include information of the age of the face, the sex of the face, and the like.

Based on the face pose and/or the face attribute, a degree of matching of the occlusion face image with the target non-occlusion face image can be determined, and the degree of matching can represent the degree of matching of the target occlusion object in the occlusion face image with the target non-occlusion face image.

Based on the above processing, the image of the area occupied by the target occlusion object is obtained from the target occlusion face image to perform image fusion, and the target occlusion face image is an image matched with the target non-occlusion face image, that is, the determined matching degree of the target occlusion object image and the target non-occlusion face image is also higher, and accordingly, the two images are fused, so that the authenticity of the fused image to be optimized can be improved, and further, the authenticity of the generated target fused image can be improved.

In one embodiment, image matching may be performed in the order of head angle of the face, age of the face, sex of the face.

For example, based on the head angle of the face, the target non-occlusion face image is compared with each occlusion face image (which may be referred to as a first occlusion face image) to determine the first occlusion face image with the highest matching degree as the target occlusion face image.

If there are a plurality of the second occlusion face images with the highest matching degree of the head angle (which may be referred to as second occlusion face images), the target non-occlusion face image may be compared with each second occlusion face image based on the age of the face, so as to determine the second occlusion face image with the highest matching degree as the target occlusion face image.

If there are a plurality of occlusion face images with the highest degree of age matching (may be referred to as third occlusion face images), the target non-occlusion face image may be compared with each third occlusion face image based on the sex of the face, and the third occlusion face image with the highest degree of matching may be determined as the target occlusion face image.

In one embodiment, the target non-occluded face image and the plurality of occluded face images may be homologous images, for example, the target non-occluded face image and the plurality of occluded face images may be face images acquired by a monitoring device at a street intersection; or the target non-occlusion face image and the multiple occlusion face images can be face images acquired by monitoring equipment at an airport security inspection entrance.

The target non-occlusion face image and the multiple occlusion face images are homologous images, so that the matching degree of the target non-occlusion face image and the target occlusion face image can be further improved, the authenticity of the fusion image to be optimized can be improved, and the authenticity of the generated target fusion image can be improved.

In one embodiment, referring to fig. 3, the step S1013 may include the following steps on the basis of fig. 2:

s10131: and extracting an image of the area occupied by the target shielding object from the target shielding face image based on an image segmentation algorithm, and taking the image as an image of the shielding object to be processed.

S10132: and carrying out Gaussian blur processing on the to-be-processed occlusion object image to obtain a target occlusion object image.

In the embodiment of the application, the image of the area occupied by the target occlusion object (namely, the image of the occlusion object to be processed) can be extracted from the image of the target occlusion face based on the image segmentation algorithm, and further, the Gaussian blur processing can be performed on the image of the occlusion object to be processed, and the processed image of the occlusion object is the image of the target occlusion object.

For example, the to-be-processed occlusion image may be extracted from the target occlusion face image based on a full convolution network (Fully Convolutional Networks, FCN); or, the image of the to-be-processed shielding object can be extracted from the image of the target shielding face based on the Roberts operator.

Based on the processing, gaussian blur processing is performed on the to-be-processed occlusion object image, so that image fusion is performed on the to-be-processed occlusion object image, the image edge of the to-be-optimized fusion image is smoother, and the authenticity of the generated target fusion image can be improved.

In one embodiment, referring to fig. 4, the step S102 may include the following steps based on fig. 1:

s1021: and acquiring the face key points in the target non-occlusion face image and the face key points in the target occlusion object image.

S1022: and calculating affine transformation parameters between the face key points in the target non-occlusion face image and the face key points in the target occlusion object image.

S1023: and fusing the target non-occlusion face image and the target occlusion object image based on affine transformation parameters to obtain a fused image to be optimized.

In the embodiment of the application, after the target occlusion face image is determined, the face key points in the target occlusion face image and the target occlusion object image can be respectively determined based on a face key point detection algorithm. For example, the determined key points of the face may be 3 points, or may be 5 points, or may be 21 points, but not limited thereto.

The affine transformation parameters may represent transformation relations between face key points in the target non-occlusion face image and face key points in the target occlusion object image.

For example, if the coordinates of the face key points in the target non-occlusion face image are Tp and the coordinates of the face key points in the target occlusion image are Tq, tp=t×tq can be obtained, and T (i.e., affine transformation parameters) can be obtained by using an SVD (Singular Value Decomposition ) algorithm.

In one embodiment, an alpha-blending framework may be employed for image fusion. For example, the target non-occlusion face image and the target occlusion object image can be fused according to a preset formula based on affine transformation parameters to obtain a fused image to be optimized.

Wherein, the preset formula is:

A _mix representing the fusion image to be optimized, T representing affine transformation parameters, alpha representing preset fusion coefficients, G [ I ] _c ]Representing an image of a target occlusion object, I _o Representing a target occlusion face image, and I representing a target non-occlusion face image. α may be 0.7 or 0.8, but is not limited thereto.

In one embodiment, referring to fig. 5, fig. 5 is a flowchart of a training generator network provided in an embodiment of the present application, and the method may include the following steps:

S501: and obtaining a non-occlusion face sample image, a target occlusion object sample image and an occlusion face sample image containing a target occlusion object.

S502: and fusing the non-occlusion face sample image and the target occlusion object sample image to obtain a fused sample image to be optimized.

S503: and inputting the fusion sample image to be optimized into a generator network to be trained to obtain a target fusion sample image.

S504: and inputting the target fusion sample image and the non-occlusion face sample image into a pre-trained identity coincidence network to obtain a feature map for determining whether the identities of faces in the target fusion sample image and the non-occlusion face sample image are consistent.

S505: and inputting the target fusion sample image and the shielding face sample image containing the target shielding object into a to-be-trained discriminator network to obtain the probability that the target fusion sample image is a real image and the probability that the shielding face sample image containing the target shielding object is a real image.

S506: and calculating the characteristic loss value of the identity-consistent network based on the characteristic graphs corresponding to the target fusion sample image and the non-occlusion face sample image.

S507: and calculating a loss value of the discriminator network based on the probability that the target fusion sample image is a real image and the probability that the sample image of the face of the person shielded by the target shielding object is a real image.

S508: and calculating a loss value of the generator network based on the difference between the target fusion sample image and the fusion sample image to be optimized.

S509: and adjusting network parameters of the generator network and the discriminator network based on the total loss value, and continuing training until the generator network and the discriminator network converge.

In one embodiment, when the network parameters of the generator network are adjusted, that is, the trained identity-consistent network can be obtained, and further, in the process of adjusting the network parameters of the generator network, the network parameters of the identity-consistent network are kept unchanged, so that when training is completed, identity consistency can be ensured based on images obtained by the generator network.

For example, the identity-consistent network to be trained may be trained in advance based on a preset sample image. The first preset sample image and the second preset sample image can be input into an identity-consistent network to be trained, the identity-consistent network can obtain feature images of the first preset sample image and the second preset sample image, the probability of identity consistency of faces in the first preset sample image and the second preset sample image is obtained based on the feature images, further, the difference value between the probability and the preset label is calculated and used as a loss value of the identity-consistent network, and network parameters of the identity-consistent network are adjusted according to the loss value until convergence. The preset label indicates whether the identities of the faces in the first preset sample image and the second preset sample image are consistent. And the loss value used for adjusting the network parameters of the identity-consistent network is different from the characteristic loss value of the identity-consistent network.

Wherein the total loss value is determined based on the loss value of the generator network, the loss value of the arbiter network, and the characteristic loss value of the identity matching network. The generator network, the arbiter network, and the identity consensus network form a generative antagonism network (Generative adversarial networks, GANs).

The generator network may be an Auto Encoder (AE) network, which may perform characterization learning on input data with the input data as a learning target, resulting in output data. For example, an Encoder (Encoder) that converts input data into encoded data and a Decoder (Decoder) that converts encoded data into output data may be included.

The output data of the identifier network is a label corresponding to the input data, and represents the probability that the input data is a real image, that is, the identifier network is used for determining whether the input image is a real image (i.e., a real image or a fused image). The arbiter network may be a deep learning network, for example, the arbiter network may include a convolution layer for feature extraction of an input image and an activation function layer that may generate a corresponding label based on a result of feature extraction.

The output data of the identity coincidence network is a label corresponding to the target fusion sample image and the non-occlusion face sample image, and the label represents the probability of identity coincidence of faces in the target fusion sample image and the non-occlusion face sample image. That is, the identity matching network is used to determine whether the identities of faces in the two input images match. The identity matching network may be a deep learning network, for example, the identity matching network may include a convolution layer and an activation function layer, where the convolution layer is configured to perform feature extraction on two input images respectively to obtain respective corresponding image features, and further may compare the image features of the two images, and the activation function layer may obtain the probability of identity matching of the faces in the two images based on the comparison result of the image features.

In one embodiment, the step S506 may include the following steps:

aiming at each preset channel, calculating the distance between the target fusion sample image and the feature image corresponding to the preset channel of the non-occlusion face sample image; and calculating an average value of the distances corresponding to all the preset channels to be used as a characteristic loss value of the identity-consistent network.

In the embodiment of the application, the identity matching network may perform convolution processing on the input image to obtain a feature map corresponding to each preset channel of the image. The number of the preset channels is determined based on the number of convolution kernels of the convolution layers in the identity-based network.

That is, based on the identity-based network, feature graphs corresponding to the input two images in each preset channel can be obtained respectively, and then, for each preset channel, the distance between the feature graphs corresponding to the preset channels of the two images can be calculated, and then, the average value of the distances corresponding to the preset channels can be calculated, so that the feature loss value of the identity-based network can be obtained.

In addition, the network parameters can be adjusted by combining the average loss value of the training process of the preset times, and the corresponding characteristic loss value of the identity-consistent network can be used for expression of formula (1).

L _ip Representing the characteristic loss value of an identity-consistent network, k representing a preset number of times, delta [ P (I) _gen )]Representing the feature map corresponding to the target fusion sample image in one channel, delta [ P (I) ]]Representing the characteristic diagram corresponding to the non-occluded face sample image in the channel,and (3) representing calculating the average value of the distances between the feature maps corresponding to the preset channels.

The step S508 may include the steps of:

In one implementation, the distance between the target fusion sample image and the fusion sample image to be optimized may be calculated as a loss value for the generator network.

For example, the distance between the target fusion sample image and the fusion sample image to be optimized may be calculated using equation (2).

L ₁ ＝‖I _mix -I _gen ‖ ₁ (2)

L ₁ Representing the distance between the target fusion sample image and the fusion sample image to be optimized, I _mix Representing the fused sample image to be optimized, I _gen Representing the target fusion sample image.

In one implementation, the difference in total variation of the target fusion sample image and the fusion sample image to be optimized may be calculated as a loss value for the generator network.

For example, the difference of the total variation of the target fusion sample image and the fusion sample image to be optimized may be calculated based on formula (3).

L _TV Representing the difference value of the total variation of the target fusion sample image and the fusion sample image to be optimized, I _mix Representing the fused sample image to be optimized, I _gen Representing the target fusion sample image.

In one implementation, a weighted sum of the distance between the target fusion sample image and the fusion sample image to be optimized and the difference of the total variation of the target fusion sample image and the fusion sample image to be optimized can be calculated to obtain a loss value of the generator network.

For example, the loss value of the generator network may be used for the expression of equation (4).

L _re ＝L ₁ +βL _TV (4)

L _re Representing the loss value of the generator network, and beta represents a third preset coefficient.

In one embodiment, the network parameters may also be adjusted in combination with the average loss value for the training process for the preset number of times, and the loss value of the discriminator network may be used for the expression of equation (5).

L _gan Represents the loss value of the arbiter network, k represents the preset number of times, D (x _i ) Representing the probability that the target fusion sample image is a real image; d (G (z) _i ) A probability that an occlusion face sample image containing a target occlusion is a true image.

In one embodiment, the total loss value is:

L＝L _gan +γL _re +μL _ip

wherein L represents the total loss value, L _gan Representing loss value of the arbiter network, L _re Representing the loss value of the generator network, L _ip And the characteristic loss value of the identity-consistent network is represented, gamma represents a first preset coefficient, and mu represents a second preset coefficient. For example, γ may be 0.1, or may be 0.2; mu may be 0.1 or 0.2, but is not limited thereto.

That is, the network parameters of the generator network and the arbiter network may be adjusted based on the total loss value until convergence is reached.

Referring to fig. 6, fig. 6 is a schematic flow chart of generating a target fusion image according to an embodiment of the present application.

The original image is namely the non-occlusion face image to be fused, and for each non-occlusion face image to be fused (i.e. the target non-occlusion face image in the embodiment of the present application), an occlusion face image (i.e. the target occlusion face image) matched with the target non-occlusion face image can be determined from the occlusion face images containing the target occlusion object (i.e. the ink mirror), and an image corresponding to the target occlusion object in the target occlusion face image (i.e. the target occlusion object image) can be obtained. The target occlusion image, i.e. the image in fig. 6 only comprising a sunglasses.

And then, fusing each target non-occlusion face image with a corresponding target occlusion object image to obtain a fused image (namely, a fused image to be optimized), and inputting the fused image to be optimized into a generator network in the generation type countermeasure network to generate an image (namely, obtaining a target fused image).

Referring to fig. 7, fig. 7 is a schematic diagram of a framework of a generated countermeasure network according to an embodiment of the present application.

In fig. 7, G denotes a generator network, D denotes a discriminator network, and P denotes an identity matching network. I represents a non-occluded face sample image. I _mix Representing the fused sample image to be optimized. I _gen Representing the target fusion sample image. I _real Representation of inclusion orderThe target occlusion object is a sunglasses in fig. 7, which is an occlusion face sample image of the target occlusion object. L (L) _re Representing the loss value of the generator network, L _ip Representing a characteristic loss value, L, of an identity-consistent network _gan Representing the loss value of the arbiter network.

Based on the same inventive concept, the embodiment of the present application further provides an image generating apparatus, referring to fig. 8, fig. 8 is a structural diagram of the image generating apparatus provided in the embodiment of the present application, where the apparatus may include:

the image acquisition module 801 is configured to acquire a target non-occlusion face image to be fused and a corresponding target occlusion object image;

the image fusion module 802 is configured to fuse the target non-occlusion face image and the target occlusion object image to obtain a fusion image to be optimized;

the image generation module 803 is configured to input the fusion image to be optimized to a pre-trained generator network, so as to obtain a target fusion image output by the generator network; the generator network belongs to a generation type reactance network which is obtained by training based on a discriminator network and an identity consistency network, wherein the discriminator network is used for restricting the detail information of the fused image, and the identity consistency network is used for restricting the identity consistency.

Optionally, the image acquisition module 801 includes:

Optionally, the second image acquisition sub-module includes:

Optionally, the image fusion module 802 includes:

wherein, the preset formula is:

I _mix representing the fusion image to be optimized, T representing the affine transformation parameters, alpha representing a preset fusion coefficient, and G [ I ] _c ]Representing the target occlusion image, I _o And representing the occlusion face image matched with the target non-occlusion face image, wherein I represents the target non-occlusion face image.

Optionally, the apparatus further includes:

Optionally, the total loss value is:

L＝L _gan +γL _re +μL _ip

The embodiment of the present application further provides an electronic device, as shown in fig. 9, including a processor 901, a communication interface 902, a memory 903, and a communication bus 904, where the processor 901, the communication interface 902, and the memory 903 perform communication with each other through the communication bus 904,

a memory 903 for storing a computer program;

the processor 901 is configured to execute a program stored in the memory 903, and implement the following steps:

The communication bus mentioned above for the electronic devices may be a peripheral component interconnect standard (Peripheral Component Interconnect, PCI) bus or an extended industry standard architecture (Extended Industry Standard Architecture, EISA) bus, etc. The communication bus may be classified as an address bus, a data bus, a control bus, or the like. For ease of illustration, the figures are shown with only one bold line, but not with only one bus or one type of bus.

The communication interface is used for communication between the electronic device and other devices.

The Memory may include random access Memory (Random Access Memory, RAM) or may include Non-Volatile Memory (NVM), such as at least one disk Memory. Optionally, the memory may also be at least one memory device located remotely from the aforementioned processor.

The processor may be a general-purpose processor, including a central processing unit (Central Processing Unit, CPU), a network processor (Network Processor, NP), etc.; but also digital signal processors (Digital Signal Processor, DSP), application specific integrated circuits (Application Specific Integrated Circuit, ASIC), field programmable gate arrays (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components.

In yet another embodiment provided herein, there is also provided a computer-readable storage medium having stored therein a computer program which, when executed by a processor, implements the steps of any of the image generation methods described above.

In yet another embodiment provided herein, there is also provided a computer program product containing instructions that, when run on a computer, cause the computer to perform any of the image generation methods of the above embodiments.

In the above embodiments, it may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded and executed on a computer, produces a flow or function in accordance with embodiments of the present application, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable apparatus. The computer instructions may be stored in or transmitted from one computer-readable storage medium to another, for example, by wired (e.g., coaxial cable, optical fiber, digital Subscriber Line (DSL)), or wireless (e.g., infrared, wireless, microwave, etc.). The computer readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server, data center, etc. that contains an integration of one or more available media. The usable medium may be a magnetic medium (e.g., floppy Disk, hard Disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., solid State Disk (SSD)), etc.

It is noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

In this specification, each embodiment is described in a related manner, and identical and similar parts of each embodiment are all referred to each other, and each embodiment mainly describes differences from other embodiments. In particular, for the apparatus, electronic device, computer readable storage medium, and computer program product embodiments, the description is relatively simple, as relevant to the method embodiments being referred to in the section of the description of the method embodiments.

The foregoing description is only of the preferred embodiments of the present application and is not intended to limit the scope of the present application. Any modifications, equivalent substitutions, improvements, etc. that are within the spirit and principles of the present application are intended to be included within the scope of the present application.

Claims

1. An image generation method, the method comprising:

inputting the fusion image to be optimized into a pre-trained generator network to obtain a target fusion image output by the generator network; the generator network belongs to a generation type contrast network which is obtained by training based on a discriminator network and an identity consistency network, wherein the discriminator network is used for restricting the detail information of the fused image, and the identity consistency network is used for restricting the identity consistency;

the target shelter image is: acquiring an occlusion face image matched with the target non-occlusion face image;

the fusing of the target non-occlusion face image and the target occlusion object image to obtain a fused image to be optimized comprises the following steps:

wherein, the preset formula is:

A _mix representing the fusion image to be optimized, T representing the affine transformation parameters, alpha representing a preset fusion coefficient, and G [ I ] _c ]Representing the target occlusion image, I _o Representing the occlusion face image matched with the target non-occlusion face image, I representing the target non-occlusion face image,representing other areas than the area mapped by the target occlusion image determined based on the affine transformation parameters.

2. The method according to claim 1, wherein the acquiring the target non-occlusion face image and the corresponding target occlusion image to be fused comprises:

3. The method of claim 2, wherein the acquiring a target occlusion image of the target occlusion face image comprises:

4. The method according to claim 1, wherein the generator network is trained to obtain by:

5. The method of claim 4, wherein the step of determining the position of the first electrode is performed,

the calculating the feature loss value of the identity-consistent network based on the feature graphs corresponding to the target fusion sample image and the non-occlusion face sample image respectively comprises the following steps:

6. The method of claim 4, wherein the total loss value is:

L＝L _gan +γL _re +μL _ip

7. An image generation apparatus, the apparatus comprising:

the image generation module is used for inputting the fusion image to be optimized into a pre-trained generator network to obtain a target fusion image output by the generator network; the generator network belongs to a generation type contrast network which is obtained by training based on a discriminator network and an identity consistency network, wherein the discriminator network is used for restricting the detail information of the fused image, and the identity consistency network is used for restricting the identity consistency;

the image fusion module is specifically used for acquiring the face key points in the target non-occlusion face image and the face key points in the target occlusion object image;

wherein, the preset formula is: