CN112733946B

CN112733946B - Training sample generation method and device, electronic equipment and storage medium

Info

Publication number: CN112733946B
Application number: CN202110050175.3A
Authority: CN
Inventors: 杨博文; 尹榛菲; 邵婧
Original assignee: Beijing Sensetime Technology Development Co Ltd
Current assignee: Beijing Sensetime Technology Development Co Ltd
Priority date: 2021-01-14
Filing date: 2021-01-14
Publication date: 2023-09-19
Anticipated expiration: 2041-01-14
Also published as: CN112733946A

Abstract

The disclosure provides a training sample generation method, a training sample generation device, an electronic device and a storage medium, wherein the method comprises the following steps: acquiring a source training data set and a target face image generated by a target scene; generating a synthetic face image corresponding to the target face image based on the target face image and a source face image corresponding to the target face image in the source training dataset; and expanding the source training data set based on the synthesized face image to obtain an expanded training data set. According to the method, the target face image can be expanded in the sample space by utilizing the fused synthesized face image, and the expanded training data set can cover more training data in the target scene to a certain extent, so that the detection accuracy of the trained neural network in the target scene (for example, a brand new environment) is higher.

Description

Training sample generation method and device, electronic equipment and storage medium

Technical Field

The disclosure relates to the technical field of computer vision, in particular to a training sample generation method, a training sample generation device, electronic equipment and a storage medium.

Background

Face recognition is taken as an important research direction in the technical field of computer vision, and is widely applied to various application scenes such as mobile phone unlocking, access control and the like. However, due to the characteristic that the face picture is easy to obtain, the face recognition system is easy to be attacked by counterfeit modes such as printing, video replay and the like, and potential safety hazards are generated, so that living detection capable of distinguishing the true and false faces is an indispensable key link in the face recognition system.

The current living body detection method can automatically identify whether the input face picture belongs to a real person or a dummy person by utilizing a living body detection neural network. Compared with other face tasks (such as face detection), the living body detection is easily influenced by various attack means and attack materials, so that the trained living body detection model cannot be well adapted to a new attack environment.

Here, considering that the training data generated in the new attack environment is generally relatively small, if the new training data is directly added into the existing data set, the detection accuracy of the neural network in the new attack environment cannot be effectively improved.

Disclosure of Invention

The embodiment of the disclosure at least provides a training sample generation method, a training sample generation device, electronic equipment and a storage medium.

In a first aspect, an embodiment of the present disclosure provides a method for generating training data, including:

acquiring a source training data set and a target face image generated by a target scene;

generating a synthetic face image corresponding to the target face image based on the target face image and a source face image corresponding to the target face image in the source training dataset;

and expanding the source training data set based on the synthesized face image to obtain an expanded training data set.

By adopting the training data generation method, the target face image can be fused based on the source face image in the source training data set, so that the fused synthetic face image not only contains the image characteristics of the source face image, but also contains the image characteristics of the target face image. Under the condition that the target face images generated by the target scene are relatively few, the synthesized face images obtained by fusion can be used for expanding the target face images in a sample space, and the training data set obtained by expansion can cover more training data in the target scene to a certain extent, so that the detection accuracy of the trained neural network in the target scene (for example, a brand new environment) is relatively high.

In one possible implementation manner, the generating a composite face image corresponding to the target face image based on the target face image and a source face image corresponding to the target face image in the source training dataset includes:

extracting the source face image paired with the target face image from the source training dataset;

inputting the target face image and a source face image paired with the target face image into a trained image generation neural network for feature fusion processing to obtain the synthesized face image, wherein the synthesized face image is fused with the content features of the source face image and the style features of the target face image.

In the embodiment of the disclosure, the fusion processing between the paired target face image and the source face image can be performed by utilizing the image generation neural network, so that the operation is simple, time is saved, and the efficiency is high.

In one possible implementation, the image-generating neural network is trained as follows:

acquiring a paired source face image sample and a paired target face image sample;

respectively extracting the characteristics of the paired source face image sample and target face image sample to obtain content characteristic information of the source face image sample and style characteristic information of the target face image sample;

And performing at least one round of training on the image generation neural network to be trained based on the content characteristic information of the source face image sample and the style characteristic information of the target face image sample.

Here, in order to enhance the significance of the features of the target face image in the target scene in applications such as subsequent living body detection, here, style feature information which is important for influencing living body detection can be extracted for the target face image, content feature information can be extracted for the source face image sample, and thus the trained image generation neural network can be better adapted to the subsequent detection network.

In a possible implementation manner, the training at least one round of the generating neural network based on the content feature information of the source face image sample and the style feature information of the target face image sample, includes:

aiming at the current training, taking the content characteristic information of the source face image sample and the style characteristic information of the target face image sample as input characteristics of the image generation neural network to be trained, and determining fusion characteristic information output by the image generation neural network to be trained;

When the first similarity between the fusion characteristic information and the content characteristic information is smaller than a first threshold value and the second similarity between the fusion characteristic information and the style characteristic information is smaller than a second threshold value, adjusting network parameters of the image generation neural network, and performing the next training;

and training is stopped until the first similarity between the fusion characteristic information and the content characteristic information is larger than or equal to a first threshold value and the second similarity between the fusion characteristic information and the style characteristic information is larger than or equal to a second threshold value.

Here, the aim is to improve the fusion image of the style characteristic significance on the premise of having the content characteristic, so that the first similarity between the fusion characteristic information and the content characteristic information and the second similarity between the fusion characteristic information and the style characteristic information can be utilized to carry out constraint of training conditions, and the trained network can meet the requirements of scenes.

In a possible implementation manner, the extracting the source face image paired with the target face image from the source training dataset includes:

determining the type of the living body label to which the target face image belongs;

And extracting a source face image with the same living body label category as the living body label category of the target face image from the source training data set, and taking the source face image as a source face image matched with the target face image.

In one possible implementation, after the obtaining the extended training data set, the method further comprises:

and performing at least one round of training on the living body detection neural network to be trained based on the extended training data set to obtain a trained living body detection neural network.

Here, the training data set can be expanded to well consider the target scene, so that the trained living body detection neural network has better compatibility with the target scene and higher detection accuracy.

In a possible implementation manner, the performing at least one training round on the living body detection neural network to be trained based on the extended training data set to obtain a trained living body detection neural network includes:

respectively obtaining characteristic information of each face image in the extended training data set by utilizing a living body detection neural network to be trained;

determining a target loss function value corresponding to the living body detection neural network to be trained based on the obtained characteristic information;

And under the condition that the target loss function value does not meet the preset condition, performing next training on the living body detection neural network to be trained until the target loss function value meets the preset condition.

In one possible implementation manner, the feature information of each face image in the extended training dataset includes:

the first characteristic information of the source face image, the second characteristic information of the target face image and the third characteristic information of the synthesized face image in the extended training data set.

In a possible implementation manner, the determining, based on the obtained feature information, the objective loss function value corresponding to the living body detection neural network to be trained includes:

determining a first objective loss function value for measuring the difference of training data in the same living body label category and a second objective loss function value for measuring the feature distribution condition of face images of different sources based on the first feature information, the second feature information and the third feature information;

and determining the target loss function value corresponding to the living body detection neural network to be trained based on the first target loss function value and the second target loss function value.

According to the embodiment of the disclosure, the living body detection neural network is synchronously adjusted based on the first objective loss function value for measuring the difference of training data in the same living body label category and the second objective loss function value for measuring the feature distribution of face images from different sources, so that the trained network can enable samples in the same category to have more similar expression in the feature space, and the accuracy of classification results can be improved.

In one possible embodiment, the determining the first objective loss function value for measuring the difference of the training data in the same living label category based on the first feature information, the second feature information, and the third feature information includes:

selecting two first face images of the same living body label category from the training data set, and selecting two second face images of different living body label categories; the sources of the two first face images are different, and the sources of the two second face images are different;

determining a first image similarity between the two first face images based on the characteristic information of the two first face images; and determining a second image similarity between the two second face images based on the feature information of the two second face images;

And summing the first image similarity and the second image similarity to obtain the first target loss function value.

Here, the same category can be pulled up and different categories can be pulled away by calculating the image similarity between samples, so that the trained network can be better classified.

In one possible implementation manner, determining a second objective loss function value for measuring feature distribution conditions of face images from different sources based on the first feature information, the second feature information and the third feature information includes:

based on the first feature information, the second feature information and the third feature information, respectively determining first distribution feature information for representing feature distribution of each source face image in the extended training data set, second distribution feature information for representing feature distribution of each target face image in the extended training data set and third distribution feature information for representing feature distribution of each synthetic face image in the extended training data set;

determining a first feature distribution similarity between each target face image and each synthesized face image based on the similarity between the second distribution feature information and the third distribution feature information; splicing the second distribution characteristic information and the third distribution characteristic information to obtain spliced distribution characteristic information;

Determining second feature distribution similarity between each source face image and each synthesized face image based on the first distribution feature information and the similarity between the spliced distribution feature information;

and summing the first characteristic distribution similarity and the second characteristic distribution similarity to obtain the second objective loss function value.

Here, three kinds of distribution feature information may be processed based on the feature distribution level, so that the entire extended training data set is compared on the feature distribution, and thus living body classification may be performed better.

determining a target loss function value corresponding to a living body detection neural network to be trained by using first characteristic information of each source face image in the extended training data set determined by the living body detection neural network to be trained and first source characteristic information extracted from the source face image by using the trained source living body detection neural network;

the source living body detection neural network is obtained by training each source face image sample and a living body label type marked on each source face image sample.

Here, in order to enable the trained living body detection neural network to not reduce the detection accuracy of the scene corresponding to the existing training data set on the premise of considering the high accuracy of the target scene, the first source characteristic information extracted from the source face image by the trained source living body detection neural network may be utilized, and parameter adjustment may be performed on the network based on the source characteristic information and the similarity between the first characteristic information, so as to obtain the above objective.

In a second aspect, an embodiment of the present disclosure further provides a generating device of training data, including:

the acquisition module is used for acquiring a source training data set and a target face image generated by a target scene;

the generation module is used for generating a synthetic face image corresponding to the target face image based on the target face image and a source face image corresponding to the target face image in the source training data set;

and the expansion module is used for expanding the source training data set based on the synthesized face image to obtain an expanded training data set.

In a third aspect, an embodiment of the present disclosure further provides an electronic device, including: a processor, a memory and a bus, the memory storing machine-readable instructions executable by the processor, the processor and the memory in communication over the bus when the electronic device is running, the machine-readable instructions when executed by the processor performing the steps of the method of generating training data as described in any of the first aspect and its various embodiments.

In a fourth aspect, the disclosed embodiments also provide a computer-readable storage medium, on which a computer program is stored, which when executed by a processor performs the steps of the method of generating training data according to the first aspect and any of its various embodiments.

The description of the effect of the training data generating apparatus, the electronic device, and the computer-readable storage medium is referred to the description of the training data generating method, and is not repeated here.

The foregoing objects, features and advantages of the disclosure will be more readily apparent from the following detailed description of the preferred embodiments taken in conjunction with the accompanying drawings.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present disclosure, the drawings required for the embodiments are briefly described below, which are incorporated in and constitute a part of the specification, these drawings showing embodiments consistent with the present disclosure and together with the description serve to illustrate the technical solutions of the present disclosure. It is to be understood that the following drawings illustrate only certain embodiments of the present disclosure and are therefore not to be considered limiting of its scope, for the person of ordinary skill in the art may admit to other equally relevant drawings without inventive effort.

FIG. 1 shows a flowchart of a method for generating training data provided by an embodiment of the present disclosure;

fig. 2 shows an application schematic diagram of a training data generating method provided by an embodiment of the present disclosure;

FIG. 3 is a schematic diagram of a training data generating apparatus according to an embodiment of the present disclosure;

fig. 4 shows a schematic diagram of an electronic device provided by an embodiment of the disclosure.

Detailed Description

For the purposes of making the objects, technical solutions and advantages of the embodiments of the present disclosure more apparent, the technical solutions in the embodiments of the present disclosure will be clearly and completely described below with reference to the drawings in the embodiments of the present disclosure, and it is apparent that the described embodiments are only some embodiments of the present disclosure, but not all embodiments. The components of the embodiments of the present disclosure, which are generally described and illustrated in the figures herein, may be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the present disclosure provided in the accompanying drawings is not intended to limit the scope of the disclosure, as claimed, but is merely representative of selected embodiments of the disclosure. All other embodiments, which can be made by those skilled in the art based on the embodiments of this disclosure without making any inventive effort, are intended to be within the scope of this disclosure.

It should be noted that: like reference numerals and letters denote like items in the following figures, and thus once an item is defined in one figure, no further definition or explanation thereof is necessary in the following figures.

The term "and/or" is used herein to describe only one relationship, meaning that there may be three relationships, e.g., a and/or B, which may mean: a exists alone, A and B exist together, and B exists alone. In addition, the term "at least one" herein means any one of a plurality or any combination of at least two of a plurality, for example, including at least one of A, B, C, and may mean including any one or more elements selected from the group consisting of A, B and C.

According to research, the current living body detection method can automatically identify whether the input face picture belongs to a real person or a dummy person by utilizing a living body detection neural network. Compared with other face tasks (such as face detection), the living body detection is easily influenced by various attack means and attack materials, so that the trained living body detection model cannot be well adapted to a new attack environment.

Based on the above study, the present disclosure provides a method, an apparatus, an electronic device, and a storage medium for generating a training sample, which extend a training data set through image feature fusion, so that the trained neural network can better adapt to various scenes.

For the sake of understanding the present embodiment, first, a detailed description will be given of a method for generating training data disclosed in the present embodiment, where an execution subject of the method for generating training data provided in the present embodiment is generally a computer device having a certain computing capability, where the computer device includes, for example: the terminal device, or server or other processing device, may be a User Equipment (UE), mobile device, user terminal, cellular telephone, cordless telephone, personal digital assistant (Personal Digital Assistant, PDA), handheld device, computing device, vehicle mounted device, wearable device, etc. In some possible implementations, the method of generating training data may be implemented by a processor invoking computer readable instructions stored in a memory.

Referring to fig. 1, a flowchart of a method for generating training data according to an embodiment of the present disclosure is shown, where the method includes steps S101 to S103, where:

S101: acquiring a source training data set and a target face image generated by a target scene;

s102: generating a synthetic face image corresponding to the target face image based on the target face image and a source face image corresponding to the target face image in the source training dataset;

s103: and expanding the source training data set based on the synthesized face image to obtain an expanded training data set.

Here, in order to facilitate understanding of the generating method of training data provided by the embodiments of the present disclosure, first, an application scenario of the generating method may be described in detail. The method for generating the training data is mainly applied to the preparation process of the training data set before training the living body detection neural network. Considering that relevant living body detection is easily influenced by various attack means and attack materials, the trained living body detection model cannot be well adapted to a new attack environment, mainly because training data generated under the new attack environment is usually relatively less, and if the new training data is directly added to an existing training data set, the detection accuracy of the trained neural network under the new attack environment cannot be effectively improved.

In order to solve the above-mentioned problems, the embodiment of the present disclosure provides a scheme for implementing training data set expansion through image synthesis, so that a neural network trained based on the training data set expansion can better adapt to the detection requirements under a new attack environment.

The source training data set may be collected in advance, for example, may be a collection of face images obtained in a widely-used face living attack mode, or may be a collection of images obtained in other living attack modes. The target scenario here may correspond to the new attack environment described above, in which the number of target face images generated is typically relatively small.

It is known that, in the embodiment of the present disclosure, the target scene corresponding to the target face image and the scene corresponding to the source training data set are different, where the different scenes mainly refer to different acquisition environments where the face image is located, for example, in the case that the acquisition environment corresponding to the source training data set includes a paper attack mode, a mobile phone screen attack mode, and the like, the face image acquired in the new attack environment, which is the computer screen attack mode, may be used as the target face image in the target scene.

In this case, the method for generating training data provided by the embodiment of the present disclosure may generate a synthetic face image based on the target face image and the source face image corresponding to the target face image in the source training data set, where the synthetic face image fuses the image features of the target face image and the image features of the source face image, so that after the synthetic face image expands the source training data set, the obtained expanded training data set will include more face images conforming to the target scene, and the neural network trained according to such expanded training data set may better adapt to the requirements in the target scene.

In the embodiment of the disclosure, there may be multiple target face images, and in the process of image synthesis, the method may be performed for all the target face images, for example, each source face image in the source training dataset may be traversed, and each traversed source face image is synthesized with each target face image, so that there may be multiple synthesized face images corresponding to each target face image. In addition, the embodiment of the disclosure can also perform a synthesis operation on a part of source face images in the source training data set for each target face image.

It is noted that, in the training of the relevant living neural network based on the training data set, the training data set will generally include a face image with a real person tag and also include a face image with a prosthetic tag. In order to avoid possible interference of different types of labels on the neural network, the type of the living label to which the target face image belongs is determined, for example, the living label corresponds to a real person label, at this time, a source face image with the same label type as the real person label can be selected from the source training data set, and then the synthesis operation of the selected source face image and the target face image is performed.

The training data generation method provided by the embodiment of the disclosure can utilize the trained image generation neural network to realize the related operation of image synthesis.

Here, the source face image paired with the target face image may be extracted from the source training data set first, and then the target face image and the source face image paired with the target face image may be input to the trained image generation neural network to perform feature fusion processing, so as to obtain a synthetic face image in which the content features of the active face image and the style features of the target face image are fused.

The source face image paired with the target face image in the embodiment of the present disclosure may refer to a face image having the same living body tag class as the living body tag class to which the target face image belongs. Here, two face images (i.e., the paired target face image and the source face image) of the same living body label category may be directly input to the image generation neural network, and a synthetic face image may be obtained by fusing the content features of the active face image and the style features of the target face image.

In the embodiment of the disclosure, the content features of the source face image and the style features of the target face image are selected for fusion, mainly considering that in the subsequent living body detection application, the influence of the style features such as living body attack type (e.g. mobile phone screen attack and paper attack) on the living body detection result is more concerned, and some features of the face image, such as content features of the facial feature size, the eyebrow space and the like, can be weakened, so that in order to better adapt to living body detection under a new target scene, the corresponding style features can be extracted for the target face image, and the corresponding content features can be extracted for the source face image.

The image generation neural network can train the matching relation between two input face images and one output synthesized face image, and can train the image generation neural network according to the following steps:

step one, acquiring paired source face image samples and target face image samples;

respectively extracting features of the paired source face image sample and target face image sample to obtain content feature information of the source face image sample and style feature information of the target face image sample;

and thirdly, performing at least one round of training on the image to be trained to generate a neural network based on the content characteristic information of the source face image sample and the style characteristic information of the target face image sample.

Here, the source face image sample and the target face image sample may be regarded as a paired set of face image samples. In order to achieve the technical purpose of migrating style features of a target face image sample onto a source face image, in the embodiment of the disclosure, under the condition of respectively extracting features of paired source face image samples and target face image samples, content feature information of the source face image sample and style feature information of the target face image sample can be determined, and one or more rounds of network training can be performed based on the content feature information of the source face image sample and style feature information of the target face image sample.

The content feature information may be used to represent a face in the source face image sample, and is a feature of a higher layer, and in a specific application, may be related information of the face, for example, may be information used to represent the face, such as a face contour, a distance between two eyes, and the like; the style characteristic information may be an underlying characteristic close to the image texture, for example, information such as a material presented by the target face image sample.

Here, in order to ensure that the image generating neural network can generate a synthetic face image sample in which the content feature information of the active face image sample and the style feature information of the target face image sample are fused, training may be performed according to the following steps in the training process of each round:

step one, aiming at the current training, taking the content characteristic information of a source face image sample and the style characteristic information of a target face image sample as input characteristics of an image generation neural network to be trained, and determining fusion characteristic information output by the image generation neural network to be trained;

step two, adjusting network parameters of the image generation neural network and performing the next training when the first similarity between the fusion characteristic information and the content characteristic information is smaller than a first threshold value and the second similarity between the fusion characteristic information and the style characteristic information is smaller than a second threshold value;

And step three, training is stopped until the first similarity between the fusion characteristic information and the content characteristic information is larger than or equal to a first threshold value and the second similarity between the fusion characteristic information and the style characteristic information is larger than or equal to a second threshold value.

Here, the similarity between the fusion characteristic information output by the image generation neural network to be trained and the content characteristic information and the style characteristic information input into the network can be determined in the training process of each round of the neural network, when the first similarity between the fusion characteristic information and the content characteristic information is not large enough, the defect that the content characteristic contained in the synthetic face image sample corresponding to the fusion characteristic information is insufficient is described to a certain extent, and at this time, the duty ratio of the relevant content characteristic in the fusion characteristic can be improved by adjusting the network parameter of the image generation neural network. Similarly, when the second similarity between the fused feature information and the style feature information is not enough, the fact that the style features contained in the synthesized face image sample corresponding to the fused feature information are insufficient is explained to a certain extent, and at the moment, the duty ratio of related style features in the fused feature can be improved by adjusting the network parameters of the image generation neural network. Thus, through repeated iterative training, a trained image generation neural network can be obtained.

It should be noted that the above-mentioned first threshold and the second threshold may be selected in combination with different application scenarios. Taking selection of the second threshold as an example, in practical application, the second threshold set is not too large or too small, the too large second threshold will cause the whole synthesized face image sample to ignore the influence of the application of the content feature corresponding to the living body detection and the like, and the too small second threshold will cause the whole synthesized face image sample to be unable to well perform the application of the living body detection and the like according to the style feature, so that the corresponding second threshold can be determined in a mode of selecting the style ratio of 0.6.

It is appreciated that the image generation neural network in the embodiments of the present disclosure implements style migration operations. In a specific application, the above style migration operation may be implemented by using a style migration network of whitening and coloring transformation (whitening and coloring transforms, WCT 2), where WCT2 also has different reconstructed synthesized face image samples for different levels of features, the lower the feature level used (the shallower the level of corresponding feature extraction), the higher the feature level used (the deeper the level of corresponding feature extraction) the better the corresponding synthesized face image sample is, and in the embodiment of the present disclosure, the degree of stylization may be controlled by a degree of stylization parameter, for example, set to 0.6 to control the degree of stylization.

The embodiment of the disclosure can train the living body detection neural network by using the extended training data, namely, train the living body detection neural network to be trained at least one round based on the extended training data set, and obtain the trained living body detection neural network.

The extended training data set may include not only a source face image included in the source training data set, but also a target face image generated in a target scene, and further a synthetic face image in which image features of the source face image and the target face image are combined.

In addition, the living body detection neural network in the embodiment of the disclosure mainly can realize the classification recognition operation of a real person or a prosthesis for any one of the input face images. Meanwhile, in practical application, the multi-classification recognition operation of determining the corresponding attack mode of the face image judged as the prosthesis may be performed, which is not particularly limited in the embodiment of the present disclosure. For ease of illustration, description will be given below taking a classification recognition as an example.

In embodiments of the present disclosure, the living neural network may be trained as follows:

Step one, utilizing a living body detection neural network to be trained to respectively obtain characteristic information of each face image in an extended training data set;

step two, determining a target loss function value corresponding to the living body detection neural network to be trained based on the obtained characteristic information;

and thirdly, under the condition that the target loss function value does not meet the preset condition, performing next training on the living body detection neural network to be trained until the target loss function value meets the preset condition.

Here, for each source face image in the extended training data set, the first feature information of the source face image can be determined by using the living body detection neural network to be trained, and similarly, for each target face image in the extended training data set, the second feature information of the target face image can be determined by using the living body detection neural network to be trained, and similarly, for each synthetic face image in the extended training data set, the third feature information of the synthetic face image can be determined by using the living body detection neural network to be trained.

The first feature information, the second feature information and the third feature information may be feature information related to the living body detection identification, where the feature information may change with adjustment of network parameters for the living body detection neural network, so that the trained feature information can perform living body detection better.

Here, the face image having the genuine face tag and the face image having the prosthetic tag may be included regardless of whether the face image is for a plurality of source face images (corresponding to the source face image subset) in the extended training data set, or for a plurality of target face images (corresponding to the target face image subset) in the extended training data set, or for a plurality of synthetic face images (corresponding to the synthetic face image subset) in the extended training data set.

In the case where the first feature information, the second feature information, and the third feature information are extracted, the feature distributions of the three subsets, that is, the source face image subset, the target face image subset, and the synthesized face image subset, are relatively independent. In order to achieve the purpose of living body classification for the three subsets, the embodiment of the disclosure can establish corresponding objective loss functions at two levels of face image level and feature distribution level, so as to achieve a good effect of living body classification for the three subsets. The following two aspects can be explained.

First aspect: for the face image level, a first objective loss function value for measuring the variability of training data within the same living label category may be determined here based on the first feature information, the second feature information, and the third feature information. This objective loss function value may be determined specifically as follows:

Step one, selecting two first face images of the same living body label category from a training data set, and selecting two second face images of different living body label categories; the sources of the two first face images are different, and the sources of the two second face images are different;

step two, determining first image similarity between two first face images based on the characteristic information of the two first face images; and determining a second image similarity between the two second face images based on the feature information of the two second face images;

and thirdly, summing the first image similarity and the second image similarity to obtain a first objective loss function value.

The two first face images and the two second face images here each include two of a source face image, a target face image, and a synthesized face image. In the embodiment of the disclosure, the two corresponding first face images and the two corresponding second face images may be selected based on the type of the living body tag, and in practical application, one of the selected first face images and one of the selected second face images may be the same image or may be different images, which is not limited in detail herein.

For two first face images belonging to the same living body label category selected in the training data set, for example, two first face images with real person labels, a first image similarity between the two first face images can be determined; similarly, for two first face images with prosthetic labels, the corresponding first image similarity may also be determined. For two second face images belonging to different living body label categories selected in the training data set, for example, one second face image with a real person label and one second face image with a prosthesis label are included, and the second image similarity between the two second face images can be determined.

In order to achieve living body classification, the first objective loss function determined here needs to increase the first image similarity as much as possible and decrease the second image similarity. Here, the above-described description about the first image similarity and the second image similarity may be specifically made with reference to an exemplary diagram shown in fig. 2.

As shown in fig. 2, in the case where the first feature information, the second feature information, and the third feature information are extracted, feature distributions (respectively labeled as distribution 1, distribution 2, and distribution 3) of the three subsets of the source face image subset, the target face image subset, and the synthesized face image subset are relatively independent.

For distribution 1, the corresponding 11 and 12 in the distribution 1 are respectively corresponding to source face images with real person labels and prosthesis labels; for distribution 2, corresponding 21 and 22 in the distribution 2 are respectively corresponding to target face images with real person labels and prosthesis labels; for distribution 3, corresponding 31 and 32 in this distribution 3 correspond to the synthetic face image with the real person tag and the prosthetic tag, respectively.

The process of performing the above-mentioned first image similarity calculation, that is, the process of zooming in the image similarity between the face images belonging to the same living body label in the distribution 1, the distribution 2, and the distribution 3, for example, may be the process of determining the first image similarity for the face image with the reference number 11 in the distribution 1 and for the face image with the reference number 21 in the distribution 2, where the purpose is to enable the trained living body detection neural network to zoom in the distance between the source face image and the target face image belonging to the real body label.

The process of performing the second image similarity calculation described above, that is, the process of pushing the image similarity between the face images belonging to different living body tags in the distribution 1, the distribution 2, and the distribution 3, may be, for example, determining the second image similarity for the face image labeled 11 in the distribution 1 and for the face image labeled 22 in the distribution 2, where the aim is to enable the trained living body detection neural network to push the distance between the source face image belonging to the real person tag and the target face image belonging to the prosthetic tag.

In the process of generating the synthetic face image by using the image generating neural network, a certain difference is generally required between the generated synthetic face image and the input source face image in order to enhance the image generating effect, so that in a specific application, the image similarity related to the distribution 1 and the distribution 3 may not be limited in the above-mentioned relation.

Based on the above principles, embodiments of the present disclosure may determine a first objective loss function value for measuring variability of training data within the same class of living labels.

Second aspect: for the feature distribution level, a second objective loss function value for measuring feature distribution conditions of face images of different sources may be determined based on the first feature information, the second feature information, and the third feature information. This objective loss function value may be determined specifically as follows:

step one, based on the first feature information, the second feature information and the third feature information, respectively determining first distribution feature information for representing feature distribution of each source face image in the extended training data set, second distribution feature information for representing feature distribution of each target face image in the extended training data set and third distribution feature information for representing feature distribution of each synthetic face image in the extended training data set;

Step two, based on the similarity between the second distribution characteristic information and the third distribution characteristic information, determining the first characteristic distribution similarity between each target face image and each synthesized face image; splicing the second distribution characteristic information and the third distribution characteristic information to obtain spliced distribution characteristic information;

step three, determining second feature distribution similarity between each source face image and each synthesized face image based on the first distribution feature information and the similarity between the spliced distribution feature information;

and step four, summing the first characteristic distribution similarity and the second characteristic distribution similarity to obtain a third objective loss function value.

Here, in the case where the first feature information for the source face image, the second feature information for the target face image, and the third feature information for the synthesized face image are determined, the corresponding first distribution feature information, the corresponding second distribution feature information, and the corresponding third distribution feature information may be determined for the source face image subset, the target face image subset, and the synthesized face image subset, respectively.

Here, in order to achieve the above-mentioned three face image subsets in the feature distribution layer, the similarity between the second distribution feature information and the third distribution feature information may be determined first, where the greater the first feature distribution similarity, the closer the feature distribution corresponding to the target face image subset and the synthesized face image subset is described to a certain extent, as shown in fig. 2, after the first distribution is pulled, the distribution 2 ^’ (corresponding to the scaled profile 2) and profile 3 ^’ (corresponding to the zoomed-in profile 3) are closer together.

In the case of stitching the second distribution feature information and the third distribution feature information, the similarity between the first distribution feature information and the stitched distribution feature information may be determined, where the second feature distribution similarity describes to some extent that the closer the source face image subset is to the feature distribution corresponding to the stitched two face image subsets, as shown in fig. 2, distribution 1 after the second distribution is stitched compared to the original distribution ^’ (corresponding to the scaled distribution 1) and distribution 2 ^’ Distribution 3 ^’ The two parts are closer to each other.

It should be noted that, in the specific process of executing the zooming operation, one distribution may be kept still, the other distribution may be closed based on the similarity of the distributions, or a specific closing position may be selected, and both distributions may be closed to this position.

Based on the above principle, the embodiment of the disclosure may determine a second objective loss function value for measuring the feature distribution situation of the face images from different sources.

After the feature distribution approximation operation is performed, a hyperplane can be easily found to effectively divide the face image belonging to the real person tag and the face image belonging to the prosthesis tag in the extended training data set, so that the identification accuracy of the living body detection neural network is improved.

In the embodiment of the disclosure, under the condition that the first objective loss function value and the second objective loss function value are determined, the objective loss function value corresponding to the living body detection neural network to be trained can be determined. Once the objective loss function value of one training round is determined to not meet the preset condition, the next training round can be performed according to the method until the objective loss function value meets the preset condition, and the training is stopped.

It should be noted that, the preset condition may be determined for the first objective loss function value, the second objective loss function value, and the entire objective loss function value separately, or may be determined for any combination of the three function values, which is not specifically limited in the embodiment of the present disclosure.

According to the training data generation method provided by the embodiment of the disclosure, the face image characteristics in the target scene can be well mined by using the network training method, so that requirements of relevant living body detection application in the target scene can be better adapted. In order to avoid performance interference of the trained living neural network with respect to living applications in existing scenarios, the living neural network may be interfered with by a source domain anti-forgetting constraint strategy.

Here, the first feature information of each source face image in the extended training data set determined by the to-be-trained living body detection neural network may be used, and the target loss function value corresponding to the to-be-trained living body detection neural network may be determined by using the first source feature information extracted from the source face image by the trained living body detection neural network.

The source living body detection neural network is obtained by training each source face image sample and the living body label category marked on each source face image sample.

In a specific application, the objective loss function value may be determined by a difference between the first characteristic information and the first source characteristic information. The smaller the difference value is, the fact that the current living body detection neural network to be trained and the trained source living body detection neural network do not generate excessive deviation in the characteristic space is indicated to a certain extent, namely, the performance of the trained living body detection neural network in the source field can be ensured not to be reduced.

In the embodiment of the present disclosure, when the objective loss function value is not sufficiently small, the above-described offset variation can be penalized by the parameter adjustment of the in-vivo detection neural network.

It should be noted that, the process of training the living body detection neural network according to the embodiment of the disclosure may be a result of a combined action of the target loss function set for the anti-forgetting constraint policy of the source field, the first target loss function for measuring the difference of training data in the same living body label category, and the second target loss function for measuring the feature distribution of face images of different sources, and the constraint of the loss function ensures that the recognition performance of the trained living body detection neural network in the existing scene is not degraded on the premise of improving the recognition performance of the trained living body detection neural network in the new target scene.

It will be appreciated by those skilled in the art that in the above-described method of the specific embodiments, the written order of steps is not meant to imply a strict order of execution but rather should be construed according to the function and possibly inherent logic of the steps.

Based on the same inventive concept, the embodiment of the disclosure further provides a training data generating device corresponding to the training data generating method, and since the principle of solving the problem of the device in the embodiment of the disclosure is similar to that of the training data generating method in the embodiment of the disclosure, the implementation of the device can refer to the implementation of the method, and the repetition is omitted.

Referring to fig. 3, a schematic diagram of a training data generating apparatus according to an embodiment of the disclosure is shown, where the apparatus includes: an acquisition module 301, a generation module 302 and an expansion module 303; wherein, the liquid crystal display device comprises a liquid crystal display device,

the acquisition module 301 is configured to acquire a source training data set and a target face image generated by a target scene;

a generating module 302, configured to generate a composite face image corresponding to the target face image based on the target face image and a source face image corresponding to the target face image in the source training dataset;

the expansion module 303 is configured to expand the source training data set based on the synthesized face image, and obtain an expanded training data set.

The embodiment of the disclosure can fuse the target face image based on the source face image in the source training data set, so that the fused synthesized face image not only contains the image characteristics of the source face image, but also contains the image characteristics of the target face image. Under the condition that the target face images generated by the target scene are relatively few, the synthesized face images obtained by fusion can be used for expanding the target face images in a sample space, and the training data set obtained by expansion can cover more training data in the target scene to a certain extent, so that the detection accuracy of the trained neural network in the target scene (for example, a brand new environment) is relatively high.

In a possible implementation manner, the generating module 302 is configured to generate a composite face image corresponding to the target face image based on the target face image and the source face image corresponding to the target face image in the source training dataset according to the following steps:

extracting a source face image paired with the target face image from the source training data set;

inputting the target face image and the source face image paired with the target face image into a trained image generation neural network for feature fusion processing to obtain a synthetic face image, wherein the synthetic face image fuses the content features of the active face image and the style features of the target face image.

In one possible implementation, the generating module 302 is configured to train the image to generate the neural network according to the following steps:

In one possible implementation, the generating module 302 is configured to perform at least one training round on the basis of the content feature information of the source face image sample and the style feature information of the target face image sample to generate the neural network for the image to be trained:

aiming at the current training, taking the content characteristic information of the source face image sample and the style characteristic information of the target face image sample as input characteristics of an image generation neural network to be trained, and determining fusion characteristic information output by the image generation neural network to be trained;

under the condition that the first similarity between the fusion characteristic information and the content characteristic information is smaller than a first threshold value and the second similarity between the fusion characteristic information and the style characteristic information is smaller than a second threshold value, adjusting network parameters of the image generation neural network, and performing the next training;

In a possible implementation manner, the generating module 302 is configured to extract, from the source training dataset, a source face image paired with the target face image according to the following steps:

and extracting the source face image with the same living body label category as the living body label category of the target face image from the source training data set as the source face image matched with the target face image.

In one possible embodiment, the apparatus further includes:

the training module 304 is configured to perform at least one training round on the living neural network to be trained based on the extended training data set after the extended training data set is obtained, so as to obtain a trained living neural network.

In a possible implementation manner, the training module 304 is configured to perform at least one training round on the living neural network to be trained based on the extended training data set according to the following steps to obtain a trained living neural network:

respectively obtaining characteristic information of each Zhang Ren face image in the extended training data set by utilizing a living body detection neural network to be trained;

and under the condition that the target loss function value does not meet the preset condition, performing the next training on the living body detection neural network to be trained until the target loss function value meets the preset condition.

In one possible implementation, the feature information of each face image in the training dataset is expanded, including:

the first characteristic information of the source face image, the second characteristic information of the target face image and the third characteristic information of the synthesized face image in the training data set are expanded.

In a possible implementation manner, the training module 304 is configured to determine a target loss function value corresponding to the living body detection neural network to be trained based on the feature information of each face image according to the following steps:

In one possible implementation, the training module 304 is configured to determine a first objective loss function value for measuring a difference of training data in the same living label category based on the first feature information, the second feature information, and the third feature information according to the following steps:

determining first image similarity between two first face images based on the characteristic information of the two first face images; determining second image similarity between the two second face images based on the characteristic information of the two second face images;

and summing the first image similarity and the second image similarity to obtain a first target loss function value.

In a possible implementation manner, the training module 304 is configured to determine a second objective loss function value for measuring feature distribution situations of face images from different sources based on the first feature information, the second feature information, and the third feature information according to the following steps:

Determining first feature distribution similarity between each target face image and each synthesized face image based on the similarity between the second distribution feature information and the third distribution feature information; splicing the second distribution characteristic information and the third distribution characteristic information to obtain spliced distribution characteristic information;

and summing the first characteristic distribution similarity and the second characteristic distribution similarity to obtain a third objective loss function value.

In a possible implementation manner, the training module 304 is configured to determine the objective loss function value corresponding to the living body detection neural network to be trained based on the obtained feature information according to the following steps:

determining a target loss function value corresponding to the living body detection neural network to be trained by using first characteristic information of each source face image in the extended training data set determined by the living body detection neural network to be trained and first source characteristic information extracted from the source face image by using the trained source living body detection neural network;

The process flow of each module in the apparatus and the interaction flow between the modules may be described with reference to the related descriptions in the above method embodiments, which are not described in detail herein.

The embodiment of the disclosure further provides an electronic device, as shown in fig. 4, which is a schematic structural diagram of the electronic device provided by the embodiment of the disclosure, including: a processor 401, a memory 402, and a bus 403. The memory 402 stores machine-readable instructions executable by the processor 401 (e.g., execution instructions corresponding to the acquisition module 301, the generation module 302, and the expansion module 303 in the apparatus of fig. 3), and when the electronic device is running, the processor 401 communicates with the memory 402 through the bus 403, and when the machine-readable instructions are executed by the processor 401, the following processing is performed:

The disclosed embodiments also provide a computer readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of the training data generation method described in the method embodiments above. Wherein the storage medium may be a volatile or nonvolatile computer readable storage medium.

The embodiments of the present disclosure further provide a computer program product, where the computer program product carries a program code, where instructions included in the program code may be used to perform the steps of the method for generating training data described in the foregoing method embodiments, and specifically reference may be made to the foregoing method embodiments, which are not described herein.

Wherein the above-mentioned computer program product may be realized in particular by means of hardware, software or a combination thereof. In an alternative embodiment, the computer program product is embodied as a computer storage medium, and in another alternative embodiment, the computer program product is embodied as a software product, such as a software development kit (Software Development Kit, SDK), or the like.

It will be clear to those skilled in the art that, for convenience and brevity of description, specific working procedures of the above-described system and apparatus may refer to corresponding procedures in the foregoing method embodiments, which are not described herein again. In the several embodiments provided in the present disclosure, it should be understood that the disclosed systems, devices, and methods may be implemented in other manners. The above-described apparatus embodiments are merely illustrative, for example, the division of the units is merely a logical function division, and there may be other manners of division in actual implementation, and for example, multiple units or components may be combined or integrated into another system, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be through some communication interface, device or unit indirect coupling or communication connection, which may be in electrical, mechanical or other form.

The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, each functional unit in each embodiment of the present disclosure may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit.

The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a non-volatile computer readable storage medium executable by a processor. Based on such understanding, the technical solution of the present disclosure may be embodied in essence or a part contributing to the prior art or a part of the technical solution, or in the form of a software product stored in a storage medium, including several instructions to cause a computer device (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the steps of the method described in the embodiments of the present disclosure. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (Random Access Memory, RAM), a magnetic disk, or an optical disk, or other various media capable of storing program codes.

Finally, it should be noted that: the foregoing examples are merely specific embodiments of the present disclosure, and are not intended to limit the scope of the disclosure, but the present disclosure is not limited thereto, and those skilled in the art will appreciate that while the foregoing examples are described in detail, it is not limited to the disclosure: any person skilled in the art, within the technical scope of the disclosure of the present disclosure, may modify or easily conceive changes to the technical solutions described in the foregoing embodiments, or make equivalent substitutions for some of the technical features thereof; such modifications, changes or substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the disclosure, and are intended to be included within the scope of the present disclosure. Therefore, the protection scope of the present disclosure shall be subject to the protection scope of the claims.

Claims

1. A method of generating training data, comprising:

generating a synthetic face image corresponding to the target face image based on the target face image and a source face image corresponding to the target face image in the source training data set, wherein the synthetic face image is fused with the content characteristics of the source face image and the style characteristics of the target face image, and the stylization degree in the synthetic face image is controlled through a stylization degree parameter;

and expanding the source training data set based on the synthesized face image to obtain an expanded training data set, and performing at least one round of training on a living body detection neural network to be trained based on the expanded training data set to obtain a trained living body detection neural network, wherein the target loss function for training the living body detection neural network comprises a first target loss function and a second target loss function, the first target loss function is used for measuring the difference of training data in the same living body label category, and the second target loss function is used for measuring the characteristic distribution situation of face images of different sources, wherein the face images of different sources comprise the source face image, the target face image and the synthesized face image in the expanded training data set.

2. The method of claim 1, wherein the generating a composite face image corresponding to the target face image based on the target face image and a source face image in the source training dataset corresponding to the target face image comprises:

and inputting the target face image and a source face image matched with the target face image into a trained image generation neural network to perform feature fusion processing, so as to obtain the synthesized face image.

3. The method of claim 2, wherein the image-generating neural network is trained as follows:

4. A method according to claim 3, wherein the training at least one round of the image generation neural network to be trained based on the content feature information of the source face image sample and the style feature information of the target face image sample comprises:

5. The method according to any one of claims 2 to 4, wherein the extracting the source face image paired with the target face image from the source training dataset comprises:

6. The method according to any one of claims 1 to 4, wherein the performing at least one training round on the living neural network to be trained based on the extended training data set to obtain a trained living neural network comprises:

7. The method of claim 6, wherein the augmenting the feature information for each face image in the training dataset comprises:

8. The method of claim 7, wherein determining the objective loss function value corresponding to the living neural network to be trained based on the obtained feature information comprises:

9. The method of claim 8, wherein the determining a first objective loss function value for measuring variability of training data within the same category of living labels based on the first characteristic information, the second characteristic information, and the third characteristic information comprises:

10. The method according to claim 8 or 9, wherein determining a second objective loss function value for measuring feature distribution of face images of different sources based on the first feature information, the second feature information, and the third feature information, comprises:

11. The method of claim 7, wherein determining the objective loss function value corresponding to the living neural network to be trained based on the obtained feature information comprises:

12. A training data generation apparatus, comprising:

the generating module is used for generating a synthesized face image corresponding to the target face image based on the target face image and a source face image corresponding to the target face image in the source training data set, wherein the synthesized face image is fused with the content characteristics of the source face image and the style characteristics of the target face image, and the stylization degree in the synthesized face image is controlled through a stylization degree parameter;

the expansion module is used for expanding the source training data set based on the synthesized face image to obtain an expanded training data set;

the training module is used for carrying out at least one round of training on the living body detection neural network to be trained based on the extended training data set to obtain a trained living body detection neural network, wherein the target loss function for training the living body detection neural network comprises a first target loss function and a second target loss function, the first target loss function is used for measuring the difference of training data in the same living body label category, the second target loss function is used for measuring the characteristic distribution condition of face images with different sources, and the face images with different sources comprise source face images, target face images and synthesized face images in the extended training data set.

13. An electronic device, comprising: a processor, a memory and a bus, said memory storing machine readable instructions executable by said processor, said processor and said memory communicating over the bus when the electronic device is running, said machine readable instructions when executed by said processor performing the steps of the training data generation method according to any of claims 1 to 11.

14. A computer-readable storage medium, characterized in that the computer-readable storage medium has stored thereon a computer program which, when executed by a processor, performs the steps of the training data generation method according to any of claims 1 to 11.