CN113705362B

CN113705362B - Training method and device of image detection model, electronic equipment and storage medium

Info

Publication number: CN113705362B
Application number: CN202110888202.4A
Authority: CN
Inventors: 黄泽斌
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2021-08-03
Filing date: 2021-08-03
Publication date: 2023-10-20
Anticipated expiration: 2041-08-03
Also published as: CN113705362A

Abstract

The disclosure provides a training method, a training device, electronic equipment and a storage medium of an image detection model, relates to the technical field of artificial intelligence, in particular to the technical field of computer vision and deep learning, and can be applied to scenes such as face recognition and living body detection. The specific implementation scheme is as follows: obtaining training data, wherein the training data comprises: a label of the sample image; determining a teacher image detection model corresponding to the label of the sample image; the teacher image detection model is obtained by training data with labels; determining a feature vector of the sample image according to the teacher image detection model; according to the sample image, the labels of the sample image and the feature vector, the coefficient adjustment is carried out on the initial student image detection model so as to realize training.

Description

Training method and device of image detection model, electronic equipment and storage medium

Technical Field

The disclosure relates to the technical field of artificial intelligence, in particular to the technical field of computer vision and deep learning, which can be applied to scenes such as face recognition and living body detection, and particularly relates to a training method and device of an image detection model, electronic equipment and a storage medium.

Background

With the development of image detection technology, an image detection model may be applied to living body detection, which may be a technology that automatically discriminates whether a face image in a given image or video originates from a real person in the scene or from a spoof face. Living body detection is an important technical means for preventing face attack and fraud, and is widely applied to industries and occasions related to remote identity authentication, such as banks, insurance, internet finance, electronic commerce and the like.

Disclosure of Invention

The disclosure provides a training method and device for an image detection model, electronic equipment and a storage medium.

According to an aspect of the present disclosure, there is provided a training method of an image detection model, including: obtaining training data, wherein the training data comprises: a sample image and a label of the sample image; determining a teacher image detection model corresponding to the label of the sample image; the teacher image detection model is obtained by training data with the labels; determining a feature vector of the sample image according to the teacher image detection model; and according to the sample image, the label of the sample image and the feature vector, performing coefficient adjustment on an initial student image detection model so as to realize training.

According to another aspect of the present disclosure, there is provided a training apparatus of an image detection model, including: the system comprises an acquisition module for acquiring training data, wherein the training data comprises: a sample image and a label of the sample image; a determining module, configured to determine a teacher image detection model corresponding to a label of the sample image; the teacher image detection model is obtained by training data with the labels; the determining module is further used for determining the feature vector of the sample image according to the teacher image detection model; and the training module is used for carrying out coefficient adjustment on the initial student image detection model according to the sample image, the label of the sample image and the feature vector so as to realize training.

According to another aspect of the present disclosure, there is provided an electronic device including: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of the first aspect embodiment of the present disclosure.

According to another aspect of the present disclosure, there is provided a non-transitory computer readable storage medium storing computer instructions for causing the computer to perform the method according to the embodiments of the first aspect of the present disclosure.

According to another aspect of the present disclosure, there is provided a computer program product comprising a computer program which, when executed by a processor, implements a method as described in embodiments of the first aspect of the present disclosure.

It should be understood that the description in this section is not intended to identify key or critical features of the embodiments of the disclosure, nor is it intended to be used to limit the scope of the disclosure. Other features of the present disclosure will become apparent from the following specification.

Drawings

The drawings are for a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:

FIG. 1 is a schematic diagram according to a first embodiment of the present disclosure;

FIG. 2 is a schematic diagram according to a second embodiment of the present disclosure;

FIG. 3 is a schematic diagram of a training method of an image detection model according to an embodiment of the present disclosure;

FIG. 4 is a schematic diagram according to a third embodiment of the present disclosure;

FIG. 5 is a schematic diagram according to a fourth embodiment of the present disclosure;

FIG. 6 is a schematic diagram according to a fifth embodiment of the present disclosure;

FIG. 7 is a schematic diagram according to a sixth embodiment of the present disclosure;

fig. 8 is a block diagram of an electronic device for implementing a training method of an image detection model in accordance with an embodiment of the present disclosure.

Detailed Description

Exemplary embodiments of the present disclosure are described below in conjunction with the accompanying drawings, which include various details of the embodiments of the present disclosure to facilitate understanding, and should be considered as merely exemplary. Accordingly, one of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

In the related art, an image detection model for living body detection only uses images corresponding to a real person and an attack tag to train, and because the quantity distribution and the learning difficulty of the images corresponding to each tag are unbalanced, higher detection accuracy is difficult to achieve on the images corresponding to each tag.

In view of the above problems, the present disclosure proposes a training method, apparatus, electronic device, and storage medium for an image detection model.

Fig. 1 is a schematic diagram according to a first embodiment of the present disclosure. It should be noted that, the training method of the image detection model according to the embodiments of the present disclosure may be applied to the training apparatus of the image detection model according to the embodiments of the present disclosure, and the apparatus may be configured in an electronic device. The electronic device may be a mobile terminal, such as a mobile phone, a tablet computer, a personal digital assistant, or other hardware devices with various operating systems.

As shown in fig. 1, the training method of the image detection model may include the following steps:

step 101, obtaining training data, wherein the training data comprises: sample image and label of sample image.

In the embodiment of the disclosure, the sample image may be acquired by the image acquisition device, and the label of the sample image corresponding to the sample image may be acquired according to the characteristics of the sample image, for example, whether the sample image is a real object image may be determined. For example, the sample image is a real object image, the label of the corresponding sample image is a positive sample label, the sample image is a non-real object, and the label of the corresponding sample image is a negative sample label. Then, the sample image and the label of the sample image are used as training data.

Step 102, determining a teacher image detection model corresponding to the label of the sample image; the teacher image detection model is obtained through training of training data with labels.

In the embodiment of the disclosure, a teacher image detection model set corresponding to the label of the sample image may be acquired, and a plurality of teacher image detection models in the teacher image detection model set are used as the teacher image detection models corresponding to the label of the sample image. The plurality of teacher image detection models can be obtained through training by adopting corresponding training data with labels.

And step 103, determining the feature vector of the sample image according to the teacher image detection model.

In the embodiment of the present disclosure, for each sample image, the sample image may be input into a teacher image detection model, and a feature vector of the sample image may be determined from an output of the teacher image detection model.

And 104, performing coefficient adjustment on the initial student image detection model according to the sample image, the label of the sample image and the feature vector so as to realize training.

In the embodiment of the disclosure, the sample image may be input into an initial student image detection model, and the student detection model may output a prediction feature vector and a prediction label, which may be respectively combined with the label and the feature vector of the sample image to perform coefficient adjustment on the initial student image detection model, so as to implement training on the student detection model. The teacher image detection model and the student image detection model may be living body detection models, and the trained student image detection models may be applied to scenes such as face recognition and living body detection.

To sum up, training data is obtained, wherein the training data comprises: a sample image and a label of the sample image; determining a teacher image detection model corresponding to the label of the sample image; the teacher image detection model is obtained by training data with labels; determining a feature vector of the sample image according to the teacher image detection model; according to the sample images, the labels of the sample images and the feature vectors, coefficient adjustment is carried out on the initial student image detection model to realize training.

In order to accurately determine a teacher image detection model corresponding to a label of a sample image, as shown in fig. 2, fig. 2 is a schematic diagram according to a second embodiment of the present disclosure. In the embodiment of the disclosure, training data corresponding to the labels of different sample images may be used to train the teacher image detection model to determine the teacher image detection model corresponding to the label of the sample image. The embodiment shown in fig. 2 may include the following steps:

Step 201, obtaining training data, wherein the training data includes: sample image and label of sample image.

Step 202, acquiring a teacher image detection model set; the teacher image detection model set comprises: a first teacher image detection model and a plurality of second teacher image detection models; the first teacher image detection model is obtained by training data of positive sample labels and training data of negative sample labels of various types; the second teacher image detection model is obtained by training the training data of the positive sample label and the training data of the negative sample label.

In an embodiment of the present disclosure, a label of a sample image includes: the training data of the positive sample label can comprise the positive sample label and a sample image corresponding to the positive sample label, the training data of the negative sample label can comprise the negative sample label and a sample image corresponding to the negative sample label, and then the positive sample label and the sample image corresponding to the positive sample label, the negative sample labels of various types and the sample images corresponding to the negative sample labels of various types can be adopted to train the first teacher image detection model; and training the second teacher image detection model by adopting the positive sample label and the sample image corresponding to the positive sample label, one type of negative sample label and the sample image corresponding to the negative sample label, wherein the trained first teacher image detection model and the trained second teacher image detection model can output the labels corresponding to the images. It should be noted that, because the second teacher image detection model is obtained by training the training data of the positive sample label and the training data of the negative sample label, if the second teacher image detection model uses the same parameter amount as the first teacher image detection model and/or the network layer number, the second teacher image detection model is easier to be subjected to data over-fitting, so that in order to improve the detection accuracy of the image of the second teacher image detection model on the label of the type, the parameter amount of the first teacher image detection model is greater than the parameter amount of the second teacher image detection model, and/or the network layer number of the first teacher image detection model is greater than the network layer number of the second teacher image detection model.

The positive sample label is characterized in that the sample image is a real object image, the negative sample label is characterized in that the sample image is a non-real object image, and the negative sample label comprises at least one of the following labels: photo attack tags, video attack tags, mask attack tags. The photo attack tag is characterized in that the sample image is an image obtained by shooting a photo, the video attack tag is characterized in that the sample image is an image obtained by shooting a video, the mask attack tag is characterized in that the sample image is an image obtained by shooting a mask.

Step 203, using the first teacher image detection model and the second teacher image detection model including the label in the corresponding training data as the teacher image detection model corresponding to the label.

It can be understood that the first teacher image detection model obtained by training the training data of the positive sample tags and the training data of the negative sample tags is better for the detection results on the images corresponding to the negative sample tags, but when the first teacher image detection model is trained, due to unbalanced distribution of the number of the images corresponding to each type of negative sample tags and unbalanced learning difficulty, the better detection results are difficult to achieve on the images corresponding to each type of tags. Therefore, in order to be able to achieve a better detection result on each type of tag, the first teacher image detection model and the second teacher image detection model including the tag in the corresponding training data may be used as the teacher image detection model corresponding to the tag.

It should be noted that, the teacher image detection models corresponding to different labels of the sample image are also different, for example, the label of the sample image is a photo attack label in the negative sample label, the corresponding teacher image detection model is a first teacher image detection model and a second teacher image detection model corresponding to the photo attack label, the label of the sample image is a video attack label in the negative sample label, and the corresponding teacher image detection model is a first teacher image detection model and a second teacher image detection model corresponding to the video attack label; the labels of the sample images are mask attack labels in the negative sample labels, and the corresponding teacher image detection models are a first teacher image detection model and a second teacher image detection model corresponding to the mask attack labels.

For example, as shown in fig. 3, the living model 1 may be a first teacher image detection model, the living models 2 to n may be different second teacher image detection models, and the living model 1 and the different second teacher image detection models may be combined to distill a new living model, for example, the living model 1 and the living model 2 are used to distill the new living model, so as to improve the detection accuracy of the new living model on the image corresponding to the photo attack tag; the living body model 1 and the living body model 3 are adopted to distill a new living body detection model, so that the detection accuracy of the new living body detection model on the image corresponding to the video attack tag can be improved; similarly, the living body model 1 and the living body model 4 are adopted to distill the new living body model, so that the detection accuracy of the new living body model on the image corresponding to the mask attack label can be improved.

Step 204, determining the feature vector of the sample image according to the teacher image detection model.

And 205, performing coefficient adjustment on the initial student image detection model according to the sample image, the label of the sample image and the feature vector to realize training.

In the embodiment of the present disclosure, steps 201, 204-205 may be implemented in any manner in each embodiment of the present disclosure, which is not limited thereto, and is not described herein.

In summary, a teacher image detection model set is obtained; the teacher image detection model set comprises: a first teacher image detection model and a plurality of second teacher image detection models; the first teacher image detection model is obtained by training data of positive sample labels and training data of negative sample labels of various types; the second teacher image detection model is obtained by training data of a positive sample label and training data of a type of negative sample label; the first teacher image detection model and the second teacher image detection model which comprises the labels in the corresponding training data are used as the teacher image detection models which correspond to the labels, so that the detection accuracy of the teacher image detection models on the images which correspond to each label can be improved, and the detection accuracy of the student image detection models on the images which correspond to each label can be improved.

In order to make the student image detection model learn the features of the teacher image detection model better, as shown in fig. 4, fig. 4 is a schematic diagram according to a third embodiment of the disclosure, in an embodiment of the disclosure, a sample image may be input into the teacher image detection model, and a feature vector output by a network layer of the teacher image detection model except for a full connection layer may be obtained, so as to avoid that the feature vector loses a corresponding knowledge feature after being processed by the full connection layer and then outputting a label corresponding to the image. The embodiment illustrated in fig. 4 may include the following steps:

step 401, acquiring training data, wherein the training data includes: sample image and label of sample image.

Step 402, determining a teacher image detection model corresponding to the label of the sample image; the teacher image detection model is obtained through training of training data with labels.

Step 403, inputting the sample image into a teacher image detection model to obtain an output vector of a target network layer in the teacher image detection model; the target network layer is a network layer except for the full connection layer in the teacher image detection model.

In the embodiment of the disclosure, a sample image is input into a teacher image detection model, the teacher image detection model can output a label of the sample image corresponding to the sample image through a target network layer and a full connection layer, and an output vector output by the target network layer can be the label of the sample image corresponding to the sample image after being processed by the full connection layer. It should be noted that, the target network layer may be a network layer except the full connection layer in the teacher image detection model.

And step 404, taking the output vector as a feature vector.

The output vector of the target network layer output may then be used as the feature vector. The number of feature vectors of the sample image is at least one.

And step 405, performing coefficient adjustment on the initial student image detection model according to the sample image, the label of the sample image and the feature vector so as to realize training.

In the embodiment of the present disclosure, the steps 401 to 402 and 405 may be implemented in any manner in each embodiment of the present disclosure, which is not limited to this embodiment, and is not repeated herein.

In summary, the output vector of the target network layer in the teacher image detection model is obtained by inputting the sample image into the teacher image detection model; the target network layer is a network layer except for the full connection layer in the teacher image detection model; and taking the output vector as a characteristic vector, wherein the number of the characteristic vectors of the sample image is at least one. Therefore, the student image detection model can learn the characteristics of the teacher image detection model better, and the detection accuracy of the student image detection model on the image corresponding to each label is improved.

In order to improve the detection accuracy and deployment convenience of the student image detection model, the student image detection model may learn the features of the teacher image detection model, as shown in fig. 5, fig. 5 is a schematic diagram according to a fourth embodiment of the disclosure, and as an example, the initial student image detection model may be subjected to coefficient adjustment according to a sample image, a label of the sample image, and each feature vector, and the embodiment shown in fig. 5 may include the following steps:

step 501, obtaining training data, wherein the training data includes: sample image and label of sample image.

Step 502, determining a teacher image detection model corresponding to a label of a sample image; the teacher image detection model is obtained through training of training data with labels.

In step 503, feature vectors of the sample image are determined according to the teacher image detection model.

In the embodiment of the present disclosure, steps 501 to 503 may be implemented in any manner in each embodiment of the present disclosure, which is not limited to this embodiment, and is not described in detail.

In step 504, for each feature vector of the sample image, a coefficient adjustment is performed on the initial student image detection model according to the sample image, the feature vector, and the label of the sample image.

Optionally, inputting the sample image into a student image detection model to obtain a prediction feature vector and a prediction label of the sample image; determining a first sub-loss function value according to the predicted feature vector, the feature vector and the first sub-loss function; determining a second sub-loss function value according to the prediction label, the label of the sample image and the second sub-loss function; determining a loss function value according to the first sub-loss function value, the second sub-loss function value, the weight of the first sub-loss function and the weight of the second sub-loss function; and carrying out coefficient adjustment on the initial student image detection model according to the loss function value.

That is, at least one feature vector of the sample image is used, for each feature vector of the sample image, a corresponding sample image may be input into the student image detection model, the student image detection model may output a predicted feature vector of the sample image and a predicted label, then, a first sub-loss function value may be determined according to the predicted feature vector output by the student image detection model, the feature vector of the sample image and a preset first sub-loss function, a second sub-loss function value may be determined according to the predicted label output by the student image detection model, the label of the sample image and a preset second sub-loss function, the first sub-loss function value and the second sub-loss function value may be combined with corresponding weights, the loss function value may be determined, and a coefficient of the initial student image detection model may be adjusted according to the loss function value, for example, when the loss function value is the smallest, a coefficient of the corresponding student image detection model may be used as a coefficient of the trained student image detection model.

In summary, by performing coefficient adjustment on the initial student image detection model according to the sample image, the feature vector and the label of the sample image for each feature vector of the sample image, the student image detection model can learn the features of the teacher image detection model, and the detection accuracy and the deployment convenience of the student image detection model are improved.

In order to improve the detection accuracy and deployment convenience of the student image detection model, the student image detection model may learn the features of the teacher image detection model, as shown in fig. 6, fig. 6 is a schematic diagram according to a fifth embodiment of the disclosure, as another example, at least one feature vector of the sample image may be spliced or weighted to obtain a processed feature vector, and the coefficient adjustment is performed on the initial student image detection model according to the sample image, the label of the sample image, and the processed feature vector, where the embodiment shown in fig. 6 includes the following steps:

step 601, acquiring training data, wherein the training data comprises: sample image and label of sample image.

Step 602, determining a teacher image detection model corresponding to a label of a sample image; the teacher image detection model is obtained through training of training data with labels.

And step 603, determining the feature vector of the sample image according to the teacher image detection model.

In the embodiment of the present disclosure, steps 601 to 603 may be implemented in any manner in each embodiment of the present disclosure, which is not limited to this embodiment, and is not described in detail.

Step 604, performing stitching or weighted summation on at least one feature vector of the sample image to obtain a processed feature vector; and according to the sample image, the label of the sample image and the processed feature vector, carrying out coefficient adjustment on the initial student image detection model.

Optionally, inputting the sample image into a student image detection model to obtain a prediction feature vector and a prediction label of the sample image; determining a first sub-loss function value according to the predicted feature vector, the processed feature vector and the first sub-loss function; determining a second sub-loss function value according to the prediction label, the label of the sample image and the second sub-loss function; determining a loss function value according to the first sub-loss function value, the second sub-loss function value, the weight of the first sub-loss function and the weight of the second sub-loss function; and carrying out coefficient adjustment on the initial student image detection model according to the loss function value.

That is, at least one feature vector of the sample image is spliced or weighted and summed to obtain a processed feature vector, then the sample image is input into a student image detection model, the student image detection model can output a predicted feature vector of the sample image and a predicted label, a first sub-loss function value can be determined according to the predicted feature vector, the processed feature vector and a preset first sub-loss function, a second sub-loss function value can be determined according to the predicted label, the label of the sample image and a preset second sub-loss function, the first sub-loss function value and the second sub-loss function value are respectively combined with corresponding weights to determine a loss function value, and the coefficient of the initial student image detection model is adjusted according to the loss function value, for example, when the loss function value is minimum, the coefficient of the corresponding student image detection model is used as the coefficient of the trained student image detection model.

In sum, by performing stitching or weighting processing on at least one feature vector of the sample image to obtain a processed feature vector, and performing coefficient adjustment on the initial student image detection model according to the sample image, the label of the sample image and the processed feature vector, the student image detection model can learn the features of the teacher image detection model, and the detection accuracy and deployment convenience of the student image detection model on the image corresponding to each label are improved.

According to the training method of the image detection model, training data are obtained, wherein the training data comprise: a label of the sample image; determining a teacher image detection model corresponding to the label of the sample image; the teacher image detection model is obtained by training data with labels; determining a feature vector of the sample image according to the teacher image detection model; according to the sample images, the labels of the sample images and the feature vectors, coefficient adjustment is carried out on the initial student image detection model to realize training.

In order to achieve the above embodiments, the embodiments of the present disclosure further provide a training device for an image detection model.

Fig. 7 is a schematic diagram of a training apparatus 700 of an image detection model, as shown in fig. 7, according to a sixth embodiment of the present disclosure, including: an acquisition module 710, a determination module 720, and a training module 730.

The obtaining module 710 is configured to obtain training data, where the training data includes: a label of the sample image; a determining module 720, configured to determine a teacher image detection model corresponding to the label of the sample image; the teacher image detection model is obtained by training data with labels; the determining module 720 is further configured to determine a feature vector of the sample image according to the teacher image detection model; and the training module 730 is configured to perform coefficient adjustment on the initial student image detection model according to the sample image, the label of the sample image, and the feature vector, so as to implement training.

As one possible implementation of the embodiments of the present disclosure, the tag includes: positive sample labels and negative sample labels of various types; the determining module 720 is specifically configured to: acquiring a teacher image detection model set; the teacher image detection model set comprises: a first teacher image detection model and a plurality of second teacher image detection models; the first teacher image detection model is obtained by training data of positive sample labels and training data of negative sample labels of various types; the second teacher image detection model is obtained by training data of a positive sample label and training data of a type of negative sample label; and taking the first teacher image detection model and the second teacher image detection model which comprises the label in the corresponding training data as the teacher image detection model corresponding to the label.

As one possible implementation manner of the embodiments of the present disclosure, the parameter amount of the first teacher image detection model is larger than the parameter amount of the second teacher image detection model; and/or the network layer number of the first teacher image detection model is larger than the network layer number of the second teacher image detection model.

As one possible implementation of the embodiments of the present disclosure, the determining module 720 is further configured to: inputting the sample image into a teacher image detection model to obtain an output vector of a target network layer in the teacher image detection model; the target network layer is a network layer except for a full connection layer in the teacher image detection model; the output vector is taken as a feature vector.

As one possible implementation of the embodiments of the present disclosure, the number of feature vectors of the sample image is at least one; the training module 730 is specifically configured to: for each feature vector of the sample image, performing coefficient adjustment on an initial student image detection model according to the sample image, the feature vector and the label of the sample image; or, at least one feature vector of the sample image is spliced or weighted and summed to obtain a processed feature vector; and according to the sample image, the label of the sample image and the processed feature vector, performing coefficient adjustment on the initial student image detection model.

As one possible implementation of an embodiment of the present disclosure, the training module 730 is further configured to: inputting the sample image into a student image detection model to obtain a prediction feature vector and a prediction label of the sample image; determining a first sub-loss function value according to the predicted feature vector, the feature vector and the first sub-loss function; determining a second sub-loss function value according to the prediction label, the label of the sample image and the second sub-loss function; determining a loss function value according to the first sub-loss function value, the second sub-loss function value, the weight of the first sub-loss function and the weight of the second sub-loss function; and carrying out coefficient adjustment on the initial student image detection model according to the loss function value.

As one possible implementation manner of the embodiments of the present disclosure, the teacher image detection model and the student image detection model are living body detection models; positive sample labels, representing that a sample image is a real object image; negative sample labels, wherein the representation sample image is a non-real object image; the negative-sample label includes at least one of the following labels: photo attack tags, video attack tags, mask attack tags.

According to the training device of the image detection model, training data are obtained, wherein the training data comprise: a label of the sample image; determining a teacher image detection model corresponding to the label of the sample image; the teacher image detection model is obtained by training data with labels; determining a feature vector of the sample image according to the teacher image detection model; according to the sample images, the labels of the sample images and the feature vectors, coefficient adjustment is carried out on the initial student image detection model so as to realize training, the device can realize training of the corresponding teacher image detection model through the sample images corresponding to different labels so as to determine a plurality of teacher image detection models, and according to the feature vectors of the sample images, the sample images and the labels of the sample images determined by the plurality of teacher image detection models, coefficient adjustment is carried out on the initial student image detection model so as to realize training of the initial student image detection model, so that the detection accuracy of the student image detection model on the images corresponding to each label can be improved.

In the technical scheme of the disclosure, the related processes of collecting, storing, using, processing, transmitting, providing, disclosing and the like of the personal information of the user are performed on the premise of proving the consent of the user, and the related legal regulations are met without violating the public order colloquial.

According to embodiments of the present disclosure, the present disclosure also provides an electronic device, a readable storage medium and a computer program product.

Fig. 8 illustrates a schematic block diagram of an example electronic device 800 that may be used to implement embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smartphones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the disclosure described and/or claimed herein.

As shown in fig. 8, the apparatus 800 includes a computing unit 801 that can perform various appropriate actions and processes according to a computer program stored in a Read Only Memory (ROM) 802 or a computer program loaded from a storage unit 808 into a Random Access Memory (RAM) 803. In the RAM 803, various programs and data required for the operation of the device 800 can also be stored. The computing unit 801, the ROM 802, and the RAM 803 are connected to each other by a bus 804. An input/output (I/O) interface 805 is also connected to the bus 804.

Various components in device 800 are connected to I/O interface 805, including: an input unit 806 such as a keyboard, mouse, etc.; an output unit 807 such as various types of displays, speakers, and the like; a storage unit 808, such as a magnetic disk, optical disk, etc.; and a communication unit 809, such as a network card, modem, wireless communication transceiver, or the like. The communication unit 809 allows the device 800 to exchange information/data with other devices via a computer network such as the internet and/or various telecommunication networks.

The computing unit 801 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of computing unit 801 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, etc. The calculation unit 801 performs the respective methods and processes described above, for example, a training method of an image detection model. For example, in some embodiments, the training method of the image detection model may be implemented as a computer software program tangibly embodied on a machine-readable medium, such as the storage unit 808. In some embodiments, part or all of the computer program may be loaded and/or installed onto device 800 via ROM 802 and/or communication unit 809. When the computer program is loaded into the RAM 803 and executed by the computing unit 801, one or more steps of the training method of the image detection model described above may be performed. Alternatively, in other embodiments, the computing unit 801 may be configured to perform the training method of the image detection model in any other suitable way (e.g. by means of firmware).

Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuit systems, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), systems On Chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs, the one or more computer programs may be executed and/or interpreted on a programmable system including at least one programmable processor, which may be a special purpose or general-purpose programmable processor, that may receive data and instructions from, and transmit data and instructions to, a storage system, at least one input device, and at least one output device.

Program code for carrying out methods of the present disclosure may be written in any combination of one or more programming languages. These program code may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus such that the program code, when executed by the processor or controller, causes the functions/operations specified in the flowchart and/or block diagram to be implemented. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. The machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and pointing device (e.g., a mouse or trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), and the internet.

The computer system may include a client and a server. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server may be a cloud server, a server of a distributed system, or a server incorporating a blockchain.

It should be appreciated that various forms of the flows shown above may be used to reorder, add, or delete steps. For example, the steps recited in the present disclosure may be performed in parallel or sequentially or in a different order, provided that the desired results of the technical solutions of the present disclosure are achieved, and are not limited herein. The above detailed description should not be taken as limiting the scope of the present disclosure. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives are possible, depending on design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present disclosure are intended to be included within the scope of the present disclosure.

Claims

1. A training method of an image detection model, comprising:

obtaining training data, wherein the training data comprises: a sample image and a label of the sample image, the label being used to characterize whether the sample image is a real object image;

acquiring a teacher image detection model set; wherein, the teacher image detection model set comprises: a first teacher image detection model and a plurality of second teacher image detection models; the first teacher image detection model is obtained by training data of positive sample labels in the labels and training data of multiple types of negative sample labels in the labels; the second teacher image detection model is obtained by training the training data of the positive sample label and the training data of one type of negative sample label; the negative-sample tag includes: photo attack tags, video attack tags, mask attack tags;

Taking the first teacher image detection model and a second teacher image detection model which comprises the label in the corresponding training data as the teacher image detection model corresponding to the label; the parameter quantity of the first teacher image detection model is larger than that of the second teacher image detection model; and/or the network layer number of the first teacher image detection model is greater than the network layer number of the second teacher image detection model;

determining a feature vector of the sample image according to the teacher image detection model;

inputting the sample image into an initial student image detection model, obtaining a prediction feature vector and a prediction label of the sample image, and determining a first sub-loss function value according to the prediction feature vector and the feature vector; determining a second sub-loss function value according to the prediction label and the label of the sample image; and carrying out coefficient adjustment on the initial student image detection model according to the first sub-loss function value, the second sub-loss function value, the weight of the first sub-loss function and the weight of the second sub-loss function.

2. The method of claim 1, wherein the determining feature vectors of the sample image from the teacher image detection model comprises:

Inputting the sample image into the teacher image detection model to obtain an output vector of a target network layer in the teacher image detection model; the target network layer is a network layer except a full connection layer in the teacher image detection model;

and taking the output vector as the characteristic vector.

3. The method of claim 1, wherein the number of feature vectors of the sample image is at least one;

and performing coefficient adjustment on an initial student image detection model according to the sample image, the label of the sample image and the feature vector, wherein the coefficient adjustment comprises the following steps:

for each feature vector of the sample image, performing coefficient adjustment on an initial student image detection model according to the sample image, the feature vector and the label of the sample image;

or,

splicing or weighting and summing at least one characteristic vector of the sample image to obtain a processed characteristic vector; and according to the sample image, the label of the sample image and the processed feature vector, performing coefficient adjustment on an initial student image detection model.

4. The method of claim 1, wherein the determining a first sub-loss function value from the predicted feature vector and the feature vector comprises:

Determining a first sub-loss function value according to the predicted feature vector, the feature vector and the first sub-loss function;

the determining a second sub-loss function value according to the prediction label and the label of the sample image includes:

determining a second sub-loss function value according to the prediction label, the label of the sample image and the second sub-loss function;

the performing coefficient adjustment on the initial student image detection model according to the first sub-loss function value, the second sub-loss function value, the weight of the first sub-loss function, and the weight of the second sub-loss function includes:

determining a loss function value according to the first sub-loss function value, the second sub-loss function value, the weight of the first sub-loss function and the weight of the second sub-loss function;

and carrying out coefficient adjustment on the initial student image detection model according to the loss function value.

5. The method of claim 1, wherein the teacher image detection model and the student image detection model are living body detection models;

the positive sample label characterizes the sample image as a real object image;

And the negative sample label characterizes that the sample image is a non-real object image.

6. A training apparatus for an image detection model, comprising:

the system comprises an acquisition module for acquiring training data, wherein the training data comprises: a sample image and a label of the sample image, the label being used to characterize whether the sample image is a real object image;

the determining module is used for acquiring a teacher image detection model set; wherein, the teacher image detection model set comprises: a first teacher image detection model and a plurality of second teacher image detection models; the first teacher image detection model is obtained by training data of positive sample labels in the labels and training data of multiple types of negative sample labels in the labels; the second teacher image detection model is obtained by training the training data of the positive sample label and the training data of one type of negative sample label; the negative-sample tag includes: photo attack tags, video attack tags, mask attack tags; taking the first teacher image detection model and a second teacher image detection model which comprises the label in the corresponding training data as the teacher image detection model corresponding to the label; the parameter quantity of the first teacher image detection model is larger than that of the second teacher image detection model; and/or the network layer number of the first teacher image detection model is greater than the network layer number of the second teacher image detection model;

The determining module is further used for determining the feature vector of the sample image according to the teacher image detection model;

the training module is used for inputting the sample image into an initial student image detection model, obtaining a prediction feature vector and a prediction label of the sample image, and determining a first sub-loss function value according to the prediction feature vector and the feature vector; determining a second sub-loss function value according to the prediction label and the label of the sample image; and carrying out coefficient adjustment on the initial student image detection model according to the first sub-loss function value, the second sub-loss function value, the weight of the first sub-loss function and the weight of the second sub-loss function.

7. The apparatus of claim 6, wherein the means for determining is further for:

and taking the output vector as the characteristic vector.

8. The apparatus of claim 6, wherein the number of feature vectors of the sample image is at least one;

The training module is specifically configured to:

or,

9. The apparatus of claim 6, wherein the training module is further to:

10. The apparatus of claim 6, wherein the teacher image detection model and the student image detection model are in-vivo detection models;

11. An electronic device, comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein,,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-5.

12. A non-transitory computer readable storage medium storing computer instructions for causing the computer to perform the method of any one of claims 1-5.