CN113052144A - Training method, device and equipment of living human face detection model and storage medium - Google Patents

Training method, device and equipment of living human face detection model and storage medium Download PDF

Info

Publication number
CN113052144A
CN113052144A CN202110482189.2A CN202110482189A CN113052144A CN 113052144 A CN113052144 A CN 113052144A CN 202110482189 A CN202110482189 A CN 202110482189A CN 113052144 A CN113052144 A CN 113052144A
Authority
CN
China
Prior art keywords
detection model
face detection
meta
living body
training
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110482189.2A
Other languages
Chinese (zh)
Other versions
CN113052144B (en
Inventor
喻晨曦
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Technology Shenzhen Co Ltd
Original Assignee
Ping An Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Technology Shenzhen Co Ltd filed Critical Ping An Technology Shenzhen Co Ltd
Priority to CN202110482189.2A priority Critical patent/CN113052144B/en
Publication of CN113052144A publication Critical patent/CN113052144A/en
Application granted granted Critical
Publication of CN113052144B publication Critical patent/CN113052144B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/161Detection; Localisation; Normalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/047Probabilistic or stochastic networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/168Feature extraction; Face representation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/172Classification, e.g. identification

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Human Computer Interaction (AREA)
  • Multimedia (AREA)
  • Biomedical Technology (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Software Systems (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Probability & Statistics with Applications (AREA)
  • Image Analysis (AREA)
  • Image Processing (AREA)

Abstract

The invention discloses a training method of a living body face detection model, which is applied to the technical field of artificial intelligence and is used for solving the technical problem that the existing living body face detection model does not have higher prediction accuracy in a real scene. The method provided by the invention comprises the following steps: acquiring face sample image sets belonging to different fields; selecting teacher networks corresponding to the fields one by one, and performing two-class training on the corresponding teacher networks through the face sample image set; freezing the trained teacher network; outputting the prediction probability of the face image as the living body face through each teacher network; training the student network by taking the average value of the output prediction probabilities as a target probability value of the student network; and taking the parameters of the trained feature extractor of the student network as initial values of the parameters of the feature extractor in the living body face detection model, and carrying out meta-learning training on the living body face detection model through the face sample image set until the loss function of the living body face detection model is converged.

Description

Training method, device and equipment of living human face detection model and storage medium
Technical Field
The invention relates to the technical field of artificial intelligence, in particular to a training method, a device, equipment and a storage medium for a human face detection model.
Background
With the updating and upgrading of mobile phones and the changing of shooting processing technology, there are various fraudulent means for a non-face living body image to imitate a face living body image, and currently, the recognition of the face living body image is generally predicted by a trained living body face detection model, and when the non-face living body is predicted and recognized, a user is required to further check whether the face image is a living body face image.
In the training process of the living body face detection model, the fact that the prediction accuracy of the trained living body face detection model is limited by the type of the face image sample participating in the training is found, for example, when the shooting light of the face image sample participating in the training is bright, the prediction accuracy of the trained living body face detection model on whether the face image with dark shooting light is a living body face is reduced.
Because the living body face detection model trained by the conventional means is difficult to resist environmental noise, such as too strong or too weak illumination, different imaging qualities derived from different imaging devices, and the like, the trained living body face detection model lacks robustness for a new attack type, so that the living body face detection model trained by the conventional training method does not have high prediction accuracy in a real scene.
Disclosure of Invention
The embodiment of the invention provides a training method, a training device, equipment and a storage medium of a living body face detection model, and aims to solve the technical problem that the existing living body face detection model is difficult to resist environmental noise and poor in robustness, so that the existing living body face detection model does not have high prediction accuracy in a real scene.
A method for training a human face detection model, the method comprising:
acquiring a face sample image set belonging to different fields, wherein the face sample image in the face sample image set carries living or non-living labels;
selecting teacher networks corresponding to the fields one by one, and performing two-class training on the corresponding teacher networks through the face sample image set;
when the loss function of the teacher network is converged and the prediction accuracy of the teacher network is within a preset range after training, freezing the teacher network;
inputting the face image in the face image data set into each teacher network, and outputting the prediction probability that the face image is a living face through each teacher network;
taking the average value of the prediction probability output by each teacher network as the target probability value of the student network, and training the student network through the face image data set;
when the loss function of the student network is converged, acquiring parameters of a feature extractor of the student network;
taking the parameters of the feature extractor of the student network as initial values of the parameters of the feature extractor in the living body face detection model, and performing meta-learning training on the living body face detection model through the face sample image set;
and when the loss function of the living body face detection model is converged, obtaining the trained living body face detection model.
An apparatus for training a human face detection model, the apparatus comprising:
the system comprises a sample acquisition module, a data acquisition module and a data processing module, wherein the sample acquisition module is used for acquiring a face sample image set belonging to different fields, and the face sample images in the face sample image set carry living or non-living labels;
the first training module is used for selecting teacher networks corresponding to the fields one by one and performing two-class training on the corresponding teacher networks through the face sample image set;
the freezing module is used for freezing the teacher network when the loss function of the teacher network is converged and the prediction accuracy of the teacher network is within a preset range after training;
the probability calculation module is used for inputting the face images in the face image data set into each teacher network and outputting the prediction probability that the face images are the living faces through each teacher network;
the second training module is used for taking the average value of the prediction probabilities output by the teacher networks as the target probability value of the student network and training the student network through the face image data set;
the parameter acquisition module is used for acquiring the parameters of the feature extractor of the student network when the loss function of the student network is converged;
the meta-learning module is used for taking parameters of the feature extractor of the student network as initial values of the parameters of the feature extractor in the living body face detection model and carrying out meta-learning training on the living body face detection model through the face sample image set;
and the convergence module is used for obtaining the trained living body face detection model when the loss function of the living body face detection model converges.
A computer device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, the processor implementing the steps of the above-mentioned training method of a living body face detection model when executing the computer program.
A computer-readable storage medium, which stores a computer program that, when executed by a processor, implements the steps of the above-described training method of a living body face detection model.
The invention provides a training method, a device, equipment and a storage medium of a living body face detection model, which carry out two-class training on teacher networks in corresponding fields through face sample image sets in different fields, freeze the teacher networks when the loss function of the teacher networks is converged and the prediction accuracy of the teacher networks is within a preset range after training, input face images in face image data sets into the teacher networks before training the student networks, output the prediction probabilities that the face images are living body faces through the teacher networks, take the average value of the prediction probabilities output through the teacher networks as the target probability value of the student networks, train the student networks through the face image data sets, and acquire the parameters of a feature extractor of the student networks when the loss function of the student networks is converged, then, the parameters of the feature extractor of the student network are used as the initial values of the parameters of the feature extractor in the living body face detection model, the living body face detection model is subjected to meta-learning training through the face sample image set, and when the loss function of the living body face detection model is converged, the trained living body face detection model can be used for detecting a face image to be detected. When the human face living body detection model trained through the scheme is used for detecting the human face living body images, the human face images shot in different fields can be compatible, the robustness is high, and the human face living body detection model trained through the scheme can have high prediction accuracy regardless of the human face images shot through different mobile phones or the human face images shot with different light intensities.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the description of the embodiments of the present invention will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained according to these drawings without inventive labor.
FIG. 1 is a schematic diagram of an application environment of a training method for a living human face detection model according to an embodiment of the present invention;
FIG. 2 is a flowchart of a training method of a live face detection model according to an embodiment of the present invention;
FIG. 3 is a flowchart of meta-learning training of a live-body face detection model according to an embodiment of the present invention;
FIG. 4 is a schematic structural diagram of a training apparatus for a living human face detection model according to an embodiment of the present invention;
FIG. 5 is a schematic diagram of a computer device according to an embodiment of the invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The training method of the living human face detection model provided by the application can be applied to the application environment shown in fig. 1, wherein the computer equipment is communicated with a server through a network. The computer device may be, but is not limited to, various personal computers, laptops, smartphones, tablets, and portable wearable devices. The server may be implemented as a stand-alone server or as a server cluster consisting of a plurality of servers.
In an embodiment, as shown in fig. 2, a method for training a human face detection model is provided, which is described by taking the computer apparatus in fig. 1 as an example, and includes the following steps S101 to S108.
S101, obtaining face sample image sets belonging to different fields, wherein the face sample images in the face sample image sets carry living or non-living labels.
It is understood that the field of the face sample image set in this step represents face sample images captured by different models of capturing devices, or face sample images captured under different lighting conditions. For example, a face sample image shot by an apple mobile phone can be classified into a first field, a face sample image shot by a Huawei mobile phone can be classified into a second field, when the type of the camera of the face sample image is not clear, the face sample image shot with the illumination intensity within the range of a first threshold value can be classified into a third field, the face sample image shot with the illumination intensity within the range of a second threshold value can be classified into a fourth field, and so on, so that face sample image sets belonging to various fields can be obtained.
In this embodiment, the face sample images belonging to the same class or the same field are classified into one class, and then the face sample image sets of different fields can be obtained.
In one embodiment, the face sample image set includes, but is not limited to, public data sets such as CASIA-MFSD, IDIAP, MSU-MFSD, Ouu-NPU, LCC FASD, NUAA, faceForensics + +, and the like, and also includes data sets that are arranged internally in the company.
And S102, selecting teacher networks corresponding to the fields one by one, and performing two-class training on the corresponding teacher networks through the face sample image set.
In one embodiment, the teacher network may optionally use Resnet 50. The teacher networks corresponding to each of the domains may each employ a Resnet 50 neural network. When the number of the fields of the face sample image set is 10, the number of the teacher networks is also 10, and each teacher network corresponds to one field of the face sample image set.
Resnet is an abbreviation for Residual Network (Residual Network), which is widely used in the field of object classification and the like and as part of the classical neural Network of the computer vision task backbone, where Resnet 50 is a typical Residual neural Network.
S103, when the loss function of the teacher network is converged and the prediction accuracy of the teacher network is within a preset range after training, freezing the teacher network.
In one embodiment, the network Freeze may be implemented by a Freeze command. It can be understood that the purpose of the frozen model is not to train any more, and the current parameters obtained by training are quantified and only used for forward derivation through the teacher network.
In one embodiment, the predetermined range is, for example, 90%, indicating that the teacher network is frozen when the average error rate (including missed versus missed) of the teacher network is within 10%.
And S104, inputting the face images in the face image data set into each teacher network, and outputting the prediction probability that the face images are living faces through each teacher network.
Wherein the image dataset is, for example, an open source ImageNet image dataset, which is a large visual database for visual object recognition software research, more than 1400 million image URLs (Uniform resource locator) are manually annotated by ImageNet to indicate objects in a picture, and a bounding box is also provided in at least one million images.
And S105, taking the average value of the prediction probabilities output by the teacher networks as a target probability value of the student network, and training the student network through the face image data set.
In one embodiment, the student network may be selected from Resnet 34, mobilene v3small or mobilene v2, and the aim is to make it difficult for the antagonistic learner to distinguish the teacher network teacher from the student network student by performing antagonistic learning training on them.
S106, when the loss function of the student network is converged, obtaining the parameters of the feature extractor of the student network.
The target loss of the student network is the loss of KL subvergence and antagonistic learning of probability distribution of student network students and teacher network teacher.
S107, taking the parameters of the feature extractor of the student network as initial values of the parameters of the feature extractor in the living body face detection model, and performing meta-learning training on the living body face detection model through the face sample image set.
Fig. 3 is a flowchart of meta-learning training a living body face detection model according to an embodiment of the present invention, and in one embodiment, as shown in fig. 3, the step of performing meta-learning training on the living body face detection model through the face sample image set includes the following steps S301 to S303:
s301, extracting a sampling set comprising positive samples and negative samples from the face sample image set;
s302, sequentially performing meta-training, meta-testing and meta-optimization on the living human face detection model through the sampling set;
and S303, judging whether the loss function of the living body face detection model in the meta-optimization stage is converged, if so, judging that the living body face detection model is trained, otherwise, circularly extracting a sampling set comprising a positive sample and a negative sample from the face sample image set to the step of judging whether the loss function of the living body face detection model in the meta-optimization stage is converged, until the loss function of the living body face detection model in the meta-optimization stage is converged.
In one embodiment, the sequentially performing meta-training, meta-testing and meta-optimization on the living human face detection model through the sampling set includes:
dividing the face sample image set into a meta-training set and a meta-testing set;
extracting a plurality of sampling sets from the meta-training set, and training the living human face detection model in a meta-training stage through the sampling sets;
when the loss function of the living body face detection model in the meta-training stage is converged, performing test training on the living body face detection model in the meta-testing stage through the meta-testing set;
when the loss function of the living body face detection model in the meta-test stage is converged, carrying out meta-optimization on the living body face detection model according to the training results of the living body face detection model in the meta-training stage and the meta-test stage;
and when the loss function of the living body face detection model in the meta-optimization stage is converged, judging that the loss function of the living body face detection model is converged.
The face sample image in one field in the face sample image set can be used as a meta-test set, and the face sample images in the other fields can be used as a meta-training set.
In one embodiment, the living body face detection model comprises a feature extractor, a depth map estimator and a meta-learning classifier, and the loss function of the living body face detection model in the meta-training stage comprises a first classification loss function, an iteration function of the meta-learning classifier and a first depth map estimation loss function, wherein the first classification loss function is:
LclsF,θM) Σ ylogM (f (x)) + (1-y) log (1-M (f (x))) wherein θFA parameter, θ, of a feature extractor representing the living body face detection modelMParameters representing said meta-learning classifier, x representing in said sample setA face sample image, y represents the value of the true label of the face sample image in the sampling set, F (x) represents the feature extracted from the face sample image by the feature extractor of the face detection model, LclsF,θM) Representing the first classification loss function.
The iterative function of the meta-learning classifier is:
Figure BDA0003049697740000081
wherein, thetaMParameters representing the meta-learning classifier, alpha representing a hyper-parameter,
Figure BDA0003049697740000082
represents the parameter theta at the i-th trainingMGradient of (a), LclsF,θM) An output result representing the first classification loss function.
The common parameters can be obtained through continuous training and learning of the model, but in general, the hyper-parameters are not learned, and the hyper-parameters are framework parameters in the machine learning model, such as the number of classes in a clustering method, or the number of topics in a topic model, and the like, and are called hyper-parameters. They are not the same as the parameters learned during training, and the hyper-parameters are usually set manually and adjusted by trial and error, or a series of exhaustive parameter sets are combined and enumerated, also called grid search.
The first depth map estimation loss function is:
LDtrnF,θD)=∑||D(F(X))-I||2
wherein, thetaFA parameter, θ, of a feature extractor representing the living body face detection modelDParameters representing the depth map estimator, F (x) represents first features extracted by a feature extractor of the face detection model on face sample images in the meta training set, D (F (X)) represents extraction by the depth map estimatorI represents the true depth of the extracted first feature.
In one embodiment, the extracted true depth I of the first feature may be obtained by an open source face 3D reconstruction and dense alignment technique PRNet.
The present embodiment uses an open source algorithmic approach, such as PRnet, to estimate the depth of the feature map. Here, for all labeled datasets, the face live view estimates the depth map using the PRnet and as a real depth map, the face non-live view we set a matrix of all 0's, which is equivalent to an all black depth map.
Further, the loss function of the living body face detection model in the meta-test stage comprises a second classification loss function and a second depth map estimation loss function, wherein the second classification loss function is as follows:
Figure BDA0003049697740000091
wherein N represents the total domain number of the face sample image set, i represents the domain number to which the face sample image currently used for meta-testing belongs, F (x) represents the feature extracted from the face sample image by the feature extractor of the face detection model, M'i(F (x)) represents the output of the meta-learning classifier on the features extracted, Lcls-tstRepresenting the second classification loss function.
Further, the second depth map estimation loss function is:
LDtstF,θD)=∑||D(F(X))-I||2
wherein f (x) represents a second feature extracted by the feature extractor of the face detection model for the face sample images in the meta test set, D (f (x)) represents an output result of the depth map estimator for the second feature extracted, and I represents a true depth of the second feature extracted.
The second depth map estimation loss function has the same expression as the first depth map estimation loss function, but because the sample input to the second depth map estimation loss function is an element test set and the sample input to the first depth map estimation loss function is an element training set, the output of the second depth map estimation loss function is different from the output of the first depth map estimation loss function.
Further, the true depth I of the second feature may be obtained by an open source face 3D reconstruction and dense alignment technique PRNet.
In one embodiment, the loss functions of the living body face detection model in the meta optimization stage comprise a meta learning classifier parameter loss function, a feature extractor parameter loss function and a depth map estimator parameter loss function. Wherein the meta-learning classifier parameter loss function is:
Figure BDA0003049697740000101
wherein beta and gamma represent hyper-parameters, LclsF,θM) An output value, L, representing said first classification loss functioncls-tstAn output result representing the second classification loss function,
Figure BDA0003049697740000104
representing a parameter theta during trainingMOf the gradient of (c).
It can be understood that, when the output value of the parameter loss function of the meta-learning classifier tends to be in a stable state, the parameter loss function of the meta-learning classifier converges to obtain the final parameter θ of the meta-learning classifierM
The feature extractor parameter loss function is:
Figure BDA0003049697740000102
wherein, thetaFParameters of a feature extractor representing a live face detection model, beta and gammaThe super-parameter is set to be,
Figure BDA0003049697740000103
representing a parameter theta during trainingFGradient of (a), LDtstF,θD) An output value, L, representing the estimated loss function of the second depth mapcls-tstAn output value, L, representing said second classification loss functionclsF,θM) An output value, L, representing said first classification loss functionDtrnF,θD) An output value representing the first depth map estimation loss function.
It can be understood that, when the output value of the parameter loss function of the feature extractor tends to a steady state, the parameter loss function of the feature extractor converges to obtain the final parameter θ of the feature extractor of the living human face detection modelF
In one embodiment, the depth map estimator parameter loss function is:
Figure BDA0003049697740000111
wherein, thetaDParameters representing the depth map estimator, beta and gamma represent hyper-parameters,
Figure BDA0003049697740000112
representing a parameter theta during trainingDGradient of (a), LDtstF,θD) An output value, L, representing the estimated loss function of the second depth mapDtrnF,θD) An output value representing the first depth map estimation loss function.
It can be understood that when the output value of the depth map estimator parametric loss function tends to be in a steady state, the convergence of the depth map estimator parametric loss function is shown, and the final parameter θ of the depth map estimator parametric loss function is obtainedD
And S108, when the loss function of the living body face detection model is converged, obtaining the trained living body face detection model.
When the living body face detection is carried out through the trained living body face detection model, the face image to be detected is input into the living body face detection model, and whether the face image to be detected is the prediction result of the living body face can be obtained.
At present, because enough real business data are lacked for model training and testing, the distribution of the real business data is almost unpredictable, the domains of the data sets of the public market and the business are often greatly different, and the distribution of the training and the testing can not be aligned by sampling the data sets in a feasible mode. Therefore, the field intersection of the mapped data sets can be maximized, and the face detection model is trained by combining face sample image sets in dozens of fields with meta-learning, so that the trained face detection model can have higher prediction accuracy under the condition that the training sample data and the image to be actually detected whether the image is a living human face or not are in different fields.
The training method of the living body face detection model provided by the embodiment can judge and feed back the living body confidence probability and the depth map of a client in an average 50 milliseconds according to the front face of an identity card uploaded by the client, and after the test of millions of data by the post model detection misjudgment of online interception, the algorithm can identify a suspected mobile phone frame, an image with unnatural distortion and other samples which are artificially judged to be an obvious non-living body at a probability of 99.3%, can cover more than 87% of service non-living body characteristic scenes, and can well perform 40% of novel attacks after the test.
The training method of the living body face detection model provided in this embodiment performs two-class training on teacher networks in corresponding fields through face sample image sets in different fields, freezes the teacher network when a loss function of the teacher network converges and a prediction accuracy of the teacher network is within a preset range, inputs face images in face image data sets to the teacher networks before training a student network, outputs prediction probabilities of the face images as living body faces through the teacher networks, takes an average value of the prediction probabilities output through the teacher networks as a target probability value of the student network, trains the student network through the face image data sets, acquires parameters of a feature extractor of the student network when the loss function of the student network converges, and then takes the parameters of the feature extractor of the student network as parameters of the feature extractor in the living body face detection model And carrying out meta-learning training on the living body face detection model according to the face sample image set, and detecting a face image to be detected by using the trained living body face detection model when a loss function of the living body face detection model is converged. When the human face living body detection model trained through the scheme is used for detecting the human face living body images, the human face images shot in different fields can be compatible, the robustness is high, and the human face living body detection model trained through the scheme can have high prediction accuracy regardless of the human face images shot through different mobile phones or the human face images shot with different light intensities.
It should be understood that, the sequence numbers of the steps in the foregoing embodiments do not imply an execution sequence, and the execution sequence of each process should be determined by its function and inherent logic, and should not constitute any limitation to the implementation process of the embodiments of the present invention.
In an embodiment, a training device for a living body face detection model is provided, and the training device for the living body face detection model is in one-to-one correspondence with the training method for the living body face detection model in the above embodiment. As shown in fig. 4, the training apparatus 100 for the living body face detection model includes a sample acquisition module 11, a first training module 12, a freezing module 13, a probability calculation module 14, a second training module 15, a parameter acquisition module 16, a meta learning module 17, and a convergence module 18. The functional modules are explained in detail as follows:
the system comprises a sample acquisition module 11, a data processing module and a data processing module, wherein the sample acquisition module 11 is used for acquiring a face sample image set belonging to different fields, and the face sample images in the face sample image set carry living or non-living labels;
the first training module 12 is used for selecting teacher networks corresponding to the fields one by one and performing two-class training on the corresponding teacher networks through the face sample image set;
a freezing module 13, configured to freeze the teacher network when the loss function of the teacher network is converged and the prediction accuracy of the teacher network is within a preset range after training;
a probability calculation module 14, configured to input the face image in the face image data set to each teacher network, and output the prediction probability that the face image is a living face through each teacher network;
a second training module 15, configured to train the student network through the face image data set, using an average value of the prediction probabilities output by the teacher networks as a target probability value of the student network;
a parameter obtaining module 16, configured to obtain a parameter of a feature extractor of the student network when a loss function of the student network converges;
a meta-learning module 17, configured to perform meta-learning training on the living body face detection model through the face sample image set, using parameters of the feature extractor of the student network as initial values of parameters of the feature extractor in the living body face detection model;
and the convergence module 18 is configured to obtain the trained living body face detection model when the loss function of the living body face detection model converges.
In one embodiment, the meta learning module 17 includes:
the sampling unit is used for extracting a sampling set comprising positive samples and negative samples from the face sample image set;
the meta-training unit is used for sequentially carrying out meta-training, meta-testing and meta-optimization on the living human face detection model through the sampling set;
and the circulating unit is used for judging whether the loss function of the living body face detection model in the meta-optimization stage is converged, if so, judging that the living body face detection model is trained, otherwise, circulating the step of extracting a sampling set comprising a positive sample and a negative sample from the face sample image set to judge whether the loss function of the living body face detection model in the meta-optimization stage is converged, until the loss function of the living body face detection model in the meta-optimization stage is converged.
Further, the meta-training unit specifically includes:
the classification unit is used for dividing the face sample image set into a meta-training set and a meta-testing set;
the first training unit is used for extracting a plurality of sampling sets from the meta-training set and training the living human face detection model in a meta-training stage through the sampling sets;
the first testing unit is used for testing and training the living body face detection model in the meta-testing stage through the meta-testing set when the loss function of the living body face detection model in the meta-training stage is converged;
the first optimization unit is used for carrying out meta-optimization on the living body face detection model according to the training results of the living body face detection model in the meta-training stage and the meta-testing stage when the loss function of the living body face detection model in the meta-testing stage is converged;
and the judging unit is used for judging the convergence of the loss function of the living body face detection model when the loss function of the living body face detection model in the meta-optimization stage converges.
In one embodiment, the living body face detection model comprises a feature extractor, a meta-learning classifier and a depth map estimator, and the loss function of the living body face detection model in the meta-training stage comprises a first classification loss function, an iterative function of the meta-learning classifier and a first depth map estimation loss function;
the first classification loss function is:
LclsF,θM)=∑ylogM(F(x))+(1-y)log(1-M(F(x)))
wherein, thetaFA parameter, θ, of a feature extractor representing the living body face detection modelMParameters representing the meta-learning classifier, x represents the face sample image in the sampling set, y represents the value of the true label of the face sample image in the sampling set, and F (x) represents the value of the true label of the face sample image in the sampling setThe feature extractor of the face detection model extracts the features L of the face sample imageclsF,θM) Representing the first classification loss function;
the iterative function of the meta-learning classifier is:
Figure BDA0003049697740000141
wherein, thetaMParameters representing the meta-learning classifier, alpha representing a hyper-parameter,
Figure BDA0003049697740000142
represents the parameter theta at the i-th trainingMGradient of (a), LclsF,θM) An output result representing the first classification loss function;
the first depth map estimation loss function is:
LDtrnF,θD)=∑||D(F(X))-I||2wherein, thetaFA parameter, θ, of a feature extractor representing the living body face detection modelDParameters representing the depth map estimator, f (x) represents a first feature extracted by a feature extractor of the face detection model on a face sample image in the meta training set, D (f (x)) represents an output result of the depth map estimator on the extracted first feature, and I represents a true depth of the extracted first feature.
Further, the loss function of the living body face detection model in the meta-test stage comprises a second classification loss function and a second depth map estimation loss function;
the second classification loss function is:
Figure BDA0003049697740000151
wherein N represents the total area number of the face sample image set, and i represents the face sample currently used for meta-testThe number of domains to which the image belongs, F (x) represents the features extracted from the face sample image by the feature extractor of the face detection model, M'i(F (x)) represents the output of the meta-learning classifier on the features extracted, Lcls-tstRepresenting the second classification loss function;
the second depth map estimation loss function is:
LDtstF,θD)=∑||D(F(X))-I||2
wherein f (x) represents a second feature extracted by the feature extractor of the face detection model for the face sample images in the meta test set, D (f (x)) represents an output result of the depth map estimator for the second feature extracted, and I represents a true depth of the second feature extracted.
Further, the loss function of the living body face detection model in the meta-optimization stage comprises a meta-learning classifier parameter loss function, a feature extractor parameter loss function and a depth map estimator parameter loss function;
the meta-learning classifier parameter loss function is:
Figure BDA0003049697740000152
wherein beta and gamma represent hyper-parameters, LclsF,θM) An output value, L, representing said first classification loss functioncls-tstAn output result representing the second classification loss function,
Figure BDA0003049697740000153
representing a parameter theta during trainingMA gradient of (a);
the feature extractor parameter loss function is:
Figure BDA0003049697740000161
wherein the content of the first and second substances,θFparameters of a feature extractor representing a live face detection model,
Figure BDA0003049697740000163
representing a parameter theta during trainingFGradient of (a), LDtstF,θD) An output value, L, representing the estimated loss function of the second depth mapcls-tstAn output value, L, representing said second classification loss functionclsF,θM) An output value, L, representing said first classification loss functionDtrnF,θD) An output value representing the first depth map estimation loss function;
the depth map estimator parameter loss function is:
Figure BDA0003049697740000162
wherein, thetaDParameters representing the depth map estimator, beta and gamma represent hyper-parameters,
Figure BDA0003049697740000164
representing a parameter theta during trainingDGradient of (a), LDtstF,θD) An output value, L, representing the estimated loss function of the second depth mapDtrnF,θD) An output value representing the first depth map estimation loss function.
When the face living body detection model obtained by the training device of the living body face detection model provided by the embodiment is used for detecting the face living body image, the face living body detection model can be compatible with the face images shot in different fields, has stronger robustness, and can have higher prediction accuracy regardless of the face images shot by different mobile phones or the face images shot by different light intensities.
Wherein the meaning of "first" and "second" in the above modules/units is only to distinguish different modules/units, and is not used to define which module/unit has higher priority or other defining meaning. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or modules is not necessarily limited to those steps or modules explicitly listed, but may include other steps or modules not explicitly listed or inherent to such process, method, article, or apparatus, and such that a division of modules presented in this application is merely a logical division and may be implemented in a practical application in a further manner.
For specific limitations of the training apparatus for the living body face detection model, reference may be made to the above limitations of the training method for the living body face detection model, and details are not repeated here. All or part of the modules in the training device of the living body face detection model can be realized by software, hardware and a combination thereof. The modules can be embedded in a hardware form or independent from a processor in the computer device, and can also be stored in a memory in the computer device in a software form, so that the processor can call and execute operations corresponding to the modules.
In one embodiment, a computer device is provided, which may be a terminal, and its internal structure diagram may be as shown in fig. 5. The computer device includes a processor, a memory, a network interface, a display screen, and an input device connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The network interface of the computer device is used for communicating with an external server through a network connection. The computer program is executed by a processor to implement a method of training a human face detection model.
In one embodiment, a computer device is provided, which includes a memory, a processor and a computer program stored on the memory and executable on the processor, and the processor executes the computer program to implement the steps of the training method for a living human face detection model in the above-mentioned embodiments, such as the steps 101 to 108 shown in fig. 2 and other extensions of the method and related steps. Alternatively, the processor, when executing the computer program, implements the functions of the modules/units of the training apparatus for a living body face detection model in the above-described embodiments, such as the functions of the modules 11 to 18 shown in fig. 4. To avoid repetition, further description is omitted here.
The Processor may be a Central Processing Unit (CPU), other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic device, discrete hardware component, etc. The general purpose processor may be a microprocessor or the processor may be any conventional processor or the like which is the control center for the computer device and which connects the various parts of the overall computer device using various interfaces and lines.
The memory may be used to store the computer programs and/or modules, and the processor may implement various functions of the computer device by running or executing the computer programs and/or modules stored in the memory and invoking data stored in the memory. The memory may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required by at least one function (such as a sound playing function, an image playing function, etc.), and the like; the storage data area may store data (such as audio data, video data, etc.) created according to the use of the cellular phone, etc.
The memory may be integrated in the processor or may be provided separately from the processor.
In one embodiment, a computer-readable storage medium is provided, on which a computer program is stored, which, when being executed by a processor, implements the steps of the training method of the living body face detection model in the above-described embodiments, such as the steps 101 to 108 shown in fig. 2 and extensions of other extensions and related steps of the method. Alternatively, the computer program, when executed by the processor, implements the functions of the modules/units of the training apparatus for a living body face detection model in the above-described embodiments, such as the functions of the modules 11 to 18 shown in fig. 4. To avoid repetition, further description is omitted here.
In the training method, the device, the equipment and the storage medium of the living body face detection model provided by the embodiment, two-class training is performed on teacher networks in corresponding fields through face sample image sets in different fields, when the loss function of the teacher network is converged and the prediction accuracy of the teacher network is within a preset range after training, the teacher network is frozen, before training a student network, the face images in the face image data sets are input into the teacher networks, the face images are output as the prediction probabilities of living body faces through the teacher networks, the average value of the prediction probabilities output through the teacher networks is used as the target probability value of the student network, the student network is trained through the face image data sets, when the loss function of the student network is converged, the parameters of a feature extractor of the student network are obtained, then, the parameters of the feature extractor of the student network are used as the initial values of the parameters of the feature extractor in the living body face detection model, the living body face detection model is subjected to meta-learning training through the face sample image set, and when the loss function of the living body face detection model is converged, the trained living body face detection model can be used for detecting a face image to be detected. When the human face living body detection model trained through the scheme is used for detecting the human face living body images, the human face images shot in different fields can be compatible, the robustness is high, and the human face living body detection model trained through the scheme can have high prediction accuracy regardless of the human face images shot through different mobile phones or the human face images shot with different light intensities.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware instructions of a computer program, which can be stored in a non-volatile computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. Any reference to memory, storage, database, or other medium used in the embodiments provided herein may include non-volatile and/or volatile memory, among others. Non-volatile memory can include read-only memory (ROM), Programmable ROM (PROM), Electrically Programmable ROM (EPROM), Electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), Double Data Rate SDRAM (DDRSDRAM), Enhanced SDRAM (ESDRAM), Synchronous Link DRAM (SLDRAM), Rambus Direct RAM (RDRAM), direct bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM).
It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-mentioned division of the functional units and modules is illustrated, and in practical applications, the above-mentioned function distribution may be performed by different functional units and modules according to needs, that is, the internal structure of the apparatus is divided into different functional units or modules to perform all or part of the above-mentioned functions.
The above-mentioned embodiments are only used for illustrating the technical solutions of the present invention, and not for limiting the same; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; such modifications and substitutions do not substantially depart from the spirit and scope of the embodiments of the present invention, and are intended to be included within the scope of the present invention.

Claims (10)

1. A method for training a human face detection model, the method comprising:
acquiring a face sample image set belonging to different fields, wherein the face sample image in the face sample image set carries living or non-living marks;
selecting teacher networks corresponding to the fields one by one, and performing two-class training on the corresponding teacher networks through the face sample image set;
when the loss function of the teacher network is converged and the prediction accuracy of the teacher network is within a preset range after training, freezing the teacher network;
inputting the face images in the face image data set into each teacher network, and outputting the prediction probability that the face images are living faces through each teacher network;
taking the average value of the prediction probabilities output by the teacher networks as a target probability value of a student network, and training the student network through the face image data set;
when the loss function of the student network is converged, acquiring parameters of a feature extractor of the student network;
taking parameters of a feature extractor of the student network as initial values of the parameters of the feature extractor in the living body face detection model, and performing meta-learning training on the living body face detection model through the face sample image set;
and when the loss function of the living body face detection model is converged, obtaining the trained living body face detection model.
2. The method for training a living body face detection model according to claim 1, wherein the step of performing meta-learning training on the living body face detection model through the face sample image set comprises:
extracting a sampling set comprising positive samples and negative samples from the face sample image set;
sequentially carrying out element training, element testing and element optimization on the living human face detection model through the sampling set;
judging whether the loss function of the living body face detection model in the meta-optimization stage is convergent or not, if so, judging that the living body face detection model is trained, otherwise, circularly extracting a sampling set comprising a positive sample and a negative sample from the face sample image set to the step of judging whether the loss function of the living body face detection model in the meta-optimization stage is convergent or not until the loss function of the living body face detection model in the meta-optimization stage is convergent.
3. The training method of the living body face detection model according to claim 2, wherein the step of performing meta-training, meta-testing and meta-optimization on the living body face detection model sequentially through the sampling set comprises:
dividing the face sample image set into a meta-training set and a meta-testing set;
extracting a plurality of sampling sets from the meta-training set, and training the living human face detection model in a meta-training stage through the sampling sets;
when the loss function of the living body face detection model in the meta-training stage is converged, performing test training on the living body face detection model in the meta-testing stage through the meta-testing set;
when the loss function of the living body face detection model in the meta-test stage is converged, carrying out meta-optimization on the living body face detection model according to the training results of the living body face detection model in the meta-training stage and the meta-test stage;
and when the loss function of the living body face detection model in the meta-optimization stage is converged, judging that the loss function of the living body face detection model is converged.
4. The training method of the living body face detection model according to claim 3, wherein the living body face detection model comprises a feature extractor, a meta learning classifier and a depth map estimator, and the loss function of the living body face detection model in the meta training stage comprises a first classification loss function, an iterative function of the meta learning classifier and a first depth map estimation loss function;
the first classification loss function is:
LclsF,θM)=∑ylogM(F(x))+(1-y)log(1-M(F(x)))
wherein, thetaFA parameter, θ, of a feature extractor representing the living body face detection modelMParameters representing the meta-learning classifier, x represents a face sample image in the sample set, y represents a value of a true label of the face sample image in the sample set, F (x) represents a feature extracted from the face sample image by a feature extractor of the face detection model, LclsF,θM) Representing the first classification loss function;
the iterative function of the meta-learning classifier is:
Figure FDA0003049697730000031
wherein, thetaMParameters representing the meta-learning classifier, alpha representing a hyper-parameter,
Figure FDA0003049697730000032
represents the parameter theta at the i-th trainingMGradient of (a), LclsF,θM) An output result representing the first classification loss function;
the first depth map estimation loss function is:
LDtrnF,θD)=∑||D(F(X))-I||2
wherein, thetaFA parameter, θ, of a feature extractor representing the living body face detection modelDParameters representing the depth map estimator, F (x) represents a first feature extracted by a feature extractor of the face detection model on a face sample image in the meta training set, D (F (X)) represents an output result of the depth map estimator on the extracted first feature, I represents extraction to a depth mapOf the first feature.
5. The training method of the living body face detection model according to claim 4, wherein the loss function of the living body face detection model in the meta-test stage comprises a second classification loss function and a second depth map estimation loss function;
the second classification loss function is:
Figure FDA0003049697730000033
wherein N represents the total domain number of the face sample image set, i represents the domain number to which the face sample image currently used for meta-testing belongs, F (x) represents the features extracted from the face sample image by the feature extractor of the face detection model,
Figure FDA0003049697730000034
representing the output result of the meta-learning classifier on the extracted features, Lcls-tstRepresenting the second classification loss function;
the second depth map estimation loss function is:
LDtstF,θD)=∑||D(F(X))-I||2
wherein f (x) represents a second feature extracted by the feature extractor of the face detection model for the face sample images in the meta test set, D (f (x)) represents an output result of the depth map estimator for the second feature extracted, and I represents a true depth of the second feature extracted.
6. The training method of the living body face detection model according to claim 5, wherein the loss functions of the living body face detection model in the meta-optimization stage comprise a meta-learning classifier parameter loss function, a feature extractor parameter loss function and a depth map estimator parameter loss function;
the meta-learning classifier parameter loss function is:
Figure FDA0003049697730000041
wherein beta and gamma represent hyper-parameters, LclsF,θM) An output value, L, representing said first classification loss functioncls-tstAn output result representing the second classification loss function,
Figure FDA0003049697730000042
representing a parameter theta during trainingMA gradient of (a);
the feature extractor parameter loss function is:
Figure FDA0003049697730000043
wherein, thetaFParameters of a feature extractor representing a live face detection model,
Figure FDA0003049697730000044
representing a parameter theta during trainingFGradient of (a), LDtstF,θD) An output value, L, representing the estimated loss function of the second depth mapcls-tstAn output value, L, representing said second classification loss functionclsF,θM) An output value, L, representing said first classification loss functionDtrnF,θD) An output value representing the first depth map estimation loss function;
the depth map estimator parameter loss function is:
Figure FDA0003049697730000045
wherein, thetaDParameters representing the depth map estimator, beta and gamma represent hyper-parameters,
Figure FDA0003049697730000046
representing a parameter theta during trainingDGradient of (a), LDtstF,θD) An output value, L, representing the estimated loss function of the second depth mapDtrnF,θD) An output value representing the first depth map estimation loss function.
7. An apparatus for training a human face detection model, the apparatus comprising:
the system comprises a sample acquisition module, a data acquisition module and a data processing module, wherein the sample acquisition module is used for acquiring a face sample image set belonging to different fields, and the face sample images in the face sample image set carry living or non-living labels;
the first training module is used for selecting teacher networks corresponding to the fields one by one and performing two-class training on the corresponding teacher networks through the face sample image set;
the freezing module is used for freezing the teacher network when the loss function of the teacher network is converged and the prediction accuracy of the teacher network is within a preset range after training;
the probability calculation module is used for inputting the face images in the face image data set into each teacher network and outputting the prediction probability that the face images are living faces through each teacher network;
the second training module is used for training the student networks through the face image data sets by taking the average value of the prediction probabilities output by the teacher networks as the target probability value of the student networks;
a parameter obtaining module, configured to obtain a parameter of a feature extractor of the student network when a loss function of the student network converges;
the meta-learning module is used for taking parameters of a feature extractor of the student network as initial values of the parameters of the feature extractor in the living body face detection model and carrying out meta-learning training on the living body face detection model through the face sample image set;
and the convergence module is used for obtaining the trained living body face detection model when the loss function of the living body face detection model converges.
8. The apparatus for training a living human face detection model according to claim 7, wherein the meta-learning module comprises:
the sampling unit is used for extracting a sampling set comprising positive samples and negative samples from the face sample image set;
the meta-training unit is used for sequentially carrying out meta-training, meta-testing and meta-optimization on the living human face detection model through the sampling set;
and the circulating unit is used for judging whether the loss function of the living body face detection model in the meta-optimization stage is converged, if so, judging that the living body face detection model is trained, otherwise, circulating the step of extracting a sampling set comprising a positive sample and a negative sample from the face sample image set to judge whether the loss function of the living body face detection model in the meta-optimization stage is converged, and till the loss function of the living body face detection model in the meta-optimization stage is converged.
9. A computer device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the processor implements the steps of the training method of the living body face detection model according to any one of claims 1 to 6 when executing the computer program.
10. A computer-readable storage medium, in which a computer program is stored, which, when being executed by a processor, carries out the steps of the method for training a living body face detection model according to any one of claims 1 to 6.
CN202110482189.2A 2021-04-30 2021-04-30 Training method, device and equipment of living human face detection model and storage medium Active CN113052144B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110482189.2A CN113052144B (en) 2021-04-30 2021-04-30 Training method, device and equipment of living human face detection model and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110482189.2A CN113052144B (en) 2021-04-30 2021-04-30 Training method, device and equipment of living human face detection model and storage medium

Publications (2)

Publication Number Publication Date
CN113052144A true CN113052144A (en) 2021-06-29
CN113052144B CN113052144B (en) 2023-02-28

Family

ID=76518159

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110482189.2A Active CN113052144B (en) 2021-04-30 2021-04-30 Training method, device and equipment of living human face detection model and storage medium

Country Status (1)

Country Link
CN (1) CN113052144B (en)

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113591736A (en) * 2021-08-03 2021-11-02 北京百度网讯科技有限公司 Feature extraction network, training method of living body detection model and living body detection method
CN113705362A (en) * 2021-08-03 2021-11-26 北京百度网讯科技有限公司 Training method and device of image detection model, electronic equipment and storage medium
CN113705425A (en) * 2021-08-25 2021-11-26 北京百度网讯科技有限公司 Training method of living body detection model, and method, device and equipment for living body detection
CN113743220A (en) * 2021-08-04 2021-12-03 深圳商周智联科技有限公司 Biological characteristic in-vivo detection method and device and computer equipment
CN114495291A (en) * 2022-04-01 2022-05-13 杭州魔点科技有限公司 Method, system, electronic device and storage medium for in vivo detection
CN114663941A (en) * 2022-03-17 2022-06-24 深圳数联天下智能科技有限公司 Feature detection method, model merging method, device, and medium
CN114743243A (en) * 2022-04-06 2022-07-12 平安科技(深圳)有限公司 Human face recognition method, device, equipment and storage medium based on artificial intelligence
CN116993835A (en) * 2023-07-31 2023-11-03 江阴极动智能科技有限公司 Camera calibration method, camera calibration device, electronic equipment and storage medium
CN117576791A (en) * 2024-01-17 2024-02-20 杭州魔点科技有限公司 Living body detection method based on living clues and large model paradigm in vertical field

Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2017215240A1 (en) * 2016-06-14 2017-12-21 广州视源电子科技股份有限公司 Neural network-based method and device for face feature extraction and modeling, and face recognition
CN110674880A (en) * 2019-09-27 2020-01-10 北京迈格威科技有限公司 Network training method, device, medium and electronic equipment for knowledge distillation
CN110674714A (en) * 2019-09-13 2020-01-10 东南大学 Human face and human face key point joint detection method based on transfer learning
CN110909815A (en) * 2019-11-29 2020-03-24 深圳市商汤科技有限公司 Neural network training method, neural network training device, neural network processing device, neural network training device, image processing device and electronic equipment
CN111639710A (en) * 2020-05-29 2020-09-08 北京百度网讯科技有限公司 Image recognition model training method, device, equipment and storage medium
CN111709409A (en) * 2020-08-20 2020-09-25 腾讯科技(深圳)有限公司 Face living body detection method, device, equipment and medium
CN112052808A (en) * 2020-09-10 2020-12-08 河南威虎智能科技有限公司 Human face living body detection method, device and equipment for refining depth map and storage medium
CN112115783A (en) * 2020-08-12 2020-12-22 中国科学院大学 Human face characteristic point detection method, device and equipment based on deep knowledge migration
US20210049370A1 (en) * 2018-08-01 2021-02-18 Advanced New Technologies Co., Ltd. Abnormality detection method, apparatus, and device for unmanned checkout
CN112418013A (en) * 2020-11-09 2021-02-26 贵州大学 Complex working condition bearing fault diagnosis method based on meta-learning under small sample
CN112541458A (en) * 2020-12-21 2021-03-23 中国科学院自动化研究所 Domain-adaptive face recognition method, system and device based on meta-learning
CN112597885A (en) * 2020-12-22 2021-04-02 北京华捷艾米科技有限公司 Face living body detection method and device, electronic equipment and computer storage medium

Patent Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2017215240A1 (en) * 2016-06-14 2017-12-21 广州视源电子科技股份有限公司 Neural network-based method and device for face feature extraction and modeling, and face recognition
US20210049370A1 (en) * 2018-08-01 2021-02-18 Advanced New Technologies Co., Ltd. Abnormality detection method, apparatus, and device for unmanned checkout
CN110674714A (en) * 2019-09-13 2020-01-10 东南大学 Human face and human face key point joint detection method based on transfer learning
CN110674880A (en) * 2019-09-27 2020-01-10 北京迈格威科技有限公司 Network training method, device, medium and electronic equipment for knowledge distillation
CN110909815A (en) * 2019-11-29 2020-03-24 深圳市商汤科技有限公司 Neural network training method, neural network training device, neural network processing device, neural network training device, image processing device and electronic equipment
CN111639710A (en) * 2020-05-29 2020-09-08 北京百度网讯科技有限公司 Image recognition model training method, device, equipment and storage medium
CN112115783A (en) * 2020-08-12 2020-12-22 中国科学院大学 Human face characteristic point detection method, device and equipment based on deep knowledge migration
CN111709409A (en) * 2020-08-20 2020-09-25 腾讯科技(深圳)有限公司 Face living body detection method, device, equipment and medium
CN112052808A (en) * 2020-09-10 2020-12-08 河南威虎智能科技有限公司 Human face living body detection method, device and equipment for refining depth map and storage medium
CN112418013A (en) * 2020-11-09 2021-02-26 贵州大学 Complex working condition bearing fault diagnosis method based on meta-learning under small sample
CN112541458A (en) * 2020-12-21 2021-03-23 中国科学院自动化研究所 Domain-adaptive face recognition method, system and device based on meta-learning
CN112597885A (en) * 2020-12-22 2021-04-02 北京华捷艾米科技有限公司 Face living body detection method and device, electronic equipment and computer storage medium

Cited By (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113705362B (en) * 2021-08-03 2023-10-20 北京百度网讯科技有限公司 Training method and device of image detection model, electronic equipment and storage medium
CN113705362A (en) * 2021-08-03 2021-11-26 北京百度网讯科技有限公司 Training method and device of image detection model, electronic equipment and storage medium
CN113591736A (en) * 2021-08-03 2021-11-02 北京百度网讯科技有限公司 Feature extraction network, training method of living body detection model and living body detection method
CN113743220A (en) * 2021-08-04 2021-12-03 深圳商周智联科技有限公司 Biological characteristic in-vivo detection method and device and computer equipment
CN113743220B (en) * 2021-08-04 2024-06-04 深圳商周智联科技有限公司 Biological feature living body detection method and device and computer equipment
CN113705425A (en) * 2021-08-25 2021-11-26 北京百度网讯科技有限公司 Training method of living body detection model, and method, device and equipment for living body detection
CN114663941A (en) * 2022-03-17 2022-06-24 深圳数联天下智能科技有限公司 Feature detection method, model merging method, device, and medium
CN114495291B (en) * 2022-04-01 2022-07-12 杭州魔点科技有限公司 Method, system, electronic device and storage medium for in vivo detection
CN114495291A (en) * 2022-04-01 2022-05-13 杭州魔点科技有限公司 Method, system, electronic device and storage medium for in vivo detection
CN114743243A (en) * 2022-04-06 2022-07-12 平安科技(深圳)有限公司 Human face recognition method, device, equipment and storage medium based on artificial intelligence
CN114743243B (en) * 2022-04-06 2024-05-31 平安科技(深圳)有限公司 Human face recognition method, device, equipment and storage medium based on artificial intelligence
CN116993835A (en) * 2023-07-31 2023-11-03 江阴极动智能科技有限公司 Camera calibration method, camera calibration device, electronic equipment and storage medium
CN117576791A (en) * 2024-01-17 2024-02-20 杭州魔点科技有限公司 Living body detection method based on living clues and large model paradigm in vertical field
CN117576791B (en) * 2024-01-17 2024-04-30 杭州魔点科技有限公司 Living body detection method based on living clues and large model paradigm in vertical field

Also Published As

Publication number Publication date
CN113052144B (en) 2023-02-28

Similar Documents

Publication Publication Date Title
CN113052144B (en) Training method, device and equipment of living human face detection model and storage medium
WO2021159774A1 (en) Object detection model training method and apparatus, object detection method and apparatus, computer device, and storage medium
US8750573B2 (en) Hand gesture detection
WO2021179471A1 (en) Face blur detection method and apparatus, computer device and storage medium
CN111401281A (en) Unsupervised pedestrian re-identification method and system based on deep clustering and sample learning
CN111598182B (en) Method, device, equipment and medium for training neural network and image recognition
WO2021114612A1 (en) Target re-identification method and apparatus, computer device, and storage medium
US20120027252A1 (en) Hand gesture detection
CN112446302B (en) Human body posture detection method, system, electronic equipment and storage medium
CN110807437B (en) Video granularity characteristic determination method and device and computer-readable storage medium
CN110188829B (en) Neural network training method, target recognition method and related products
CN111476268A (en) Method, device, equipment and medium for training reproduction recognition model and image recognition
KR20220107120A (en) Method and apparatus of training anti-spoofing model, method and apparatus of performing anti-spoofing using anti-spoofing model, electronic device, storage medium, and computer program
CN109472193A (en) Method for detecting human face and device
CN107945210B (en) Target tracking method based on deep learning and environment self-adaption
CN113269149B (en) Method and device for detecting living body face image, computer equipment and storage medium
CN114241505B (en) Method and device for extracting chemical structure image, storage medium and electronic equipment
CN113283388B (en) Training method, device, equipment and storage medium of living body face detection model
CN112818888A (en) Video auditing model training method, video auditing method and related device
Ashwinkumar et al. Deep learning based approach for facilitating online proctoring using transfer learning
CN111914068B (en) Method for extracting test question knowledge points
CN112766351A (en) Image quality evaluation method, system, computer equipment and storage medium
CN116484224A (en) Training method, device, medium and equipment for multi-mode pre-training model
CN110704678A (en) Evaluation sorting method, evaluation sorting system, computer device and storage medium
CN112329634B (en) Classroom behavior identification method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant