CN116091875B

CN116091875B - Model training method, living body detection method, electronic device, and storage medium

Info

Publication number: CN116091875B
Application number: CN202310375684.2A
Authority: CN
Inventors: 刘冲冲; 付贤强; 何武; 朱海涛; 户磊
Original assignee: Hefei Dilusense Technology Co Ltd
Current assignee: Hefei Dilusense Technology Co Ltd
Priority date: 2023-04-11
Filing date: 2023-04-11
Publication date: 2023-08-29
Anticipated expiration: 2043-04-11
Also published as: CN116091875A

Abstract

The embodiment of the application relates to the field of image recognition, and discloses a model training method, a living body detection method, electronic equipment and a storage medium. The model training method comprises the following steps: performing feature extraction on the face image through a plurality of branch networks in the feature extraction network to obtain a plurality of face features; determining a plurality of first prediction probabilities based on the plurality of face features, and obtaining a second prediction probability based on the plurality of first prediction probabilities; performing iterative training on the feature extraction network; each branch network has one network layer as a specific network layer, and each specific network layer is configured in each iterative training process: the input features of each specific network layer comprise output features of a previous network layer of the branched network, or further comprise features after the output features of the previous network layer of the branched network of at least one specific network layer in all the specific network layers are fused. The training method of the application improves the accuracy, stability and efficiency of feature extraction of each branch network.

Description

Model training method, living body detection method, electronic device, and storage medium

Technical Field

The embodiment of the application relates to the technical field of image recognition, in particular to a model training method, a living body detection method, electronic equipment and a storage medium.

Background

The face image recognition technology is a very popular AI technology in recent years, and is widely used in production and living in various fields. Products employing facial image recognition techniques also typically require the use of live detection techniques to deny authorization for malicious attacks using props such as photographs, videos, masks, dummy models, head covers, etc.

The most widely used image living body detection technology at present takes a human face image as the input of a living body detection model to give a result of whether living body exists or not. However, the props available for malicious attacks are becoming ever-changing, and the performance of general biopsy techniques in dealing with entirely new attack types is far below expected. To address this problem, some biopsy methods design multiple models or multiple branches, each model or branch being responsible for handling a different attack type. However, the method needs to divide the attack types manually, relies on priori knowledge of human experts, has complex training process and long time consumption, has a certain subjective bias in dividing the attack types manually, and has larger possibility of misjudgment and potential safety hazard when processing the attack types beyond the priori knowledge.

Disclosure of Invention

The application aims to provide a model training method, a living body detection method, electronic equipment and a storage medium, which enable a feature extraction network to automatically determine the type of a prosthesis responsible for processing of each branch network in the training process by configuring a specific network layer in a plurality of branch networks, fully dig the potential of each branch network and greatly improve the accuracy, stability and efficiency of feature extraction of each branch network.

In order to solve the above technical problems, an embodiment of the present application provides a model training method, including: extracting features of the face image through a plurality of branch networks included in the feature extraction network to obtain a plurality of face features; determining a plurality of first prediction probabilities that the face image belongs to a living body based on the plurality of face features, and obtaining a second prediction probability that the face image belongs to the living body based on the plurality of first prediction probabilities; performing iterative training on the feature extraction network; wherein, each branch network has one network layer as a specific network layer, each specific network layer has the same layer position in the affiliated branch network and is configured in each iterative training process: the input features of each specific network layer comprise output features of a previous network layer of the branched network, or further comprise features after the output features of the previous network layer of the branched network of at least one specific network layer in all the specific network layers are fused.

The embodiment of the application also provides a living body detection method, which comprises the following steps: inputting the face image to be detected into a trained feature extraction network to obtain a plurality of face features; obtaining a plurality of first prediction probabilities that the face image belongs to a living body according to the plurality of face features, and obtaining a second prediction probability that the face image belongs to the living body based on the plurality of first prediction probabilities; when the second prediction probability is greater than or equal to a preset living body threshold value, determining that the face image to be detected is a living body; when the prediction probability is smaller than a preset living body threshold value, determining the face image to be detected as a prosthesis; the trained feature extraction network is obtained through the model training method in the embodiment.

The embodiment of the application also provides electronic equipment, which comprises: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the model training method as mentioned in the above embodiments or to perform the living detection method as mentioned in the above embodiments.

The embodiment of the present application also provides a computer-readable storage medium storing a computer program which, when executed by a processor, implements the model training method mentioned in the above embodiment, or is capable of executing the living body detection method mentioned in the above embodiment.

The feature extraction network trained by the model training method provided by the application comprises a plurality of branch networks, wherein in each iterative training process, a specific network layer exists in each branch network, the layers of each specific network layer in the branch network are the same, the input features of the specific network layer comprise the output features of the previous network layer of the branch network, or the input features of the previous network layer of the branch network of at least one specific network layer in all the specific network layers are integrated. Therefore, after repeated iterative training, through the specific network layer in each branch network, the output characteristics of the middle network layer of other branch networks can be correlated to perform joint training on other branch networks while training each branch network according to the final output characteristics of each branch network, the design of the auxiliary training of each branch network reduces redundant training, so that the characteristic extraction network automatically determines the type of prosthesis which is responsible for processing each branch network in the training process, and the problem of false identification caused by the fact that the prior art depends on human subjective division of the type of prosthesis attack is solved. Meanwhile, by the configuration of a specific network layer, each branch network has the capability of distinguishing the living body from the prosthesis according to the output of the branch network, has the capability of distinguishing the living body from the prosthesis according to the output of part or all of the branch networks, fully digs the potential of each branch network, and greatly improves the accuracy, stability and efficiency of the feature extraction of each branch network.

Drawings

One or more embodiments are illustrated by way of example and not limitation in the figures of the accompanying drawings, in which like references indicate similar elements, and in which the figures of the drawings are not to be taken in a limiting sense, unless otherwise indicated.

FIG. 1 is a flow chart of a model training method provided by an embodiment of the present application;

FIG. 2 is a schematic diagram of a feature extraction network according to an embodiment of the present application;

FIG. 3 is a flow chart of a method of in-vivo detection provided by an embodiment of the present application;

fig. 4 is a schematic structural view of an electronic device according to an embodiment of the present application.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the embodiments of the present application more apparent, the following detailed description of the embodiments of the present application will be given with reference to the accompanying drawings. However, those of ordinary skill in the art will understand that in various embodiments of the present application, numerous technical details have been set forth in order to provide a better understanding of the present application. However, the claimed application may be practiced without these specific details and with various changes and modifications based on the following embodiments.

The implementation details of the model training method of the present embodiment are exemplified below. The following is merely an implementation detail provided for ease of understanding and is not necessary to practice the present embodiments.

The embodiment of the application relates to a model training method, as shown in fig. 1, comprising the following steps:

and step 101, extracting the characteristics of the face image through a plurality of branch networks included in the characteristic extraction network to obtain a plurality of face characteristics. Wherein, each branch network has one network layer as a specific network layer, each specific network layer has the same layer position in the affiliated branch network and is configured in each iterative training process: the input features of each specific network layer comprise output features of a previous network layer of the branched network, or further comprise features after the output features of the previous network layer of the branched network of at least one specific network layer in all the specific network layers are fused.

Specifically, in this embodiment, face images in a sample set are input into a feature extraction network, so as to obtain a plurality of face features. The face images in the sample set comprise living body face images and prosthesis face images, and each face image corresponds to a label and is used for marking that the face image belongs to a living body or a prosthesis. The living body face images can be face images of different shooting angles, different wearing accessories and different age stages of the same person, or face images of different shooting angles, different wearing accessories and different age stages of the same person. The prosthetic face image may contain a variety of prosthetic types, such as: taking a photograph to obtain a prosthetic face image, taking a dummy model to obtain a prosthetic face image, taking a real person of a headset to obtain a prosthetic face image, and the like.

In this embodiment, the plurality of branch networks may be branch networks having the same network structure, or may be branch networks having different network structures. It will be appreciated that the specific values of the learnable parameters of the multiple branch networks after the final training is completed may be different, although the network structures of the multiple branch networks are identical. The plurality of branch networks having different network structures may be different in network type, such as: convolutional neural networks, residual neural networks, SVM (support vector machine) networks, and the like; the network types may be the same but the specific network structures are different, for example: the system comprises a neural network comprising different numbers of pooling layers, different numbers of splicing layers, different numbers of full-connection layers and the like, and the neural network internally provided with different connection relations; it is also possible that network configuration parameters are different, such as: neural networks with different convolution kernel sizes, neural networks with different learning rates, neural networks with different back propagation weight attenuation values, and the like.

Wherein each branch network has a specific network layer, and the specific network layer may be one network layer at any same network layer position in each branch network. Taking the feature extraction network including 4 branch networks as shown in fig. 2 as an example, the specific network layer may be the kth network layer of each branch network, and then the input features of the kth network layer (specific network layer) of each branch network may be two cases, where the first case is that the input features include two parts, one part is the output features of the kth-1 network layer in the branch network (indicated by solid arrows in fig. 2), and the other part is the feature after the output features of the previous network layer of at least one kth network layer in all the kth network layers are fused (indicated by addition of dashed arrows in fig. 2). The second case is where the input features are only the output features of the k-1 network layer of the branched network (indicated by solid arrows in fig. 2).

That is, at each iterative training, the inputs of a particular network layer must include the output characteristics of the previous network layer of the branched network in which it is located, and may also include: the output characteristics of the previous network layer of the specific network layer of any n branch networks, wherein the value range of n is 0-N, and N is the number of the branch networks.

Such as: when n=1 and any one of the branch networks is the branch network 1, the input of the kth network layer of the branch network 1 includes two identical output characteristics of the kth-1 network layers of the branch network 1; the inputs of the kth network layer of the branch network 2 include the output characteristics of the kth-1 network layer of the branch network 2, the output characteristics of the kth-1 network layer of the branch network 1. Similarly, other branched networks and so on.

And, for example: when n=2 and any two branch networks are branch network 2 and branch network 3, the input of the kth network layer of branch network 1 comprises the output characteristics of the kth-1 network layer of branch network 1, the characteristics after the output characteristics of the kth-1 network layer of branch network 2 and the output characteristics of the kth-1 network layer of branch network 3 are fused; the input of the kth network layer of the branch network 2 comprises the output characteristics of the kth-1 network layer of the branch network 2, the characteristics of the output characteristics of the kth-1 network layer of the branch network 2 and the characteristics of the output characteristics of the kth-1 network layer of the branch network 3 after being fused. Similarly, other branched networks and so on.

That is, when n=0, the input of each particular network layer is the output characteristic of the previous network layer of the branch network where it is located, i.e., each branch network trains alone, so that each branch network has the ability to distinguish between living and prosthesis according to the output of its own branch network. When n takes a value within the range of 1-N, the inputs to each particular network layer include: the output characteristics of the previous network layer of the branch network and the output characteristics of the previous network layer of at least one specific network layer in all specific network layers are integrated. The training of the branch network is performed to train other branch networks, so that the auxiliary training is achieved, the redundant training is reduced, and each branch network has the capability of distinguishing living bodies from prostheses according to the output of part or all of the branch networks.

It should be noted that the specific network layer cannot be the first network layer in the branched network, and it is understood that the input of the first network layer is a face image, not a feature vector, and there is no concept of the previous network layer, so the specific network layer must be a non-first layer (the value range of the specific network layer k is 2 to k, and k is the number of network layers included in each branched network). In addition, each network layer in each branch network can have other functions such as a feature shaping function, a feature scalar function, a feature dimension reduction function and the like, while having a feature extraction function. Of course, in order to reduce training complexity, it is preferable to set the number of network layers included in each branch network to be the same.

In addition, the inputs to each particular network layer are reconfigured each time training is iterated, i.e., each time a particular network layer input may be different.

Step 102, determining a plurality of first prediction probabilities that the face image belongs to the living body based on the plurality of face features, and obtaining a second prediction probability that the face image belongs to the living body based on the plurality of first prediction probabilities.

Specifically, the plurality of first prediction probabilities are prediction results of the plurality of branch networks on the input face image, and the first prediction probabilities can be obtained according to an activation function commonly used in deep learning, for example: the sigmoid activation function, the tanh activation function, the ReLU activation function, the leak ReLU activation function, and the like, and may also input a plurality of face features into a classifier commonly used in deep learning to obtain a plurality of first prediction probabilities.

And obtaining a second prediction probability of the face image belonging to the living body through the plurality of first prediction probabilities, namely a probability value finally used for living body detection. Specifically, the second prediction probability may be obtained by calculating an average value of the plurality of first prediction probabilities, the second prediction probability may be obtained by multiplying the plurality of first prediction probabilities, the maximum value and the minimum value may be removed from the plurality of first prediction probabilities, and the second prediction probability may be obtained by calculating an average value of the remaining first prediction probabilities.

And 103, performing iterative training on the feature extraction network.

Specifically, a plurality of face images comprising a prosthesis and a living body are input to train a feature extraction network, a specific network layer in the feature extraction network is configured each time in the training process, and training is carried out according to a training method commonly used for deep learning (such as a gradient descent method, a Newton algorithm, a conjugate gradient method and a Levenberg-Marquardt algorithm) to obtain a converged feature extraction network.

In this embodiment, the face features output by each branch network both cover the output of each network layer before a specific network layer in the branch network itself and cover the output of each network layer before a specific network layer in other branch networks in the feature extraction process, so that the face features finally output by each branch network are different and related to each other. Therefore, a plurality of facial features which are different from each other and are associated with each other can automatically determine the type of the prosthesis which is more suitable for processing of the corresponding branch network through training.

In an embodiment, the feature extraction network further comprises: an output network and a converged network; the configuration process of each specific network layer in each iterative training process comprises the following steps: outputting a plurality of random parameters which obey Bernoulli distribution and are the same as the number of the branch networks through an output network, and selecting zero or at least one branch network according to the plurality of random parameters; and fusing the output characteristics of the previous network layer of the specific network layer in the selected zero or at least one branch network through a fusion network aiming at each specific network layer to obtain the fused characteristics.

In this embodiment, the output network is configured to output a plurality of random parameters α that follow Bernoulli distribution and are the same in the number of branch networks _i (i=1, 2, … …, N is the number of branch networks), i.e. the values of the plurality of random parameters are 0 or 1,0 indicates that a certain branch network is not selected, and 1 indicates that a certain branch network is selected. Taking the feature extraction network including 4 branch networks as shown in fig. 2 as an example, if the 4 random parameters output by the output network are { α } ₁ =0，α ₂ =1，α ₃ =0，α ₄ =1 }, then this indicates that the 2 nd and 4 th branch networks are selected. If output network outputsThe 4 random parameters are { alpha } ₁ =0，α ₂ =0，α ₃ =0，α ₄ =0 }, then it indicates that none of the 4 branch networks is selected.

And for the selected branch network, fusing the output characteristics of the previous network layer of the specific network layer in the selected branch network through a fusion network to obtain the fused characteristics.

Further, in each iterative training process, probability values corresponding to the plurality of random parameters are super parameters in the training process. Since a plurality of random parameters obey Bernoulli distribution, P (α _i =1)=p，P(α _i =0) =1-p, p being the hyper-parameter during training. It can be understood that the value of the random parameter output by the output network can be indirectly adjusted by training the adjustment probability value p, which is equivalent to indirectly adjusting the branch network selected each time. Of course, even if the probability values are the same, the values of the corresponding random parameters may be different, such as: the probability values corresponding to the random parameter {0,1,0,1} and the random parameter {1, 0} are equal, but the specific selected branch networks are different.

In addition, the vector dimensions of the output features of the previous network layer of the particular network layer in each branched network are the same. Specifically, in order to facilitate the feature fusion of output, the previous network layer of the specific network layer in each branch network has the feature extraction and the feature shaping function, so that the vector dimensions of the output features of the previous network layer are the same.

In one embodiment, iteratively training a feature extraction network includes: constructing a loss function based on the second predictive probability; for a plurality of learnable parameters of the feature extraction network, determining a neighborhood range by taking the minimum loss value of the loss function as the center, and acquiring a plurality of parameter offsets of the plurality of learnable parameters corresponding to the maximum loss value in the neighborhood range; updating the plurality of learnable parameters by adopting a plurality of parameter offsets to obtain a plurality of offset learnable parameters; and training and updating a plurality of offset learning parameters of the feature extraction network according to the loss function until the feature extraction network converges.

It should be noted that, in the conventional training method, only the loss value of the loss function reaches a minimum value at a certain point, and whether the minimum value is stable is not concerned, so that the input of the network is slightly disturbed (for example, the type of the prosthesis of the input face image does not appear in the previous training set, the quality of the input face image is poor, etc.), and the network may miss the minimum value point to obtain a larger loss value. That is, when the conventional training method is used for training the feature extraction network and applying the feature extraction network to prediction, the problem of false facial image recognition of a brand new prosthesis type which does not appear in the training set is easy to appear, and the network prediction is unstable.

Based on the above, the embodiment trains the offset learning parameters of the feature extraction network obtained by calculating the maximum loss value in the neighborhood range, so that the maximum loss value in the neighborhood range can reach a very small value, and even if the input of the feature extraction network is disturbed, the feature extraction network trained by the method can also show high stability and strong robustness, and can accurately identify the face image of the brand new prosthesis type.

Specifically, a minimum loss value of the loss function is obtained, a neighborhood range is determined according to the minimum loss value, and the neighborhood range can be automatically adjusted according to requirements on training time, network stability and the like. And then acquiring a plurality of parameter offsets of a plurality of learnable parameters corresponding to the maximum loss value in the neighborhood range, namely, the parameter offsets can be simply understood as the difference value of the learnable parameters corresponding to the maximum loss value and the learnable parameters corresponding to the minimum loss value in the neighborhood range, updating the learnable parameters according to the parameter offsets to obtain offset learnable parameters, and training and updating the offset learnable parameters by using a loss function.

Wherein the loss function constructed based on the second predictive probability is as follows:

wherein loss is loss function, r and m are super parameters larger than 0, pred ^(b) A second predictive probability, y, for the b-th face image to belong to the living body ^(b) Label belonging to living body or prosthesis for b-th face image, y ^(b) When=0, the b-th face image belongs to a living body, y ^(b) When=1, the B-th face image belongs to a prosthesis, and b=1, 2, … …, B is the number of face images.

The calculation formula of the parameter offset is as follows:

the formula for updating the offset learnable parameters is as follows:

params=params+offset

wherein offset is a parameter offset, q is a super parameter greater than 0,gradient of Loss function Loss with respect to the learnable parameter params ₂ The L2 norm is expressed.

According to the model training method provided by the embodiment of the application, after repeated iterative training, through the specific network layer in each branch network, the output characteristics of the middle network layer of other branch networks can be associated to perform joint training on other branch networks while training the respective branch network according to the final output characteristics of each branch network, and the design of the auxiliary training of each branch network reduces redundant training, so that the characteristic extraction network automatically determines the type of the prosthesis responsible for processing in the training process, and the problem of false identification caused by the fact that the prior art depends on human subjective division of the type of the prosthesis is solved. Meanwhile, by the configuration of a specific network layer, each branch network has the capability of distinguishing the living body from the prosthesis according to the output of the branch network, has the capability of distinguishing the living body from the prosthesis according to the output of part or all of the branch networks, fully digs the potential of each branch network, and greatly improves the accuracy, stability and efficiency of the feature extraction of each branch network.

An embodiment of the present application relates to a living body detection method, as shown in fig. 3, including:

step 201, inputting the face image to be detected into a trained feature extraction network to obtain a plurality of face features.

Step 202, obtaining a plurality of first prediction probabilities that the face image belongs to the living body according to the plurality of face features, and obtaining a second prediction probability that the face image belongs to the living body based on the plurality of first prediction probabilities.

In this embodiment, the trained feature extraction network is obtained by the model training method described in the above embodiment. The first prediction probability may be obtained by calculation according to an activation function commonly used in deep learning, for example: the sigmoid activation function, the tanh activation function, the ReLU activation function, the leak ReLU activation function, and the like, and may also input a plurality of face features into a classifier commonly used in deep learning to obtain a first prediction probability.

And obtaining a second prediction probability of the face image belonging to the living body through the plurality of first prediction probabilities, namely a probability value finally used for living body detection. The second prediction probability may be obtained by calculating an average value of the plurality of first prediction probabilities, or may be obtained by multiplying the plurality of first prediction probabilities, or the second prediction probability may be obtained by removing a maximum value and a minimum value from the plurality of first prediction probabilities and calculating an average value of the remaining first prediction probabilities.

Of course, the methods of calculating the first prediction probability and the second prediction probability in the living body detection stage and the training stage are consistent.

Step 203, when the second prediction probability is greater than or equal to a preset living body threshold value, determining that the face image to be detected is a living body; and when the prediction probability is smaller than a preset living body threshold value, determining the face image to be detected as a prosthesis.

In this embodiment, the preset living body threshold value may be set by self-adjustment according to requirements for identification accuracy and differences in application scenario.

The above steps of the methods are divided, for clarity of description, and may be combined into one step or split into multiple steps when implemented, so long as they include the same logic relationship, and they are all within the protection scope of this patent; it is within the scope of this patent to add insignificant modifications to the algorithm or flow or introduce insignificant designs, but not to alter the core design of its algorithm and flow.

An embodiment of the present application relates to an electronic device, as shown in fig. 4, including:

at least one processor 301; and a memory 302 communicatively coupled to the at least one processor 301; the memory 302 stores instructions executable by the at least one processor 301, the instructions being executable by the at least one processor 301 to enable the at least one processor 301 to perform model training as mentioned in the above embodiments or to perform the living detection method as mentioned in the above embodiments.

The electronic device includes: one or more processors 301, and a memory 302, one processor 301 being illustrated in fig. 4. The processor 301, the memory 302 may be connected by a bus or otherwise, for example in fig. 4. The memory 302 is a non-volatile computer readable storage medium that can be used to store non-volatile software programs, non-volatile computer executable programs, and modules, such as algorithms corresponding to the processing strategies in the strategy space in the embodiments of the present application, are stored in the memory 302. The processor 301 executes various functional applications of the device and data processing, i.e., implements the above-described model training method or living detection method, by running nonvolatile software programs, instructions, and modules stored in the memory 302.

Memory 302 may include a storage program area that may store an operating system, at least one application program required for functionality, and a storage data area; the storage data area may store a list of options, etc. In addition, memory 302 may include high-speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid-state storage device. In some implementations, the memory 302 may optionally include memory located remotely from the processor 301, which may be connected to an external device via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

One or more modules are stored in the memory 302 that, when executed by the one or more processors 301, perform the model training method in any of the above-described embodiments, or are capable of performing the in-vivo detection method mentioned in the above-described embodiments.

The above product may perform the method provided by the embodiment of the present application, and has the corresponding functional module and beneficial effect of the performing method, and technical details not described in detail in the embodiment of the present application may be referred to the method provided by the embodiment of the present application.

Embodiments of the present application relate to a computer-readable storage medium storing a computer program. The computer program implements the above-described method embodiments when executed by a processor.

That is, it will be understood by those skilled in the art that all or part of the steps in implementing the methods of the embodiments described above may be implemented by a program stored in a storage medium, where the program includes several instructions for causing a device (which may be a single-chip microcomputer, a chip or the like) or a processor (processor) to perform all or part of the steps in the methods of the embodiments of the application. And the aforementioned storage medium includes: a usb disk, a removable hard disk, a Read-only memory (ROM), a random access memory (RAM, randomAccessMemory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.

It will be understood by those of ordinary skill in the art that the foregoing embodiments are specific examples of carrying out the application and that various changes in form and details may be made therein without departing from the spirit and scope of the application.

Claims

1. A method of model training, comprising:

extracting features of the face image through a plurality of branch networks included in the feature extraction network to obtain a plurality of face features;

determining a plurality of first prediction probabilities that the face image belongs to a living body based on the plurality of face features, and obtaining a second prediction probability that the face image belongs to the living body based on the plurality of first prediction probabilities;

performing iterative training on the feature extraction network;

wherein, each branch network has one network layer as a specific network layer, each specific network layer has the same layer position in the affiliated branch network and is configured in each iterative training process:

the input features of each specific network layer comprise output features of a previous network layer of the branch network, or further comprise features after the output features of the previous network layer of the branch network of at least one specific network layer in all the specific network layers are fused;

the feature extraction network further includes: an output network and a converged network;

the configuration process of each specific network layer in each iterative training process comprises the following steps:

outputting a plurality of random parameters which obey Bernoulli distribution and are the same as the number of the branch networks through the output network, and selecting zero or at least one branch network according to the plurality of random parameters;

and fusing output characteristics of a previous network layer of the selected zero or at least one specific network layer in the branch network through the fusion network aiming at each specific network layer to obtain the fused characteristics.

2. The model training method according to claim 1, wherein in each iterative training process, probability values corresponding to the plurality of random parameters are super-parameters in the training process.

3. The model training method of claim 1, wherein the vector dimensions of the output features of the network layer preceding the particular network layer in each branched network are the same.

4. The model training method according to claim 1, wherein the obtaining a second prediction probability that the face image belongs to a living body based on the plurality of first prediction probabilities includes:

multiplying the plurality of first prediction probabilities to obtain a second prediction probability that the face image belongs to a living body.

5. The model training method according to any one of claims 1-4, characterized in that the iterative training of the feature extraction network comprises:

constructing a loss function based on the second predictive probability;

for a plurality of learnable parameters of a feature extraction network, determining a neighborhood range by taking a minimum loss value of the loss function as a center, and acquiring a plurality of parameter offsets of the plurality of learnable parameters corresponding to a maximum loss value in the neighborhood range;

updating the plurality of learnable parameters by adopting the plurality of parameter offsets to obtain a plurality of offset learnable parameters;

training and updating a plurality of offset learnable parameters of the feature extraction network according to the loss function until the feature extraction network converges.

6. The model training method of claim 5, wherein the parameter offset is calculated by the following formula:

the offset learnable parameter is calculated by the following formula:

params=params+offset

wherein offset is a parameter offset, q is a super parameter greater than 0,gradient of Loss function Loss with respect to the learnable parameter params*‖ ₂ The L2 norm is expressed.

7. A living body detecting method, characterized by comprising:

inputting the face image to be detected into a trained feature extraction network to obtain a plurality of face features;

obtaining a plurality of first prediction probabilities that the face image belongs to a living body according to the plurality of face features, and obtaining a second prediction probability that the face image belongs to the living body based on the plurality of first prediction probabilities;

when the second prediction probability is greater than or equal to a preset living body threshold value, determining that the face image to be detected is a living body; when the prediction probability is smaller than a preset living body threshold value, determining the face image to be detected as a prosthesis;

wherein the trained feature extraction network is obtained by the model training method of any one of claims 1-6.

8. An electronic device, comprising:

at least one processor; the method comprises the steps of,

a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the model training method of any one of claims 1 to 6 or the in vivo detection method of claim 7.

9. A computer readable storage medium storing a computer program, characterized in that the computer program when executed by a processor implements the model training method of any one of claims 1 to 6 or implements the living detection method of claim 7.