WO2021082562A1 - 活体检测方法、装置、电子设备、存储介质及程序产品 - Google Patents

活体检测方法、装置、电子设备、存储介质及程序产品 Download PDF

Info

Publication number
WO2021082562A1
WO2021082562A1 PCT/CN2020/105213 CN2020105213W WO2021082562A1 WO 2021082562 A1 WO2021082562 A1 WO 2021082562A1 CN 2020105213 W CN2020105213 W CN 2020105213W WO 2021082562 A1 WO2021082562 A1 WO 2021082562A1
Authority
WO
WIPO (PCT)
Prior art keywords
feature extraction
level
feature
face image
target face
Prior art date
Application number
PCT/CN2020/105213
Other languages
English (en)
French (fr)
Inventor
张卓翼
蒋程
Original Assignee
上海商汤智能科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 上海商汤智能科技有限公司 filed Critical 上海商汤智能科技有限公司
Priority to JP2021550213A priority Critical patent/JP2022522203A/ja
Priority to SG11202111482XA priority patent/SG11202111482XA/en
Publication of WO2021082562A1 publication Critical patent/WO2021082562A1/zh
Priority to US17/463,896 priority patent/US20210397822A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/161Detection; Localisation; Normalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/30Determination of transform parameters for the alignment of images, i.e. image registration
    • G06T7/33Determination of transform parameters for the alignment of images, i.e. image registration using feature-based methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/161Detection; Localisation; Normalisation
    • G06V40/162Detection; Localisation; Normalisation using pixel segmentation or colour matching
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/168Feature extraction; Face representation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/168Feature extraction; Face representation
    • G06V40/171Local features and components; Facial parts ; Occluding parts, e.g. glasses; Geometrical relationships
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/172Classification, e.g. identification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/40Spoof detection, e.g. liveness detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/40Spoof detection, e.g. liveness detection
    • G06V40/45Detection of the body part being alive
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30196Human being; Person
    • G06T2207/30201Face

Definitions

  • the present disclosure relates to the field of image processing technology, and in particular, to living body detection methods, devices, electronic equipment, storage media, and program products.
  • face recognition technology When face recognition technology is applied to identity verification, first obtain the user's face photo in real time through the image acquisition device, and then compare the real-time obtained face photo with the pre-stored face photo. If the comparison is consistent, the identity verification by.
  • the present disclosure provides at least one living body detection method, device, electronic equipment, and storage medium, which can improve the detection efficiency in the living body detection process.
  • an optional implementation manner of the present disclosure also provides a living body detection method, including: determining the number of faces from the acquired video to be detected based on the similarity between the multiple frames of face images included in the video to be detected. Frame target face images; based on the multiple frames of target face images, determine the live detection result of the to-be-detected video.
  • an optional implementation manner of the present disclosure provides a living body detection device, including: an acquiring unit, configured to obtain a similarity between multiple frames of face images included in the acquired video to be detected from the A multi-frame target face image is determined in the video; the detection unit is configured to determine a live body detection result of the video to be detected based on the multi-frame target face image.
  • an optional implementation manner of the present disclosure also provides an electronic device, a processor, and a memory storing machine-readable instructions executable by the processor, and when the machine-readable instructions are executed by the processor, The processor is prompted to execute the living body detection method described in the first aspect.
  • an optional implementation manner of the present disclosure also provides a computer-readable storage medium having a computer program stored on the computer-readable storage medium, and when the computer program is run by an electronic device, it prompts the electronic device to execute the above-mentioned first The living body detection method described in the aspect.
  • an optional implementation manner of the present disclosure also provides a computer program product, including machine-executable instructions.
  • machine-executable instructions When the machine-executable instructions are read and executed by an electronic device, the electronic device is prompted to execute the above-mentioned first aspect.
  • the present disclosure Based on the similarity between the multiple frames of face images included in the acquired video to be detected, the present disclosure extracts multiple frames of target face images from the to-be-detected video, and then based on the multiple frames of target face images, determines the identity of the to-be-detected video
  • the living body detection result uses the user's multiple frames of face images with large differences to silently detect whether the user is a living body, and the detection efficiency is higher.
  • Fig. 1 shows a flowchart of a living body detection method provided by an embodiment of the present disclosure.
  • Fig. 2A shows a flowchart of a method for extracting a preset number of target face images from a video to be detected according to an embodiment of the present disclosure.
  • Fig. 2B shows a flowchart of a method for extracting a preset number of target face images from a video to be detected according to another embodiment of the present disclosure.
  • FIG. 3A shows a flowchart of the process of obtaining the feature extraction result of each frame of the target face image provided by the embodiment of the present disclosure.
  • FIG. 3B shows a flowchart of the process of performing feature fusion processing on the feature extraction results of the multi-frame target face image provided by an embodiment of the present disclosure to obtain first fused feature data.
  • FIG. 3C shows the process of obtaining the first detection result based on the feature extraction result of each frame of the target face image in the multi-frame target face image in the living body detection method provided by the embodiment of the present disclosure.
  • FIG. 4A shows a flow chart of a method for feature extraction of differential concatenated images provided by an embodiment of the present disclosure.
  • FIG. 4B shows a process of obtaining a second detection result based on the difference image of every two adjacent target face images in a multi-frame target face image in a living body detection method provided by an embodiment of the present disclosure.
  • FIG. 4C shows a flow chart of the process of performing feature fusion on the feature extraction results of the differential cascade image provided by an embodiment of the present disclosure.
  • Fig. 5 shows a flowchart of a living body detection method provided by another embodiment of the present disclosure.
  • FIG. 6A shows a schematic diagram of a living body detection device provided by an embodiment of the present disclosure.
  • Fig. 6B shows a schematic diagram of an electronic device provided by an embodiment of the present disclosure.
  • FIG. 7 shows a flowchart of the application process of the living body detection method provided by an embodiment of the present disclosure.
  • the user in order to verify whether the user to be detected is alive during face recognition, it is usually necessary to perform certain specified actions by the user to be detected.
  • the user is required to stand in front of the camera of the terminal device and make a certain specified facial expression according to the prompts in the terminal device.
  • the camera obtains the face video, and then detects whether the user has made the specified action based on the obtained face video, and detects whether the user who made the specified action is a legitimate user. If the user is a legitimate user, the identity verification is passed.
  • This method of living body detection usually consumes a lot of time in the interaction process between the terminal device and the user, resulting in low detection efficiency.
  • the present disclosure provides a living body detection method and device, which can extract multiple frames of target face images from a video to be detected, and then obtain a first detection result based on the feature extraction results of each frame of the target face image in the multiple frames of target face images , And obtain a second detection result based on the difference image of every two adjacent target face images in the multi-frame target face image; then, based on the first detection result and the second detection result, determine the live detection result of the video to be detected.
  • the user does not need to make any specified actions, but uses multiple frames of the user's face images with large differences to silently detect whether the user is a living body, and the detection efficiency is higher.
  • the execution subject of the living body detection method provided in the embodiment of the present disclosure is generally an electronic device with a certain computing capability.
  • the electronic equipment includes, for example, terminal equipment or servers or other processing equipment.
  • the terminal equipment may be User Equipment (UE), mobile equipment, user terminal, terminal, cellular phone, cordless phone, personal digital assistant (Personal Digital Assistant, PDA), handheld devices, computing devices, vehicle-mounted devices, wearable devices, etc.
  • UE User Equipment
  • PDA Personal Digital Assistant
  • the living body detection method can be implemented by a processor invoking a computer-readable instruction stored in a memory.
  • the following takes the execution subject as the terminal device as an example to describe the living body detection method provided by the alternative implementation of the present disclosure.
  • FIG. 1 it is a flowchart of a living body detection method provided by an embodiment of the present disclosure.
  • the method includes steps S101-S104.
  • S101 Extract multiple frames of target face images from the acquired video to be detected.
  • S102 Obtain a first detection result based on the feature extraction result of each frame of the target face image in the multi-frame target face image.
  • S103 Obtain a second detection result based on the difference image of every two adjacent target face images in the multiple frames of target face images.
  • S104 Determine a live body detection result of the video to be detected based on the first detection result and the second detection result.
  • S102 and S103 have no order of execution.
  • the above S101-S104 will be described in detail below.
  • an image acquisition device is installed in the terminal device, and the original detection video can be instantly acquired through the image acquisition device.
  • Each frame of the original detection video includes a human face.
  • the original detection video can be used as the video to be detected; it is also possible to capture images of the face parts included in the original detection video to obtain the video to be detected.
  • the video duration of the detected video can be above a preset duration threshold, and the preset duration range can be specifically set according to actual needs.
  • the preset duration threshold is 2 seconds, 3 seconds, 4 seconds, and so on.
  • the number of frames of the face image included in the video to be detected is greater than the number of frames of the target face image that needs to be extracted.
  • the number of frames of the target face detection image may be fixed or determined according to the video length of the video to be detected.
  • the multi-frame target face image is determined from the video to be detected, for example, based on the similarity between the multi-frame face images included in the video to be detected.
  • the multi-frame target face image satisfies at least one of the following two requirements.
  • the similarity between every two adjacent target face images in the multi-frame target face image is lower than the first value.
  • the face image is used as a frame in the target face image.
  • the first value may be a preset value. In this way, the obtained multiple target face images have large differences, and the detection results can be obtained with higher accuracy.
  • Requirement 2 Determine the first target face image in the multi-frame target face image from the video to be detected; based on the first target face image, from the multiple frames of consecutive faces in the video to be detected A second target face image is determined in the image, where the similarity between the second target face image and the first target face image meets a preset similarity requirement.
  • the similarity requirement may include: the second target face image is a face image that has the smallest similarity with the first target face image among the multiple frames of continuous face images. In this way, the obtained multiple target face images have large differences, and the detection results can be obtained with higher accuracy.
  • the first target face image in the multi-frame target face image may be determined in the following manner: the video to be detected is divided into multiple segments, where each segment includes a certain number of consecutive people Face image; selecting a first target face image from the first segment of the plurality of segments. And based on the first target face image, a second target face image is determined from each of the multiple segments.
  • the target face image can be scattered to the entire video to be detected, and then the changes in the user's expression during the duration of the video to be detected can be better captured.
  • FIG. 2A is a flowchart of a method for extracting a preset number of target face images from a video to be detected according to an embodiment of the present disclosure, which includes the following steps.
  • N preset number-1.
  • the number of face images included in different image groups may be the same or different, and may be specifically set according to actual needs.
  • S202 For the first image group, determine the first frame of face image in the image group as the first frame of target face image, and use the first frame of target face image as the reference face image to obtain the image The similarity between all face images in the group and the reference face image; the face image with the smallest similarity with the reference face image is determined as the second target face image in the image group.
  • S203 For each of the other image groups, use the second target face image in the previous image group as a reference face image, and obtain the similarity between each frame of the face image in the image group and the reference face image; The face image with the smallest degree of similarity with the reference face image is taken as the second target face image of the image group.
  • any one of the following two methods may be adopted but not limited to determining the similarity between a certain frame of face image and a reference face image.
  • This frame of face image can be referred to as the first face image
  • the reference face image can be referred to as the second face image.
  • any one of the multiple frames of face images may be referred to as a first face image, and another frame of face images may be referred to as a second face image.
  • Manner 1 Based on the pixel value of each pixel in the first face image and the pixel value of each pixel in the second face image, the first face image and the second face image are obtained.
  • the face difference image of the face image according to the pixel value of each pixel in the face difference image, the variance corresponding to the face difference image is obtained; the variance is used as the first face image and the total Describe the similarity between the second face images.
  • the pixel value of any pixel M in the face difference image the pixel value of pixel M'in the first face image-the pixel value of pixel M" in the second face image.
  • the pixel point M is The position in the face difference image, the position of the pixel point M'in the face image, and the position of the pixel point M" in the reference face image are consistent.
  • the similarity obtained by this method has the characteristic of simple calculation.
  • Method 2 Perform at least one level of feature extraction on the first face image and the second face image to obtain feature data corresponding to the first face image and the second face image; then calculate the first face image and the second face image The distance between the feature data corresponding to the two face images, and the distance is used as the similarity between the first face image and the second face image. The larger the distance, the smaller the similarity between the first face image and the second face image.
  • a convolutional neural network can be used to perform feature extraction on the first face image and the second face image.
  • the video to be detected is divided into 4 groups according to the order of the timestamps, respectively They are: the first group: a1-a5; the second group: a6-a10; the third group: a11-a15; the fourth group: a16-a20.
  • For the first image group take a1 as the target face image of the first frame, and use a1 as the reference face image to obtain the similarity between a2 and a5 respectively. Assuming that the similarity between a3 and a1 is the smallest, a3 is taken as the second target face image in the first image group. For the second image group, take a3 as the reference face image, and obtain the similarity between a6-a10 and a3. Assuming that the similarity between a7 and a3 is the smallest, then a7 is taken as the second target face image in the second image group. For the third image group, take a7 as the reference face image, and obtain the similarity between a11-a15 and a7 respectively.
  • a14 is taken as the second target face image in the third image group.
  • For the fourth image group take a14 as the reference face image, and obtain the similarity between a16-a20 and a14 respectively.
  • a19 is taken as the second target face image in the fourth image group.
  • the finally obtained target face image includes five frames a1, a3, a7, a14, and a19.
  • the first target face image is selected from the video to be detected; then the remaining other face images are divided into multiple segments, and based on the first target face image, the first target face image is selected from the multiple segments.
  • the target face image determines the second target face image.
  • Fig. 2B is a flowchart of a method for extracting a preset number of target face images from a video to be detected according to another embodiment of the present disclosure, including the following steps.
  • S211 Determine the first frame of face image in the video to be detected as the first frame of target face image.
  • S213 Regarding the first image group, use the first frame of target face image as a reference face image, and obtain the similarity between all face images in the image group and the reference face image; The face image with the smallest degree of similarity between the images is determined as the second target face image in the first image group.
  • S214 For each other image group, use the second target face image in the previous image group as a reference face image, and obtain the similarity between each frame of the face image in the image group and the reference face image; The face image with the smallest degree of similarity with the reference face image is taken as the second target face image of the image group.
  • the determination method of the similarity between the face image and the reference face image is similar to the determination method in FIG. 2A described above, and will not be repeated here.
  • a1-a20 there are 20 frames of face images in the video to be detected, a1-a20, respectively, the preset number of target face images is 5, and a1 is used as the first frame of target face image, according to the order of the timestamps, Divide a2-a20 into 4 groups, namely: the first group: a2-a6; the second group: a7-a11; the third group: a12-a16; the fourth group: a17-a20.
  • For the first image group use a1 as the reference face image, and obtain the similarity between a2 and a6 respectively. Assuming that the similarity between a4 and a1 is the smallest, then a4 is taken as the second target face image in the first image group. For the second image group, take a4 as the reference face image, and obtain the similarity between a7-a11 and a4. Assuming that the similarity between a10 and a4 is the smallest, a10 is taken as the second target face image in the second image group. For the third image group, take a10 as the reference face image, and obtain the similarity between a12-a16 and a10 respectively.
  • a13 is taken as the second target face image in the third image group.
  • For the fourth image group take a13 as the reference face image, and obtain the similarity between a17-a20 and a13 respectively.
  • a19 is taken as the second target face image in the fourth image group.
  • the finally obtained target face image includes five frames a1, a4, a10, a13, and a19.
  • the living body detection method further includes: acquiring key point information of each frame of the face image in the multi-frame face image included in the video to be detected; based on the key point information of each frame of the face image in the multi-frame face image, Alignment processing is performed on the multiple frames of face images to obtain aligned multiple frames of face images.
  • the multiple frames of face images in the video to be detected can be sequentially input into the pre-trained face key point detection model to obtain the key point position of each target key point in each frame of face image, and then based on the obtained target
  • the key point position of the key point, the first frame of face image is used as the reference image, and the other face images except the first frame of face image are aligned, so that the position and angle of the face in different face images are uniform. be consistent. Avoid the interference of head position and direction changes on the subtle changes of the human face.
  • determining multiple frames of target face images from the to-be-detected video includes: According to the similarity between the multiple frames of face images after the alignment process, the multiple frames of target face images are determined from the multiple frames of face images after the alignment process.
  • the method for determining the target face image is similar to the above method, and will not be repeated here.
  • the respective feature extraction results of the multiple frames of target face images may be subjected to feature fusion processing to obtain first fusion feature data; based on the first fusion feature data, the first detection result.
  • the feature data corresponding to each frame of the target face image contains the characteristics of subtle changes in the face, so that the user does not need to make changes. Under the premise of any designated action, accurate live detection is carried out.
  • FIG. 3A is a flowchart of the process of obtaining the feature extraction result of each frame of the target face image provided by the embodiment of the present disclosure, including the following steps.
  • S301 Perform multi-level feature extraction processing on the target face image to obtain first initial feature data corresponding to each level of first feature extraction processing in the multi-level feature extraction processing.
  • the target face image may be input into the pre-trained first convolutional neural network, and the target face image may be subjected to multi-level first feature extraction processing.
  • the first convolutional neural network includes multiple convolutional layers; multiple convolutional layers are connected in order, and the output of any convolutional layer is the next convolutional layer of the convolutional layer. Layer input. And the output of each convolutional layer is used as the first intermediate feature data corresponding to the convolutional layer.
  • a pooling layer, a fully connected layer, etc. can also be set; for example, a pooling layer is connected after each convolutional layer, and after the pooling layer A fully connected layer is connected, so that the convolutional layer, the pooling layer, and the fully connected layer constitute a first-level network structure for performing the first feature extraction process.
  • the specific structure of the first convolutional neural network can be specifically set according to actual needs, and will not be repeated here.
  • the number of convolutional layers in the first convolutional neural network is consistent with the number of stages for performing the first feature extraction process.
  • S302 For each level of the first feature extraction process, according to the first initial feature data of the first feature extraction process of the level, and the first initial feature data of the at least one level of the first feature extraction process subsequent to the first feature extraction process of the level The feature data is fused to obtain the first intermediate feature data corresponding to the first feature extraction process of this level, wherein the feature extraction result of the target face image includes the first feature of each level in the multi-level first feature extraction process The first intermediate feature data corresponding to the extraction process respectively.
  • the first feature extraction process at each level can obtain richer facial features, and finally higher detection accuracy can be obtained.
  • the first intermediate feature data corresponding to the first feature extraction process at any level can be obtained in the following manner: the first initial feature data of the first feature extraction process at this level and the subordinate feature data of the first feature extraction process at that level
  • the first intermediate feature data corresponding to a feature extraction process is fused to obtain the first intermediate feature data corresponding to the first feature extraction process at this level, wherein the first intermediate feature data corresponding to the first feature extraction process at the lower level It is obtained based on the first initial feature data of the first feature extraction process at the lower level.
  • the first feature extraction process at each level can obtain richer facial features, and finally higher detection accuracy can be obtained.
  • the first intermediate feature data corresponding to the first feature extraction process of this level is obtained; for the last level of first feature extraction process, the first initial feature data obtained by the last level of first feature extraction process is determined as the last First-level first feature extraction processing corresponding first intermediate feature data.
  • the first intermediate feature data corresponding to the first feature extraction process at this level may be obtained in the following manner: up-sampling the first intermediate feature data corresponding to the first feature extraction process at the lower level of the first feature extraction process at this level, Obtain the upsampling data corresponding to the first feature extraction process of this level; fuse the upsampling data corresponding to the first feature extraction process of this level and the first initial feature data to obtain the first intermediate feature data corresponding to the first feature extraction process of this level.
  • up-sampling is performed and added to the features of shallow feature extraction processing, so that deep features can flow to shallow features, thus enriching the information extracted by shallow feature extraction. Increased detection accuracy.
  • the first initial feature data obtained by the five-level feature extraction process are: V1, V2, V3, V4, and V5.
  • V5 is used as the first intermediate feature data M5 corresponding to the fifth-level first feature extraction process.
  • the first intermediate feature data M5 obtained by the fifth-level first feature extraction process is subjected to up-sampling processing to obtain the up-sampled data M5' corresponding to the fourth-level first feature extraction process.
  • the first intermediate feature data M4 corresponding to the fourth-level first feature extraction process is generated based on V4 and M5'.
  • the first intermediate feature data M3 corresponding to the third-level first feature extraction process can be obtained.
  • the first intermediate feature data M2 corresponding to the second-level first feature extraction process can be obtained.
  • the first intermediate feature data M2 obtained by the second-level first feature extraction processing is subjected to up-sampling processing to obtain the up-sampled data M2' corresponding to the first-level first feature extraction processing. Based on V1 and M2', first intermediate feature data M1 corresponding to the first-level feature extraction process is generated.
  • the up-sampling data corresponding to the first feature extraction process at this level and the first initial feature data can be merged in the following manner to obtain the first intermediate feature data corresponding to the first feature extraction process at this level:
  • the first initial feature data is added.
  • adding refers to adding the data value of each data in the up-sampled data to the data value of the corresponding position data in the first initial feature data.
  • the obtained up-sampled data After up-sampling the first intermediate feature data corresponding to the first feature extraction process at the next level, the obtained up-sampled data has the same dimension as the first initial feature data corresponding to the first feature extraction process at the current level. After being added to the first initial feature data, the dimension of the first intermediate feature data obtained is also the same as the dimension of the first initial feature data corresponding to the first feature extraction process at this level.
  • the dimensions of the first initial feature data corresponding to each level of the first feature extraction process are related to the network settings of each level of the convolutional neural network, which is not limited in this application.
  • the up-sampled data and the first initial feature data may also be spliced.
  • the dimensions of the up-sampling data and the first initial feature data are both m*n*f. After the two are vertically spliced, the dimension of the first intermediate feature data obtained is: 2m*n*f. After the two are horizontally spliced, the dimension of the first intermediate feature data obtained is: m*2n*f.
  • FIG. 3B is a flowchart of a process of performing feature fusion processing on the feature extraction results of the multi-frame target face images to obtain first fused feature data according to an embodiment of the present disclosure, including the following steps.
  • S311 For each level of first feature extraction processing, perform fusion processing on the first intermediate feature data corresponding to the multi-frame target face image in the first feature extraction processing of this level, to obtain the corresponding first feature extraction processing of this level Fusion data in the middle.
  • the intermediate fusion data corresponding to the first feature extraction process at each level can be obtained in the following manner: Based on the first intermediate feature data corresponding to the multi-frame target face image in the first feature extraction process at this level, the data and The feature sequence corresponding to the first feature extraction process at this level; the feature sequence is input to the cyclic neural network for fusion processing, and the intermediate fusion data corresponding to the first feature extraction process at this level is obtained.
  • the recurrent neural network includes, for example, one or more of Long Short-Term Memory (LSTM), Recurrent Neural Networks (RNN), and Gated Recurrent Unit (GRU) .
  • LSTM Long Short-Term Memory
  • RNN Recurrent Neural Networks
  • GRU Gated Recurrent Unit
  • n intermediate fusion data can be finally obtained.
  • the feature sequence corresponding to the first feature extraction process of this level is obtained.
  • the feature sequence corresponding to the first feature extraction process at this level is specifically: in accordance with the time sequence of the multi-frame target face image, the first feature extraction process corresponding to the first feature extraction process at this level is based on the multi-frame target face image. 2.
  • global average pooling can convert three-dimensional feature data into two-dimensional feature data.
  • the first intermediate feature data is transformed in dimensionality and the subsequent processing process is simplified.
  • the dimension of the first intermediate feature data obtained is 7*7*128, which can be understood as superimposing 128 7*7 two-dimensional matrices together .
  • calculate the average value of each element in the two-dimensional matrix For each 7*7 two-dimensional matrix, calculate the average value of each element in the two-dimensional matrix. Finally, 128 average values can be obtained, and the 128 average values are used as the second intermediate feature data.
  • the target face images are: b1-b5.
  • the second intermediate feature data corresponding to the first feature extraction process of each frame of the target face image at a certain level are: P1, P2, P3, P4, and P5, then the second intermediate feature data of the five frames of target face image
  • the obtained feature sequence corresponding to the first feature extraction process at this level is: (P1, P2, P3, P4, P5).
  • the first feature extraction process at any level after obtaining the second intermediate feature data corresponding to each frame of the target face image in the first feature extraction process at this level, based on the time sequence of each frame of the target face image, arrange the The second intermediate feature data corresponding to the first feature extraction process of the multi-frame target face image at this level can obtain the feature sequence.
  • the feature sequences are input into the corresponding recurrent neural network model to obtain the first feature extraction processing at each level. Corresponding intermediate fusion data.
  • Multi-level extraction of features in the target face image can make the finally obtained feature data of the target face image contain richer information, thereby improving the accuracy of living body detection.
  • the intermediate fusion data corresponding to the first feature extraction processing at all levels may be spliced to obtain the first fusion feature data that uniformly characterizes the target face image.
  • the intermediate fusion data corresponding to the multi-level first feature extraction processing may also be spliced, and then the full connection processing is performed to obtain the first fusion feature data.
  • the first fusion feature data can be input to the first classifier to obtain the first detection result.
  • the first classifier is, for example, a softmax classifier.
  • an example of obtaining the first detection result based on the feature extraction result of each frame of the target face image in the multi-frame target face image is provided.
  • a certain frame of the target face image is performed
  • the five-level feature extraction process, the first initial feature data obtained are: V1, V2, V3, V4, and V5.
  • the first intermediate feature data M5 of the fifth-level first feature extraction process is generated.
  • Up-sampling is performed on the first intermediate feature data M5 to obtain up-sampling data M5' of the first feature extraction process at the fourth level.
  • the first initial feature data V4 of the fourth-level first feature extraction process and the up-sampled data M5' are added to obtain the first intermediate feature data M4 of the fourth-level first feature extraction process.
  • Up-sampling is performed on the first intermediate feature data M4 to obtain up-sampled data M4' of the first feature extraction process at the third level.
  • the first initial feature data V3 of the first feature extraction process of the third level and the up-sampled data M4' are added to obtain the first intermediate feature data M3 of the first feature extraction process of the third level.
  • Up-sampling is performed on the first intermediate feature data M3 to obtain up-sampled data M3' of the second-level first feature extraction process.
  • the first initial feature data V2 of the second-level first feature extraction process and the up-sampled data M3' are added to obtain the first intermediate feature data M2 of the second-level first feature extraction process.
  • Up-sampling the first intermediate feature data M2 to obtain up-sampling data M2' of the first-level first feature extraction process compare the first initial feature data V1 of the first-level first feature extraction process to the up-sampling data M2' Plus, the first intermediate feature data M1 of the first-level first feature extraction process is obtained.
  • the obtained first intermediate feature data M1, M2, M3, M4, and M5 are used as feature extraction results obtained after feature extraction is performed on the target face image of the frame.
  • the first intermediate feature data corresponding to the target face image in the five-level first feature extraction process are averaged pooled to obtain the target face image of the frame.
  • the corresponding second intermediate feature data G1, G2, G3, G4, and G5 are respectively corresponding.
  • the second intermediate feature data corresponding to the first frame of the target face image a1 under the five-level first feature extraction process are: G11, G12, G13, G14, G15; the second frame of the target face image a2 in the five-level first feature extraction process respectively corresponds to the second intermediate feature data: G21, G22, G23, G24, G25; the third frame of the target person
  • the second intermediate feature data corresponding to the face image a3 under the five-level first feature extraction process are: G31, G32, G33, G34, G35; the fourth frame of the target face image a4 under the five-level first feature extraction process respectively
  • the corresponding second intermediate feature data are: G41, G42, G43, G44, G45; the second intermediate feature data corresponding to the fifth frame of target face image a5 under the five-level first feature extraction process are: G51, G52, G53, G54, G55.
  • the feature sequence corresponding to the first-level feature extraction process is: (G11, G21, G31, G41, G51).
  • the feature sequence corresponding to the second-level feature extraction process is: (G12, G22, G32, G42, G52).
  • the feature sequence corresponding to the third-level feature extraction process is: (G13, G23, G33, G43, G53).
  • the feature sequence corresponding to the fourth-level feature extraction process is: (G14, G24, G34, G44, G54).
  • the feature sequence corresponding to the fifth-level feature extraction process is: (G15, G25, G35, G45, G55).
  • the feature sequence (G11, G21, G31, G41, G51) is input to the LSTM network corresponding to the first-level first feature extraction process, and the intermediate fusion data R1 corresponding to the first-level first feature extraction process is obtained.
  • the feature sequence (G12, G22, G32, G42, G52) is input to the LSTM network corresponding to the second-level first feature extraction process to obtain the intermediate fusion data R2 corresponding to the second-level first feature extraction process.
  • the feature sequence (G13, G23, G33, G43, G53) is input to the LSTM network corresponding to the third-level first feature extraction process, and the intermediate fusion data R3 corresponding to the third-level first feature extraction process is obtained.
  • the feature sequence (G14, G24, G34, G44, G54) is input to the LSTM network corresponding to the fourth-level first feature extraction process, and the intermediate fusion data R4 corresponding to the fourth-level first feature extraction process is obtained.
  • the feature sequence (G15, G25, G35, G45, G55) is input to the LSTM network corresponding to the fifth-level first feature extraction process, and the intermediate fusion data R5 corresponding to the second-level first feature extraction process is obtained.
  • intermediate fusion data R1, R2, R3, R4, and R5 are spliced, they are passed into the fully connected layer for fully connected processing to obtain the first fused feature data. Then the first fusion feature data is passed to the first classifier to obtain the first detection result.
  • step S103 the following method can be used to obtain the second detection result based on the difference image of every two adjacent target face images in the multi-frame target face image.
  • the change feature can be better extracted, thereby improving the accuracy of the second detection result.
  • the method for obtaining the difference image of the target face image in every two adjacent frames is similar to the description of the first method in FIG. 2A, and will not be repeated here.
  • the difference image is cascaded on the color channel.
  • the differential image is a three-channel image
  • the resulting differential concatenated image is a six-channel image.
  • the number of color channels of different differential images is the same, and the number of pixels is also the same.
  • the representation vector of the differential image is: 256*1024*3.
  • the element value of any element Aijk in the representation vector is the pixel value of the pixel point Aij' in the k-th color channel.
  • the following method may be adopted to obtain the second detection result based on the differential cascaded image: feature extraction processing is performed on the differential cascaded image to obtain the feature extraction result of the differential cascaded image; Perform feature fusion on the feature extraction result of the differential cascade image to obtain second fusion feature data; and obtain the second detection result based on the second fusion feature data.
  • the change feature can be better extracted, thereby improving the accuracy of the second detection result.
  • FIG. 4 is a flowchart of a method for feature extraction of differential concatenated images according to an embodiment of the present disclosure, including the following steps.
  • S401 Perform multi-level second feature extraction processing on the differential cascaded image to obtain second initial feature data corresponding to each level of second feature extraction processing respectively.
  • the differential concatenated image can be input into the pre-trained second convolutional neural network, and the differential concatenated image can be subjected to multi-level second feature extraction processing.
  • the second convolutional neural network is similar to the above-mentioned first convolutional neural network. It should be noted that the network structure of the second convolutional neural network and the aforementioned first convolutional neural network may be the same or different; when the two structures are the same, the network parameters are different. The number of stages of the first feature extraction process and the number of stages of the second feature extraction process may be the same or different.
  • S402 Obtain a feature extraction result of the differential cascaded image based on the second initial feature data corresponding to the multi-level second feature extraction process respectively.
  • Performing multi-level second feature extraction processing on the differential cascade image can increase the receptive field of feature extraction and enrich the information in the differential cascade image.
  • the following method may be used to obtain the feature extraction results of the differential cascaded image based on the second initial feature data corresponding to the multi-level second feature extraction processing: for each level of second feature extraction processing, The second initial feature data of the second feature extraction process is fused with the second initial feature data of the second feature extraction process of at least one level before the second feature extraction process of this level to obtain the corresponding data of the second feature extraction process of this level.
  • the third intermediate feature data; the feature extraction result of the differential concatenated image includes the third intermediate feature data corresponding to the multi-level second feature extraction process respectively.
  • the information obtained by the second feature extraction processing at each level is richer, and this information can better represent the change information in the differential image, so as to improve the accuracy of the second detection result.
  • the specific manner of performing fusion processing on the second initial feature data of the second feature extraction process at any level and the second initial feature data of the second feature extraction process at the at least one level before the second feature extraction process at that level may be It is: down-sampling the second initial feature data of the second feature extraction process of the second feature extraction process of this level to obtain the down-sampling data corresponding to the second feature extraction process of this level; corresponding to the second feature extraction process of this level
  • the down-sampled data and the second initial feature data are fused to obtain the third intermediate feature data corresponding to the second feature extraction process at this level.
  • the information obtained by the multi-level second feature extraction process is processed by the upper level second feature extraction process, and the lower level second feature extraction process flows, so that the information obtained by the second feature extraction process at each level is more abundant.
  • the second initial feature data obtained by the first-level second feature extraction process is determined as the third intermediate feature data corresponding to the second feature extraction process of the first level.
  • the second feature of the same level is obtained.
  • the third intermediate feature data corresponding to each level of the second feature extraction process is used as the result of feature extraction on the differential cascade image.
  • the third intermediate feature data corresponding to the second feature extraction process at each level can be obtained in the following manner: down-sampling the third intermediate feature data obtained by the second feature extraction process at the previous level to obtain the second feature extraction process at this level
  • the vector dimension of the down-sampled data corresponding to the second feature extraction process at this level is the same as the dimension of the second initial feature data obtained based on the second feature extraction process at this level; the second feature extraction based on the level
  • the corresponding down-sampled data and the second initial feature data are processed to obtain the third intermediate feature data corresponding to the second feature extraction process at this level.
  • a 5-level second feature extraction process is performed on the differential concatenated image.
  • the second initial feature numbers obtained by the five-level second feature extraction process are respectively: W1, W2, W3, W4, and W5.
  • W1 is used as the third intermediate feature data E1 corresponding to the first-level second feature extraction process.
  • the third intermediate feature data E1 obtained by the first-level second feature extraction process is down-sampled to obtain the down-sampled data E1' corresponding to the second-level first feature extraction process.
  • the third intermediate feature data E2 corresponding to the second-level second feature extraction process is generated based on W2 and E1'.
  • the third intermediate feature data E3 corresponding to the third-level second feature extraction process and the third intermediate feature data E4 corresponding to the fourth-level second feature extraction process are respectively obtained.
  • the third intermediate feature data E4 obtained by the fourth-level second feature extraction process is down-sampled to obtain down-sampled data E4' corresponding to the fifth-level second feature extraction process.
  • the fifth intermediate feature data E5 corresponding to the fifth-level second feature extraction process is generated based on W5 and E4'.
  • FIG. 4C is a flowchart of the process of performing feature fusion on the feature extraction results of the differential concatenated images according to an embodiment of the present disclosure, including the following steps.
  • S411 Perform global average pooling processing on the third intermediate feature data in the second feature extraction process at each level of the differential concatenated image, respectively, to obtain the corresponding information in the second feature extraction process at each level of the differential concatenated image.
  • the fourth intermediate feature data is
  • the method of performing global average pooling on the third intermediate feature data is similar to the above method of performing global average pooling on the first intermediate feature data, and will not be repeated here.
  • S412 Perform feature fusion on the fourth intermediate feature data corresponding to the second feature extraction processing at each level of the differential cascade image to obtain the second fusion feature data.
  • the dimensional transformation of the third intermediate feature data can simplify the subsequent processing process.
  • the fourth intermediate feature data corresponding to the second feature extraction processing at each level may be spliced, and then input to the fully connected network for fully connected processing to obtain the second fused feature data. After the second fusion feature data is obtained, the second fusion feature data is input to the second classifier to obtain the second detection result.
  • the third intermediate feature data E1 corresponding to the first-level second feature extraction process undergoes global average pooling to obtain the corresponding fourth intermediate feature data U1;
  • the second-level second feature extraction After processing the corresponding third intermediate feature data E2 after global average pooling, the corresponding fourth intermediate feature data U2 is obtained;
  • the third intermediate feature data E3 corresponding to the third-level second feature extraction processing is obtained after global average pooling
  • fifth level second feature extraction process After the corresponding third intermediate feature data E5 undergoes global average pooling, the corresponding fourth intermediate feature data U5 is obtained.
  • the second classifier is, for example, a softmax classifier.
  • the detection result can be determined in the following manner: the first detection result and the second detection result are weighted and summed to obtain the target detection result.
  • the first detection result and the second detection result are weighted and summed, and the two detection results are combined to obtain a more accurate living body detection result.
  • the weights corresponding to the first detection result and the second detection result can be specifically set according to actual needs, and are not limited here. In an example, their respective weights can be the same.
  • the target detection result is a living body. For example, when the value is greater than or equal to a certain threshold, the face in the video to be detected is a live face; otherwise, it is a non-living face.
  • the threshold may be obtained when the first convolutional neural network and the second convolutional neural network are trained. For example, the two convolutional neural networks can be trained through multiple labeled samples, and then the weighted summation value after the training of the positive sample and the weighted summation value after the training of the negative sample are obtained to obtain the threshold.
  • a living body detection method is also provided, and the living body detection method is implemented by a living body detection model.
  • the living body detection model includes: a first sub-model, a second sub-model, and a calculation module; wherein the first sub-model includes: a first feature extraction network, a first feature fusion network, and a first classifier; the second sub-model includes: The second feature extraction network, the second feature fusion network, and the second classifier; the living body detection model is obtained by training using the sample face video in the training sample set, and the sample face video is labeled with label information whether it is a living body.
  • the first feature extraction network is used to obtain the first detection result based on the feature extraction result of each frame of the target face image in the multi-frame target face image.
  • the second feature extraction network is used to obtain a second detection result based on the difference image of every two adjacent target face images in the multi-frame target face image.
  • the calculation module is used to obtain the living body detection result based on the first detection result and the second detection result.
  • the embodiment of the present disclosure can extract multiple frames of target face images from the video to be detected, and then obtain the first detection result based on the feature extraction results of each frame of the target face image in the multiple frames of target face images, and based on the multiple frames of target face images The difference image of every two adjacent target face images in the image obtains the second detection result; then based on the first detection result and the second detection result, the live detection result of the video to be detected is determined.
  • the user does not need to make any specified actions, but uses multiple frames of the user's face images with large differences to silently detect whether the user is a living body, and the detection efficiency is higher.
  • the writing order of the steps does not mean a strict execution order but limits the implementation process in any way.
  • the specific execution order of each step should be based on its function and possibility.
  • the inner logic is determined.
  • another embodiment of the present disclosure also provides a living body detection method, which includes the following steps.
  • S501 Based on the acquired similarity between the multiple frames of face images included in the to-be-detected video, extract multiple frames of target face images from the to-be-detected video.
  • S502 Determine a live body detection result of the video to be detected based on multiple frames of target face images.
  • step S501 For the specific implementation of step S501, please refer to the implementation of step S101 above, which will not be repeated here.
  • multiple frames of target face images are extracted from the video to be detected, and the similarity between adjacent target face images in the multiple frames of target face images is lower than the first value, and then based on the target face image, Determining the live detection result of the video to be detected does not require the user to make any specified actions. Instead, the user's multi-frame face images with large differences are used to silently detect whether the user is alive, and the detection efficiency is higher.
  • determining the live detection result of the video to be detected based on multiple frames of target face images includes: obtaining the first target face image based on the feature extraction result of each frame of the target face image in the multiple frames The detection result, and/or the second detection result is obtained based on the difference image of every two adjacent target face images in the multi-frame target face image; the second detection result is determined based on the first detection result and/or the second detection result The live test result of the video.
  • the first detection result is obtained, and the first detection result is used as the target detection result, or the first detection result is processed to obtain the target detection result.
  • the second detection result is obtained, and the second detection result is used as the target detection result, or the second detection result is processed to obtain the target detection result.
  • the first detection result and the second detection result are acquired, and based on the first detection result and the second detection result, the live detection result for the video to be detected is determined, for example, the first detection result is Perform a weighted summation with the second detection result to obtain the living body detection result.
  • the embodiment of the present disclosure also provides a living body detection device corresponding to the living body detection method. Since the principle of the device in the embodiment of the disclosure to solve the problem is similar to the above-mentioned living body detection method in the embodiment of the disclosure, the implementation of the device You can refer to the implementation of the method, and the repetition will not be repeated here.
  • FIG. 6A it is a schematic diagram of a living body detection device provided by an embodiment of the present disclosure.
  • the device includes: an acquisition unit 61 and a detection unit 62.
  • the acquiring unit 61 is configured to determine a multi-frame target face image from the video to be detected based on the similarity between the acquired multiple frames of face images included in the video to be detected.
  • the detection unit 62 is configured to determine the live detection result of the video to be detected based on multiple frames of target face images.
  • the similarity between every two adjacent target face images in the multi-frame target face image is lower than the first value.
  • the acquiring unit 61 is further configured to: determine the first target face image in the multi-frame target face image from the video to be detected; based on the first target face image, from the multi-frame continuous person of the video to be detected A second target face image is determined from the face image, where the similarity between the second target face image and the first target face image meets a preset similarity requirement.
  • the acquiring unit 61 is further configured to: divide the video to be detected into multiple segments, where each segment includes a certain number of consecutive face images; and select the first target from the first segment of the multiple segments Face image; based on the first target face image, a second target face image is determined from each of the multiple segments.
  • the acquiring unit 61 is further configured to: compare the similarity between all face images in the first segment and the first target face image, and use the face image with the smallest similarity as the second target person in the first segment Face image; for each of the other segments, compare the similarity between all face images in the segment and the second target face image of the previous segment of the segment, and use the face image with the smallest similarity as the segment In the second target face image of, the other segments are multiple segments except the first segment.
  • the similarity between multiple frames of face images is obtained based on the following method: selecting two frames of face images from the multiple frames of face images as the first face image and the second face image; The pixel value of each pixel in a face image and the pixel value of each pixel in the second face image are used to obtain the face difference image of the first face image and the second face image; according to the face difference From the pixel value of each pixel in the image, the variance corresponding to the face difference image is obtained; the variance is taken as the similarity between the first face image and the second face image.
  • the acquiring unit 61 before extracting multiple frames of target face images from the acquired video to be detected, is further configured to: acquire key points of each frame of the face image in the multiple frames of face images included in the video to be detected Information; Based on the key point information of each face image in the multi-frame face image, the multi-frame face image is aligned to obtain the multi-frame face image after the alignment process; based on the multi-frame face image after the alignment process Based on the similarity between the two, multiple frames of target face images are determined from the multiple frames of face images after the alignment process.
  • the detection unit 62 includes: a first detection module and/or a second detection module, and a determination module; wherein, the first detection module is configured to be based on the feature of each frame of the target face image in the multi-frame target face image.
  • the extraction result is used to obtain the first detection result;
  • the second detection module is used to obtain the second detection result based on the difference image of every two adjacent target face images in the multi-frame target face image;
  • the determination module is used to obtain the second detection result based on the first
  • the detection result and/or the second detection result determine the live detection result of the video to be detected.
  • the first detection module is further configured to: perform feature fusion processing on the respective feature extraction results of the multiple frames of target face images to obtain the first fusion feature data; and obtain the first detection result based on the first fusion feature data.
  • the feature extraction result of each frame of the target face image includes: performing multi-level first feature extraction processing on the target face image to obtain first intermediate feature data corresponding to each level of first feature extraction processing;
  • the detection module is also used to: for each level of first feature extraction processing, perform fusion processing on the first intermediate feature data corresponding to the multi-frame target face image in the first feature extraction processing of this level to obtain the first feature extraction of this level Processing the corresponding intermediate fusion data; processing the respective corresponding intermediate fusion data based on the multi-level first feature extraction to obtain the first fusion feature data.
  • the first detection module is also used to: obtain the features corresponding to the first feature extraction process of this level based on the first intermediate feature data corresponding to the multi-frame target face image in the first feature extraction process of this level.
  • Sequence The feature sequence is input to the cyclic neural network for fusion processing, and the intermediate fusion data corresponding to the first feature extraction processing of this level is obtained.
  • the first detection module is also used to: perform a global average pooling process for the first intermediate feature data corresponding to each frame of the target face image in the multi-frame target face image in the first feature extraction process at this level , Obtain the second intermediate feature data corresponding to the first feature extraction process of the multi-frame target face image at this level; according to the time sequence of the multi-frame target face image, arrange the multi-frame target face image in the first feature extraction at this level The corresponding second intermediate feature data are processed to obtain the feature sequence.
  • the first detection module is further configured to: after splicing the intermediate fusion data corresponding to the multi-level first feature extraction processing respectively, perform full connection processing to obtain the first fusion feature data.
  • the first detection module is used to obtain the feature extraction result of each frame of the target face image in the following way: perform multi-level feature extraction processing on the target face image to obtain the first level of each level in the multi-level feature extraction process.
  • the first initial feature data corresponding to the feature extraction process respectively; for each level of the first feature extraction process, according to the first initial feature data of the first feature extraction process of that level, and at least the first level of the first feature extraction process subsequent to the first feature extraction process of this level
  • the first initial feature data of a feature extraction process is fused to obtain the first intermediate feature data corresponding to the first feature extraction process of this level.
  • the feature extraction result of the target face image includes each of the multiple levels of the first feature extraction process.
  • the first-level feature extraction process respectively corresponding first intermediate feature data.
  • the first detection module is further configured to: perform the first initial feature data of the first feature extraction process at this level and the first intermediate feature data corresponding to the first feature extraction process at the lower level of the first feature extraction process at this level.
  • the fusion process obtains the first intermediate feature data corresponding to the first feature extraction process of the level, where the first intermediate feature data corresponding to the first feature extraction process of the lower level is obtained based on the first initial feature data of the first feature extraction process of the lower level .
  • the first detection module is further configured to: up-sampling the first intermediate feature data corresponding to the lower-level first feature extraction process of the first-level feature extraction process to obtain the upper-level first feature extraction process corresponding to the first feature extraction process. Sampling data; fuse the up-sampling data corresponding to the first feature extraction process of this level and the first initial feature data corresponding to the first feature extraction process of this level to obtain the first intermediate feature data corresponding to the first feature extraction process of this level.
  • the second detection module is also used to: cascade the difference images of every two adjacent target face images in the multi-frame target face image to obtain the difference cascade image; based on the difference cascade image , Get the second test result.
  • the second detection module is also used to: perform feature extraction processing on the differential cascade image to obtain the feature extraction result of the differential cascade image; perform feature fusion on the feature extraction result of the differential cascade image to obtain the second fusion Feature data: Based on the second fusion feature data, the second detection result is obtained.
  • the second detection module is also used to: perform multi-level second feature extraction processing on the differential cascaded image to obtain second initial feature data corresponding to each level of second feature extraction processing; based on the multi-level second feature extraction process; The feature extraction process respectively corresponding to the second initial feature data to obtain the feature extraction result of the differential cascade image.
  • the second detection module is further configured to: for each level of the second feature extraction process, the second initial feature data of the level of the second feature extraction process is compared with at least one level before the level of the second feature extraction process.
  • the second initial feature data of the second feature extraction process is fused to obtain the third intermediate feature data corresponding to the second feature extraction process of this level; the feature extraction results of the differential cascaded image include multiple levels of second feature extraction processes corresponding respectively The third intermediate feature data.
  • the second detection module is also used for: down-sampling the second initial feature data of the second feature extraction process of the second feature extraction process at the level to obtain the down-sampling corresponding to the second feature extraction process at the level.
  • Data; the down-sampled data corresponding to the second feature extraction process of this level and the second initial feature data of the second feature extraction process of this level are fused to obtain the third intermediate feature data corresponding to the second feature extraction process of this level.
  • the second detection module is also used to: perform global average pooling on the respective third intermediate feature data of the differential cascaded image in the multi-level second feature extraction process to obtain the differential cascaded image in the multi-level The fourth intermediate feature data corresponding to the second feature extraction process respectively; feature fusion is performed on the fourth intermediate feature data respectively corresponding to the differential cascade image in the multi-level second feature extraction process to obtain the second fused feature data.
  • the second detection module is further configured to: after the fourth intermediate feature data corresponding to the multi-level second feature extraction processing is spliced, then the full connection processing is performed to obtain the second fused feature data.
  • the determining module is further used to: perform a weighted summation of the first detection result and the second detection result to obtain the living body detection result.
  • An optional implementation manner of the present disclosure also provides an electronic device 600.
  • a schematic structural diagram of the electronic device 600 provided for an optional implementation manner of the present disclosure includes: a processor 610, a memory 620; and the memory 620 is used for storage.
  • Processor executable instructions include memory 621 and external memory 622.
  • the memory 621 here is also called an internal memory, and is used to temporarily store calculation data in the processor 610 and data exchanged with an external memory 622 such as a hard disk.
  • the processor 610 exchanges data with the external memory 622 through the memory 621.
  • the machine-readable instructions are executed by the processor, so that the processor 610 performs the following operations: extracting multiple frames of target face images from the acquired video to be detected; based on each frame of the multiple frames of target face images Based on the feature extraction result of the target face image, the first detection result is obtained; the second detection result is obtained based on the difference image of every two adjacent target face images in the multi-frame target face image; and the second detection result is obtained based on the first detection result and the first detection result. 2.
  • the detection result is to determine the live detection result of the video to be detected.
  • An optional implementation manner of the present disclosure also provides a computer-readable storage medium having a computer program stored on the computer-readable storage medium, and the computer program executes the steps of the living body detection method in the foregoing method optional implementation manner when the computer program is run by a processor .
  • the computer-readable storage medium may be a non-volatile storage medium.
  • the embodiment of the present disclosure also discloses an example of specific application of the living body detection method provided in the disclosed embodiment.
  • the execution subject of the living body detection method is the cloud server 1; the cloud server 1 is in communication connection with the user terminal 2. Refer to the following steps for the interaction process between the two.
  • S701 Use terminal 2 to upload the user's video to cloud server 1.
  • the user terminal 2 uploads the obtained user video to the cloud server 1.
  • S702 The cloud server 1 performs face key point detection. After receiving the user video sent by the user terminal 2, the cloud server 1 performs face key point detection on each frame of the user video. When the detection fails, skip to S703; when the detection succeeds, skip to S705.
  • S703 The cloud detection server 1 feeds back the reason for the detection failure to the user terminal 2; at this time, the reason for the detection failure is: no face is detected.
  • the user end 2 After receiving the reason for the detection failure fed back by the cloud server 1, the user end 2 executes S704: reacquires the user video, and jumps to S701.
  • S705 The cloud server 1 crops each frame image in the user video according to the detected key points of the face to obtain the video to be detected.
  • the cloud server 1 performs alignment processing on each frame of the face image in the video to be detected based on the key points of the face.
  • the cloud server 1 filters multiple frames of target face images from the video to be detected.
  • the cloud server 1 inputs multiple frames of target face images into the first sub-model in the living body detection model; and inputs the difference image between every two adjacent frames of target face images into the living body detection model. The second sub-model is tested.
  • the first sub-model is used to obtain the first detection result based on the feature extraction result of each frame of the target face image in the multi-frame target face image.
  • the second sub-model is used to obtain a second detection result based on the difference image of every two adjacent target face images in the multi-frame target face image.
  • the cloud server 1 After obtaining the first detection result and the second detection result output by the living body detection model, the cloud server 1 obtains the living body detection result according to the first detection result and the second detection result.
  • S710 Feed back the result of the living body detection to the user terminal 2.
  • the computer program product of the living body detection method includes a computer-readable storage medium storing program code, and the instructions included in the program code can be used to execute the living body described in the alternative implementation of the above method.
  • the steps of the detection method please refer to the optional implementation of the above method for details, which will not be repeated here.
  • the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in one place, or they may be distributed on multiple network units. Some or all of the units can be selected according to actual needs to achieve the purpose of this optional implementation scheme.
  • each functional unit in each optional implementation manner of the present disclosure may be integrated into one processing unit, or each unit may exist alone physically, or two or more units may be integrated into one unit.
  • the function is implemented in the form of a software functional unit and sold or used as an independent product, it can be stored in a non-volatile computer readable storage medium executable by a processor.
  • the technical solution of the present disclosure essentially or the part that contributes to the prior art or the part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium, including Several machine-executable instructions are used to make an electronic device (which may be a personal computer, a server, or a network device, etc.) execute all or part of the steps of the method described in each alternative implementation of the present disclosure.
  • the aforementioned storage media include: U disk, mobile hard disk, read-only memory (Read-Only Memory, ROM), random access memory (Random Access Memory, RAM), magnetic disk or optical disk and other media that can store program code .

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Health & Medical Sciences (AREA)
  • Human Computer Interaction (AREA)
  • General Health & Medical Sciences (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Software Systems (AREA)
  • Medical Informatics (AREA)
  • Computing Systems (AREA)
  • Databases & Information Systems (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Image Analysis (AREA)
  • Image Processing (AREA)

Abstract

本公开提供了一种活体检测方法、装置、电子设备、存储介质及计算机程序产品。其中,该方法包括:基于获取到的待检测视频中包括的多帧人脸图像之间的相似度,从所述待检测视频中确定多帧目标人脸图像;基于所述多帧目标人脸图像,确定所述待检测视频的活体检测结果。

Description

活体检测方法、装置、电子设备、存储介质及程序产品
相关申请的交叉引用
本专利申请要求于2019年10月31日提交的、申请号为201911063398.2、发明名称为“活体检测方法、装置、电子设备及存储介质”的中国专利申请的优先权,该申请的全文以引用的方式并入本文中。
技术领域
本公开涉及图像处理技术领域,具体而言,涉及活体检测方法、装置、电子设备、存储介质及程序产品。
背景技术
人脸识别技术被应用于身份验证时,首先通过图像采集设备实时获取用户的人脸照片,然后将实时获取的人脸照片与预存的人脸照片进行比对,如果比对一致,则身份验证通过。
发明内容
有鉴于此,本公开至少提供一种活体检测方法、装置、电子设备及存储介质,能够提升活体检测过程中的检测效率。
第一方面,本公开可选实现方式还提供一种活体检测方法,包括:基于获取到的待检测视频中包括的多帧人脸图像之间的相似度,从所述待检测视频中确定多帧目标人脸图像;基于所述多帧目标人脸图像,确定所述待检测视频的活体检测结果。
第二方面,本公开可选实现方式提供一种活体检测装置,包括:获取单元,用于基于获取到的待检测视频中包括的多帧人脸图像之间的相似度,从所述待检测视频中确定多帧目标人脸图像;检测单元,用于基于所述多帧目标人脸图像,确定所述待检测视频的活体检测结果。
第三方面,本公开可选实现方式还提供一种电子设备,处理器、存储有所述处理器可执行的机器可读指令的存储器,所述机器可读指令被所述处理器执行时,促使所述处理器执行上述第一方面所述的活体检测方法。
第四方面,本公开可选实现方式还提供一种计算机可读存储介质,该计算机可读存储介质上存储有计算机程序,该计算机程序被电子设备运行时,促使所述电子设备执行上述第一方面所述的活体检测方法。
第五方面,本公开可选实现方式还提供一种计算机程序产品,包括机器可执行指令,所述机器可执行指令被电子设备读取并执行时,促使所述电子设备执行上述第一方面所述的活体检测方法。
本公开基于获取到的待检测视频中包括的多帧人脸图像之间的相似度,从待检测视频中提取多帧目标人脸图像,然后基于多帧目标人脸图像,确定待检测视频的活体检测结果,利用用户的多帧差别较大的人脸图像来静默式地检测用户是否为活体,检测效率更高。
附图说明
图1示出了本公开实施例所提供的一种活体检测方法的流程图。
图2A示出了本公开实施例所提供的一种从待检测视频中提取预设数量的目标人脸图像的方法的流程图。
图2B示出了本公开另一实施例提供的一种从待检测视频中提取预设数量的目标人脸图像的方法的流程图。
图3A示出了本公开实施例所提供得到每帧目标人脸图像的特征提取结果的过程的流程图。
图3B示出了本公开实施例所提供将所述多帧目标人脸图像的特征提取结果进行特征融合处理得到第一融合特征数据的过程的流程图。
图3C示出了本公开实施例所提供的一种活体检测方法中,基于多帧目标人脸图像中每帧目标人脸图像的特征提取结果,得到第一检测结果的过程。
图4A示出了本公开实施例所提供的一种对差分级联图像进行特征提取的方式的流程图。
图4B示出了本公开实施例所提供的一种活体检测方法中,基于多帧目标人脸图像中每两帧相邻的目标人脸图像的差分图像,得到第二检测结果的过程。
图4C示出了本公开实施例所提供的对差分级联图像的特征提取结果进行特征融合的过程的流程图。
图5示出了本公开另一实施例所提供的一种活体检测方法的流程图。
图6A示出了本公开实施例所提供的一种活体检测装置的示意图。
图6B示出了本公开实施例所提供的一种电子设备的示意图。
图7示出本公开实施例提供的活体检测方法应用过程的流程图。
具体实施方式
为使本公开可选实现方式的目的、技术方案和优点更加清楚,下面将结合本公开可选实现方式中附图,对本公开可选实现方式中的技术方案进行清楚、完整地描述,显然,所描述的可选实现方式仅仅是本公开一部分可选实现方式,而不是全部的可选实现方式。通常在此处附图中描述和示出的本公开可选实现方式的组件可以以各种不同的配置来布置和设计。因此,以下对在附图中提供的本公开的可选实现方式的详细描述并非旨在限制要求保护的本公开的范围,而是仅仅表示本公开的选定可选实现方式。基于本公开的可选实现方式,本领域技术人员在没有做出创造性劳动的前提下所获得的所有其他可选实现方式,都属于本公开保护的范围。
当前在基于图像识别的方法进行人脸活体检测的时候,为了在人脸识别时验证待检测用户是否为活体,通常需要待检测用户做出某些指定的动作。以银行***对用户进行身份验证为例,需要用户站在终端设备的摄像头前边,并按照终端设备中的提示做出某种指定的表情动作。在用户做出指定动作的时候,摄像头获取人脸视频,然后基于获取的人脸视频检测用户是否做出指定动作,并检测做出指定动作的用户是否为合法用户。若该用户是合法用户,则身份验证通过。这种活体检测方式通常会在终端设备与用户的交互过程中耗费大量的时间,导致检测效率较低。
本公开提供了一种活体检测方法及装置,能够从待检测视频中提取多帧目标人脸图像,然后基于多帧目标人脸图像中每帧目标人脸图像的特征提取结果得到第一检测结果,并基于多帧目标人脸图像中每两帧相邻的目标人脸图像的差分图像得到第二检测结果;然后基于第一检测结果和第二检测结果,确定待检测视频的活体检测结果。在该方法中,不需要用户做出任何的指定动作,而是利用用户的多帧差别较大的人脸图像来静默式地来检测用户是否为活体,检测效率更高。
同时,若非法登录者通过翻拍屏幕获得的人脸视频试图进行欺骗,则由于通过翻拍所获得的图像会丢失大量原始图像的图像信息,由于图像信息的丢失造成无法检测到用户外表细微变化,进而可以判断出不是活体,因此本申请提供的方法能够有效抵御屏幕翻拍的攻击手段。
应注意到:相似的标号和字母在下面的附图中表示类似项,因此,一旦某一项在一个附图中被定义,则在随后的附图中不需要对其进行进一步定义和解释。
为便于对本可选实现方式进行理解,首先对本公开实施例所公开的一种活体检测方法进行详细介绍,本公开实施例所提供的活体检测方法的执行主体一般为具有一定计算 能力的电子设备,该电子设备例如包括:终端设备或服务器或其它处理设备,终端设备可以为用户设备(User Equipment,UE)、移动设备、用户终端、终端、蜂窝电话、无绳电话、个人数字处理(Personal Digital Assistant,PDA)、手持设备、计算设备、车载设备、可穿戴设备等。在一些可能的实现方式中,该活体检测方法可以通过处理器调用存储器中存储的计算机可读指令的方式来实现。
下面以执行主体为终端设备为例对本公开可选实现方式提供的活体检测方法加以说明。
参见图1所示,为本公开实施例提供的活体检测方法的流程图,方法包括步骤S101-S104。
S101:从获取到的待检测视频中提取多帧目标人脸图像。
S102:基于所述多帧目标人脸图像中每帧目标人脸图像的特征提取结果,得到第一检测结果。
S103:基于所述多帧目标人脸图像中每两帧相邻的目标人脸图像的差分图像,得到第二检测结果。
S104:基于所述第一检测结果和所述第二检测结果,确定所述待检测视频的活体检测结果。
其中,S102和S103无执行的先后顺序。下面分别对上述S101-S104加以详细说明。
I:在上述步骤S101中,在终端设备中安装有图像获取装置,通过该图像获取装置能够即时获取原始检测视频。在原始检测视频的每帧图像中,包括有人脸。可以将原始检测视频作为待检测视频;也可以对原始检测视频中包括的人脸部位进行图像截取,以获得待检测视频。
为了提升检测精度,检测视频的视频时长可以在预设时长阈值以上,该预设时长范围可以根据实际需要进行具体设定,例如该预设时长阈值为2秒、3秒、4秒等。
待检测视频中包括的人脸图像的帧数,大于需要提取的目标人脸图像的帧数。目标人脸检测图像的帧数,可以是固定的,也可以是根据待检测视频的视频长度来确定的。
在得到待检测视频后,要从待检测视频中提取多帧目标人脸图像。示例性的,在本公开一可选实现方式中,例如基于待检测视频中包括的多帧人脸图像之间的相似度,从所述待检测视频中确定所述多帧目标人脸图像。在基于待检测视频中包括的多帧人脸图像之间的相似度,确定多帧目标人脸图像时,多帧目标人脸图像满足下述两个要求中的至少一种。
要求一、多帧目标人脸图像中每两帧相邻的目标人脸图像之间的相似度低于第一数值。例如,可以将待检测视频中任一帧人脸图像作为基准图像,分别确定其余的各帧人脸图像与基准图像之间的相似度,并从中取相似度低于第一数值的每帧人脸图像作为目标人脸图像中的一帧。其中,第一数值可以是预设的一个数值。这样,所得到的多张目标人脸图像之间具有较大差别,进而能够以较高精度得到检测结果。
要求二、从所述待检测视频中确定所述多帧目标人脸图像中的第一目标人脸图像;基于所述第一目标人脸图像,从所述待检测视频的多帧连续人脸图像中确定第二目标人脸图像,其中,所述第二目标人脸图像与所述第一目标人脸图像之间的相似度满足预设的相似度要求。相似度要求可以包括:所述第二目标人脸图像为所述多帧连续的人脸图像中与所述第一目标人脸图像之间的相似度最小的人脸图像。这样,所得到的多张目标人脸图像之间具有较大差别,进而能够以较高精度得到检测结果。
在一些例子中,可以采用下述方式确定多帧目标人脸图像中的第一目标人脸图像:将所述待检测视频划分为多个片段,其中,每个片段包括一定数量的连续的人脸图像;从所述多个片段的第一片段中选取第一目标人脸图像。并基于所述第一目标人脸图像,从所述多个片段的每个片段中确定第二目标人脸图像。
通过划分多个片段的方式来确定目标人脸图像,能够将目标人脸图像分散到整段待检测视频,进而更好的捕捉用户在待检测视频持续时长内表情的变化。
具体的实现过程例如下述图2A所示。图2A为本公开实施例提供的一种从待检测视频中提取预设数量的目标人脸图像的方法的流程图,包括以下步骤。
S201:按照待检测视频中各帧人脸图像对应的时间戳的先后顺序,将待检测视频中包括的人脸图像依级划分为N个图像组;其中,N=预设数量-1。这里,N个图像组中,不同图像组中所包括的人脸图像的数量可以相同,也可以不同,具体可以根据实际的需要进行设定。
S202:针对第一个图像组,将该图像组中的第一帧人脸图像确定为第一帧目标人脸图像,并将该第一帧目标人脸图像作为基准人脸图像,获取该图像组中所有人脸图像与该基准人脸图像之间的相似度;将与该基准人脸图像之间的相似度最小的人脸图像确定为该图像组中的第二目标人脸图像。
S203:针对其它每个图像组,将上一个图像组中的第二目标人脸图像作为基准人脸图像,获取该图像组中各帧人脸图像与该基准人脸图像之间的相似度;将与该基准人脸图像之间相似度最小的人脸图像作为该图像组的第二目标人脸图像。
在具体实施中,可以采用但不限于下述两种方式中任一种确定某帧人脸图像与基准人脸图像之间的相似度。可以将该帧人脸图像称为第一人脸图像,将基准人脸图像称为第二人脸图像。
需要说明的是,对于要求一中多帧人脸图像之间的相似度,也可以用这两种方式进行计算。这种情况下,可以将多帧人脸图像中的任一帧人脸图像称为第一人脸图像,将另一帧人脸图像称为第二人脸图像。
方式一、基于所述第一人脸图像中每个像素点的像素值、和所述第二人脸图像中每个像素点的像素值,得到所述第一人脸图像和所述第二人脸图像的人脸差分图像;根据所述人脸差分图像中每个像素点的像素值,得到所述人脸差分图像对应的方差;将所述方差作为所述第一人脸图像和所述第二人脸图像之间的相似度。这里,人脸差分图像中任一像素点M的像素值=第一人脸图像中像素点M’的像素值-第二人脸图像中像素点M”的像素值。其中,像素点M在人脸差分图像中的位置,像素点M’在该人脸图像中的位置、以及像素点M”在基准人脸图像中的位置一致。得到的方差越大,则该人脸图像与基准人脸图像之间的相似度越小。通过该方法得到的相似度,具有运算简单的特征。
方式二、对第一人脸图像与第二人脸图像分别进行至少一级特征提取,得到第一人脸图像和第二人脸图像分别对应的特征数据;然后计算第一人脸图像和第二人脸图像分别对应的特征数据之间的距离,并将该距离作为第一人脸图像和第二人脸图像之间的相似度。距离越大,则第一人脸图像与第二人脸图像之间的相似度越小。这里,可以采用卷积神经网络对第一人脸图像和第二人脸图像进行特征提取。
例如,待检测视频中的人脸图像有20帧,分别为a1-a20,目标人脸图像的预设数量为5,则按照时间戳的先后顺序,将待检测视频划分为4个分组,分别为:第一组:a1-a5;第二组:a6-a10;第三组:a11-a15;第四组:a16-a20。
针对第一个图像组,以a1作为第一帧目标人脸图像,并将a1作为基准人脸图像,获取a2-a5分别与a1之间的相似度。假设a3与a1之间的相似度最小,则将a3作为该第一个图像组中的第二目标人脸图像。针对第二个图像组,以a3作为基准人脸图像,并获取a6-a10分别与a3之间的相似度。假设a7与a3之间的相似度最小,则将a7作为第二个图像组中的第二目标人脸图像。针对第三个图像组,以a7作为基准人脸图像,并获取a11-a15分别与a7之间的相似度。假设a14与a7之间的相似度最小,则将a14作为第三个图像组中的第二目标人脸图像。针对第四个图像组,以a14作为基准人脸图像,并获取a16-a20分别与a14之间的相似度。假设a19与a14之间的相似度最小,则将a19作为第四个图像组中的第二目标人脸图像。则最终得到的目标人脸图像包括:a1、a3、a7、a14、a19共五帧。
在一些例子中,从待检测视频中选取第一目标人脸图像;然后将其余的其他人脸图像划分为多个片段,并基于第一目标人脸图像,从多个片段中根据该第一目标人脸图像确定第二目标人脸图像。
具体的实现过程例如下述图2B所示。图2B为本公开另一实施例提供的一种从待检测视频中提取预设数量的目标人脸图像的方法的流程图,包括以下步骤。
S211:将待检测视频中的第一帧人脸图像确定为第一帧目标人脸图像。
S212:按照待检测视频中各帧人脸图像对应的时间戳的先后顺序,将待检测视频中包括的除第一帧目标人脸图像外的人脸图像依级划分为N个图像组;其中,N=预设数量-1。
S213:针对第一个图像组,将第一帧目标人脸图像作为基准人脸图像,获取该图像组中所有人脸图像与该基准人脸图像之间的相似度;将与该基准人脸图像之间的相似度最小的人脸图像确定为该第一个图像组中的第二目标人脸图像。
S214:针对其它每个图像组,将上一个图像组中的第二目标人脸图像作为基准人脸图像,获取该图像组中各帧人脸图像与该基准人脸图像之间的相似度;将与该基准人脸图像之间相似度最小的人脸图像作为该图像组的第二目标人脸图像。
这里,人脸图像和基准人脸图像之间的相似度的确定方式,与上述图2A中的确定方式类似,在此不再赘述。
例如:待检测视频中的人脸图像有20帧,分别为a1-a20,目标人脸图像的预设数量为5,将a1作为第一帧目标人脸图像,则按照时间戳的先后顺序,将a2-a20划分为4个分组,分别为:第一组:a2-a6;第二组:a7-a11;第三组:a12-a16;第四组:a17-a20。
针对第一个图像组,将a1作为基准人脸图像,获取a2-a6分别与a1之间的相似度。假设a4与a1之间的相似度最小,则将a4作为该第一个图像组中的第二目标人脸图像。针对第二个图像组,以a4作为基准人脸图像,并获取a7-a11分别与a4之间的相似度。假设a10与a4之间的相似度最小,则将a10作为第二个图像组中的第二目标人脸图像。针对第三个图像组,以a10作为基准人脸图像,并获取a12-a16分别与a10之间的相似度。假设a13与a10之间的相似度最小,则将a13作为第三个图像组中的第二目标人脸图像。针对第四个图像组,以a13作为基准人脸图像,并获取a17-a20分别与a13之间的相似度。假设a19与a13之间的相似度最小,则将a19作为第四个图像组中的第二目标人脸图像。则最终得到的目标人脸图像包括:a1、a4、a10、a13、a19共五帧。
另外,在本公开一些例子中,为了避免由于用户整体发生位移,例如头部位置、方向变化对人体外表细微变化所造成的干扰,在从待检测视频中提取预设数量的目标人脸图像之前,活体检测方法还包括:获取所述待检测视频包括的多帧人脸图像中每帧人脸图像的关键点信息;基于所述多帧人脸图像中每帧人脸图像的关键点信息,对所述多帧人脸图像进行对齐处理,得到对齐处理后的多帧人脸图像。
例如,确定待检测人脸视频中的多帧人脸图像中,每帧人脸图像中的至少三个目标关键点的关键点位置;基于各帧人脸图像中的目标关键点的关键点位置,以对应时间戳最早的人脸图像作为基准图像,对除基准图像外的其他各帧人脸图像进行关键点对齐处理,得到与所述其他各帧人脸图像分别对应的对齐人脸图像。
这里,可以将待检测视频中的多帧人脸图像依级输入至预先训练的人脸关键点检测模型中,得到每帧人脸图像中各个目标关键点的关键点位置,然后基于得到的目标关键点的关键点位置,以第一帧人脸图像为基准图像,对除第一帧人脸图像外的其他人脸图像进行对齐处理,使得人脸在不同人脸图像中的位置、角度均保持一致。避免头部位置、方向变化对人体人脸细微变化造成的干扰。
在该种情况下,基于所述获取到的待检测视频中包括的所述多帧人脸图像之间的相似度,从所述待检测视频中确定多帧目标人脸图像,包括:基于所述对齐处理后的多帧人脸图像中之间的相似度,从所述对齐处理后的多帧人脸图像中确定所述多帧目标人脸图像。这里确定目标人脸图像的方式,与上述方式类似,在此不再赘述。
Ⅱ:在上述步骤S102中,可以将所述多帧目标人脸图像的各自特征提取结果进行特征融合处理,得到第一融合特征数据;基于所述第一融合特征数据,得到所述第一检测结果。
通过对多帧目标人脸图像进行多维度的特征提取和时序上的特征融合,使得各帧目标人脸图像对应的特征数据中,包含了人脸细微变化的特点,进而在不需要用户做出任何指定动作的前提下,进行精确的活体检测。
首先,对获取每帧目标人脸图像的特征提取结果的具体方式加以说明。
图3A为本公开实施例提供得到每帧目标人脸图像的特征提取结果的过程的流程图,包括以下步骤。
S301:对所述目标人脸图像进行多级特征提取处理,得到所述多级特征提取处理中每级第一特征提取处理分别对应的第一初始特征数据。
此处,可以将目标人脸图像输入至预先训练的第一卷积神经网络中,对目标人脸图像进行多级的第一特征提取处理。
一种可选实现方式中,该第一卷积神经网络中包括多个卷积层;多个卷积层依级相连,任一卷积层的输出,为该卷积层的下一个卷积层的输入。且每个卷积层的输出,作为与该卷积层对应的第一中间特征数据。
另一种可选实现方式中,在多层卷积层之间,还可以设置池化层、全连接层等;例如在每个卷积层之后连接一池化层,并在池化层后连接一全连接层,使得卷积层、池化层、和全连接层,构成一级进行第一特征提取处理的网络结构。
第一卷积神经网络的具体结构,可以根据实际需要进行具体设置,在此不再赘述。
第一卷积神经网络中卷积层的数量与进行第一特征提取处理的级数一致。
S302:针对每级所述第一特征提取处理,根据该级第一特征提取处理的第一初始特征数据、与该级第一特征提取处理后续的至少一级第一特征提取处理的第一初始特征数据进行融合处理,得到该级第一特征提取处理对应的第一中间特征数据,其中,所述目标人脸图像的特征提取结果包括所述多级第一特征提取处理中每级第一特征提取处理分别对应的第一中间特征数据。
这样,使得每一级第一特征提取处理得到更丰富的人脸特征,从而最终得到更高的检测精度。
此处,可以采用下述方式得到任一级第一特征提取处理对应的第一中间特征数据:对该级第一特征提取处理的第一初始特征数据与该级第一特征提取处理的下级第一特征提取处理对应的第一中间特征数据进行融合处理,得到所述该级第一特征提取处理对应的第一中间特征数据,其中,所述下级第一特征提取处理对应的第一中间特征数据是基于所述下级第一特征提取处理的第一初始特征数据得到的。
这样,使得每一级第一特征提取处理得到更丰富的人脸特征,从而最终得到更高的检测精度。
具体地,针对除最后一级外的其他每级第一特征提取处理,基于该级第一特征提取处理得到的第一初始特征数据,以及下一级第一特征提取处理得到的第一中间特征数据,得到与该级第一特征提取处理对应的第一中间特征数据;针对最后一级第一特征提取处理,将最后一级第一特征提取处理得到的第一初始特征数据,确定为该最后一级第一特征提取处理对应的第一中间特征数据。
这里,可以采用下述方式得到与该级第一特征提取处理对应的第一中间特征数据:对该级第一特征提取处理的下级第一特征提取处理对应的第一中间特征数据进行上采样,得到该级第一特征提取处理对应的上采样数据;融合该级第一特征提取处理对应的上采样数据和第一初始特征数据,得到该级第一特征提取处理对应的第一中间特征数据。
将深层特征提取处理的特征调整通道数后进行上采样,和浅层特征提取处理的特征相加,从而使得深层特征能够向浅层特征流动,因此丰富了浅层特征提取处理提取到的信息,增加了检测精度。
例如,对目标人脸图像进行5级第一特征提取处理。5级特征提取处理得到的第一初始特征数据分别为:V1、V2、V3、V4以及V5。
针对第5级第一特征提取处理,将V5作为该第5级第一特征提取处理对应的第一中间特征数据M5。针对第4级第一特征提取处理,将第5级第一特征提取处理得到的第一中间特征数据M5进行上采样处理,得到第4级第一特征提取处理对应的上采样数据M5’。基于V4以及M5’生成第4级第一特征提取处理对应的第一中间特征数据M4。
类似的,可以得到第3级第一特征提取处理对应的第一中间特征数据M3。可以得 到第2级第一特征提取处理对应的第一中间特征数据M2。
针对第1级第一特征提取处理,将第2级第一特征提取处理得到的第一中间特征数据M2进行上采样处理,得到第1级第一特征提取处理对应的上采样数据M2’。基于V1以及M2’生成第1级第一特征提取处理对应的第一中间特征数据M1。
可以采用下述方式融合该级第一特征提取处理对应的上采样数据和第一初始特征数据,得到该级第一特征提取处理对应的第一中间特征数据:将所述上采样数据和所述第一初始特征数据相加。这里,相加是指将上采样数据中,每一个数据的数据值,与第一初始特征数据中对应位置数据的数据值相加。
对下一级第一特征提取处理对应的第一中间特征数据进行上采样后,得到的上采样数据和本级第一特征提取处理对应的第一初始特征数据的维度相同,在将上采样数据和第一初始特征数据相加后,得到的第一中间特征数据的维度,也与本级第一特征提取处理对应的第一初始特征数据的维度相同。
在一些例子中,每一级第一特征提取处理对应的第一初始特征数据的维度和卷积神经网络各级的网络设置相关,本申请对此不作限制。
另外一种可选实现方式中,也可以将上采样数据和第一初始特征数据进行拼接。
例如上采样数据、和第一初始特征数据的维度均为m*n*f,将两者进行纵向拼接后,得到的第一中间特征数据的维度为:2m*n*f。将两者进行横向拼接后,得到的第一中间特征数据的维度为:m*2n*f。
下面,对将所述多帧目标人脸图像的特征提取结果进行特征融合处理,得到第一融合特征数据的过程加以详细说明。
图3B为本公开实施例提供将所述多帧目标人脸图像的特征提取结果进行特征融合处理得到第一融合特征数据的过程的流程图,包括以下步骤。
S311:针对每级第一特征提取处理,对所述多帧目标人脸图像在该级第一特征提取处理中分别对应的第一中间特征数据进行融合处理,得到该级第一特征提取处理对应的中间融合数据。
这里,可以采用下述方式得到每级第一特征提取处理对应的中间融合数据:基于所述多帧目标人脸图像在该级第一特征提取处理中分别对应的第一中间特征数据,得到与该级第一特征提取处理对应的特征序列;将所述特征序列输入到循环神经网络进行融合处理,得到该级第一特征提取处理对应的中间融合数据。
通过将各目标人脸图像进行空间变化上的特征融合,能够更好的提取到人脸随时间变化而发生细微变化的特征,从而增加活体检测的精度。
这里,循环神经网络例如包括:长短期记忆网络(Long Short-Term Memory,LSTM)、循环神经网络(Recurrent Neural Networks,RNN)、门控循环单元(Gated Recurrent Unit,GRU)中一种或者多种。
若第一特征提取处理有n级,则最终能够得到n个中间融合数据。
在另一可选实现方式中,基于所述多帧目标人脸图像在该级第一特征提取处理中分别对应的第一中间特征数据,得到与该级第一特征提取处理对应的特征序列之前,还包括:针对所述多帧目标人脸图像中的每帧目标人脸图像在该级第一特征提取处理中对应的第一中间特征数据进行全局平均池化处理,得到所述多帧目标人脸图像在该级第一特征提取处理分别对应的第二中间特征数据;所述基于所述多帧目标人脸图像在该级第一特征提取处理中分别对应的第一中间特征数据,得到与该级第一特征提取处理对应的特征序列,具体为:按照所述多帧目标人脸图像的时间顺序,基于所述多帧目标人脸图像在该级第一特征提取处理分别对应的第二中间特征数据,得到所述特征序列。
这里,全局平均池化,能够将三维特征数据转换为二维特征数据。从而将第一中间特征数据进行维度上的转化,简化后续的处理过程。
若某一目标人脸图像在某级第一特征提取处理中,得到的第一中间特征数据的维度为7*7*128,其可以理解为将128个7*7的二维矩阵叠加在一起。在对该第一中间 特征数据进行全局平均池化时,针对每一个7*7的二维矩阵,计算该二维矩阵中各个元素的值的均值。最终,能够得到128个均值,将128个均值作为第二中间特征数据。
例如目标人脸图像分别为:b1-b5。每帧目标人脸图像在某一级第一特征提取处理对应的第二中间特征数据分别为:P1、P2、P3、P4以及P5,则由该5帧目标人脸图像的第二中间特征数据得到的该级第一特征提取处理对应的特征序列为:(P1,P2,P3,P4,P5)。
针对任一级第一特征提取处理,在得到各帧目标人脸图像在该级第一特征提取处理分别对应的第二中间特征数据后,基于各帧目标人脸图像的时间顺序,排列所述多帧目标人脸图像在该级第一特征提取处理分别对应的第二中间特征数据,可以得到所述特征序列。
在得到各级第一特征提取处理分别对应的将与该级第一特征提取处理对应的特征序列后,将特征序列分别输入至对应的循环神经网络模型中,得到与各级第一特征提取处理对应的中间融合数据。
312:基于所述多级第一特征提取处理分别对应的中间融合数据,得到所述第一融合特征数据。
多层级提取目标人脸图像中的特征,可以使得最终得到的目标人脸图像的特征数据包含有更加丰富的信息,从而提升活体检测的精度。
在一个例子中,可以将各级第一特征提取处理分别对应的中间融合数据进行拼接,得到统一表征目标人脸图像的第一融合特征数据。在另一个例子中,也可以将所述多级第一特征提取处理分别对应的中间融合数据进行拼接后,进行全连接处理,得到所述第一融合特征数据。
进一步将各个中间融合数据进行融合,使得第一融合特征数据受到每级第一特征提取处理分别对应的中间融合数据的影响,从而使得所提取出来的第一融合特征数据能够更好的表征多帧目标人脸图像的特征。
在得到第一融合特征数据后,可以将第一融合特征数据输入至第一分类器,得到第一检测结果。第一分类器例如为softmax分类器。
如图3C所示,提供一种基于多帧目标人脸图像中每帧目标人脸图像的特征提取结果,得到第一检测结果的示例,在该示例中,对某一帧目标人脸图像进行5级特征提取处理,得到的第一初始特征数据分别为:V1、V2、V3、V4以及V5。
基于第一初始特征数据V5生成第五级第一特征提取处理的第一中间特征数据M5。
对第一中间特征数据M5进行上采样,得到第四级第一特征提取处理的上采样数据M5’。将第四级第一特征提取处理的第一初始特征数据V4和上采样数据M5’相加,得到第四级第一特征提取处理的第一中间特征数据M4。对第一中间特征数据M4进行上采样,得到第三级第一特征提取处理的上采样数据M4’。将第三级第一特征提取处理的第一初始特征数据V3和上采样数据M4’相加,得到第三级第一特征提取处理的第一中间特征数据M3。对第一中间特征数据M3进行上采样,得到第二级第一特征提取处理的上采样数据M3’。将第二级第一特征提取处理的第一初始特征数据V2和上采样数据M3’相加,得到第二级第一特征提取处理的第一中间特征数据M2。对第一中间特征数据M2进行上采样,得到第一级第一特征提取处理的上采样数据M2’;将第一级第一特征提取处理的第一初始特征数据V1和上采样数据M2’相加,得到第一级第一特征提取处理的第一中间特征数据M1。将得到的第一中间特征数据M1、M2、M3、M4以及M5作为对该帧目标人脸图像进行特征提取后,得到的特征提取结果。
然后,针对每帧目标人脸图像,将该目标人脸图像在五级第一特征提取处理分别对应的第一中间特征数据进行平均池化,得到该帧目标人脸图像,在五级第一特征提取处理下,分别对应的第二中间特征数据G1、G2、G3、G4以及G5。
假设目标人脸图像有5帧,按照时间戳的先后顺序依次为a1-a5,第一帧目标人脸图像a1在五级第一特征提取处理下分别对应的第二中间特征数据为:G11、G12、G13、G14、G15;第二帧目标人脸图像a2在五级第一特征提取处理下分别对应的第二中间特 征数据为:G21、G22、G23、G24、G25;第三帧目标人脸图像a3在五级第一特征提取处理下分别对应的第二中间特征数据为:G31、G32、G33、G34、G35;第四帧目标人脸图像a4在五级第一特征提取处理下分别对应的第二中间特征数据为:G41、G42、G43、G44、G45;第五帧目标人脸图像a5在五级第一特征提取处理下分别对应的第二中间特征数据为:G51、G52、G53、G54、G55。
那么,第一级特征提取处理对应的特征序列为:(G11,G21,G31,G41,G51)。第二级特征提取处理对应的特征序列为:(G12,G22,G32,G42,G52)。第三级特征提取处理对应的特征序列为:(G13,G23,G33,G43,G53)。第四级特征提取处理对应的特征序列为:(G14,G24,G34,G44,G54)。第五级特征提取处理对应的特征序列为:(G15,G25,G35,G45,G55)。
然后将特征序列(G11,G21,G31,G41,G51)输入至与第一级第一特征提取处理对应的LSTM网络,得到与第一级第一特征提取处理对应的中间融合数据R1。将特征序列(G12,G22,G32,G42,G52)输入至与第二级第一特征提取处理对应的LSTM网络,得到与第二级第一特征提取处理对应的中间融合数据R2。将特征序列(G13,G23,G33,G43,G53)输入至与第三级第一特征提取处理对应的LSTM网络,得到与第三级第一特征提取处理对应的中间融合数据R3。将特征序列(G14,G24,G34,G44,G54)输入至与第四级第一特征提取处理对应的LSTM网络,得到与第四级第一特征提取处理对应的中间融合数据R4。将特征序列(G15,G25,G35,G45,G55)输入至与第五级第一特征提取处理对应的LSTM网络,得到与第二级第一特征提取处理对应的中间融合数据R5。
将中间融合数据R1、R2、R3、R4以及R5拼接后,传入全连接层,进行全连接处理,得到第一融合特征数据。然后将第一融合特征数据传入至第一分类器,得到第一检测结果。
Ⅲ:在上述步骤S103中,可以采用下述方式基于所述多帧目标人脸图像中每两帧相邻的目标人脸图像的差分图像,得到第二检测结果。
对所述多帧目标人脸图像中每两帧相邻的目标人脸图像的差分图像进行级联处理,得到差分级联图像;基于所述差分级联图像,得到所述第二检测结果。
在多帧差分级联图像中,能够更好的提取变化特征,从而提升第二检测结果的精度。
具体地,每两帧相邻的目标人脸图像的差分图像的获取方式,与上述图2A中方式一的描述类似,在此不再赘述。
将差分图像进行级联处理时,使差分图像进行颜色通道上的级联。例如,若差分图像为三通道图像,则将两张差分图像级联后,得到的差分级联图像为六通道的图像。
在具体实施中,不同的差分图像的颜色通道数量一致,像素点数量也一致。
例如,若差分图像的颜色通道数量为3,像素点数量为256*1024,则差分图像的表示向量为:256*1024*3。其中,该表示向量中任一元素Aijk的元素值,为像素点Aij’在第k个颜色通道的像素值。
若差分图像有s个,则将s个差分图像进行级联,得到差分级联图像的维度为:256*1024*(3×s)。
在一种可选实现方式中,可以采用下述方式基于差分级联图像,得到第二检测结果:对所述差分级联图像进行特征提取处理,得到所述差分级联图像的特征提取结果;对所述差分级联图像的特征提取结果进行特征融合,得到第二融合特征数据;基于所述第二融合特征数据,得到所述第二检测结果。
在多帧差分级联图像中,能够更好的提取变化特征,从而提升第二检测结果的精度。
下面先通过下述图4A对差分级联图像进行特征提取处理的具体过程加以详细描述。图4为本公开实施例提供一种对差分级联图像进行特征提取的方式的流程图,包括以下步骤。
S401:对所述差分级联图像进行多级第二特征提取处理,得到与每级第二特征提取处理分别对应的第二初始特征数据。
此处,可以将差分级联图像输入至预先训练的第二卷积神经网络中,对差分级联图像进行多级第二特征提取处理。该第二卷积神经网络与上述第一卷积神经网路类似。需要注意的是,第二卷积神经网络和上述第一卷积神经网络的网络结构可以相同,也可以不同;在两者结构相同的情况下,网络参数不同。第一特征提取处理的级数,与第二特征提取处理的级数可以相同,也可以不同。
S402:基于多级第二特征提取处理分别对应的第二初始特征数据,得到所述差分级联图像的特征提取结果。
对差分级联图像进行多级第二特征提取处理,可以增加特征提取的感受野,丰富差分级联图像中的信息。
示例性的,可以采用下述方式基于多级第二特征提取处理分别对应的第二初始特征数据,得到所述差分级联图像的特征提取结果:针对每级第二特征提取处理,对该级第二特征提取处理的第二初始特征数据,与该级第二特征提取处理之前的至少一级第二特征提取处理的第二初始特征数据进行融合处理,得到该级第二特征提取处理对应的第三中间特征数据;所述差分级联图像的特征提取结果,包括所述多级第二特征提取处理分别对应的第三中间特征数据。
这样,每级第二特征提取处理得到的信息更加丰富,这些信息能够更好的表征差分图像中的变化信息,以提升第二检测结果的精度。
此处,对任一级第二特征提取处理的第二初始特征数据,与该级第二特征提取处理之前的至少一级第二特征提取处理的第二初始特征数据进行融合处理的具体方式可以为:对该级第二特征提取处理的上级第二特征提取处理的第二初始特征数据进行下采样,得到该级第二特征提取处理对应的下采样数据;对该级第二特征提取处理对应的下采样数据和所述第二初始特征数据进行融合处理,得到该级第二特征提取处理对应的第三中间特征数据。
将多级第二特征提取处理得到的信息,由上级第二特征提取处理,向下级第二特征提取处理流动,使得每级第二特征提取处理得到的信息更加丰富。
具体地:针对第一级第二特征提取处理,将第一级第二特征提取处理得到的第二初始特征数据,确定为该级第二特征提取处理对应的第三中间特征数据。
针对其他各级第二特征提取处理,基于该级第二特征提取处理得到的第二初始特征数据,以及上一级第二特征提取处理得到的第三中间特征数据,得到与该级第二特征提取处理对应的第三中间特征数据。
将各级第二特征提取处理分别对应的第三中间特征数据作为对差分级联图像进行特征提取的结果。
可以采用下述方式得到各级第二特征提取处理对应的第三中间特征数据:对上一级第二特征提取处理得到的第三中间特征数据进行下采样,得到该级第二特征提取处理对应的下采样数据,其中,该级第二特征提取处理对应的下采样数据的向量维度,与基于该级第二特征提取处理得到的第二初始特征数据的维度相同;基于该级第二特征提取处理对应的下采样数据以及第二初始特征数据,得到该级第二特征提取处理对应的第三中间特征数据。
例如,图4B所示提供的示例中,对差分级联图像进行5级第二特征提取处理。
5级第二特征提取处理得到的第二初始特征数分别为:W1、W2、W3、W4以及W5。
针对第一级第二特征提取处理,将W1作为该第一级第二特征提取处理对应的第三中间特征数据E1。针对第二级第二特征提取处理,将第一级第二特征提取处理得到的第三中间特征数据E1进行下采样处理,得到第二级第一特征提取处理对应的下采样数据E1’。基于W2以及E1’生成第二级第二特征提取处理对应的第三中间特征数据E2。
类似的,分别得到第三级第二特征提取处理对应的第三中间特征数据E3和第四级第二特征提取处理对应的第三中间特征数据E4。
针对第五级第二特征提取处理,将第四级第二特征提取处理得到的第三中间特征数据E4进行下采样处理,得到第五级第二特征提取处理对应的下采样数据E4’。基于W5以及E4’生成第五级第二特征提取处理对应的第五中间特征数据E5。
下面通过图4C对所述差分级联图像的特征提取结果进行特征融合,得到第二融合特征数据的过程加以详细描述。图4C为本公开实施例提供对差分级联图像的特征提取结果进行特征融合的过程的流程图,包括以下步骤。
S411:对所述差分级联图像在各级第二特征提取处理中的第三中间特征数据分别进行全局平均池化处理,得到所述差分级联图像在各级第二特征提取处理分别对应的第四中间特征数据。
这里,对第三中间特征数据进行全局平均池化的方式与上述对第一中间特征数据进行全局平均池化的方式类似,在此不再赘述。
S412:对所述差分级联图像在各级第二特征提取处理分别对应的第四中间特征数据进行特征融合,得到所述第二融合特征数据。
将第三中间特征数据进行维度上的转化,可以简化后续的处理过程。
可以对各级第二特征提取处理分别对应的第四中间特征数据进行拼接后,输入至全连接网络进行全连接处理,得到第二融合特征数据。在得到第二融合特征数据后,将第二融合特征数据输入至第二分类器,得到第二检测结果。
例如在图4B示出的示例中,第一级第二特征提取处理对应的第三中间特征数据E1经过全局平均池化后,得到对应的第四中间特征数据U1;第二级第二特征提取处理对应的第三中间特征数据E2经过全局平均池化后,得到对应的第四中间特征数据U2;第三级第二特征提取处理对应的第三中间特征数据E3经过全局平均池化后,得到对应的第四中间特征数据U3;第四级第二特征提取处理对应的第三中间特征数据E4经过全局平均池化后,得到对应的第四中间特征数据U4;第五级第二特征提取处理对应的第三中间特征数据E5经过全局平均池化后,得到对应的第四中间特征数据U5。将第四中间特征数据U1、U2、U3、U4以及U5拼接后,输入至全连接层,进行全连接处理,得到第二融合特征数据,然后将第二融合特征数据输入至第二分类器中,得到第二检测结果。
第二分类器例如为softmax分类器。
Ⅳ:在上述S104中,可以采用下述方式确定检测结果:将第一检测结果和第二检测结果进行加权求和,得到目标检测结果。
将第一检测结果和第二检测结果进行加权求和,综合两个检测结果,可以得到更准确的活体检测结果。
第一检测结果、第二检测结果分别对应的权重可以根据实际的需要进行具体设置,这里不做限定。在一个例子中,其各自对应的权重可以相同。
将第一检测结果和第二检测结果进行加权求和后,根据所得到的数值,可以判断出目标检测结果为是否为活体。例如,当该数值大于等于某一阈值时,待检测视频中的人脸为活体的人脸;否则,为非活体的人脸。所述阈值可以在上述第一卷积神经网络和第二卷积神经网络进行训练时获得。例如,可以通过带标注的多个样本训练这两个卷积神经网络,然后得到正样本训练后的加权求和值,以及负样本训练后的加权求和值,从而得到该阈值。
在本公开另一实施例中,还提供一种活体检测方法,该活体检测方法通过活体检测模型实现。活体检测模型包括:第一子模型、第二子模型、以及计算模块;其中第一子模型包括:第一特征提取网络、第一特征融合网络、以及第一分类器;第二子模型包括:第二特征提取网络、第二特征融合网络、及第二分类器;活体检测模型为利用训练样本集中的样本人脸视频训练得到的,样本人脸视频标注有是否为活体的标注信息。
其中:第一特征提取网络用于基于所述多帧目标人脸图像中每帧目标人脸图像 的特征提取结果,得到第一检测结果。第二特征提取网络用于基于所述多帧目标人脸图像中每两帧相邻的目标人脸图像的差分图像,得到第二检测结果。计算模块,用于基于第一检测结果和第二检测结果,得到活体检测结果。
本公开实施例能够从待检测视频中提取多帧目标人脸图像,然后基于多帧目标人脸图像中每帧目标人脸图像的特征提取结果得到第一检测结果,并基于多帧目标人脸图像中每两帧相邻的目标人脸图像的差分图像得到第二检测结果;然后基于第一检测结果和第二检测结果,确定待检测视频的活体检测结果。在该方法中,不需要用户做出任何的指定动作,而是利用用户的多帧差别较大的人脸图像来静默式地来检测用户是否为活体,检测效率更高。
同时,若非法登录者通过翻拍屏幕获得的人脸视频试图进行欺骗,则由于通过翻拍所获得的图像会丢失大量原始图像的图像信息,由于图像信息的丢失造成无法检测到用户外表细微变化,进而可以判断出不是活体,因此本申请提供的方法能够有效抵御屏幕翻拍的攻击手段。
本领域技术人员可以理解,在具体实施方式的上述方法中,各步骤的撰写顺序并不意味着严格的执行顺序而对实施过程得到任何限定,各步骤的具体执行顺序应当以其功能和可能的内在逻辑确定。
参见图5所示,本公开另一实施例还提供一种活体检测方法,包括以下步骤。
S501:基于获取到的待检测视频中包括的多帧人脸图像之间的相似度,从待检测视频中提取多帧目标人脸图像。
S502:基于多帧目标人脸图像,确定待检测视频的活体检测结果。
步骤S501的具体实现方式请参见上文步骤S101的实现方式,在此不再赘述。
本公开实施例通过待检测视频中提取多帧目标人脸图像,且多帧目标人脸图像中的相邻目标人脸图像之间的相似度低于第一数值,然后基于目标人脸图像,确定待检测视频的活体检测结果,不需要用户做出任何的指定动作,而是利用用户的多帧差别较大的人脸图像来静默式地检测用户是否为活体,检测效率更高。
同时,若非法登录者通过翻拍屏幕获得的人脸视频试图进行欺骗,则由于通过翻拍所获得的图像会丢失大量原始图像的图像信息,由于图像信息的丢失造成无法检测到用户外表细微变化,进而可以判断出不是活体,因此本申请提供的方法能够有效抵御屏幕翻拍的攻击手段。
在一种可能的实施方式中,基于多帧目标人脸图像,确定待检测视频的活体检测结果,包括:基于多帧目标人脸图像中每帧目标人脸图像的特征提取结果,得到第一检测结果,和/或基于多帧目标人脸图像中每两帧相邻的目标人脸图像的差分图像,得到第二检测结果;基于第一检测结果和/或第二检测结果,确定待检测视频的活体检测结果。
其中,得到第一检测结果和第二检测结果的实现方式可以分别参照上文S102和S103的描述,这里不再赘述。
在一种可能的实现方式中,获取第一检测结果,并将第一检测结果作为目标检测结果,或者,将第一检测结果进行处理后得到目标检测结果。
在另一种可能的实现方式中,获取第二检测结果,并将第二检测结果作为目标检测结果,或者,将第二检测结果进行处理后得到目标检测结果。
在另一种可能的实施方式中,获取第一检测结果和第二检测结果,并基于第一检测结果和第二检测结果,确定针对待检测视频的活体检测结果,例如,将第一检测结果和第二检测结果进行加权求和,得到活体检测结果。
基于类似的构思,本公开实施例中还提供了与活体检测方法对应的活体检测装置,由于本公开实施例中的装置解决问题的原理与本公开实施例上述活体检测方法相似,因此装置的实施可以参见方法的实施,重复之处不再赘述。
参照图6A所示,为本公开实施例提供的一种活体检测装置的示意图,装置包括:获取单元61和检测单元62。
获取单元61,用于基于获取到的待检测视频中包括的多帧人脸图像之间的相似度,从待检测视频中确定多帧目标人脸图像。
检测单元62,用于基于多帧目标人脸图像,确定待检测视频的活体检测结果。
在一些例子中,多帧目标人脸图像中每两帧相邻的目标人脸图像之间的相似度低于第一数值。
在一些例子中,获取单元61还用于:从待检测视频中确定多帧目标人脸图像中的第一目标人脸图像;基于第一目标人脸图像,从待检测视频的多帧连续人脸图像中确定第二目标人脸图像,其中,第二目标人脸图像与第一目标人脸图像之间的相似度满足预设的相似度要求。
在一些例子中,获取单元61还用于:将待检测视频划分为多个片段,其中,每个片段包括一定数量的连续的人脸图像;从多个片段的第一片段中选取第一目标人脸图像;基于第一目标人脸图像,从多个片段的每个片段中确定第二目标人脸图像。
在一些例子中,获取单元61还用于:比较第一片段中的所有人脸图像和第一目标人脸图像的相似度,将相似度最小的人脸图像作为第一片段的第二目标人脸图像;对其他片段中的每个片段,比较该片段中的所有人脸图像和该片段的上一片段的第二目标人脸图像的相似度,将相似度最小的人脸图像作为该片段的第二目标人脸图像,其中,其他片段为多个片段除第一片段外的片段。
在一些例子中,多帧人脸图像之间的相似度是基于以下方式得到的:从多帧人脸图像中选择两帧人脸图像作为第一人脸图像和第二人脸图像;基于第一人脸图像中每个像素点的像素值、和第二人脸图像中每个像素点的像素值,得到第一人脸图像和第二人脸图像的人脸差分图像;根据人脸差分图像中每个像素点的像素值,得到人脸差分图像对应的方差;将方差作为第一人脸图像和第二人脸图像之间的相似度。
在一些例子中,在从获取到的待检测视频中提取多帧目标人脸图像之前,获取单元61还用于:获取待检测视频包括的多帧人脸图像中每帧人脸图像的关键点信息;基于多帧人脸图像中每帧人脸图像的关键点信息,对多帧人脸图像进行对齐处理,得到对齐处理后的多帧人脸图像;基于对齐处理后的多帧人脸图像中之间的相似度,从对齐处理后的多帧人脸图像中确定多帧目标人脸图像。
在一些例子中,检测单元62包括:第一检测模块和/或第二检测模块、以及确定模块;其中,第一检测模块用于基于多帧目标人脸图像中每帧目标人脸图像的特征提取结果,得到第一检测结果;第二检测模块用于基于多帧目标人脸图像中每两帧相邻的目标人脸图像的差分图像,得到第二检测结果;确定模块用于基于第一检测结果和/或第二检测结果,确定待检测视频的活体检测结果。
在一些例子中,第一检测模块还用于:将多帧目标人脸图像各自的特征提取结果进行特征融合处理,得到第一融合特征数据;基于第一融合特征数据,得到第一检测结果。
在一些例子中,每帧目标人脸图像的特征提取结果包括:对目标人脸图像进行多级第一特征提取处理得到与每级第一特征提取处理分别对应的第一中间特征数据;第一检测模块还用于:针对每级第一特征提取处理,对多帧目标人脸图像在该级第一特征提取处理中分别对应的第一中间特征数据进行融合处理,得到该级第一特征提取处理对应的中间融合数据;基于多级第一特征提取处理分别对应的中间融合数据,得到第一融合特征数据。
在一些例子中,第一检测模块还用于:基于多帧目标人脸图像在该级第一特征提取处理中分别对应的第一中间特征数据,得到与该级第一特征提取处理对应的特征序列;将特征序列输入到循环神经网络进行融合处理,得到该级第一特征提取处理对应的中间融合数据。
在一些例子中,第一检测模块还用于:针对多帧目标人脸图像中的每帧目标人脸图像在该级第一特征提取处理中对应的第一中间特征数据进行全局平均池化处理,得到多帧目标人脸图像在该级第一特征提取处理分别对应的第二中间特征数据;按照多帧目标人脸图像的时间顺序,排列多帧目标人脸图像在该级第一特征提取处理分别对应的 第二中间特征数据,得到特征序列。
在一些例子中,第一检测模块还用于:将多级第一特征提取处理分别对应的中间融合数据进行拼接后,进行全连接处理,得到第一融合特征数据。
在一些例子中,第一检测模块用于采用下述方式得到每帧目标人脸图像的特征提取结果:对目标人脸图像进行多级特征提取处理,得到多级特征提取处理中每级第一特征提取处理分别对应的第一初始特征数据;针对每级第一特征提取处理,根据该级第一特征提取处理的第一初始特征数据、与该级第一特征提取处理后续的至少一级第一特征提取处理的第一初始特征数据进行融合处理,得到该级第一特征提取处理对应的第一中间特征数据,其中,目标人脸图像的特征提取结果包括多级第一特征提取处理中每级第一特征提取处理分别对应的第一中间特征数据。
在一些例子中,第一检测模块还用于:对该级第一特征提取处理的第一初始特征数据与该级第一特征提取处理的下级第一特征提取处理对应的第一中间特征数据进行融合处理,得到该级第一特征提取处理对应的第一中间特征数据,其中,下级第一特征提取处理对应的第一中间特征数据是基于下级第一特征提取处理的第一初始特征数据得到的。
在一些例子中,第一检测模块还用于:对该级第一特征提取处理的下级第一特征提取处理对应的第一中间特征数据进行上采样,得到该级第一特征提取处理对应的上采样数据;融合该级第一特征提取处理对应的上采样数据和该级第一特征提取处理对应的第一初始特征数据,得到该级第一特征提取处理对应的第一中间特征数据。
在一些例子中,第二检测模块还用于:对多帧目标人脸图像中每两帧相邻的目标人脸图像的差分图像进行级联处理,得到差分级联图像;基于差分级联图像,得到第二检测结果。
在一些例子中,第二检测模块还用于:对差分级联图像进行特征提取处理,得到差分级联图像的特征提取结果;对差分级联图像的特征提取结果进行特征融合,得到第二融合特征数据;基于第二融合特征数据,得到第二检测结果。
在一些例子中,第二检测模块还用于:对差分级联图像进行多级第二特征提取处理,得到与每级第二特征提取处理分别对应的第二初始特征数据;基于多级第二特征提取处理分别对应的第二初始特征数据,得到差分级联图像的特征提取结果。
在一些例子中,第二检测模块还用于:针对每级第二特征提取处理,对该级第二特征提取处理的第二初始特征数据,与该级第二特征提取处理之前的至少一级第二特征提取处理的第二初始特征数据进行融合处理,得到该级第二特征提取处理对应的第三中间特征数据;差分级联图像的特征提取结果,包括多级第二特征提取处理分别对应的第三中间特征数据。
在一些例子中,第二检测模块还用于:对该级第二特征提取处理的上级第二特征提取处理的第二初始特征数据进行下采样,得到该级第二特征提取处理对应的下采样数据;对该级第二特征提取处理对应的下采样数据和该级第二特征提取处理的第二初始特征数据进行融合处理,得到该级第二特征提取处理对应的第三中间特征数据。
在一些例子中,第二检测模块还用于:对差分级联图像在多级第二特征提取处理中各自的第三中间特征数据分别进行全局平均池化处理,得到差分级联图像在多级第二特征提取处理分别对应的第四中间特征数据;对差分级联图像在多级第二特征提取处理分别对应的第四中间特征数据进行特征融合,得到第二融合特征数据。
在一些例子中,第二检测模块还用于:将多级第二特征提取处理分别对应的第四中间特征数据进行拼接后,进行全连接处理,得到第二融合特征数据。
在一些例子中,确定模块还用于:将第一检测结果和第二检测结果进行加权求和,得到活体检测结果。
关于装置中的各模块和/或单元的处理流程、以及各模块和/或单元之间的交互流程的描述可以参照上述方法实施例中的相关说明,这里不再详述。
本公开可选实现方式还提供了一种电子设备600,如图6B所示,为本公开可选 实现方式提供的电子设备600结构示意图,包括:处理器610、存储器620;存储器620用于存储处理器可执行指令,包括内存621和外部存储器622。这里的内存621也称内部存储器,用于暂时存放处理器610中的运算数据,以及与硬盘等外部存储器622交换的数据,处理器610通过内存621与外部存储器622进行数据交换。
当电子设备600运行时,机器可读指令被处理器执行,使得处理器610执行以下操作:从获取到的待检测视频中提取多帧目标人脸图像;基于多帧目标人脸图像中每帧目标人脸图像的特征提取结果,得到第一检测结果;基于多帧目标人脸图像中每两帧相邻的目标人脸图像的差分图像,得到第二检测结果;基于第一检测结果和第二检测结果,确定待检测视频的活体检测结果。
或者执行以下操作:基于获取到的待检测视频中包括的多帧人脸图像之间的相似度,从待检测视频中提取多帧目标人脸图像;基于多帧目标人脸图像,确定待检测视频的活体检测结果。
本公开可选实现方式还提供一种计算机可读存储介质,该计算机可读存储介质上存储有计算机程序,该计算机程序被处理器运行时执行上述方法可选实现方式中的活体检测方法的步骤。其中,计算机可读存储介质可以是非易失性存储介质。
另外,参见图7所示,本公开实施例还公开一种将公开实施例提供的活体检测方法进行具体应用的示例。
在该示例中,活体检测方法的执行主体为云端服务器1;云端服务器1与使用端2通信连接。两者的交互过程参见下述步骤。
S701:使用端2将用户视频上传云端服务器1。使用端2将获取的用户视频上传至云端服务器1。
S702:云端服务器1进行人脸关键点检测。云端服务器1在接收到使用端2发送的用户视频后,对用户视频中的各帧图像进行人脸关键点检测。检测失败时,跳转至S703;检测成功时,跳转至S705。
S703:检云端服务器1向使用端2反馈检测失败的原因;此时,检测失败的原因为:未检测到人脸。
使用端2在接收到云端服务器1反馈的检测失败的原因后,执行S704:重新获取用户视频,并跳转至S701。
S705:云端服务器1根据检测到的人脸关键点,对用户视频中的各帧图像进行裁剪,得到待检测视频。
S706:云端服务器1基于人脸关键点对待检测视频中的各帧人脸图像进行对齐处理。
S707:云端服务器1从待检测视频中筛选多帧目标人脸图像。
S708:云端服务器1将多帧目标人脸图像,输入至活体检测模型中的第一子模型;并将每相邻的两帧目标人脸图像之间的差分图像,输入至活体检测模型中的第二子模型,进行检测。
其中,第一子模型,用于基于所述多帧目标人脸图像中每帧目标人脸图像的特征提取结果,得到第一检测结果。第二子模型,用于基于所述多帧目标人脸图像中每两帧相邻的目标人脸图像的差分图像,得到第二检测结果。
S709:云端服务器1得到活体检测模型输出的第一检测结果和第二检测结果后,根据第一检测结果和第二检测结果,得到活体检测结果。
S710:将活体检测结果反馈至使用端2。
通过上述过程,实现了对从使用端2获取的一段视频的活体检测过程。
本公开可选实现方式所提供的活体检测方法的计算机程序产品,包括存储了程序代码的计算机可读存储介质,所述程序代码包括的指令可用于执行上述方法可选实现方式中所述的活体检测方法的步骤,具体可参见上述方法可选实现方式,在此不再赘述。
所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,上述描述的***和装置的具体工作过程,可以参考前述方法可选实现方式中的对应过程,在此不再赘述。在本公开所提供的几个可选实现方式中,应该理解到,所揭露的***、装置和方法,可以通过其它的方式实现。以上所描述的装置可选实现方式仅仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,又例如,多个单元或组件可以结合或者可以集成到另一个***,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些通信接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本可选实现方式方案的目的。
另外,在本公开各个可选实现方式中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。
所述功能如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个处理器可执行的非易失的计算机可读取存储介质中。基于这样的理解,本公开的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干机器可执行指令,用以使得一台电子设备(可以是个人计算机,服务器,或者网络设备等)执行本公开各个可选实现方式所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(Read-Only Memory,ROM)、随机存取存储器(Random Access Memory,RAM)、磁碟或者光盘等各种可以存储程序代码的介质。
最后应说明的是:以上所述可选实现方式,仅为本公开的具体实施方式,用以说明本公开的技术方案,而非对其限制,本公开的保护范围并不局限于此,尽管参照前述可选实现方式对本公开进行了详细的说明,本领域的普通技术人员应当理解:任何熟悉本技术领域的技术人员在本公开揭露的技术范围内,其依然可以对前述可选实现方式所记载的技术方案进行修改或可轻易想到变化,或者对其中部分技术特征进行等同替换;而这些修改、变化或者替换,并不使相应技术方案的本质脱离本公开可选实现方式技术方案的精神和范围,都应涵盖在本公开的保护范围之内。因此,本公开的保护范围应所述以权利要求的保护范围为准。

Claims (29)

  1. 一种活体检测方法,其特征在于,包括:
    基于获取到的待检测视频中包括的多帧人脸图像之间的相似度,从所述待检测视频中确定多帧目标人脸图像;
    基于所述多帧目标人脸图像,确定所述待检测视频的活体检测结果。
  2. 根据权利要求1所述的活体检测方法,其特征在于,所述多帧目标人脸图像中每两帧相邻的目标人脸图像之间的相似度低于第一数值。
  3. 根据权利要求1或2所述的活体检测方法,其特征在于,从获取到的所述待检测视频中提取所述多帧目标人脸图像,包括:
    从所述待检测视频中确定所述多帧目标人脸图像中的第一目标人脸图像;
    基于所述第一目标人脸图像,从所述待检测视频的多帧连续人脸图像中确定第二目标人脸图像,其中,所述第二目标人脸图像与所述第一目标人脸图像之间的相似度满足预设的相似度要求。
  4. 根据权利要求3所述的活体检测方法,其特征在于,所述方法还包括:
    将所述待检测视频划分为多个片段,其中,每个片段包括一定数量的连续的人脸图像;
    从所述待检测视频中确定所述多帧目标人脸图像中的所述第一目标人脸图像,包括:
    从所述多个片段的第一片段中选取第一目标人脸图像;
    基于所述第一目标人脸图像,从所述待检测视频的所述多帧连续人脸图像中确定所述第二目标人脸图像,包括:
    基于所述第一目标人脸图像,从所述多个片段的每个片段中确定第二目标人脸图像。
  5. 根据权利要求4所述的活体检测方法,其特征在于,从所述多个片段的每个片段中确定所述第二目标人脸图像包括:
    比较所述第一片段中的所有人脸图像和所述第一目标人脸图像的相似度,将相似度最小的人脸图像作为所述第一片段的所述第二目标人脸图像;
    对其他片段中的每个片段,比较该片段中的所有人脸图像和该片段的上一片段的第二目标人脸图像的相似度,将相似度最小的人脸图像作为该片段的第二目标人脸图像,其中,所述其他片段为所述多个片段除第一片段外的片段。
  6. 根据权利要求1-5任一项所述的活体检测方法,其特征在于,所述多帧人脸图像之间的相似度是基于以下方式得到的:
    从多帧人脸图像中选择两帧人脸图像作为第一人脸图像和第二人脸图像;
    基于所述第一人脸图像中每个像素点的像素值、和所述第二人脸图像中每个像素点的像素值,得到所述第一人脸图像和所述第二人脸图像的人脸差分图像;
    根据所述人脸差分图像中每个像素点的像素值,得到所述人脸差分图像对应的方差;
    将所述方差作为所述第一人脸图像和所述第二人脸图像之间的所述相似度。
  7. 根据权利要求1-6任一项所述的活体检测方法,其特征在于,在从所述获取到的待检测视频中提取所述多帧目标人脸图像之前,还包括:
    获取所述待检测视频包括的多帧人脸图像中每帧人脸图像的关键点信息;
    基于所述多帧人脸图像中每帧人脸图像的关键点信息,对所述多帧人脸图像进行对齐处理,得到对齐处理后的多帧人脸图像;
    基于所述获取到的待检测视频中包括的所述多帧人脸图像之间的相似度,从所述待检测视频中确定所述多帧目标人脸图像,包括:
    基于所述对齐处理后的多帧人脸图像中之间的相似度,从所述对齐处理后的多帧人脸图像中确定所述多帧目标人脸图像。
  8. 根据权利要求1-7任一项所述的活体检测方法,其特征在于,基于所述多帧目标人脸图像,确定所述待检测视频的所述活体检测结果包括:
    基于所述多帧目标人脸图像中每帧目标人脸图像的特征提取结果,得到第一检测结果,和/或基于所述多帧目标人脸图像中每两帧相邻的目标人脸图像的差分图像,得到第二检测结果;
    基于所述第一检测结果和/或所述第二检测结果,确定所述待检测视频的活体检测结果。
  9. 根据权利要求8所述的活体检测方法,其特征在于,基于所述多帧目标人脸图像中每帧目标人脸图像的所述特征数据,得到所述第一检测结果,包括:
    将所述多帧目标人脸图像各自的特征提取结果进行特征融合处理,得到第一融合特征数据;
    基于所述第一融合特征数据,得到所述第一检测结果。
  10. 根据权利要求9所述的活体检测方法,其特征在于,每帧所述目标人脸图像的特征提取结果包括:
    对所述目标人脸图像进行多级第一特征提取处理得到与每级第一特征提取处理分别对应的第一中间特征数据;
    将所述多帧目标人脸图像各自的特征提取结果进行特征融合处理,得到所述第一融合特征数据,包括:
    针对每级第一特征提取处理,对所述多帧目标人脸图像在该级第一特征提取处理中分别对应的第一中间特征数据进行融合处理,得到该级第一特征提取处理对应的中间融合数据;
    基于所述多级第一特征提取处理分别对应的中间融合数据,得到所述第一融合特征数据。
  11. 根据权利要求10所述的活体检测方法,其特征在于,对所述多帧目标人脸图像在该级第一特征提取处理中分别对应的所述第一中间特征数据进行融合处理,得到该级第一特征提取处理对应的所述中间融合数据,包括:
    基于所述多帧目标人脸图像在该级第一特征提取处理中分别对应的所述第一中间特征数据,得到与该级第一特征提取处理对应的特征序列;
    将所述特征序列输入到循环神经网络进行融合处理,得到该级第一特征提取处理对应的所述中间融合数据。
  12. 根据权利要求11所述的活体检测方法,其特征在于,基于所述多帧目标人脸图像在该级第一特征提取处理中分别对应的所述第一中间特征数据,得到与该级第一特征提取处理对应的所述特征序列之前,还包括:
    针对所述多帧目标人脸图像中的每帧目标人脸图像在该级第一特征提取处理中对应的第一中间特征数据进行全局平均池化处理,得到所述多帧目标人脸图像在该级第一特征提取处理分别对应的第二中间特征数据;
    基于所述多帧目标人脸图像在该级第一特征提取处理中分别对应的所述第一中间特征数据,得到与该级第一特征提取处理对应的所述特征序列,包括:
    按照所述多帧目标人脸图像的时间顺序,排列所述多帧目标人脸图像在该级第一特征提取处理分别对应的所述第二中间特征数据,得到所述特征序列。
  13. 根据权利要求10至12中任一项所述的活体检测方法,其特征在于,基于所述多级第一特征提取处理对应的所述中间融合数据,得到所述第一融合特征数据,包括:
    将所述多级第一特征提取处理分别对应的所述中间融合数据进行拼接后,进行全连接处理,得到所述第一融合特征数据。
  14. 根据权利要求8至13中任一项所述的活体检测方法,其特征在于,采用下述方式得到每帧目标人脸图像的特征提取结果:
    对所述目标人脸图像进行多级特征提取处理,得到所述多级特征提取处理中每级第一特征提取处理分别对应的第一初始特征数据;
    针对每级所述第一特征提取处理,根据该级第一特征提取处理的第一初始特征数据、与该级第一特征提取处理后续的至少一级第一特征提取处理的第一初始特征数据进行融合处理,得到该级第一特征提取处理对应的第一中间特征数据,其中,所述目标人脸图像的特征提取结果包括所述多级第一特征提取处理中每级第一特征提取处理分别对应的第一中间特征数据。
  15. 根据权利要求14所述的活体检测方法,其特征在于,根据该级第一特征提取处理的所述第一初始特征数据、与该级第一特征提取处理后续的至少一级第一特征提取处理的所述第一初始特征数据进行融合处理,得到该级第一特征提取处理对应的所述第一中间特征数据,包括:
    对该级第一特征提取处理的所述第一初始特征数据与该级第一特征提取处理的下级第一特征提取处理对应的第一中间特征数据进行融合处理,得到所述该级第一特征提取处理对应的所述第一中间特征数据,其中,所述下级第一特征提取处理对应的所述第一中间特征数据是基于所述下级第一特征提取处理的第一初始特征数据得到的。
  16. 根据权利要求15所述的活体检测方法,其特征在于,对该级第一特征提取处理的所述第一初始特征数据与该级第一特征提取处理的所述下级第一特征提取处理对应的所述第一中间特征数据进行融合处理,得到所述该级第一特征提取处理对应的所述第一中间特征数据,包括:
    对该级第一特征提取处理的下级第一特征提取处理对应的所述第一中间特征数据进行上采样,得到该级第一特征提取处理对应的上采样数据;
    融合该级第一特征提取处理对应的所述上采样数据和该级第一特征提取处理对应的所述第一初始特征数据,得到该级第一特征提取处理对应的所述第一中间特征数据。
  17. 根据权利要求8-16任一项所述的活体检测方法,其特征在于,基于所述多帧目标人脸图像中每两帧相邻的目标人脸图像的所述差分图像,得到所述第二检测结果,包括:
    对所述多帧目标人脸图像中每两帧相邻的目标人脸图像的所述差分图像进行级联处理,得到差分级联图像;
    基于所述差分级联图像,得到所述第二检测结果。
  18. 根据权利要求17所述的活体检测方法,其特征在于,基于所述差分级联图像, 得到所述第二检测结果,包括:
    对所述差分级联图像进行特征提取处理,得到所述差分级联图像的特征提取结果;
    对所述差分级联图像的所述特征提取结果进行特征融合,得到第二融合特征数据;
    基于所述第二融合特征数据,得到所述第二检测结果。
  19. 根据权利要求18所述的活体检测方法,其特征在于,对所述差分级联图像进行特征提取处理,得到所述差分级联图像的所述特征提取结果,包括:
    对所述差分级联图像进行多级第二特征提取处理,得到与每级第二特征提取处理分别对应的第二初始特征数据;
    基于所述多级第二特征提取处理分别对应的所述第二初始特征数据,得到所述差分级联图像的所述特征提取结果。
  20. 根据权利要求19所述的活体检测方法,其特征在于,基于所述多级第二特征提取处理分别对应的所述第二初始特征数据,得到所述差分级联图像的所述特征提取结果,包括:
    针对每级第二特征提取处理,对该级第二特征提取处理的第二初始特征数据,与该级第二特征提取处理之前的至少一级第二特征提取处理的第二初始特征数据进行融合处理,得到该级第二特征提取处理对应的第三中间特征数据;
    所述差分级联图像的特征提取结果,包括所述多级第二特征提取处理分别对应的第三中间特征数据。
  21. 根据权利要求20所述的活体检测方法,其特征在于,对该级第二特征提取处理的所述第二初始特征数据,与该级第二特征提取处理之前的至少一级第二特征提取处理的所述第二初始特征数据进行融合处理,得到所述每级第二特征提取处理对应的所述第三中间特征数据,包括:
    对该级第二特征提取处理的上级第二特征提取处理的第二初始特征数据进行下采样,得到该级第二特征提取处理对应的下采样数据;
    对该级第二特征提取处理对应的所述下采样数据和该级第二特征提取处理的所述第二初始特征数据进行融合处理,得到该级第二特征提取处理对应的所述第三中间特征数据。
  22. 根据权利要求20或21所述的活体检测方法,其特征在于,对所述差分级联图像的所述特征提取结果进行特征融合,得到所述第二融合特征数据之前,还包括:
    对所述差分级联图像在所述多级第二特征提取处理中各自的第三中间特征数据分别进行全局平均池化处理,得到所述差分级联图像在所述多级第二特征提取处理分别对应的第四中间特征数据;
    对所述差分级联图像的所述特征提取结果进行特征融合,得到所述第二融合特征数据,包括:
    对所述差分级联图像在所述多级第二特征提取处理分别对应的所述第四中间特征数据进行特征融合,得到所述第二融合特征数据。
  23. 根据权利要求22所述的活体检测方法,其特征在于,对所述差分级联图像在所述多级第二特征提取处理分别对应的所述第四中间特征数据进行特征融合,得到所述第二融合特征数据,包括:
    将所述多级第二特征提取处理分别对应的所述第四中间特征数据进行拼接后,进行 全连接处理,得到所述第二融合特征数据。
  24. 根据权利要求8-23任一项所述的活体检测方法,其特征在于,基于所述第一检测结果和所述第二检测结果,确定所述待检测视频的所述活体检测结果,包括:
    将所述第一检测结果和所述第二检测结果进行加权求和,得到所述活体检测结果。
  25. 一种活体检测装置,其特征在于,包括:
    获取单元,用于基于获取到的待检测视频中包括的多帧人脸图像之间的相似度,从所述待检测视频中确定多帧目标人脸图像;
    检测单元,用于基于所述多帧目标人脸图像,确定所述待检测视频的活体检测结果。
  26. 根据权利要求25所述的活体检测装置,其特征在于,所述检测单元包括:第一检测模块和/或第二检测模块、以及确定模块;其中,
    所述第一检测模块,用于基于所述多帧目标人脸图像中每帧目标人脸图像的特征提取结果,得到第一检测结果;
    所述第二检测模块,用于基于所述多帧目标人脸图像中每两帧相邻的目标人脸图像的差分图像,得到第二检测结果;
    所述确定模块,用于基于所述第一检测结果和/或所述第二检测结果,确定所述待检测视频的活体检测结果。
  27. 一种电子设备,其特征在于,包括:处理器、存储有所述处理器可执行的机器可读指令的存储器,其中,所述机器可读指令被所述处理器执行时,促使所述处理器执行如权利要求1至24任一项所述的活体检测方法。
  28. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质上存储有计算机程序,所述计算机程序被电子设备运行时,促使所述电子设备执行如权利要求1至24任一项所述的活体检测方法。
  29. 一种计算机程序产品,包括机器可执行指令,其特征在于,所述机器可执行指令被电子设备读取并执行时,促使所述电子设备执行如1至24任一项所述的活体检测方法。
PCT/CN2020/105213 2019-10-31 2020-07-28 活体检测方法、装置、电子设备、存储介质及程序产品 WO2021082562A1 (zh)

Priority Applications (3)

Application Number Priority Date Filing Date Title
JP2021550213A JP2022522203A (ja) 2019-10-31 2020-07-28 生体検出方法、装置、電子機器、記憶媒体、及びプログラム製品
SG11202111482XA SG11202111482XA (en) 2019-10-31 2020-07-28 Living body detection method, apparatus, electronic device, storage medium and program product
US17/463,896 US20210397822A1 (en) 2019-10-31 2021-09-01 Living body detection method, apparatus, electronic device, storage medium and program product

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201911063398.2 2019-10-31
CN201911063398.2A CN112749603A (zh) 2019-10-31 2019-10-31 活体检测方法、装置、电子设备及存储介质

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US17/463,896 Continuation US20210397822A1 (en) 2019-10-31 2021-09-01 Living body detection method, apparatus, electronic device, storage medium and program product

Publications (1)

Publication Number Publication Date
WO2021082562A1 true WO2021082562A1 (zh) 2021-05-06

Family

ID=75645179

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/105213 WO2021082562A1 (zh) 2019-10-31 2020-07-28 活体检测方法、装置、电子设备、存储介质及程序产品

Country Status (5)

Country Link
US (1) US20210397822A1 (zh)
JP (1) JP2022522203A (zh)
CN (1) CN112749603A (zh)
SG (1) SG11202111482XA (zh)
WO (1) WO2021082562A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114049518A (zh) * 2021-11-10 2022-02-15 北京百度网讯科技有限公司 图像分类方法、装置、电子设备和存储介质

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113469085B (zh) * 2021-07-08 2023-08-04 北京百度网讯科技有限公司 人脸活体检测方法、装置、电子设备及存储介质
CN113989531A (zh) * 2021-10-29 2022-01-28 北京市商汤科技开发有限公司 一种图像处理方法、装置、计算机设备和存储介质
CN114445898B (zh) * 2022-01-29 2023-08-29 北京百度网讯科技有限公司 人脸活体检测方法、装置、设备、存储介质及程序产品

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2006099614A (ja) * 2004-09-30 2006-04-13 Toshiba Corp 生体判別装置および生体判別方法
CN1794264A (zh) * 2005-12-31 2006-06-28 北京中星微电子有限公司 视频序列中人脸的实时检测与持续跟踪的方法及***
CN108229376A (zh) * 2017-12-29 2018-06-29 百度在线网络技术(北京)有限公司 用于检测眨眼的方法及装置
CN110175549A (zh) * 2019-05-20 2019-08-27 腾讯科技(深圳)有限公司 人脸图像处理方法、装置、设备及存储介质

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2003178306A (ja) * 2001-12-12 2003-06-27 Toshiba Corp 個人認証装置および個人認証方法
US10268911B1 (en) * 2015-09-29 2019-04-23 Morphotrust Usa, Llc System and method for liveness detection using facial landmarks
CN105260731A (zh) * 2015-11-25 2016-01-20 商汤集团有限公司 一种基于光脉冲的人脸活体检测***及方法
US10210380B2 (en) * 2016-08-09 2019-02-19 Daon Holdings Limited Methods and systems for enhancing user liveness detection
JP6849387B2 (ja) * 2016-10-24 2021-03-24 キヤノン株式会社 画像処理装置、画像処理システム、画像処理装置の制御方法、及びプログラム
CN109389002A (zh) * 2017-08-02 2019-02-26 阿里巴巴集团控股有限公司 活体检测方法及装置
WO2019133995A1 (en) * 2017-12-29 2019-07-04 Miu Stephen System and method for liveness detection
CN110378219B (zh) * 2019-06-13 2021-11-19 北京迈格威科技有限公司 活体检测方法、装置、电子设备及可读存储介质

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2006099614A (ja) * 2004-09-30 2006-04-13 Toshiba Corp 生体判別装置および生体判別方法
CN1794264A (zh) * 2005-12-31 2006-06-28 北京中星微电子有限公司 视频序列中人脸的实时检测与持续跟踪的方法及***
CN108229376A (zh) * 2017-12-29 2018-06-29 百度在线网络技术(北京)有限公司 用于检测眨眼的方法及装置
CN110175549A (zh) * 2019-05-20 2019-08-27 腾讯科技(深圳)有限公司 人脸图像处理方法、装置、设备及存储介质

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114049518A (zh) * 2021-11-10 2022-02-15 北京百度网讯科技有限公司 图像分类方法、装置、电子设备和存储介质

Also Published As

Publication number Publication date
CN112749603A (zh) 2021-05-04
JP2022522203A (ja) 2022-04-14
US20210397822A1 (en) 2021-12-23
SG11202111482XA (en) 2021-11-29

Similar Documents

Publication Publication Date Title
WO2021082562A1 (zh) 活体检测方法、装置、电子设备、存储介质及程序产品
Chen et al. Fsrnet: End-to-end learning face super-resolution with facial priors
CN108805047B (zh) 一种活体检测方法、装置、电子设备和计算机可读介质
Feng et al. Learning generalized spoof cues for face anti-spoofing
WO2022156640A1 (zh) 一种图像的视线矫正方法、装置、电子设备、计算机可读存储介质及计算机程序产品
CN110598019B (zh) 重复图像识别方法及装置
CN111611873A (zh) 人脸替换检测方法及装置、电子设备、计算机存储介质
EP4085369A1 (en) Forgery detection of face image
Singh et al. Steganalysis of digital images using deep fractal network
CN112966574A (zh) 人体三维关键点预测方法、装置及电子设备
Alnuaim et al. Human‐Computer Interaction with Hand Gesture Recognition Using ResNet and MobileNet
WO2023124040A1 (zh) 一种人脸识别方法及装置
CN112561879B (zh) 模糊度评价模型训练方法、图像模糊度评价方法及装置
CN111985281A (zh) 图像生成模型的生成方法、装置及图像生成方法、装置
CN104636764A (zh) 一种图像隐写分析方法以及其装置
CN112633221A (zh) 一种人脸方向的检测方法及相关装置
Xia et al. Domain fingerprints for no-reference image quality assessment
Qu et al. shallowcnn-le: A shallow cnn with laplacian embedding for face anti-spoofing
Liu et al. Face liveness detection based on enhanced local binary patterns
WO2023071180A1 (zh) 真伪识别方法、装置、电子设备以及存储介质
Ali et al. Deep multi view spatio temporal spectral feature embedding on skeletal sign language videos for recognition
CN114120391A (zh) 一种多姿态人脸识别***及其方法
CN112967216A (zh) 人脸图像关键点的检测方法、装置、设备以及存储介质
CN111275183A (zh) 视觉任务的处理方法、装置和电子***
Chen et al. FaceCat: Enhancing Face Recognition Security with a Unified Generative Model Framework

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20882073

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2021550213

Country of ref document: JP

Kind code of ref document: A

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20882073

Country of ref document: EP

Kind code of ref document: A1

122 Ep: pct application non-entry in european phase

Ref document number: 20882073

Country of ref document: EP

Kind code of ref document: A1

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 27.10.2022)

122 Ep: pct application non-entry in european phase

Ref document number: 20882073

Country of ref document: EP

Kind code of ref document: A1