CN111160178B

CN111160178B - Image processing method and device, processor, electronic equipment and storage medium

Info

Publication number: CN111160178B
Application number: CN201911322102.4A
Authority: CN
Inventors: 李若岱; 高哲峰; 庄南庆; 马堃
Original assignee: Shenzhen Sensetime Technology Co Ltd
Current assignee: Shenzhen Sensetime Technology Co Ltd
Priority date: 2019-12-19
Filing date: 2019-12-19
Publication date: 2024-01-12
Anticipated expiration: 2039-12-19
Also published as: CN111160178A

Abstract

The application discloses an image processing method and device, a processor, electronic equipment and a storage medium. The method comprises the following steps: acquiring binocular images and acquiring parameters of a binocular camera of the binocular images; obtaining depth information of at least four reference points in the human body region, horizontal position information of the at least four reference points and vertical position information of the at least four reference points according to the binocular image; obtaining three-dimensional position information of the at least four reference points under a world coordinate system according to parameters of the binocular camera, horizontal position information of the at least four reference points, vertical position information of the at least four reference points and depth information of the at least four reference points; and determining the person object to be detected as a living body under the condition that the variance of the three-dimensional position information of the at least four reference points in the world coordinate system in the depth direction is greater than or equal to a first threshold value. Corresponding products are also disclosed.

Description

Image processing method and device, processor, electronic equipment and storage medium

Technical Field

The present disclosure relates to the field of security technologies, and in particular, to an image processing method and apparatus, a processor, an electronic device, and a storage medium.

Background

Along with the development of face recognition technology, the face recognition technology has been widely applied to different application scenarios, wherein confirming the identity of a person through face recognition is an important application scenario, for example, real-name authentication, identity authentication, and the like are performed through the face recognition technology. However, recently, an event occurs in which the face image attack face recognition technology using "non-living body" is increasingly used. The above-mentioned "non-living body" face image includes: paper photographs, electronic images, and the like. The face image attack face recognition technology of the 'non-living body' is used, namely the face image of the 'non-living body' is used for replacing the face area of the person, so that the effect of the face recognition technology is achieved. How to effectively prevent the face image of the 'non-living body' has very important significance to the attack of the face recognition technology.

The traditional method determines whether the person in the acquired face image is a living body or not based on the binocular image acquired by the binocular camera, but the detection accuracy of the method is low.

Disclosure of Invention

The application provides an image processing method and device, a processor, an electronic device and a storage medium, so as to detect whether a person object to be detected is a living body.

In a first aspect, there is provided an image processing method, the method comprising:

acquiring a binocular image and parameters of a binocular camera for acquiring the binocular image, wherein the binocular image comprises a human body area of a person object to be detected;

obtaining depth information of at least four reference points in the human body region, horizontal position information of the at least four reference points and vertical position information of the at least four reference points according to the binocular image, wherein the reference points comprise face key points, and the reference points or comprise the face key points and trunk key points;

obtaining three-dimensional position information of the at least four reference points under a world coordinate system according to parameters of the binocular camera, horizontal position information of the at least four reference points, vertical position information of the at least four reference points and depth information of the at least four reference points;

and under the condition that the variance of the three-dimensional position information of the at least four reference points in the world coordinate system in the depth direction is larger than or equal to a first threshold value, determining that the person object to be detected is a living body, wherein the depth direction is a direction perpendicular to the image plane of the binocular camera when the binocular camera acquires the binocular image.

In this aspect, depth information of at least four reference points in the person object to be detected is obtained based on the binocular image, and three-dimensional position information of the at least four reference points in the world coordinate system is obtained. According to the three-dimensional position information of at least four reference points under the world coordinate system, whether the human body area of the person object to be detected is a three-dimensional area or not can be determined, and further two-dimensional attack on the face recognition technology can be effectively prevented. The implementation obtains the three-dimensional position information of at least four reference points in the character object to be detected on the basis of the hardware of the two-dimensional living body detection method, and compared with the two-dimensional living body detection method, the hardware cost is not increased, but the detection accuracy is improved.

In one possible implementation, the binocular image includes: a first image to be processed and a second image to be processed; the binocular camera includes: a first camera for acquiring the first image to be processed and a second camera for acquiring the second image to be processed; parameters of the binocular camera include: a distance between the first camera and the second camera, a first focal length of the first camera, and a second focal length of the second camera;

the obtaining depth information of at least four reference points in the human body region according to the binocular image includes:

Obtaining parallax images of the first to-be-processed image and the second to-be-processed image according to the first to-be-processed image and the second to-be-processed image, wherein the parallax images carry parallax information of the at least four reference points;

performing three-dimensional correction processing on the first image to be processed and the second image to be processed to normalize the first focal length and the second focal length to obtain normalized focal lengths;

and obtaining depth information of the at least four reference points according to the parallax information of the at least four reference points, the normalized focal length and the distance.

In the possible implementation manner, the normalization of the first focal length of the first camera and the second focal length of the second camera is realized by carrying out three-dimensional correction processing on the first image to be processed and the second image to be processed, and the vertical displacement difference of homonymous points in the first image to be processed and the second image to be processed is reduced, so that the normalized focal length is obtained. And obtaining depth information of at least four reference points according to the parallax information of the at least four reference points, the distance between the first camera and the second camera and the normalized focal length, so that the accuracy of the obtained depth information of the reference points can be improved.

In another possible implementation, the at least four reference points include a first reference point;

and obtaining depth information of the at least four reference points according to the parallax information of the at least four reference points, the normalized focal length and the distance, wherein the depth information comprises the following steps:

determining the product of the normalized focal length and the distance to obtain a first intermediate value;

and determining the quotient of the first intermediate value and the parallax information of the first reference point to obtain the depth information of the first reference point.

In still another possible implementation manner, the performing stereo correction processing on the first to-be-processed image and the second to-be-processed image to normalize the first focal length and the second focal length to obtain normalized focal lengths includes:

performing stereo correction processing on the first image to be processed and the second image to be processed to normalize parameters of the first camera and parameters of the second camera to obtain normalized camera parameters;

the obtaining three-dimensional position information of the at least four reference points under a world coordinate system according to the parameters of the binocular camera, the horizontal position information of the at least four reference points, the vertical position information of the at least four reference points and the depth information of the at least four reference points includes:

And obtaining three-dimensional position information of the at least four reference points under a world coordinate system according to the normalized camera parameters, the horizontal position information of the at least four reference points, the vertical position information of the at least four reference points and the depth information of the at least four reference points.

In the former possible implementation manner, by performing stereo correction processing on the first to-be-processed image and the second to-be-processed image, the vertical displacement difference of the same-name points in the first to-be-processed image and the second to-be-processed image can be reduced, so that the normalized camera parameters are obtained. In this possible implementation manner, the three-dimensional position information of the at least four reference points under the world coordinate system is obtained by using the normalized camera parameters, the horizontal position information of the at least four reference points, the vertical position information of the at least four reference points and the depth information of the at least four reference points, so that the accuracy of the obtained three-dimensional position information of the at least four reference points under the world coordinate system can be improved.

In yet another possible implementation, the at least four reference points include a second reference point; the normalized camera parameters include: the normalized horizontal position information of the center point of the camera and the vertical position information of the center point, wherein the center point is an intersection point of the image plane of the first camera and the optical axis of the first camera; the normalized focal length includes: the horizontal position information of the normalized focal length and the vertical position information of the normalized focal length; the three-dimensional position information of the at least four reference points in the world coordinate system comprises: horizontal position information of the at least four reference points in a world coordinate system, vertical position information of the at least four reference points in the world coordinate system and depth position information of the at least four reference points in the world coordinate system;

The obtaining three-dimensional position information of the at least four reference points under a world coordinate system according to the normalized camera parameters, the horizontal position information of the at least four reference points, the vertical position information of the at least four reference points and the depth information of the at least four reference points includes:

determining a difference between the horizontal position information of the second reference point and the horizontal position information of the central point to obtain a second intermediate value, and determining a quotient of the parallax information of the second reference point and the horizontal position information of the normalized focal length to obtain a third intermediate value;

determining the difference between the vertical position information of the second reference point and the vertical position information of the central point to obtain a fourth intermediate value, and determining the quotient of the parallax information of the second reference point and the vertical position information of the normalized focal length to obtain a fifth intermediate value;

taking the product of the second intermediate value and the third intermediate value as horizontal position information of the second reference point in a world coordinate system, taking the product of the fourth intermediate value and the fifth intermediate value as vertical position information of the second reference point in the world coordinate system, and taking parallax information of the second reference point as depth position information of the second reference point in the world coordinate system.

In still another possible implementation manner, the determining that the person object to be detected is a living body in the case where the variance of the three-dimensional position information of the at least four reference points in the depth direction in the world coordinate system is greater than or equal to a first threshold value includes:

constructing a matrix according to the three-dimensional position information of the at least four reference points under a world coordinate system, so that each row in the matrix contains the three-dimensional position information of one reference point to obtain a coordinate matrix;

and determining at least one singular value of the coordinate matrix, and determining that the person object to be detected is a living body under the condition that the ratio of the minimum value in the at least one singular value to the sum of the at least one singular value is greater than or equal to a second threshold value.

In this possible implementation manner, a coordinate matrix is constructed according to three-dimensional coordinates of at least four reference points in a world coordinate system, and whether the human body region of the person object to be detected is a three-dimensional region or not is determined according to singular values of the coordinate matrix, so as to determine whether the person object to be detected is a living body or not. Because the coordinate matrix can be constructed according to the three-dimensional coordinates of at least four reference points in the world coordinate system for the human body region of any one person object to be detected, the method for determining whether the person object to be detected is a living body or not provided in the embodiment can be applied to any scene. The technical scheme provided by the implementation can improve the universality of the three-dimensional living body detection method.

In yet another possible implementation, the human body region includes a human face region; the at least four key points comprise first face key points;

the obtaining the horizontal position information of the at least four reference points and the vertical position information of the at least four reference points according to the binocular image includes:

performing face key point detection processing on the first to-be-processed image and the second to-be-processed image respectively to obtain initial horizontal position information of the first face key point, initial vertical position information of the first face key point, initial horizontal position information of the second face key point and initial vertical position information of the second face key point in the second to-be-processed image, wherein the first face key point and the second face key point are identical points;

taking the initial vertical position information of the first face key point or the initial vertical position information of the second face key point as the vertical displacement information of the first face key point;

obtaining a first horizontal parallax displacement between the first face key point and the second face key point according to the parallax image;

And determining the sum between the initial horizontal position information of the first face key point and the first horizontal parallax displacement as the horizontal position information of the first face key point.

In yet another possible implementation, the body region includes a face region and a torso region; the at least four key points include: the third face key point and the first trunk key point;

performing face key point detection processing and trunk key point detection processing on the first to-be-processed image and the second to-be-processed image respectively to obtain initial horizontal position information of a third face key point, initial vertical position information of the third face key point, initial horizontal position information of a fourth face key point, initial vertical position information of the fourth face key point, initial horizontal position information of the first trunk key point, initial vertical position information of the first trunk key point, initial horizontal position information of a second trunk key point and vertical position information of the second trunk key point in the first to-be-processed image, wherein the third face key point and the fourth face key point are identical points;

Taking the initial vertical position information of the third face key point or the initial vertical position information of the third face key point as the vertical displacement information of the third face key point, and taking the initial vertical position information of the first trunk key point or the initial vertical position information of the second trunk key point as the vertical displacement information of the first trunk key point;

obtaining a second horizontal parallax displacement between the third face key point and a third horizontal parallax displacement between the first trunk key point and the second trunk key point according to the parallax image;

and determining the sum of the initial horizontal position information of the third face key point and the second horizontal parallax displacement as the horizontal position information of the third face key point, and determining the sum of the initial horizontal position information of the first trunk key point and the third horizontal parallax displacement as the horizontal position information of the first trunk key point.

In a second aspect, there is provided an image processing apparatus comprising:

the device comprises an acquisition unit, a detection unit and a control unit, wherein the acquisition unit is used for acquiring binocular images and parameters of a binocular camera for acquiring the binocular images, wherein the binocular images comprise human body areas of a person object to be detected;

The first processing unit is used for obtaining depth information of at least four reference points in the human body area, horizontal position information of the at least four reference points and vertical position information of the at least four reference points according to the binocular image, wherein the reference points comprise face key points, and the reference points or comprise the face key points and trunk key points;

the second processing unit is used for obtaining three-dimensional position information of the at least four reference points under a world coordinate system according to parameters of the binocular camera, horizontal position information of the at least four reference points, vertical position information of the at least four reference points and depth information of the at least four reference points;

and the determining unit is used for determining that the person object to be detected is a living body under the condition that the variance of the three-dimensional position information of the at least four reference points in the depth direction under the world coordinate system is larger than or equal to a first threshold value, wherein the depth direction is a direction perpendicular to the image plane of the binocular camera when the binocular camera acquires the binocular image.

The first processing unit is used for:

the first processing unit is used for:

In a further possible implementation manner, the first processing unit is configured to:

The second processing unit is used for:

The determining unit is configured to include:

In a further possible implementation manner, the determining unit is configured to:

the first processing unit is used for:

In a third aspect, a processor is provided for performing the method of the first aspect and any one of its possible implementation manners described above.

In a fourth aspect, there is provided an electronic device comprising: a processor, a transmitting means, an input means, an output means and a memory for storing computer program code comprising computer instructions which, when executed by the processor, cause the electronic device to perform the method as described in the first aspect and any one of its possible implementation manners.

In a fifth aspect, a computer readable storage medium is provided, in which a computer program is stored, the computer program comprising program instructions which, when executed by a processor of an electronic device, cause the processor to carry out a method as in the first aspect and any one of the possible implementations thereof.

In a sixth aspect, there is provided a computer program product containing instructions which, when run on a computer, cause the computer to perform the method of the first aspect and any one of its possible implementations.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.

Drawings

In order to more clearly describe the technical solutions in the embodiments or the background of the present application, the following description will describe the drawings that are required to be used in the embodiments or the background of the present application.

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the disclosure and together with the description, serve to explain the technical aspects of the disclosure.

Fig. 1 is a schematic flow chart of an image processing method according to an embodiment of the present application;

fig. 2 is a schematic diagram of homonymous points in a binocular image according to an embodiment of the present application;

fig. 3 is a schematic diagram of a face key point provided in an embodiment of the present application;

fig. 4 is a schematic diagram of a torso key point according to an embodiment of the present disclosure;

FIG. 5 is a schematic diagram of a distribution of reference points according to an embodiment of the present disclosure;

fig. 6 is a flowchart of another image processing method according to an embodiment of the present application;

fig. 7 is a flowchart of another image processing method according to an embodiment of the present application;

fig. 8 is a schematic structural diagram of an image processing apparatus according to an embodiment of the present application;

fig. 9 is a schematic hardware structure of an image processing apparatus according to an embodiment of the present application.

Detailed Description

In order to make the present application solution better understood by those skilled in the art, the following description will clearly and completely describe the technical solution in the embodiments of the present application with reference to the accompanying drawings in the embodiments of the present application, and it is apparent that the described embodiments are only some embodiments of the present application, not all embodiments. All other embodiments, which can be made by one of ordinary skill in the art without undue burden from the present disclosure, are within the scope of the present disclosure.

The terms first, second and the like in the description and in the claims of the present application and in the above-described figures, are used for distinguishing between different objects and not for describing a particular sequential order. Furthermore, the terms "comprise" and "have," as well as any variations thereof, are intended to cover a non-exclusive inclusion. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not limited to only those listed steps or elements but may include other steps or elements not listed or inherent to such process, method, article, or apparatus.

Reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment may be included in at least one embodiment of the present application. The appearances of such phrases in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. Those of skill in the art will explicitly and implicitly appreciate that the embodiments described herein may be combined with other embodiments.

Along with the development of face recognition technology, the face recognition technology has been widely applied to different application scenarios, wherein confirming the identity of a person through face recognition is an important application scenario, for example, real-name authentication, identity authentication, and the like are performed through the face recognition technology.

The face recognition technology obtains face feature data by carrying out feature extraction processing on a face image obtained by collecting a face region of a person. And comparing the extracted face feature data with the face feature data in the database to determine the identity of the person in the face image.

However, recently, an event occurs in which the face image attack face recognition technology using "non-living body" is increasingly used. The above-mentioned "non-living body" face image includes: paper photographs, electronic images, and the like. The face image attack face recognition technology of the 'non-living body' is used, namely the face image of the 'non-living body' is used for replacing the face area of the person, so that the effect of cheating the face recognition technology is achieved. For example, zhang San places the photo of Lifour in front of the mobile phone of Lifour to unlock face recognition. The mobile phone shoots the photo of the plum four through the camera to obtain a face image of the face area containing the plum four, further determines the identity of the person three as the plum four, and performs unlocking treatment on the mobile phone. Thus, the unlocking of the mobile phone of the third hand can be realized by using the face recognition technology of successfully spoofing the mobile phone by using the photo of the fourth hand. The face image for effectively preventing "non-living body" has a very important meaning to the attack of the face recognition technology (hereinafter, will be referred to as two-dimensional attack).

The attack of the face image of the 'non-living body' on the face recognition technology can be effectively prevented by carrying out living body detection on the face image. Conventional living body detection methods can be classified into two-dimensional living body detection methods and three-dimensional living body detection methods.

The two-dimensional living body detection method comprises the steps of collecting binocular images of a face area of a person object to be detected through a binocular camera, obtaining horizontal position information and vertical position information of face key points in the face area of the person to be detected based on the binocular images, and determining whether the person object to be detected is a living body according to the horizontal position information and the vertical position information of the face key points.

The traditional three-dimensional living body detection method is added with hardware (such as a depth camera and a structured light camera) for obtaining depth information of the key points of the face area of the person to be detected on the basis of the two-dimensional living body detection method, and the depth information of the key points of the face in the face area of the person to be detected is obtained through the hardware. And inputting the horizontal position information, the vertical position information and the depth information of the key points of the human face into the trained deep learning model to determine whether the person object to be detected is a living body.

In the two methods, as the two-dimensional living body detection method only uses the horizontal position information and the vertical position information of the key points of the human face, the detection accuracy of the two-dimensional living body detection method is lower than that of the traditional three-dimensional living body detection method. However, the conventional three-dimensional living body detection method needs to obtain the key depth information of the human face by means of hardware (such as a depth camera), so that the hardware cost of the conventional three-dimensional living body detection method is high. In addition, the detection accuracy of the trained deep learning model used in the conventional three-dimensional living body detection method is largely dependent on the training data of the deep learning model. Specifically, the application scene included in the training data is a scene to which the deep learning model is applicable. For example, if the training data includes paper photographs but does not include electronic photographs, the deep learning model trained using the training data has low accuracy in detecting the electronic photographs in vivo. This also results in the poor versatility of the conventional three-dimensional living body detection method.

Based on the current situation, the embodiment of the application provides the three-dimensional living body detection method which has the same hardware cost as the two-dimensional living body detection method and strong universality. Embodiments of the present application are described below with reference to the accompanying drawings in the embodiments of the present application.

Referring to fig. 1, fig. 1 is a flowchart of an image processing method according to an embodiment (a) of the present application.

101. And acquiring binocular images and parameters of a binocular camera for acquiring the binocular images.

The binocular image comprises a human body area of a person object to be detected;

the technical scheme provided by the embodiment of the application can be applied to a first terminal, wherein the first terminal comprises a mobile phone, a computer, a tablet personal computer, a server and the like.

The binocular images are two images respectively obtained by two different imaging devices (namely, the binocular cameras) shooting the same object or scene from different positions at the same time. The imaging device may be a camera or a camera. For example, two cameras on a cell phone. For another example, two cameras are loaded on one intelligent car. For another example, two cameras on an unmanned aerial vehicle.

Since the word "homonym" will appear more than once below, the meaning of the homonym is clarified first, before proceeding with the following explanation. In the embodiment of the application, the pixel points in different images in the binocular image corresponding to the same physical point are the same name points. Fig. 2 shows two images in a binocular image, in which a pixel point a and a pixel point C are the same name points, and a pixel point B and a pixel point D are the same name points.

It is to be understood that the present embodiment illustrates how three-dimensional living body detection is achieved based on binocular images by taking two different imaging apparatuses as examples. In practical application, at least three images can be obtained by shooting the same object or scene from different positions at the same time through three or more imaging devices, and the three-dimensional living body detection can be realized based on the at least three images through the technical scheme provided by the embodiment of the application.

The manner of acquiring the binocular image may be to receive the binocular image input by a user through an input assembly, wherein the input assembly includes: a keyboard, a mouse, a touch screen, a touch pad, an audio input device, and the like. The method may also be receiving a binocular image sent by a second terminal, where the second terminal includes a mobile phone, a computer, a tablet computer, a server, and the method for obtaining the binocular image is not limited in this application.

In the embodiment of the application, parameters of the binocular camera include: an internal parameter of each imaging device and an external parameter of each imaging device. The internal parameters include: the focal length and the center point are the intersection point of the optical axis of the imaging device and the image plane. The external parameters include: the distance between the center points of the two imaging devices (which will be referred to as the baseline length hereinafter), the rotation matrix between the camera coordinate system of the imaging device and the world coordinate system, and the amount of translation between the camera coordinate system of the imaging device and the world coordinate system.

Optionally, before the binocular camera is used to collect the binocular image, the binocular camera may be calibrated, so that two imaging devices in the binocular camera are in the same horizontal plane, and image planes of the two imaging devices are in the same plane, where the image planes are planes where imaging components in the imaging devices are located. The parameters of the binocular camera can be obtained by calibrating the binocular camera.

In this embodiment of the present application, the binocular image includes a human body region of the person to be detected, where the human body region may include a human face region, for example, the binocular camera acquires an image of the face region of the person to be detected to obtain a human face image including only the face region of the person to be detected. The body region may also include a face region and a torso region. For example, a binocular camera acquires images of a face region and a torso region of a person object to be detected to obtain a human body image containing the face region and the torso region of the person object to be detected.

102. And obtaining depth information of at least four reference points in the human body region, horizontal position information of the at least four reference points and vertical position information of the at least four reference points according to the binocular image.

The face key points include: face contour key points and facial key points. As shown in fig. 3, the five sense organ key points include: the key points of the eyebrow area, the eye area, the nose area, the mouth area and the ear area. The face contour key points include key points on the face contour line. It should be understood that the number and location of the face key points shown in fig. 3 are only provided as an example of the embodiment of the present application, and should not be limited to this application.

The torso keypoints include keypoints at joints of the torso. As shown in fig. 4, the torso keypoints include: shoulder joint keypoints, elbow joint keypoints, wrist joint keypoints, hip joint keypoints, knee joint keypoints, and ankle joint keypoints. It should be understood that the number and location of torso key points shown in fig. 4 is merely an example provided by embodiments of the present application and should not be construed as limiting the present application.

In this embodiment of the present application, the reference points include face key points, for example, the above at least four reference points include: key points in the nose area, key points in the mouth area, key points in the ear area, and key points on the contour of the face. The reference points may include face keypoints and torso keypoints, e.g., the at least four keypoints include: key points in the nose area, key points in the mouth area, key points in the ear area, and shoulder key points.

In this embodiment of the present application, a face key point and its position information (i.e., horizontal position information of the face key point and vertical position information of the face key point) in a binocular image may be obtained by using any face key point detection algorithm, where the face key point detection algorithm may be one of OpenFace, multi-task cascade convolutional neural network (multi-task cascaded convolutional networks, MTCNN), adjustment convolutional neural network (tweaked convolutional neural networks, TCNN), or task constraint deep convolutional neural network (tasks-constrained deep convolutional network, TCDCN), and the application does not limit the face key point detection algorithm.

If the human body region includes a torso region, torso keypoints and position information thereof (i.e., horizontal position information of the torso keypoints and vertical position information of the torso keypoints) in the binocular image may be obtained by an arbitrary torso keypoint detection algorithm, which may be: one of a cascading pyramid neural network (cascaded pyramid network, CPN), a region mask convolutional neural network (mask region-convolutional neural network, RCNN), or a multi-person pose estimation (RMPE), the torso keypoint detection algorithm is not limited in this application.

In the embodiment of the present application, the binocular image includes: a first image to be processed and a second image to be processed. And obtaining a parallax image between the first to-be-processed image and the second to-be-processed image according to the first to-be-processed image and the second to-be-processed image. The parallax image carries information of horizontal parallax displacement between homonymous points in the first image to be processed and the second image to be processed. It should be understood that if two imaging devices in the binocular camera are in the same vertical plane and the image planes of the two imaging devices are in the same plane, the parallax image is the information of the vertical parallax displacement between the same-name points in the first to-be-processed image and the second to-be-processed image.

In an implementation manner of obtaining a parallax image between a first to-be-processed image and a second to-be-processed image according to the first to-be-processed image and the second to-be-processed image, a homonymy point in the first to-be-processed image and the second to-be-processed image can be determined by performing feature matching processing on the first to-be-processed image and the second to-be-processed image. The parallax images are obtained according to the horizontal parallax displacement between the homonymous points in the first image to be processed and the second image to be processed. The feature matching process may be implemented by one of a storm algorithm (brute force), a k-nearest neighbor (KNN), or a fast nearest neighbor search algorithm (fast library for approximate nearest neighbors, FLANN), which is not limited in this application.

After the parallax image is obtained, the depth information of the at least four key points can be obtained according to the focal length of the binocular camera, the horizontal parallax information carried by the parallax image and the base line length. For example, the at least four key points include an eye key point, the eye key point is a first pixel point in the first image to be processed, and the eye key point is a second pixel point in the second image to be processed, i.e. the first pixel point and the second pixel point are homonymous points. According to the parallax image, the horizontal displacement difference between the first pixel point and the second pixel point can be determined as d ₁ . The base line length d can be determined by calibrating the binocular camera ₂ . Based on d ₁ And d ₂ Depth information of the eye key points can be obtained.

103. And obtaining three-dimensional position information of the at least four reference points under a world coordinate system according to the parameters of the binocular camera, the horizontal position information of the at least four reference points, the vertical position information of the at least four reference points and the depth information of the at least four reference points.

Through the processing of step 101 and step 102, three-dimensional position information of the at least four reference points in the camera coordinate system of the binocular camera can be obtained. In order to facilitate the subsequent processing, the three-dimensional position information under the camera coordinate system needs to be converted into three-dimensional position information under the world coordinate system.

In one possible implementation (which will be referred to as a non-correction implementation hereinafter), the three-dimensional coordinates of the reference point in the world coordinate system and, thus, the three-dimensional position information of the reference point in the world coordinate system may be obtained from the parameters of the binocular camera and the three-dimensional coordinates determined from the three-dimensional position information of the reference point in the camera coordinate system.

In another possible implementation (which will be referred to as a correction implementation hereinafter), the binocular camera includes: a first camera and a second camera. Before determining the three-dimensional position information of the reference point in the world coordinate system, stereo correction processing can be performed on the binocular image to normalize the parameters of the first camera and the parameters of the second camera, and normalized camera parameters are obtained. And obtaining the three-dimensional position information of the reference point under the world coordinate system according to the normalized parameters and the three-dimensional position information of the reference point under the camera coordinate system. For example, by performing stereo correction processing on parameters of the first camera and parameters of the second camera, the obtained normalized camera parameters include: the horizontal position information of the normalized focal length, the numerical position information of the normalized focal length, the horizontal position information of the normalized center point, and the vertical position information of the normalized center point. If the horizontal coordinate of the reference point in the camera coordinate system determined by the horizontal position information of the reference point in the camera coordinate system is x _c The vertical coordinate of the reference point in the camera coordinate system, determined from the vertical position information of the reference point in the camera coordinate system, is y _c The depth value determined by the depth information of the reference point is d, and the horizontal coordinate of the normalized focal length determined by the horizontal position information of the normalized focal length is f _x The vertical coordinate of the normalized focal length determined from the vertical position information of the normalized focal length is f _y The horizontal coordinate of the normalized center point determined by the horizontal position information of the normalized center point is u _x The vertical coordinate of the normalized center point determined from the vertical position information of the normalized center point is u _y . The horizontal coordinate x of the reference point in the world coordinate system _w Satisfies the following formula:vertical coordinate y of reference point in world coordinate system _w Satisfies the following formula: />The depth coordinate z of the reference point in the world coordinate system satisfies the following equation: z=. The above depth coordinates can be understood as coordinates of the reference point in a direction perpendicular to the image plane of the binocular camera with the normalized center point as the origin. />

104. And determining the person object to be detected as a living body under the condition that the variance of the three-dimensional position information of the at least four reference points in the world coordinate system in the depth direction is greater than or equal to a first threshold value.

In this embodiment of the present application, the depth direction is a direction perpendicular to an image plane of the binocular camera when the binocular camera collects the binocular image.

After the three-dimensional position information of the four reference points in the world coordinate system is obtained through the processing in step 103, the three-dimensional coordinates of the four reference points in the world coordinate system can be determined, and whether the person object to be detected is a living body or not can be determined according to the three-dimensional coordinates of the four reference points in the world coordinate system.

In one possible implementation, the variance of the three-dimensional coordinates of the four reference points in the depth direction in the world coordinate system may be used to characterize the dispersion of the three-dimensional coordinates of the four reference points in the depth direction in the world coordinate system. If the variance of the three-dimensional position information of the four reference points in the world coordinate system in the depth direction is greater than or equal to the first threshold, the dispersion of the at least four reference points in the depth direction is not 0, that is, at least one reference point of the at least four reference points in the depth direction and other reference points do not belong to the same plane, that is, the human body area of the person object to be detected is a three-dimensional object rather than a two-dimensional plane object, and further it can be determined that the person object to be detected is a living body. If the variance of the three-dimensional position information of the four reference points in the world coordinate system in the depth direction is smaller than the first threshold, the four reference points in the human body area representing the human object to be detected in the depth direction belong to the same plane, that is to say, the human body area of the human object to be detected is a two-dimensional object, and further, the human object to be detected can be determined to be a non-living body. The first threshold is a positive number, and the specific value of the first threshold can be set according to actual use conditions.

For example, as shown in fig. 5, the at least four reference points include: reference point a, reference point B, reference point C and reference point D. Reference point B, reference point C and reference point D belong to plane a, reference point a does not belong to plane a. The dispersion of the three-dimensional coordinates of the four reference points in the depth direction shown in fig. 4 is greater than 0, i.e., the variance of the three-dimensional coordinates of the four reference points in the depth direction is greater than or equal to the first threshold.

It should be understood that, since three points can determine one plane, in the embodiment of the present application, three-dimensional coordinates of at least four reference points are required to determine whether the human body region of the human object to be detected is a three-dimensional region. The greater the number of reference points, the higher the accuracy of the in-vivo detection. But an increase in the number of reference points will also result in an increase in the amount of data processing. Therefore, in practical application, the user can adjust the number of the reference points according to the requirement, and the number of the reference points is not limited in the application.

The implementation obtains depth information of at least four reference points in a character object to be detected based on the binocular image, and further obtains three-dimensional position information of the at least four reference points under a world coordinate system. According to the three-dimensional position information of at least four reference points under the world coordinate system, whether the human body area of the person object to be detected is a three-dimensional area or not can be determined, and further two-dimensional attack on the face recognition technology can be effectively prevented. The implementation obtains the three-dimensional position information of at least four reference points in the character object to be detected on the basis of the hardware of the two-dimensional living body detection method, and compared with the two-dimensional living body detection method, the hardware cost is not increased, but the detection accuracy is improved.

In step 103, two implementations are provided for converting three-dimensional position information of a reference point in a camera coordinate system to three-dimensional position information in a world coordinate system. The non-correction implementation has a larger data processing amount and a slower processing speed than the correction implementation. In order to reduce the data processing amount of living body detection and improve the processing speed, the three-dimensional position information of the reference point under the camera coordinate system can be optionally converted into the three-dimensional position information under the world coordinate system through a correction implementation mode. Referring to fig. 5, fig. 6 is a flow chart illustrating a possible implementation of step 103 according to the second embodiment of the present application.

601. And carrying out three-dimensional correction processing on the first image to be processed and the second image to be processed so as to normalize the parameters of the first camera and the parameters of the second camera and obtain normalized camera parameters.

In this embodiment, the binocular image includes: a first image to be processed and a second image to be processed. Although the two imaging devices are positioned on the same horizontal plane and the image planes of the two imaging devices are positioned on the same plane by calibrating the binocular camera before the binocular camera is used for collecting the binocular image of the human body area of the person object to be detected, the vertical displacement difference exists between the same name point in the first image to be processed and the second image to be processed due to the factors of calibration errors, lens distortion of the imaging devices and the like. If there is a vertical displacement difference between the same-name points in the first to-be-processed image and the second to-be-processed image, the accuracy of the parallax image between the first to-be-processed image and the second to-be-processed image is reduced, and the accuracy of living body detection is further affected. Therefore, the present embodiment performs the stereo correction processing on the first to-be-processed image and the second to-be-processed image to reduce the vertical displacement difference of the same name points in the first to-be-processed image and the second to-be-processed image.

In an implementation manner of performing stereo correction processing on a first image to be processed and a second image to be processed, distortion parameters of the first camera can be obtained according to parameters of the first camera for collecting the first image to be processed, and distortion parameters of the second camera can be obtained according to parameters of the second camera for collecting the second image to be processed. And adjusting the first to-be-processed image based on the distortion parameters of the first camera to obtain a first to-be-processed image after distortion elimination, and adjusting the second to-be-processed image based on the distortion parameters of the second camera to obtain a second to-be-processed image after distortion elimination. A rotation matrix and a translation amount (which will be referred to as a epipolar line rotation matrix and an epipolar line translation amount hereinafter) of aligning epipolar lines of the first camera with epipolar lines of the second camera are obtained from parameters of the first camera and parameters of the second camera. The epipolar line of the first camera is any intersection between a plane including a baseline between the first camera and the second camera and the image plane of the first camera, and the epipolar line of the second camera is any intersection between a plane including a baseline between the first camera and the second camera and the image plane of the second camera, wherein the baseline between the first camera and the second camera refers to a straight line passing through a center point of the first camera and a center point of the second camera. And respectively adjusting the first to-be-processed image after distortion elimination and the second to-be-processed image after distortion elimination based on the epipolar line rotation matrix and the epipolar line translation amount to obtain an aligned first to-be-processed image and an aligned second to-be-processed image. And obtaining normalized camera parameters according to the aligned first to-be-processed image and the aligned second to-be-processed image, and normalizing the parameters of the first camera and the parameters of the second camera. Optionally, after the aligned first to-be-processed image and the aligned second to-be-processed image are obtained, clipping processing may be performed on the aligned first to-be-processed image and the aligned second to-be-processed image to remove irregular corner areas in the aligned first to-be-processed image and the aligned second to-be-processed image, so as to obtain a clipped first to-be-processed image and a clipped second to-be-processed image. And obtaining normalized camera parameters according to the cut first image to be processed and the cut second image to be processed.

Alternatively, after the aligned first to-be-processed image and the aligned second to-be-processed image are obtained by performing correction processing on the first to-be-processed image and the second to-be-processed image, a parallax image between the first to-be-processed image and the second to-be-processed image may be obtained based on the aligned first to-be-processed image and the aligned second to-be-processed image. After the normalized camera parameters are obtained, depth information of the reference point can be obtained based on parallax information of the reference point in the parallax image, the normalized focal length and the baseline length. For example, the at least four reference points include a firstThe depth coordinate z of the first reference point in the camera coordinate system of the reference point satisfies the following formula:where f is the normalized focal length, b is the baseline length, and d is the horizontal parallax displacement of the first reference point determined from the parallax image.

602. And obtaining three-dimensional position information of the at least four reference points under a world coordinate system according to the normalized camera parameters, the horizontal position information of the at least four reference points, the vertical position information of the at least four reference points and the depth information of the at least four reference points.

The implementation of this step may be referred to as the correction implementation in step 103, and will not be described here.

According to the embodiment, the first to-be-processed image and the second to-be-processed image are subjected to three-dimensional correction processing, so that the vertical displacement difference between the same-name points in the first to-be-processed image and the second to-be-processed image is reduced, the lens distortion of the first camera and the lens distortion of the second camera are eliminated, the precision of a parallax image obtained subsequently is improved, the precision of the obtained key depth information is further improved, and the accuracy of living body detection is further improved.

According to the three-dimensional position information of the at least four reference points in the world coordinate system, the three-dimensional coordinates of the at least four reference points in the world coordinate system can be determined, and whether the human body area of the person object to be detected is a three-dimensional area can be further determined.

By determining whether the at least four reference points are on the same plane in the depth direction or not through singular values of a coordinate matrix constructed by three-dimensional coordinates of the at least four reference points in a world coordinate system, whether the human body area of the person object to be detected is a three-dimensional area or not is further determined. For this reason, the embodiment of the application provides a technical scheme for determining whether a person object to be detected is a living body based on singular values of a coordinate matrix constructed by three-dimensional coordinates of at least four reference points in a world coordinate system.

Referring to fig. 7, fig. 7 is a flow chart illustrating a possible implementation of step 104 according to the third embodiment of the present application.

701. Constructing a matrix according to the three-dimensional position information of the at least four reference points in the world coordinate system, so that each row in the matrix contains the three-dimensional position information of one reference point, and obtaining a coordinate matrix.

In this embodiment, each row in the coordinate matrix contains the three-dimensional coordinates of only one reference point. For example, the at least four reference points include: a first reference point, a second reference point, a third reference point, and a fourth reference point. The three-dimensional coordinates of the first reference point in the world coordinate system are (x ₁ ，y ₁ ，z ₁ ) The second reference point has a three-dimensional coordinate (x ₂ ，y ₂ ，z ₂ ) The three-dimensional coordinates of the third reference point in the world coordinate system are (x ₃ ，y ₃ ，z ₃ ) The three-dimensional coordinates of the fourth reference point in the world coordinate system are (x ₄ ，y ₄ ，z ₄ ). The coordinate matrix constructed with the three-dimensional coordinates of the first reference point, the three-dimensional coordinates of the second reference point, the three-dimensional coordinates of the third reference point, and the three-dimensional coordinates of the fourth reference point may be:

or->One of them.

702. And determining at least one singular value of the coordinate matrix, and determining that the person object to be detected is a living body when the ratio of the minimum value of the at least one singular value to the sum of the at least one singular value is greater than or equal to a second threshold value.

At least one singular value of the coordinate matrix can be calculated by singular value decomposition theorem. In the embodiment of the application, 3 singular values are obtained by singular value decomposition of a coordinate matrix constructed by three-dimensional coordinates of at least one reference point, which are used for representing the dispersion of the at least four reference points in the horizontal direction, the dispersion of the at least four reference points in the horizontal direction and the dispersion of the at least four reference points in the depth direction, respectively. When the singular value is greater than 0, the dispersion of the at least four reference points in the direction corresponding to the singular value is not 0, that is, at least one of the at least four reference points in the corresponding direction does not belong to the same plane with other reference points. For example, the at least four reference points include: singular values a, B, and C may be obtained by singular value decomposition of a matrix a constructed from the coordinates of the first, second, third, and fourth reference points. The singular value A is used for representing the dispersion of the first reference point, the second reference point, the third reference point and the fourth reference point in the horizontal direction, the singular value B is used for representing the dispersion of the first reference point, the second reference point, the third reference point and the fourth reference point in the vertical direction, and the singular value C is used for representing the dispersion of the first reference point, the second reference point, the third reference point and the fourth reference point in the depth direction. Assuming that the singular value a is greater than 0, the dispersion in the horizontal direction characterizing the first reference point, the second reference point, the third reference point, and the fourth reference point is not 0. Assuming that the singular value B is greater than 0, the dispersion in the vertical direction characterizing the first, second, third and fourth reference points is not 0. Assuming that the singular value C is greater than 0, the dispersion in the depth direction characterizing the first reference point, the second reference point, the third reference point, and the fourth reference point is not 0.

If the human body area of the person object to be detected is a three-dimensional area, the dispersion of the at least four reference points in three directions (including the horizontal direction, the vertical direction and the depth direction) is not 0. Since the human body region of the person object to be detected is either a two-dimensional region (such as a paper photograph and an electronic picture) or a three-dimensional region, at least two singular values of the three singular values of the coordinate matrix are greater than 0. Therefore, it is only necessary to determine whether the minimum singular value of the three singular values of the coordinate matrix is greater than 0, if the human body region of the person object to be detected is a three-dimensional region.

In order to reduce the error, the embodiment of the application determines whether the human body area of the person object to be detected is a three-dimensional area by determining whether the ratio of the minimum singular value of the at least one singular value to the sum of the at least one singular value is greater than or equal to a second threshold. The second threshold is a very small positive number, and the value range is more than 0 and less than 1.

And under the condition that the ratio of the minimum value of the three singular values of the coordinate matrix to the sum of the three singular values is greater than or equal to a second threshold value, determining the human body area of the person object to be detected as a three-dimensional area, and further determining the person object to be detected as a living body. And under the condition that the ratio of the minimum value of the three singular values of the coordinate matrix to the sum of the three singular values is smaller than a second threshold value, determining the human body area of the person object to be detected as a two-dimensional area, and further determining that the person object to be detected is not a living body.

For example, the three singular values of the coordinate matrix are respectively: a singular value a of size 3, a singular value B of size 4, and a singular value C of size 1. The minimum value of the three singular values is 1, the sum of the three singular values is 8, and the ratio of the minimum value to the sum of the three singular values is 0.125. Assuming that the second threshold is 0.08, since 0.125 is greater than 0.08, it can be determined that the human subject to be detected is a living body.

In the embodiment, a coordinate matrix is constructed according to three-dimensional coordinates of at least four reference points in a world coordinate system, whether a human body area of a person object to be detected is a three-dimensional area or not is determined according to singular values of the coordinate matrix, and whether the person object to be detected is a living body or not is further determined. Because the coordinate matrix can be constructed according to the three-dimensional coordinates of at least four reference points in the world coordinate system for the human body region of any one person object to be detected, the method for determining whether the person object to be detected is a living body or not provided in the embodiment can be applied to any scene. The technical scheme provided by the implementation can improve the universality of the three-dimensional living body detection method.

It will be appreciated by those skilled in the art that in the above-described method of the specific embodiments, the written order of steps is not meant to imply a strict order of execution but rather should be construed according to the function and possibly inherent logic of the steps.

The foregoing details the method of embodiments of the present application, and the apparatus of embodiments of the present application is provided below.

Referring to fig. 8, fig. 8 is a schematic structural diagram of an image processing apparatus according to an embodiment of the present application, where the apparatus 1 includes: a first acquisition unit 11, a first processing unit 12, a second processing unit 13, and a determination unit 14, wherein:

a first obtaining unit 11, configured to obtain a binocular image and parameters of a binocular camera that collects the binocular image, where the binocular image includes a human body area of a person object to be detected;

a first processing unit 12, configured to obtain depth information of at least four reference points in the human body region, horizontal position information of the at least four reference points, and vertical position information of the at least four reference points according to the binocular image, where the reference points include a face key point, and the reference points include the face key point and a torso key point;

a second processing unit 13, configured to obtain three-dimensional position information of the at least four reference points under a world coordinate system according to parameters of the binocular camera, horizontal position information of the at least four reference points, vertical position information of the at least four reference points, and depth information of the at least four reference points;

A determining unit 14, configured to determine that the person object to be detected is a living body if a variance of three-dimensional position information of the at least four reference points in a depth direction in a world coordinate system is greater than or equal to a first threshold, where the depth direction is a direction perpendicular to an image plane of the binocular camera when the binocular camera acquires the binocular image.

the first processing unit 12 is configured to:

In a further possible implementation, the first processing unit 12 is configured to:

the second processing unit 13 is configured to:

the determining unit 14 is configured to include:

In a further possible implementation manner, the determining unit 14 is configured to:

the first processing unit 12 is configured to:

In some embodiments, functions or modules included in an apparatus provided by the embodiments of the present disclosure may be used to perform a method described in the foregoing method embodiments, and specific implementations thereof may refer to descriptions of the foregoing method embodiments, which are not repeated herein for brevity.

Fig. 9 is a schematic hardware structure of an image processing apparatus according to an embodiment of the present application. The image processing device 2 comprises a processor 21, a memory 22, an input device 23 and an output device 24. The processor 21, memory 22, input device 23, and output device 24 are coupled by connectors, including various interfaces, transmission lines or buses, etc., as not limited in this application. It should be understood that in various embodiments of the present application, coupled is intended to mean interconnected by a particular means, including directly or indirectly through other devices, e.g., through various interfaces, transmission lines, buses, etc.

The processor 21 may be one or more graphics processors (graphics processing unit, GPUs), which may be single-core GPUs or multi-core GPUs in the case where the processor 21 is a GPU. Alternatively, the processor 21 may be a processor group formed by a plurality of GPUs, and the plurality of processors are coupled to each other through one or more buses. In the alternative, the processor may be another type of processor, and the embodiment of the present application is not limited.

Memory 22 may be used to store computer program instructions as well as various types of computer program code for performing aspects of the present application. Optionally, the memory includes, but is not limited to, a random access memory (random access memory, RAM), a read-only memory (ROM), an erasable programmable read-only memory (erasable programmable read only memory, EPROM), or a portable read-only memory (compact disc read-only memory, CD-ROM) for associated instructions and data.

The input means 23 are for inputting data and/or signals and the output means 24 are for outputting data and/or signals. The input device 23 and the output device 24 may be separate devices or may be an integral device.

It will be appreciated that in the embodiment of the present application, the memory 22 may be used to store not only related instructions, but also related data, for example, the memory 22 may be used to store a binocular image acquired through the input device 23, or the memory 22 may be further used to store depth information obtained through the processor 21, etc., and the embodiment of the present application is not limited to the data specifically stored in the memory.

It will be appreciated that fig. 9 shows only a simplified design of an image processing apparatus. In practical applications, the image processing apparatus may also include other necessary elements, including but not limited to any number of input/output devices, processors, memories, etc., and all image processing apparatuses capable of implementing the embodiments of the present application are within the scope of protection of the present application.

Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

It will be clear to those skilled in the art that, for convenience and brevity of description, specific working procedures of the above-described systems, apparatuses and units may refer to corresponding procedures in the foregoing method embodiments, and are not repeated herein. It will be further apparent to those skilled in the art that the descriptions of the various embodiments herein are provided with emphasis, and that the same or similar parts may not be explicitly described in different embodiments for the sake of convenience and brevity of description, and thus, parts not described in one embodiment or in detail may be referred to in the description of other embodiments.

In the several embodiments provided in this application, it should be understood that the disclosed systems, devices, and methods may be implemented in other manners. For example, the apparatus embodiments described above are merely illustrative, e.g., the division of the units is merely a logical function division, and there may be additional divisions when actually implemented, e.g., multiple units or components may be combined or integrated into another system, or some features may be omitted or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices or units, which may be in electrical, mechanical or other form.

The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, each functional unit in each embodiment of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit.

In the above embodiments, it may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded and executed on a computer, produces a flow or function in accordance with embodiments of the present application, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable apparatus. The computer instructions may be stored in or transmitted across a computer-readable storage medium. The computer instructions may be transmitted from one website, computer, server, or data center to another website, computer, server, or data center by a wired (e.g., coaxial cable, fiber optic, digital subscriber line (digital subscriber line, DSL)), or wireless (e.g., infrared, wireless, microwave, etc.). The computer readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server, data center, etc. that contains an integration of one or more available media. The usable medium may be a magnetic medium (e.g., a floppy disk, a hard disk, a magnetic tape), an optical medium (e.g., a digital versatile disk (digital versatile disc, DVD)), or a semiconductor medium (e.g., a Solid State Disk (SSD)), or the like.

Those of ordinary skill in the art will appreciate that implementing all or part of the above-described method embodiments may be accomplished by a computer program to instruct related hardware, the program may be stored in a computer readable storage medium, and the program may include the above-described method embodiments when executed. And the aforementioned storage medium includes: a read-only memory (ROM) or a random access memory (random access memory, RAM), a magnetic disk or an optical disk, or the like.

Claims

1. An image processing method, the method comprising:

Determining that the person object to be detected is a living body under the condition that the variance of the three-dimensional position information of the at least four reference points in the world coordinate system in the depth direction is larger than or equal to a first threshold, wherein the depth direction is a direction perpendicular to an image plane of the binocular camera when the binocular camera collects the binocular image; the determining that the person object to be detected is a living body in the case that the variance of the three-dimensional position information of the at least four reference points in the depth direction under the world coordinate system is greater than or equal to a first threshold value includes: constructing a matrix according to the three-dimensional position information of the at least four reference points under a world coordinate system, so that each row in the matrix contains the three-dimensional position information of one reference point to obtain a coordinate matrix; and determining at least one singular value of the coordinate matrix, and determining that the person object to be detected is a living body under the condition that the ratio of the minimum value in the at least one singular value to the sum of the at least one singular value is greater than or equal to a second threshold value.

2. The method of claim 1, wherein the binocular image comprises: a first image to be processed and a second image to be processed; the binocular camera includes: a first camera for acquiring the first image to be processed and a second camera for acquiring the second image to be processed; parameters of the binocular camera include: a distance between the first camera and the second camera, a first focal length of the first camera, and a second focal length of the second camera;

3. The method of claim 2, wherein the at least four reference points comprise a first reference point;

4. The method of claim 2, wherein the performing the stereo correction process on the first to-be-processed image and the second to-be-processed image to normalize the first focal length and the second focal length to obtain normalized focal lengths includes:

5. The method of claim 4, wherein the at least four reference points comprise a second reference point; the normalized camera parameters include: the normalized horizontal position information of the center point of the camera and the vertical position information of the center point, wherein the center point is an intersection point of the image plane of the first camera and the optical axis of the first camera; the normalized focal length includes: the horizontal position information of the normalized focal length and the vertical position information of the normalized focal length; the three-dimensional position information of the at least four reference points in the world coordinate system comprises: horizontal position information of the at least four reference points in a world coordinate system, vertical position information of the at least four reference points in the world coordinate system and depth position information of the at least four reference points in the world coordinate system;

6. The method of claim 2, wherein the human body region comprises a human face region; the at least four key points comprise first face key points;

7. The method of claim 2, wherein the body region comprises a face region and a torso region; the at least four key points include: the third face key point and the first trunk key point;

8. An image processing apparatus, characterized in that the apparatus comprises:

a determining unit, configured to determine that the person object to be detected is a living body if a variance of three-dimensional position information of the at least four reference points in a depth direction in a world coordinate system is greater than or equal to a first threshold, where the depth direction is a direction perpendicular to an image plane of the binocular camera when the binocular camera collects the binocular image; the determining that the person object to be detected is a living body in the case that the variance of the three-dimensional position information of the at least four reference points in the depth direction under the world coordinate system is greater than or equal to a first threshold value includes: constructing a matrix by using three-dimensional position information of at least four reference points under a world coordinate system, so that each row in the matrix contains three-dimensional position information of one reference point to obtain a coordinate matrix; and determining at least one singular value of the coordinate matrix, and determining that the person object to be detected is a living body under the condition that the ratio of the minimum value in the at least one singular value to the sum of the at least one singular value is greater than or equal to a second threshold value.

9. The apparatus of claim 8, wherein the binocular image comprises: a first image to be processed and a second image to be processed; the binocular camera includes: a first camera for acquiring the first image to be processed and a second camera for acquiring the second image to be processed; parameters of the binocular camera include: a distance between the first camera and the second camera, a first focal length of the first camera, and a second focal length of the second camera;

the first processing unit is used for:

10. The apparatus of claim 9, wherein the at least four reference points comprise a first reference point;

The first processing unit is used for:

11. The apparatus of claim 9, wherein the first processing unit is configured to:

the second processing unit is used for:

12. The apparatus of claim 11, wherein the at least four reference points comprise a second reference point; the normalized camera parameters include: the normalized horizontal position information of the center point of the camera and the vertical position information of the center point, wherein the center point is an intersection point of the image plane of the first camera and the optical axis of the first camera; the normalized focal length includes: the horizontal position information of the normalized focal length and the vertical position information of the normalized focal length; the three-dimensional position information of the at least four reference points in the world coordinate system comprises: horizontal position information of the at least four reference points in a world coordinate system, vertical position information of the at least four reference points in the world coordinate system and depth position information of the at least four reference points in the world coordinate system;

The determining unit is configured to include:

13. The apparatus of claim 9, wherein the body region comprises a face region; the at least four key points comprise first face key points;

The first processing unit is used for:

14. The apparatus of claim 9, wherein the body region comprises a face region and a torso region; the at least four key points include: the third face key point and the first trunk key point;

The first processing unit is used for:

15. A processor for performing the method of any one of claims 1 to 7.

16. An electronic device, comprising: a processor, transmission means, input means, output means and memory for storing computer program code comprising computer instructions which, when executed by the processor, cause the electronic device to perform the method of any one of claims 1 to 7.

17. A computer readable storage medium, characterized in that the computer readable storage medium has stored therein a computer program comprising program instructions which, when executed by a processor of an electronic device, cause the processor to perform the method of any of claims 1 to 7.

18. A computer program product comprising instructions which, when run on a computer, cause the computer to perform the method of any of claims 1 to 7.