CN113095274B

CN113095274B - Sight estimation method, system, device and storage medium

Info

Publication number: CN113095274B
Application number: CN202110450755.1A
Authority: CN
Inventors: 梁姗姗; 张航
Original assignee: Sun Yat Sen University
Current assignee: Sun Yat Sen University
Priority date: 2021-04-26
Filing date: 2021-04-26
Publication date: 2024-02-09
Anticipated expiration: 2041-04-26
Also published as: CN113095274A

Abstract

The invention discloses a sight line estimation method, a sight line estimation system, a sight line estimation device and a storage medium, wherein the sight line estimation method comprises the following steps: acquiring a human face image, and performing key point detection and 3D model fitting processing to obtain a human eye image and a 3D head rotation vector; carrying out data regularization on the human eye image and the 3D head rotation vector to obtain a regularized human eye image and a head posture estimation vector; the regularized eye image and the head pose estimation vector are input to a pre-trained CNN network and the network output is converted into a 3D gaze direction vector. The system comprises: the device comprises an image preprocessing module, a data regularization module and a result output module. The apparatus includes a memory and a processor for performing the line-of-sight estimation method described above. By using the method and the device, the high-precision sight line estimation result can be obtained. The sight line estimation method, the sight line estimation system, the sight line estimation device and the storage medium can be widely applied to the field of sight line estimation.

Description

Sight estimation method, system, device and storage medium

Technical Field

The present invention relates to the field of line of sight estimation, and in particular, to a line of sight estimation method, system, apparatus, and storage medium.

Background

The visual line estimation technology is a technology for researching how to accurately track the visual direction and visual attention of human beings, has wide application scenes and huge application value in actual life, can be applied to the fields of cognition science, psychology, medical research, automobile driving, entertainment, advertisement, marketing research and the like, brings convenience to life, comprehensively improves social science and technology level, and is accompanied with continuous improvement of optical imaging technology and image processing capability, particularly development of computer vision, the visual line estimation method based on images starts to be dominant, and the current visual line estimation method comprises a model-based method and a visual line based on expression, but the existing method has the problems of low estimation precision, slower estimation speed, strong scene dependence, complex experimental process, poor user experience and the like.

Disclosure of Invention

In order to solve the technical problems, the invention aims to provide a sight line estimation method, a sight line estimation system, a sight line estimation device and a sight line estimation storage medium, which have high precision, do not need calibration and are simple to operate.

The first technical scheme adopted by the invention is as follows: a line-of-sight estimation method, comprising the steps of:

acquiring a human face image, and performing key point detection and 3D model fitting processing to obtain a human eye image and a 3D head rotation vector;

carrying out data regularization on the human eye image and the 3D head rotation vector to obtain a regularized human eye image and a head posture estimation vector;

the regularized eye image and the head pose estimation vector are input to a pre-trained CNN network and the network output is converted into a 3D gaze direction vector.

Further, the step of obtaining a human face image and performing key point detection and 3D model fitting processing to obtain a human eye image and a 3D head rotation vector specifically includes:

acquiring a complete face image;

2D face alignment is carried out based on dlib face detection and 68 face key point detection, and two-dimensional coordinates of the face key points corresponding to the image are obtained;

acquiring a human eye image according to the positions of eye key points in the two-dimensional coordinates of the human face key points;

acquiring a 3D face key point model;

and fitting the two-dimensional coordinates of the face key points and the 3D face key point model based on an EPnP algorithm to obtain a 3D head rotation vector.

Further, before regularization of the data of the human eye image, the method further comprises the steps of blink detection and screening of the human eye image, and specifically comprises the following steps:

obtaining a horizontal line and a vertical line passing through eyes according to the left eye key point information and the right eye key point information in the human eye image;

calculating the ratio of the horizontal line to the corresponding vertical line;

judging that the ratio is larger than a preset threshold value, determining that the human eye image is in an eye opening state, and estimating the sight line;

and judging that the ratio is smaller than a preset threshold value, determining that the human eye image is in a closed eye state, and not estimating the sight line.

Further, the formula for data regularization is as follows:

M＝S*R

in the above equation, R represents an inverse of the camera rotation matrix, and S represents a scaling matrix.

Further, the step of regularizing the data of the human eye image and the 3D head rotation vector to obtain a regularized human eye image and a head posture estimation vector specifically includes:

processing the human eye image and the 3D head rotation vector based on the transformation matrix;

rotating the camera coordinate system by an R rotation matrix;

scaling the camera coordinate system by an S scaling matrix;

and finally obtaining regularized human eye images and head gesture estimation vectors through perspective transformation.

Further, the training step of the pre-trained CNN network specifically includes:

the method comprises the steps of obtaining a human eye image with a real sight angle label and inputting a head posture estimation vector to a CNN (computer numerical network) to obtain network output;

calculating the error between the network output and the real sight angle label based on the loss function of the mean square error to obtain an error result;

and adjusting network parameters according to the error result to obtain the trained sight estimation model.

Further, the step of inputting the regularized eye image and the head pose estimation vector into a pre-trained CNN network and converting the network output into a 3D gaze direction vector specifically includes:

inputting regularized eye images and head pose estimation vectors into a pre-trained CNN network;

the eye characteristics are obtained through convolution of a convolution layer and lamination of a pooling layer;

splicing the head posture estimation vector with the extracted eye characteristics through the full connection layer, and outputting a 2D sight angle;

and geometrically converting the 2D sight angle to obtain a 3D sight direction vector.

The second technical scheme adopted by the invention is as follows: a gaze estimation system, comprising:

the image preprocessing module is used for acquiring a human face image, performing key point detection and 3D model fitting processing to obtain a human eye image and a 3D head rotation vector;

the data regularization module is used for carrying out data regularization on the human eye image and the 3D head rotation vector to obtain a regularized human eye image and a head posture estimation vector;

and the result output module is used for inputting the regularized human eye image and the head posture estimation vector into a pre-trained CNN network and converting network output into a 3D sight direction vector.

The third technical scheme adopted by the invention is as follows: a line-of-sight estimation apparatus comprising:

at least one processor;

at least one memory for storing at least one program;

the at least one program, when executed by the at least one processor, causes the at least one processor to implement a gaze estimation method as described above.

The fourth technical scheme adopted by the invention is as follows: a storage medium having stored therein instructions executable by a processor, characterized by: the processor executable instructions when executed by the processor are for implementing a gaze estimation method as described above.

The method, the system, the device and the storage medium have the beneficial effects that: the invention firstly judges whether the face exists, then carries out human eye detection by determining a plurality of key points of eyes, finally inputs the obtained eye image into a CNN network through cutting so as to realize sight estimation, screens closed-eye pictures through blink detection, and enables the sight direction estimation to be more reasonable and accurate.

Drawings

FIG. 1 is a flow chart of the steps of a line-of-sight estimation method of the present invention;

FIG. 2 is a schematic diagram of a line-of-sight estimation method according to an embodiment of the present invention;

FIG. 3 is a schematic diagram of data regularization of a human eye image in accordance with an embodiment of the invention;

FIG. 4 is a schematic illustration of regularized human eye (left and right eye) images in accordance with an embodiment of the present invention;

FIG. 5 is a schematic diagram of 68 face keypoints according to an embodiment of the invention;

fig. 6 is a block diagram of the structure of a line-of-sight estimating system of the present invention.

Detailed Description

The invention will now be described in further detail with reference to the drawings and to specific examples. The step numbers in the following embodiments are set for convenience of illustration only, and the order between the steps is not limited in any way, and the execution order of the steps in the embodiments may be adaptively adjusted according to the understanding of those skilled in the art.

Referring to fig. 1 and 2, the present invention provides a line of sight estimation method, comprising the steps of:

s1, acquiring a human face image, and performing key point detection and 3D model fitting processing to obtain a human eye image and a 3D head rotation vector;

s2, carrying out data regularization on the human eye image and the 3D head rotation vector to obtain a regularized human eye image and a head posture estimation vector;

s3, inputting regularized human eye images and head posture estimation vectors into a pre-trained CNN network, and converting network output into 3D sight direction vectors.

Further as a preferred embodiment of the method, the step of obtaining a human face image and performing key point detection and 3D model fitting processing to obtain a human eye image and a 3D head rotation vector specifically includes:

acquiring a complete face image;

acquiring a 3D face key point model;

specifically, a 3D-FAN network is adopted to carry out fine adjustment on data sets of 300W, 300W-LP-3D and the like, so as to obtain 68 face key point models (namely average face models) required by the method.

Wherein, the EPnP algorithm is a weighted sum of n three-dimensional space points expressed as 4 virtual control points. Then, the coordinates of the 4 control points in the camera coordinate system need to be estimated, and the coordinates of the control points in the camera coordinate system can be obtained by expressing the coordinates as a weighted sum of feature vectors of a 12 x 12 matrix and solving a small constant quadratic equation to select the correct weight. Finally, according to Euclidean motions of a camera coordinate system and a world coordinate system, a translation vector and a rotation matrix of the coordinate system can be solved.

Referring to fig. 3, a head coordinate system (X _h ，Y _h ，Z _h ) The definition mode is as follows: origin at the tip of the nose, Z _h The axis is perpendicular to the plane formed by the three midpoints of eyes and mouth, X _h The axis being parallel to a line passing through the midpoint of the eyes, Y _h Perpendicular to axis Z _h Axis and X _h Axes, coordinates are in meters. And the distance between the outer corners of the model is set to be 90mm. Wherein the triangular area is a plane formed by three midpoints of eyes and mouth; the order of the dots from top to bottom and from left to right is as follows: left and right outer corners of the eye, nose tip and two key points of the mouth.

Further as a preferred embodiment of the method, before regularizing the data of the human eye image, the method further includes a step of blink detection and screening of the human eye image, specifically including:

Specifically, referring to fig. 5, based on face key point detection, we can determine 68 specific face key points, each with a specific index. Thus, we can get several key point indices for the left and right eyes (36,37,38,39,40,41) and (42,43,44,45,46,47), respectively, with the length of the horizontal line almost unchanged and the vertical line different when the eyes are open and closed. When the eye is open, the vertical line length is much greater than when closed. The eye was closed and the vertical line length was almost zero.

Further as a preferred embodiment of the method, the formula for data regularization is as follows:

M＝S*R

in the above equation, R represents the inverse of the camera rotation matrix, which causes the x-axis of the head coordinate system to be perpendicular to the y-axis of the camera coordinate system, the camera z-axis to be oriented toward the eye position, and S represents the scaling matrix, which causes the distance of the eye from the camera coordinate system to remain fixed.

Further as a preferred embodiment of the method, the step of regularizing the data of the human eye image and the 3D head rotation vector to obtain a regularized human eye image and a head pose estimation vector specifically includes:

rotating the camera coordinate system by an R rotation matrix;

scaling the camera coordinate system by an S scaling matrix;

Specifically, in order to realize high-precision line-of-sight estimation under different camera parameter conditions, data regularization is needed, that is, regularization processing is performed on an input image, so that the distance from a camera to the position of human eyes is ensured to be fixed, the x-axis of a head coordinate system is perpendicular to the y-axis of the camera coordinate system, and the z-axis of the camera faces the eyes.

Image regularization step schematic views referring to fig. 3 and 4, (a) from the head coordinate system centered on the tip of the nose (up) and the camera coordinate system (down); (b) the camera coordinate system is rotated in a rotation matrix; (c) scaling the camera coordinate system by the S scaling matrix; (d) And finally obtaining the regularized eye image through perspective transformation.

Further as a preferred embodiment of the method, the training step of the pre-trained CNN network specifically includes:

Further as a preferred embodiment of the method, the step of inputting regularized eye images and head pose estimation vectors into a pre-trained CNN network and converting the network output into 3D gaze direction vectors specifically comprises:

specifically, the convolution layer carries out convolution operation, extracts eye features, and passes through the pooling layer to compress input features and extract main features, so that network calculation complexity is simplified.

As shown in fig. 6, a line-of-sight estimation system includes:

The content in the method embodiment is applicable to the system embodiment, the functions specifically realized by the system embodiment are the same as those of the method embodiment, and the achieved beneficial effects are the same as those of the method embodiment.

A line-of-sight estimating apparatus:

at least one processor;

at least one memory for storing at least one program;

The content in the method embodiment is applicable to the embodiment of the device, and the functions specifically realized by the embodiment of the device are the same as those of the method embodiment, and the obtained beneficial effects are the same as those of the method embodiment.

A storage medium having stored therein instructions executable by a processor, characterized by: the processor executable instructions when executed by the processor are for implementing a gaze estimation method as described above.

The content in the method embodiment is applicable to the storage medium embodiment, and functions specifically implemented by the storage medium embodiment are the same as those of the method embodiment, and the achieved beneficial effects are the same as those of the method embodiment.

While the preferred embodiment of the present invention has been described in detail, the invention is not limited to the embodiment, and various equivalent modifications and substitutions can be made by those skilled in the art without departing from the spirit of the invention, and these modifications and substitutions are intended to be included in the scope of the present invention as defined in the appended claims.

Claims

1. A line-of-sight estimation method, comprising the steps of:

inputting regularized human eye images and head pose estimation vectors into a pre-trained CNN network, and converting network output into 3D gaze direction vectors;

the step of obtaining a human face image, performing key point detection and 3D model fitting processing to obtain a human eye image and a 3D head rotation vector specifically comprises the following steps:

acquiring a complete face image;

acquiring a 3D face key point model;

fitting the two-dimensional coordinates of the key points of the human face and the key point model of the 3D human face based on an EPnP algorithm to obtain a 3D head rotation vector;

the EPnP algorithm is to represent n three-dimensional space points as a weighted sum of 4 virtual control points;

before regularization of the data of the human eye image, the method further comprises the steps of blink detection and screening of the human eye image, and specifically comprises the following steps:

2. The line-of-sight estimation method of claim 1, wherein the data regularization formula is as follows:

M＝S*R

3. The line-of-sight estimating method according to claim 2, wherein the step of regularizing the human eye image and the 3D head rotation vector to obtain a regularized human eye image and a head pose estimating vector specifically comprises:

rotating the camera coordinate system by an R rotation matrix;

scaling the camera coordinate system by an S scaling matrix;

4. A line-of-sight estimation method according to claim 3, wherein the training step of the pre-trained CNN network specifically comprises:

5. The line-of-sight estimating method according to claim 4, wherein the step of inputting regularized eye images and head pose estimating vectors to a pre-trained CNN network and converting network outputs into 3D line-of-sight direction vectors, specifically comprises:

6. A line of sight estimation system for performing the line of sight estimation method of claim 1, comprising:

7. A line-of-sight estimating apparatus, comprising:

at least one processor;

at least one memory for storing at least one program;

the at least one program, when executed by the at least one processor, causes the at least one processor to implement a gaze estimation method as recited in any of claims 1-5.

8. A storage medium having stored therein instructions executable by a processor, characterized by: the processor executable instructions when executed by a processor are for implementing a line of sight estimation method as claimed in any one of claims 1 to 5.