CN113366491A - Eyeball tracking method, device and storage medium - Google Patents

Eyeball tracking method, device and storage medium Download PDF

Info

Publication number
CN113366491A
CN113366491A CN202180001560.7A CN202180001560A CN113366491A CN 113366491 A CN113366491 A CN 113366491A CN 202180001560 A CN202180001560 A CN 202180001560A CN 113366491 A CN113366491 A CN 113366491A
Authority
CN
China
Prior art keywords
target
user
sample
face
depth image
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202180001560.7A
Other languages
Chinese (zh)
Other versions
CN113366491B (en
Inventor
袁麓
张国华
张代齐
郑爽
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huawei Technologies Co Ltd
Original Assignee
Huawei Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Technologies Co Ltd filed Critical Huawei Technologies Co Ltd
Publication of CN113366491A publication Critical patent/CN113366491A/en
Application granted granted Critical
Publication of CN113366491B publication Critical patent/CN113366491B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/011Arrangements for interaction with the human body, e.g. for user immersion in virtual reality
    • G06F3/013Eye tracking input arrangements

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Human Computer Interaction (AREA)
  • Image Analysis (AREA)
  • Image Processing (AREA)

Abstract

The embodiment of the application provides an eyeball tracking method, a device and a storage medium, comprising the following steps: preprocessing the gray level image and the depth image to obtain a gray level-depth image of a target under a preset coordinate system; performing human head detection on the gray-depth image of the target to obtain a gray-depth image of the head of the target; carrying out face reconstruction processing on the gray level-depth image of the head of the target to obtain face information of the target; and obtaining the pupil position of the target according to the face information. According to the scheme, the point cloud of the target is obtained based on the gray level image and the depth image of the target, the point cloud of the head of the target is obtained by carrying out human head detection, and the pupil position of the target is obtained by carrying out face reconstruction processing according to the point cloud of the head of the target. By adopting the method, the human face of the target is reconstructed based on the information of two dimensions of the gray image and the depth image, and an accurate sight line starting point can be obtained in real time.

Description

Eyeball tracking method, device and storage medium
Technical Field
The present application relates to the field of image processing technologies, and in particular, to an eyeball tracking method, an eyeball tracking device, and a storage medium.
Background
The sight line estimation technology is an important technology for understanding human intention in human-computer interaction, and can be applied to scenes such as game interaction, medical diagnosis (psychological diseases), analysis of driver intention in a cockpit and the like. The sight line starting Point (namely eyeball position) and the sight line direction are two important modules forming sight line estimation, and the two modules can obtain a Point of interest (Point of Regard) of the sight line of the user by combining three-dimensional modeling of a scene environment, so that the intention of the user can be known more accurately, and interaction is completed.
Currently, monocular cameras are used to estimate the position of the gaze starting point in three-dimensional space when determining the eyeball position, and the distance between the human eye and the camera is estimated by using a priori and a camera imaging model. By adopting the technology, under the condition of normal driving distance, the depth error is 2-3 centimeters (cm), and the technology cannot meet the scene with higher precision requirement, such as lighting a central control screen in a vehicle-mounted scene. And the error of the initial point of 2-3cm can cause a large error of the predicted PoR in the corresponding direction, and particularly, the difference between the intersection point of the sight line direction and the object and the true value is larger and larger as the distance of the gazing object from the user is farther, so that the requirement of interaction between the sight line and the object outside the vehicle cannot be met.
Currently, a depth sensor method is adopted to determine the eyeball position, the optimization-based face reconstruction is firstly performed by using depth data in an off-line manner, and when deployment is performed, iterative closest point algorithm processing is performed by using a reconstructed face model and point cloud data acquired in real time to acquire the current pose of 6 degrees of freedom of the face, so that the three-dimensional position of the eyeball is acquired. By adopting the technology, offline registration is needed to obtain the face grid information of the user, and meanwhile, the registration error of the iterative closest point algorithm is larger when the change amplitude of the face expression is larger. Therefore, the prior art cannot cope with an open environment and an actual vehicle-mounted scene.
Disclosure of Invention
The embodiment of the application provides an eyeball tracking method, an eyeball tracking device and a storage medium, so that the eyeball tracking precision is improved.
In a first aspect, an embodiment of the present application provides an eyeball tracking method, including: preprocessing a gray image and a depth image to obtain a gray-depth image of a target under a preset coordinate system, wherein the gray image and the depth image both contain head information of the target; performing human head detection on the gray-depth image of the target to obtain a gray-depth image of the head of the target; carrying out face reconstruction processing on the gray level-depth image of the head of the target to obtain face information of the target; and obtaining the pupil position of the target according to the face information.
According to the embodiment of the application, the gray-depth image of the target is obtained based on the gray image and the depth image of the target, the gray-depth image of the head of the target is obtained by detecting the human head, and the human face reconstruction processing is carried out according to the gray-depth image of the head of the target, so that the pupil position of the target is obtained. By adopting the method, the human face of the target is reconstructed based on the information of two dimensions of the gray image and the depth image, and an accurate sight line starting point can be obtained in real time.
As an optional implementation manner, the performing face reconstruction processing on the gray-level depth image of the head of the target to obtain face information of the target includes: performing feature extraction on the gray level-depth image of the head of the target to obtain a gray level feature and a depth feature of the target; fusing the gray level feature and the depth feature of the target to obtain a human face model parameter of the target; and obtaining the face information of the target according to the face model parameters of the target.
And obtaining the face model parameters of the target by fusing the gray level feature and the depth feature of the target, and further obtaining the face information of the target. The human face model parameters of the target are fused with the gray scale features and the depth features, and compared with the method that only the gray scale features are contained in the prior art, the human face model parameters of the target are more comprehensive, and the eyeball tracking precision can be effectively improved.
As an alternative implementation, the face reconstruction processing on the gray-scale depth image of the head of the target is processed through a face reconstruction network model.
As an optional implementation manner, the face reconstruction network model is obtained by training as follows: respectively extracting the characteristics of a user gray level image sample and a user depth image sample which are input into a face reconstruction network model to obtain the gray level characteristics and the depth characteristics of the user; fusing the gray level features and the depth features of the user to obtain face model parameters of the user, wherein the face model parameters comprise identity parameters, expression parameters, texture parameters, rotation parameters and displacement parameters; obtaining face information according to the face model parameters of the user; and obtaining a loss value according to the face information, if the loss value does not reach a stop condition, adjusting parameters of the face reconstruction network model, and repeatedly executing the steps until the stop condition is reached to obtain the trained face reconstruction network model, wherein the weight of the user eyes in a first loss function corresponding to the loss value is not less than a preset threshold value. The stop condition may be that the loss value is not greater than a preset value.
As another optional implementation, the method further includes: acquiring a first point cloud sample of the user, a point cloud sample of a shelter and a texture sample; overlaying the point cloud sample of the shelter on the first point cloud sample of the user to obtain a second point cloud sample of the user; blanking the second point cloud sample of the user to obtain a third point cloud sample of the user; rendering the third point cloud sample and the texture sample of the shelter to obtain a two-dimensional image sample of the user; and respectively performing enhancement processing of adding noise on the two-dimensional image sample of the user and the third point cloud sample to obtain an enhanced two-dimensional image sample and an enhanced depth image sample of the user, wherein the enhanced two-dimensional image sample and the enhanced depth image sample of the user are respectively a user gray level image sample and a user depth image sample of the input face reconstruction network model.
According to the method and the device, the point cloud sample of the user, the point cloud sample of the shielding object and the texture sample are obtained, and the situation that the shielding object exists is simulated, so that the face reconstruction network model capable of adapting to the shielding object is obtained through training. By adopting the scheme, stronger robustness on the eye shelter can be realized; and the data of the eye region is enhanced, so that the reconstruction precision of the eye region is higher. By adopting the method, the conditions which can occur in various real scenes can be simulated, and the corresponding enhanced two-dimensional image and three-dimensional image are obtained, so that the robustness of the algorithm is improved.
In a second aspect, an embodiment of the present application provides an eyeball tracking device, including: the system comprises a preprocessing module, a data processing module and a data processing module, wherein the preprocessing module is used for preprocessing a gray level image and a depth image to obtain a gray level-depth image of a target under a preset coordinate system, and the gray level image and the depth image both comprise head information of the target; the detection module is used for carrying out human head detection on the gray level-depth image of the target so as to obtain a gray level-depth image of the head of the target; the reconstruction processing module is used for carrying out face reconstruction processing on the gray level-depth image of the head of the target so as to obtain face information of the target; and the acquisition module is used for obtaining the pupil position of the target according to the face information.
As an optional implementation manner, the reconstruction processing module is configured to: performing feature extraction on the gray level-depth image of the head of the target to obtain a gray level feature and a depth feature of the target; fusing the gray level feature and the depth feature of the target to obtain a human face model parameter of the target; and obtaining the face information of the target according to the face model parameters of the target.
As an alternative implementation, the face reconstruction processing on the gray-scale depth image of the head of the target is processed through a face reconstruction network model.
As an optional implementation manner, the face reconstruction network model is obtained by training as follows: respectively extracting the characteristics of a user gray level image sample and a user depth image sample which are input into a face reconstruction network model to obtain the gray level characteristics and the depth characteristics of the user; fusing the gray level features and the depth features of the user to obtain face model parameters of the user, wherein the face model parameters comprise identity parameters, expression parameters, texture parameters, rotation parameters and displacement parameters; obtaining face information according to the face model parameters of the user; and obtaining a loss value according to the face information, if the loss value does not reach a stop condition, adjusting parameters of the face reconstruction network model, and repeatedly executing the steps until the stop condition is reached to obtain the trained face reconstruction network model, wherein the weight of the user eyes in a first loss function corresponding to the loss value is not less than a preset threshold value.
As another optional implementation, the apparatus is further configured to: acquiring a first point cloud sample of the user, a point cloud sample of a shelter and a texture sample; overlaying the point cloud sample of the shelter on the first point cloud sample of the user to obtain a second point cloud sample of the user; blanking the second point cloud sample of the user to obtain a third point cloud sample of the user; rendering the third point cloud sample and the texture sample of the shelter to obtain a two-dimensional image sample of the user; and respectively performing enhancement processing of adding noise on the two-dimensional image sample of the user and the third point cloud sample to obtain an enhanced two-dimensional image sample and an enhanced depth image sample of the user, wherein the enhanced two-dimensional image sample and the enhanced depth image sample of the user are respectively a user gray level image sample and a user depth image sample of the input face reconstruction network model.
In a third aspect, the present application provides a computer storage medium comprising computer instructions that, when executed on an electronic device, cause the electronic device to perform the method as provided in any one of the possible embodiments of the first aspect.
In a fourth aspect, the embodiments of the present application provide a computer program product, which when run on a computer, causes the computer to execute the method as provided in any one of the possible embodiments of the first aspect.
In a fifth aspect, embodiments of the present application provide an eye tracking device, including a processor and a memory; wherein the memory is configured to store program code, and the processor is configured to call the program code to perform the method as provided in any one of the possible embodiments of the first aspect.
In a sixth aspect, an embodiment of the present application provides a server, which includes a processor, a memory, and a bus, where: the processor and the memory are connected through the bus; the memory is used for storing a computer program; the processor is configured to control the memory and execute the program stored in the memory to implement the method according to any one of the possible embodiments of the first aspect.
It is to be understood that the apparatus of the second aspect, the computer storage medium of the third aspect, the computer program product of the fourth aspect, the apparatus of the fifth aspect, and the server of the sixth aspect, which are provided above, are all configured to perform the method provided in any one of the first aspect. Therefore, the beneficial effects achieved by the method can refer to the beneficial effects in the corresponding method, and are not described herein again.
Drawings
Fig. 1 is a schematic flowchart of an eyeball tracking method according to an embodiment of the present application;
fig. 2 is a schematic diagram of an image preprocessing method according to an embodiment of the present disclosure;
fig. 3 is a schematic diagram of a face model reconstruction method according to an embodiment of the present application;
fig. 4 is a schematic diagram of a training method for reconstructing a face model according to an embodiment of the present application;
fig. 5 is a schematic flowchart of another eye tracking method according to an embodiment of the present disclosure;
fig. 6a is a schematic diagram of an image before being processed according to an embodiment of the present application;
fig. 6b is a schematic diagram of an image after being processed according to an embodiment of the present application;
fig. 7 is a schematic structural diagram of an eyeball tracking device provided by the embodiment of the application;
fig. 8 is a schematic structural diagram of another eye tracking device according to an embodiment of the present disclosure.
Detailed Description
The embodiments of the present application are applicable to technologies or scenes such as gaze estimation and gaze tracking in vehicle-mounted scenes, game interaction, and the like.
Fig. 1 is a schematic flow chart of an eyeball tracking method according to an embodiment of the present disclosure. The eyeball tracking method provided in the embodiment of the application can be executed by a vehicle-mounted device (such as a car machine), and can also be executed by terminal equipment such as a mobile phone, a computer and the like. The present solution is not particularly limited to this. As shown in fig. 1, the method may include steps 101-104, which are as follows:
101. preprocessing a gray image and a depth image to obtain a gray-depth image of a target under a preset coordinate system, wherein the gray image and the depth image both contain head information of the target;
the target may be a user, a robot, or the like, and this is not particularly limited in the embodiment of the present application.
As an optional implementation manner, as shown in fig. 2, the grayscale image and the depth image are preprocessed, and the grayscale image with high resolution of the target is acquired by an infrared Sensor (IR), and the depth image with low resolution of the target is acquired by a depth camera; and then aligning, interpolating, fusing and the like the low-resolution depth image and the high-resolution gray image to obtain a high-resolution point cloud under the coordinates of the infrared sensor.
Specifically, the infrared sensor and the Depth sensor are calibrated to obtain a coordinate system conversion relationship, then the Depth of the Depth sensor is converted into the infrared sensor coordinate system, and finally aligned infrared-Depth IR-Depth data, namely a gray-Depth image of the target, is output.
102. Performing human head detection on the gray-depth image of the target to obtain a gray-depth image of the head of the target;
as an alternative implementation, the gray-level depth image of the target is subjected to human head detection by using a detection algorithm, which may be, for example, a common human head detection algorithm based on deep learning.
103. Carrying out face reconstruction processing on the gray level-depth image of the head of the target to obtain face information of the target;
as an alternative implementation manner, as shown in fig. 3, a schematic diagram of a face model reconstruction method provided in the embodiment of the present application is shown. Performing feature extraction on the gray level-depth image of the head of the target to obtain a gray level feature and a depth feature of the target; and carrying out fusion processing on the gray level feature and the depth feature of the target to obtain the human face model parameter of the target.
Optionally, the face model parameters include an identity parameter, an expression parameter, a texture parameter, a rotation parameter, a displacement parameter, and a spherical harmonic parameter. Wherein, the identity parameter refers to the identity information of the user; the expression parameters refer to the expression information of the user; the texture parameter is an albedo principal component coefficient indicating the user; the rotation parameter refers to a rotation vector of the head of the user converted from a world coordinate system to a camera coordinate system; the displacement parameter refers to a translation vector of the head of the user converted from a world coordinate system to a camera coordinate system; the spherical harmonic parameters refer to parameters of the illumination model and are used for modeling the illumination model.
And further obtaining the face information of the target based on the face model parameters of the target.
As another optional implementation manner, the gray-depth image of the head of the target is input to a face reconstruction network model for processing, so as to obtain the face information of the target. The human face reconstruction network model obtains the gray characteristic and the depth characteristic of the target by extracting the characteristic of the gray-depth image of the head of the target; performing fusion processing on the gray level feature and the depth feature of the target to obtain a human face model parameter of the target; and further obtaining the face information of the target according to the face model parameters of the target. That is to say, the face model parameters are regressed through the face reconstruction network model, and face mesh information, namely face information, under a preset coordinate system is further acquired.
Specifically, the gray-depth image of the head of the target is input into a first feature extraction layer of a face reconstruction network model for gray feature extraction, the gray-depth image of the head of the target is input into a second feature extraction layer for depth feature extraction, then the features extracted by the first feature extraction layer and the second feature extraction layer are input into a feature fusion layer for fusion processing, and finally, face model parameters obtained by face reconstruction network model regression are output.
The face reconstruction network model can be obtained by adopting convolutional neural network training. Specifically, as shown in fig. 4, feature extraction is performed on a gray level image sample of a user and a depth image sample of the user, which are input into a face reconstruction network model, respectively, so as to obtain a gray level feature and a depth feature of the user; then, carrying out fusion processing on the gray level feature and the depth feature of the user to obtain the face model parameters of the user, wherein the face model parameters comprise identity parameters, expression parameters, texture parameters, rotation parameters, displacement parameters and spherical harmonic parameters; obtaining face information according to the face model parameters of the user; obtaining a loss value according to the face information, the user gray level image sample and the user depth image sample, if the loss value does not reach a stop condition, adjusting parameters of the face reconstruction network model, and repeatedly executing the steps until the stop condition is reached to obtain the trained face reconstruction network model, wherein the weight of the user eyes in a first loss function corresponding to the loss value is not less than a preset threshold value. The first loss function may be a geometric loss function.
As an alternative implementation, the convolutional neural network is trained in an auto-supervision manner. It includes the following three loss functions:
1) geometric loss Egro(X) for calculating an error between the face point cloud and the depth image point cloud:
Egro(X)=wppEpp(X)+wpsEps(X);
wherein E ispp(X) is a point-to-point loss; eps(X) is the loss of points to the surface of the face model; w is appIs a point-to-point weight; w is apsPoint-to-face weights.
2) Face key point loss Elan(X) calculating a three-dimensional key point projection error of the human face model;
Figure BDA0003126912130000051
wherein L is a visible face key point; LP is a visible eye key point; q. q.siThe ith key point of the face is taken as the face; p is a radical ofiIs the ith three-dimensional (3D) key point on the face model; r is a rotation matrix; t is a displacement vector;
||(qi-qj)-(∏(Rpi+t)-∏(Rpj+t))||2pair of expression (q)i-qj)-(∏(Rpi+t)-∏(Rpj+ t)) square and reopen; sigmai∈L‖qi-∏(Rpi+t)‖2Represents a pair | qi-∏(Rpi+t)‖2A summation wherein | qi-Rpi + t2 represents the absolute value of qi-Rpi + t and then the sum of squares; i. j is a positive integer.
3) Pixel loss Ecol(X) calculating a gray difference between a rendering gray of the face model and an IR gray image;
Figure BDA0003126912130000052
wherein, F is a pixel point visible for the human face model; i issynPixel values rendered for composition; i isrealAre pixel values in the actual image.
The convolutional neural network adopts the following face model regular loss Ereg(X) face constraint:
Figure BDA0003126912130000061
wherein alpha isidThe face identity coefficient; alpha is alphaalbIs the face albedo coefficient; alpha is alphaexpIs a facial expression coefficient; sigmaidIs an identity coefficient weight; sigmaalbIs the albedo coefficient; sigmaexpIs the expression coefficient weight.
Because human eyes are the key positions in the eyeball tracking process, the scheme can properly increase the geometric loss E of the human eyesgro(X) weights in (X) to calculate the error between the face point cloud and the depth image point cloud:
Egro(X)=w1Eeve(X)+w2Enose(X)+w3Emouth(X)+w4Eother(X);
wherein E iseve(X) loss of the vertex of the eye region in the face model; enose(X) is the loss of the vertex of the nose region in the face model; emouth(X) loss of the vertex of the mouth region in the face model; eother(X) the vertex loss of other areas in the face model; w is a1Coefficients for the eye region in the face model; w is a2The coefficients of the nose region in the face model; w is a3Coefficients of a mouth region in the face model; w is a4Coefficients for other regions in the face model.
Wherein the coefficient w of the eye region in the face model1Satisfies a condition not less than a preset threshold value. The preset threshold valueAnd may be any value. For example, w1Satisfies the following conditions: w is a1Not less than w2、w1Not less than w3And w1Not less than w4
The embodiment aims at the loss weight enhancement of the eye region, so that the reconstruction precision of the eye region is higher.
And calculating to obtain a geometric loss value, a face key point loss value and a pixel loss value based on the three loss functions. And if the geometric loss value is not greater than a preset geometric loss threshold value, the face key point loss value is not greater than a preset key point loss threshold value, and the pixel loss value is not greater than a preset pixel loss threshold value, stopping training to obtain a trained face reconstruction network model. If the loss values do not meet the condition, adjusting network parameters, and repeatedly executing the training process until the stop condition is reached.
The stop condition in the above embodiment is explained by taking an example in which the loss value is not greater than the preset loss threshold value. The stopping condition may also be that the number of iterations reaches a preset number, and the like, and this is not specifically limited by the present scheme.
The above description is given by taking three loss functions as examples. Other loss functions may also be used, and this is not specifically limited in this embodiment.
104. And obtaining the pupil position of the target according to the face information.
As an optional implementation manner, the coordinates of the pupils of the eyes can be further obtained according to the eye region key points of the three-dimensional face. Specifically, the pupil position of the target is obtained by solving according to position information of preset key points such as eyelids and canthus on the human face. The pupil position is the starting point of the line of sight.
The embodiments of the present application are described only by taking eye tracking as an example. By adopting the above method, the position of the mouth, the position of the nose, the position of the ear, and the like of the target can be obtained, and the scheme is not particularly limited.
According to the embodiment of the application, the gray-depth image of the target is obtained based on the gray image and the depth image of the target, the gray-depth image of the head of the target is obtained by detecting the human head, and the human face reconstruction processing is carried out according to the gray-depth image of the head of the target, so that the pupil position of the target is obtained. By adopting the method, the human face of the target is reconstructed based on the information of two dimensions of the gray image and the depth image, and an accurate sight line starting point can be obtained in real time.
The focus of the sight line starting point is on the accuracy of the eye region, and the result of eyeball tracking is affected when the eyes of the target are shielded by hands, glasses, a hat and the like, or image change caused by light change, depth error of a depth image and the like. In order to simulate the situations which can occur in various real scenes and enable the face reconstruction network model to cope with various complex scenes, the scheme also provides an eyeball tracking method, and eyeball tracking is carried out based on the obtained enhanced two-dimensional image and three-dimensional point cloud image of the key area corresponding to the target, so that the robustness of the algorithm is improved.
Fig. 5 is a schematic flowchart of another eyeball tracking method according to an embodiment of the present disclosure. The eyeball tracking method provided in the embodiment of the application can be executed by a vehicle-mounted device (such as a car machine), and can also be executed by terminal equipment such as a mobile phone, a computer and the like. The present solution is not particularly limited to this. As shown in fig. 5, the method may include steps 501 and 504, which are as follows:
501. preprocessing a gray image and a depth image to obtain a gray-depth image of a target under a preset coordinate system, wherein the gray image and the depth image both contain head information of the target;
the target may be a user, a robot, or the like, and this is not particularly limited in the embodiment of the present application.
As an optional implementation manner, as shown in fig. 2, the grayscale image and the depth image are preprocessed, and the grayscale image with high resolution of the target is acquired by an infrared Sensor (IR), and the depth image with low resolution of the target is acquired by a depth camera; and then aligning, interpolating, fusing and the like the low-resolution depth image and the high-resolution gray image to obtain a high-resolution point cloud under the coordinates of the infrared sensor.
Specifically, the infrared sensor and the Depth sensor are calibrated to obtain a coordinate system conversion relationship, then the Depth of the Depth sensor is converted into the infrared sensor coordinate system, and finally aligned IR-Depth data, namely a gray-Depth image of the target, is output.
502. Performing human head detection on the gray-depth image of the target to obtain a gray-depth image of the head of the target;
as an alternative implementation, the gray-level depth image of the target is subjected to human head detection by using a detection algorithm, which may be, for example, a common human head detection algorithm based on deep learning.
503. Carrying out face reconstruction processing on the gray level-depth image of the head of the target to obtain face information of the target;
the face reconstruction network model can be obtained by training based on steps 5031 and 5039, and the details are as follows:
5031. acquiring a first point cloud sample of a user, a point cloud sample of a shelter and a texture sample;
the first point cloud sample may be an original point cloud sample of the user, i.e., the point cloud sample of the user without the obstruction.
The shelter is a shelter for the eyes, such as hands, glasses, a hat and the like, or other influences of light change and the like.
5032. Overlaying the point cloud sample of the shelter on the first point cloud sample of the user to obtain a second point cloud sample of the user;
and superposing the point cloud sample of the shielding object in front of the visual angle of the first point cloud sample camera of the user (namely on a camera coordinate system) to obtain a second point cloud sample of the user.
5033. Blanking the second point cloud sample of the user to obtain a third point cloud sample of the user;
in the process of drawing the realistic graphics, the depth information is lost due to projection transformation, which often results in the ambiguity of the graphics. To remove such ambiguities, it is necessary to remove the hidden invisible lines or surfaces during rendering, which is conventionally referred to as removing hidden lines and hidden surfaces, or simply blanking.
And (4) carrying out blanking processing on invisible points behind the shielding object, such as removing the point cloud after the shielding object by adopting a blanking algorithm (for example, a Z-buffer algorithm), so as to obtain a third point cloud sample of the blanked user.
5034. Rendering a third point cloud sample of the user and a texture sample of the shelter to obtain a two-dimensional image sample of the user;
the texture sample of the shielding object positioned in front of the user is rendered to cover the texture of the user behind the shielding object, so that the two-dimensional image sample of the user can be obtained.
5035. Respectively performing enhancement processing of adding noise on the two-dimensional image sample of the user and the third point cloud sample to obtain an enhanced two-dimensional image sample and an enhanced depth image sample of the user, wherein the enhanced two-dimensional image sample and the enhanced depth image sample of the user are respectively a user gray level image sample and a user depth image sample of the input face reconstruction network model;
two-dimensional images and three-dimensional point clouds are obtained after shielding enhancement is carried out, and blocks in various shapes can be superposed to serve as noise. The pixel values or point cloud coordinate values within such a block may conform to a predetermined distribution (e.g., the pixel value distribution satisfies a gaussian distribution with a mean value of 10 and a standard deviation of 0.1, and the point cloud coordinate is assigned a value of zero). To be more realistic, illumination noise, Time of flight (TOF) sensor noise data may also be simulated. For example, blocks of 25 × 25 pixel size, 50 × 50 pixel size, and 100 × 100 pixel size are randomly generated on an IR image and a TOF point cloud, where the gray values of the gray blocks in the two-dimensional image satisfy a gaussian distribution, the mean value of the distribution is the pixel mean value of the corresponding block in the original image, and the standard deviation is 0.01. The block in the point cloud picture can simulate noise such as holes, and the setting depth is zero at the moment. The effect is shown in fig. 6b, where fig. 6a is an effect diagram without superimposed noise.
As an alternative implementation, an original two-dimensional image and three-dimensional point cloud of the user in the cabin are acquired. And acquiring three-dimensional scanning point cloud and texture information of the shielding object by using a scanner. And overlapping the point cloud information of the shielding object on the three-dimensional point cloud information of the user, and removing the point cloud after the shielding object through a z-buffer algorithm to obtain the processed point cloud of the user. And rendering the processed point cloud of the user by scanning the texture of the shielding object to generate a two-dimensional image of the processed user.
Taking the hand occlusion as an example, in order to obtain data of the hand occlusion at various different positions, a scanner may be used to scan the hand first, and three-dimensional point cloud and texture information of the hand are obtained. In the original image, the position of a face key point in a two-dimensional image is obtained by using a face key point algorithm, and the position of the key point in a camera coordinate system can be found in a depth image or a three-dimensional point cloud image according to the position in the image. Then, the three-dimensional model of the hand obtained by scanning before can be put at the corresponding position through the coordinate information of the key point on the face. The occlusion is now in front, and from the sensor perspective, some face regions that were not occluded are now occluded by the hand, and the cloud of face points behind the hand can be eliminated using a blanking algorithm (e.g., z-buffer algorithm). Thus, a complete composite point cloud data can be obtained.
After the point cloud data is acquired, texture information can be acquired according to the point cloud data, and a two-dimensional image under the camera view angle can be rendered, so that an enhanced two-dimensional image and a three-dimensional depth image are acquired.
The above description is only given by way of example, and reflective glasses, opaque sunglasses, and other accessory data that may cause occlusion may also be synthesized. The reconstruction data of the 3d object is obtained through the scanner, the rotation matrix R and the displacement vector T of the human eyes relative to the camera are roughly estimated through an algorithm, the 3d object is moved to the corresponding position by utilizing R, T, the 3d object is superposed on the TOF point cloud data of the flight time by utilizing a blanking algorithm, the grid gray information is superposed on the IR image through perspective projection, and then data synthesis is completed.
5036. Inputting the user gray level image sample and the user depth image sample into a face reconstruction network model to obtain the gray level feature and the depth feature of the user;
the user gray level image sample here is the enhanced two-dimensional image sample of the user, and the user depth image sample here is the enhanced depth image sample.
5037. Fusing the gray level features and the depth features of the user to obtain face model parameters of the user;
5038. obtaining face information according to the face model parameters of the user;
5039. obtaining a loss value according to the face information, a first gray image sample and a first depth image sample of the user, if the loss value does not reach a stop condition, adjusting parameters of the face reconstruction network model, and repeatedly executing the steps until the stop condition is reached to obtain the trained face reconstruction network model, wherein the weight of the user eyes in a first loss function corresponding to the loss value is not less than a preset threshold value;
the first grayscale image sample of the user is an original grayscale image sample of the user, that is, the grayscale image sample of the user when there is no obstruction. The first depth image sample of the user is an original depth image sample of the user, that is, a depth image sample of the user without an obstruction.
For the related description of the step 5036 and the step 5039, reference may be made to the foregoing embodiments, which are not described herein again.
504. And obtaining the pupil position of the target according to the face information.
According to the method and the device, the point cloud sample of the user, the point cloud sample of the shielding object and the texture sample are obtained, and the situation that the shielding object exists is simulated, so that the face reconstruction network model capable of adapting to the shielding object is obtained through training. By adopting the scheme, the data of the eye region is enhanced, so that the reconstruction precision of the eye region is higher; and conditions which can occur in various real scenes can be simulated, and the corresponding enhanced two-dimensional image and three-dimensional image are obtained, so that the robustness of the algorithm is improved.
It should be noted that the eyeball tracking method provided by the application can be executed locally, and can also be executed by uploading the grayscale image and the depth image of the target to the cloud. The cloud end can be realized by a server, the server can be a virtual server, an entity server and the like, and can also be other devices, and the scheme is not particularly limited to this.
Referring to fig. 7, an eyeball tracking apparatus is provided for an embodiment of the present application, where the apparatus may be a vehicle-mounted apparatus (e.g., a car machine), and may also be a terminal device such as a mobile phone and a computer. The device comprises a preprocessing module 701, a detection module 702, a reconstruction processing module 703 and an acquisition module 704, and the details are as follows:
the preprocessing module 701 is configured to preprocess the grayscale image and the depth image to obtain a grayscale-depth image of the target in a preset coordinate system, where the grayscale image and the depth image both include header information of the target;
a detection module 702, configured to perform human head detection on the gray-level depth image of the target to obtain a gray-level depth image of the head of the target;
a reconstruction processing module 703, configured to perform face reconstruction processing on the gray-level depth image of the head of the target to obtain face information of the target;
an obtaining module 704, configured to obtain a pupil position of the target according to the face information.
According to the embodiment of the application, the gray-depth image of the target is obtained based on the gray image and the depth image of the target, the gray-depth image of the head of the target is obtained by detecting the human head, and the human face reconstruction processing is carried out according to the gray-depth image of the head of the target, so that the pupil position of the target is obtained. By adopting the method, the human face of the target is reconstructed based on the information of two dimensions of the gray image and the depth image, and an accurate sight line starting point can be obtained in real time.
As an optional implementation manner, the reconstruction processing module 703 is configured to:
performing feature extraction on the gray level-depth image of the head of the target to obtain a gray level feature and a depth feature of the target;
fusing the gray level feature and the depth feature of the target to obtain a human face model parameter of the target;
and obtaining the face information of the target according to the face model parameters of the target.
And obtaining the face model parameters of the target by fusing the gray level feature and the depth feature of the target, and further obtaining the face information of the target. The human face model parameters of the target are fused with the gray scale features and the depth features, and compared with the method that only the gray scale features are contained in the prior art, the human face model parameters of the target are more comprehensive, and the eyeball tracking precision can be effectively improved.
As an alternative implementation, the face reconstruction processing on the gray-scale depth image of the head of the target is processed through a face reconstruction network model.
As an optional implementation manner, the face reconstruction network model is obtained by training as follows:
respectively extracting the characteristics of a user gray level image sample and a user depth image sample which are input into a face reconstruction network model to obtain the gray level characteristics and the depth characteristics of the user;
fusing the gray level features and the depth features of the user to obtain face model parameters of the user, wherein the face model parameters comprise identity parameters, expression parameters, texture parameters, rotation parameters and displacement parameters;
obtaining face information according to the face model parameters of the user;
and obtaining a loss value according to the face information, if the loss value does not reach a stop condition, adjusting parameters of the face reconstruction network model, and repeatedly executing the steps until the stop condition is reached to obtain the trained face reconstruction network model, wherein the weight of the user eyes in a first loss function corresponding to the loss value is not less than a preset threshold value.
As another optional implementation, the apparatus is further configured to: acquiring a first point cloud sample of the user, a point cloud sample of a shelter and a texture sample; overlaying the point cloud sample of the shelter on the first point cloud sample of the user to obtain a second point cloud sample of the user; blanking the second point cloud sample of the user to obtain a third point cloud sample of the user; rendering the third point cloud sample and the texture sample of the shelter to obtain a two-dimensional image sample of the user; and respectively performing enhancement processing of adding noise on the two-dimensional image sample of the user and the third point cloud sample to obtain an enhanced two-dimensional image sample and an enhanced depth image sample of the user, wherein the enhanced two-dimensional image sample and the enhanced depth image sample of the user are respectively a user gray level image sample and a user depth image sample of the input face reconstruction network model.
It should be noted that the preprocessing module 701, the detecting module 702, the reconstructing processing module 703 and the obtaining module 704 are configured to execute relevant steps of the foregoing method. For example, the preprocessing module 701 is configured to execute the relevant content of step 101 and/or step 501, the detection module 702 is configured to execute the relevant content of step 102 and/or step 502, the reconstruction processing module 703 is configured to execute the relevant content of step 103 and/or step 503, and the acquisition module 704 is configured to execute the relevant content of step 104 and/or step 504.
According to the method and the device, the point cloud sample of the user, the point cloud sample of the shielding object and the texture sample are obtained, and the situation that the shielding object exists is simulated, so that the face reconstruction network model capable of adapting to the shielding object is obtained through training. By adopting the scheme, the data of the eye region is enhanced, so that the reconstruction precision of the eye region is higher; and conditions which can occur in various real scenes can be simulated, and the corresponding enhanced two-dimensional image and the corresponding enhanced three-dimensional point cloud image are obtained, so that the robustness of the algorithm is improved.
In this embodiment, the eyeball tracking device is represented in a module form. A "module" herein may refer to an application-specific integrated circuit (ASIC), a processor and memory that execute one or more software or firmware programs, an integrated logic circuit, and/or other devices that may provide the described functionality. Further, the above preprocessing module 701, the detection module 702, the reconstruction processing module 703 and the acquisition module 704 may be implemented by the processor 801 of the eye tracking apparatus shown in fig. 8.
Fig. 8 is a schematic structural diagram of another eyeball tracking device according to an embodiment of the present application. As shown in fig. 8, the eye tracking apparatus 800 comprises at least one processor 801, at least one memory 802 and at least one communication interface 803. The processor 801, the memory 802 and the communication interface 803 are connected through the communication bus and perform communication with each other.
The processor 801 may be a general purpose Central Processing Unit (CPU), a microprocessor, an application-specific integrated circuit (ASIC), or one or more integrated circuits for controlling the execution of programs according to the above schemes.
Communication interface 803 is used for communicating with other devices or communication Networks, such as ethernet, Radio Access Network (RAN), Wireless Local Area Networks (WLAN), etc.
The Memory 802 may be a Read-Only Memory (ROM) or other type of static storage device that can store static information and instructions, a Random Access Memory (RAM) or other type of dynamic storage device that can store information and instructions, an Electrically Erasable Programmable Read-Only Memory (EEPROM), a Compact Disc Read-Only Memory (CD-ROM) or other optical Disc storage, optical Disc storage (including Compact Disc, laser Disc, optical Disc, digital versatile Disc, blu-ray Disc, etc.), magnetic disk storage media or other magnetic storage devices, or any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer, but is not limited to these. The memory may be self-contained and coupled to the processor via a bus. The memory may also be integral to the processor.
The memory 802 is used for storing application program codes for executing the above schemes, and is controlled by the processor 801 to execute. The processor 801 is used to execute application program code stored in the memory 802.
The memory 802 stores code that may perform one of the eye tracking methods provided above.
It should be noted that although the eye tracking apparatus 800 shown in fig. 8 only shows a memory, a processor and a communication interface, in the specific implementation process, those skilled in the art will understand that the eye tracking apparatus 800 also includes other devices necessary for normal operation. Also, as may be appreciated by those skilled in the art, the eye tracking apparatus 800 may also include hardware components for performing other additional functions, according to particular needs. Furthermore, those skilled in the art will appreciate that the eye tracking apparatus 800 may also include only those components necessary to implement the embodiments of the present application, and need not include all of the components shown in FIG. 8.
The embodiment of the application also provides a chip system, and the chip system is applied to the electronic equipment; the chip system comprises one or more interface circuits, and one or more processors; the interface circuit and the processor are interconnected through a line; the interface circuit is to receive a signal from a memory of the electronic device and to send the signal to the processor, the signal comprising computer instructions stored in the memory; the electronic device performs the method when the processor executes the computer instructions.
Embodiments of the present application also provide a computer-readable storage medium having stored therein instructions, which when executed on a computer or processor, cause the computer or processor to perform one or more steps of any one of the methods described above.
The embodiment of the application also provides a computer program product containing instructions. The computer program product, when run on a computer or processor, causes the computer or processor to perform one or more steps of any of the methods described above.
It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
It should be understood that in the description of the present application, unless otherwise indicated, "/" indicates a relationship where the objects associated before and after are an "or", e.g., a/B may indicate a or B; wherein A and B can be singular or plural. Also, in the description of the present application, "a plurality" means two or more than two unless otherwise specified. "at least one of the following" or similar expressions refer to any combination of these items, including any combination of the singular or plural items. For example, at least one (one) of a, b, or c, may represent: a, b, c, a-b, a-c, b-c, or a-b-c, wherein a, b, c may be single or multiple. In addition, in order to facilitate clear description of technical solutions of the embodiments of the present application, in the embodiments of the present application, terms such as "first" and "second" are used to distinguish the same items or similar items having substantially the same functions and actions. Those skilled in the art will appreciate that the terms "first," "second," etc. do not denote any order or quantity, nor do the terms "first," "second," etc. denote any order or importance. Also, in the embodiments of the present application, words such as "exemplary" or "for example" are used to mean serving as examples, illustrations or illustrations. Any embodiment or design described herein as "exemplary" or "e.g.," is not necessarily to be construed as preferred or advantageous over other embodiments or designs. Rather, use of the word "exemplary" or "such as" is intended to present relevant concepts in a concrete fashion for ease of understanding.
In the several embodiments provided in the present application, it should be understood that the disclosed system, apparatus and method may be implemented in other ways. For example, the division of the unit is only one logical function division, and other division may be implemented in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. The shown or discussed mutual coupling, direct coupling or communication connection may be an indirect coupling or communication connection of devices or units through some interfaces, and may be in an electrical, mechanical or other form.
Units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In the above embodiments, the implementation may be wholly or partially realized by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. The procedures or functions according to the embodiments of the present application are wholly or partially generated when the computer program instructions are loaded and executed on a computer. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored on or transmitted over a computer-readable storage medium. The computer instructions may be transmitted from one website, computer, server, or data center to another website, computer, server, or data center by wire (e.g., coaxial cable, fiber optic, Digital Subscriber Line (DSL)), or wirelessly (e.g., infrared, wireless, microwave, etc.). The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device, such as a server, a data center, etc., that includes one or more of the available media. The usable medium may be a read-only memory (ROM), or a Random Access Memory (RAM), or a magnetic medium, such as a floppy disk, a hard disk, a magnetic tape, a magnetic disk, or an optical medium, such as a Digital Versatile Disk (DVD), or a semiconductor medium, such as a Solid State Disk (SSD).
The above description is only a specific implementation of the embodiments of the present application, but the scope of the embodiments of the present application is not limited thereto, and any changes or substitutions within the technical scope disclosed in the embodiments of the present application should be covered by the scope of the embodiments of the present application. Therefore, the protection scope of the embodiments of the present application shall be subject to the protection scope of the claims.

Claims (14)

1. An eye tracking method, comprising:
preprocessing a gray image and a depth image to obtain a gray-depth image of a target under a preset coordinate system, wherein the gray image and the depth image both contain head information of the target;
performing human head detection on the gray-depth image of the target to obtain a gray-depth image of the head of the target;
carrying out face reconstruction processing on the gray level-depth image of the head of the target to obtain face information of the target;
and obtaining the pupil position of the target according to the face information.
2. The method according to claim 1, wherein the performing a face reconstruction process on the gray-depth image of the head of the target to obtain the face information of the target comprises:
performing feature extraction on the gray level-depth image of the head of the target to obtain a gray level feature and a depth feature of the target;
fusing the gray level feature and the depth feature of the target to obtain a human face model parameter of the target;
and obtaining the face information of the target according to the face model parameters of the target.
3. The method of claim 2, wherein the face reconstruction processing of the gray-depth image of the head of the object is processed through a face reconstruction network model.
4. The method of claim 3, wherein the face reconstruction network model is trained by:
respectively extracting the characteristics of a user gray level image sample and a user depth image sample which are input into a face reconstruction network model to obtain the gray level characteristics and the depth characteristics of the user;
fusing the gray level features and the depth features of the user to obtain face model parameters of the user, wherein the face model parameters comprise identity parameters, expression parameters, texture parameters, rotation parameters and displacement parameters;
obtaining face information according to the face model parameters of the user;
and obtaining a loss value according to the face information, if the loss value does not reach a stop condition, adjusting parameters of the face reconstruction network model, and repeatedly executing the steps until the stop condition is reached to obtain the trained face reconstruction network model, wherein the weight of the user eyes in a first loss function corresponding to the loss value is not less than a preset threshold value.
5. The method of claim 4, further comprising:
acquiring a first point cloud sample of the user, a point cloud sample of a shelter and a texture sample;
overlaying the point cloud sample of the shelter on the first point cloud sample of the user to obtain a second point cloud sample of the user;
blanking the second point cloud sample of the user to obtain a third point cloud sample of the user;
rendering the third point cloud sample and the texture sample of the shelter to obtain a two-dimensional image sample of the user;
and respectively performing enhancement processing of adding noise on the two-dimensional image sample of the user and the third point cloud sample to obtain an enhanced two-dimensional image sample and an enhanced depth image sample of the user, wherein the enhanced two-dimensional image sample and the enhanced depth image sample of the user are respectively a user gray level image sample and a user depth image sample of the input face reconstruction network model.
6. An eye tracking device, comprising:
the system comprises a preprocessing module, a data processing module and a data processing module, wherein the preprocessing module is used for preprocessing a gray level image and a depth image to obtain a gray level-depth image of a target under a preset coordinate system, and the gray level image and the depth image both comprise head information of the target;
the detection module is used for carrying out human head detection on the gray level-depth image of the target so as to obtain a gray level-depth image of the head of the target;
the reconstruction processing module is used for carrying out face reconstruction processing on the gray level-depth image of the head of the target so as to obtain face information of the target;
and the acquisition module is used for obtaining the pupil position of the target according to the face information.
7. The apparatus of claim 6, wherein the reconstruction processing module is configured to:
performing feature extraction on the gray level-depth image of the head of the target to obtain a gray level feature and a depth feature of the target;
fusing the gray level feature and the depth feature of the target to obtain a human face model parameter of the target;
and obtaining the face information of the target according to the face model parameters of the target.
8. The apparatus of claim 7, wherein the face reconstruction process on the gray-depth image of the head of the object is processed by a face reconstruction network model.
9. The apparatus of claim 8, wherein the face reconstruction network model is trained by:
respectively extracting the characteristics of a user gray level image sample and a user depth image sample which are input into a face reconstruction network model to obtain the gray level characteristics and the depth characteristics of the user;
fusing the gray level features and the depth features of the user to obtain face model parameters of the user, wherein the face model parameters comprise identity parameters, expression parameters, texture parameters, rotation parameters and displacement parameters;
obtaining face information according to the face model parameters of the user;
and obtaining a loss value according to the face information, if the loss value does not reach a stop condition, adjusting parameters of the face reconstruction network model, and repeatedly executing the steps until the stop condition is reached to obtain the trained face reconstruction network model, wherein the weight of the user eyes in a first loss function corresponding to the loss value is not less than a preset threshold value.
10. The apparatus of claim 9, wherein the apparatus is further configured to:
acquiring a first point cloud sample of the user, a point cloud sample of a shelter and a texture sample;
overlaying the point cloud sample of the shelter on the first point cloud sample of the user to obtain a second point cloud sample of the user;
blanking the second point cloud sample of the user to obtain a third point cloud sample of the user;
rendering the third point cloud sample and the texture sample of the shelter to obtain a two-dimensional image sample of the user;
and respectively performing enhancement processing of adding noise on the two-dimensional image sample of the user and the third point cloud sample to obtain an enhanced two-dimensional image sample and an enhanced depth image sample of the user, wherein the enhanced two-dimensional image sample and the enhanced depth image sample of the user are respectively a user gray level image sample and a user depth image sample of the input face reconstruction network model.
11. An eye tracking device comprising a processor and a memory; wherein the memory is configured to store program code and the processor is configured to invoke the program code to perform the method of any of claims 1 to 5.
12. A computer-readable storage medium, characterized in that it stores a computer program which is executed by a processor to implement the method of any one of claims 1 to 5.
13. A computer program product, characterized in that, when the computer program product is run on a computer, it causes the computer to perform the method according to any of claims 1 to 5.
14. A server, comprising a processor, a memory, and a bus, wherein:
the processor and the memory are connected through the bus;
the memory is used for storing a computer program;
the processor is configured to control the memory and execute the program stored in the memory to implement the method of any one of claims 1 to 5.
CN202180001560.7A 2021-04-26 2021-04-26 Eyeball tracking method, device and storage medium Active CN113366491B (en)

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2021/090064 WO2022226747A1 (en) 2021-04-26 2021-04-26 Eyeball tracking method and apparatus and storage medium

Publications (2)

Publication Number Publication Date
CN113366491A true CN113366491A (en) 2021-09-07
CN113366491B CN113366491B (en) 2022-07-22

Family

ID=77523064

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202180001560.7A Active CN113366491B (en) 2021-04-26 2021-04-26 Eyeball tracking method, device and storage medium

Country Status (2)

Country Link
CN (1) CN113366491B (en)
WO (1) WO2022226747A1 (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113837153A (en) * 2021-11-25 2021-12-24 之江实验室 Real-time emotion recognition method and system integrating pupil data and facial expressions
CN114155557A (en) * 2021-12-07 2022-03-08 美的集团(上海)有限公司 Positioning method, positioning device, robot and computer-readable storage medium
CN114274514A (en) * 2021-12-22 2022-04-05 深圳市创必得科技有限公司 Model printing annular texture full blanking method, device, equipment and storage medium
CN114782864A (en) * 2022-04-08 2022-07-22 马上消费金融股份有限公司 Information processing method and device, computer equipment and storage medium
CN115953813A (en) * 2022-12-19 2023-04-11 北京字跳网络技术有限公司 Expression driving method, device, equipment and storage medium
CN116822260A (en) * 2023-08-31 2023-09-29 天河超级计算淮海分中心 Eyeball simulation method based on numerical conversion, electronic equipment and storage medium

Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103440476A (en) * 2013-08-26 2013-12-11 大连理工大学 Locating method for pupil in face video
CN103810472A (en) * 2013-11-29 2014-05-21 南京大学 Method for pupil position filtering based on movement correlation
CN103810491A (en) * 2014-02-19 2014-05-21 北京工业大学 Head posture estimation interest point detection method fusing depth and gray scale image characteristic points
CN104143086A (en) * 2014-07-18 2014-11-12 吴建忠 Application technology of portrait comparison to mobile terminal operating system
CN104778441A (en) * 2015-01-07 2015-07-15 深圳市唯特视科技有限公司 Multi-mode face identification device and method fusing grey information and depth information
CN106469465A (en) * 2016-08-31 2017-03-01 深圳市唯特视科技有限公司 A kind of three-dimensional facial reconstruction method based on gray scale and depth information
CN109643366A (en) * 2016-07-21 2019-04-16 戈斯蒂冈有限责任公司 For monitoring the method and system of the situation of vehicle driver
CN110363133A (en) * 2019-07-10 2019-10-22 广州市百果园信息技术有限公司 A kind of method, apparatus, equipment and the storage medium of line-of-sight detection and video processing
CN110619303A (en) * 2019-09-16 2019-12-27 Oppo广东移动通信有限公司 Method, device and terminal for tracking point of regard and computer readable storage medium
CN111222468A (en) * 2020-01-08 2020-06-02 浙江光珀智能科技有限公司 People stream detection method and system based on deep learning
CN112560584A (en) * 2020-11-27 2021-03-26 北京芯翌智能信息技术有限公司 Face detection method and device, storage medium and terminal

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR100682889B1 (en) * 2003-08-29 2007-02-15 삼성전자주식회사 Method and Apparatus for image-based photorealistic 3D face modeling
CN108549886A (en) * 2018-06-29 2018-09-18 汉王科技股份有限公司 A kind of human face in-vivo detection method and device

Patent Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103440476A (en) * 2013-08-26 2013-12-11 大连理工大学 Locating method for pupil in face video
CN103810472A (en) * 2013-11-29 2014-05-21 南京大学 Method for pupil position filtering based on movement correlation
CN103810491A (en) * 2014-02-19 2014-05-21 北京工业大学 Head posture estimation interest point detection method fusing depth and gray scale image characteristic points
CN104143086A (en) * 2014-07-18 2014-11-12 吴建忠 Application technology of portrait comparison to mobile terminal operating system
CN104778441A (en) * 2015-01-07 2015-07-15 深圳市唯特视科技有限公司 Multi-mode face identification device and method fusing grey information and depth information
CN109643366A (en) * 2016-07-21 2019-04-16 戈斯蒂冈有限责任公司 For monitoring the method and system of the situation of vehicle driver
CN106469465A (en) * 2016-08-31 2017-03-01 深圳市唯特视科技有限公司 A kind of three-dimensional facial reconstruction method based on gray scale and depth information
CN110363133A (en) * 2019-07-10 2019-10-22 广州市百果园信息技术有限公司 A kind of method, apparatus, equipment and the storage medium of line-of-sight detection and video processing
CN110619303A (en) * 2019-09-16 2019-12-27 Oppo广东移动通信有限公司 Method, device and terminal for tracking point of regard and computer readable storage medium
CN111222468A (en) * 2020-01-08 2020-06-02 浙江光珀智能科技有限公司 People stream detection method and system based on deep learning
CN112560584A (en) * 2020-11-27 2021-03-26 北京芯翌智能信息技术有限公司 Face detection method and device, storage medium and terminal

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113837153A (en) * 2021-11-25 2021-12-24 之江实验室 Real-time emotion recognition method and system integrating pupil data and facial expressions
CN114155557A (en) * 2021-12-07 2022-03-08 美的集团(上海)有限公司 Positioning method, positioning device, robot and computer-readable storage medium
CN114155557B (en) * 2021-12-07 2022-12-23 美的集团(上海)有限公司 Positioning method, positioning device, robot and computer-readable storage medium
CN114274514A (en) * 2021-12-22 2022-04-05 深圳市创必得科技有限公司 Model printing annular texture full blanking method, device, equipment and storage medium
CN114782864A (en) * 2022-04-08 2022-07-22 马上消费金融股份有限公司 Information processing method and device, computer equipment and storage medium
CN114782864B (en) * 2022-04-08 2023-07-21 马上消费金融股份有限公司 Information processing method, device, computer equipment and storage medium
CN115953813A (en) * 2022-12-19 2023-04-11 北京字跳网络技术有限公司 Expression driving method, device, equipment and storage medium
CN115953813B (en) * 2022-12-19 2024-01-30 北京字跳网络技术有限公司 Expression driving method, device, equipment and storage medium
CN116822260A (en) * 2023-08-31 2023-09-29 天河超级计算淮海分中心 Eyeball simulation method based on numerical conversion, electronic equipment and storage medium
CN116822260B (en) * 2023-08-31 2023-11-17 天河超级计算淮海分中心 Eyeball simulation method based on numerical conversion, electronic equipment and storage medium

Also Published As

Publication number Publication date
CN113366491B (en) 2022-07-22
WO2022226747A1 (en) 2022-11-03

Similar Documents

Publication Publication Date Title
CN113366491B (en) Eyeball tracking method, device and storage medium
CN110874864B (en) Method, device, electronic equipment and system for obtaining three-dimensional model of object
CN110889890B (en) Image processing method and device, processor, electronic equipment and storage medium
CN109003325B (en) Three-dimensional reconstruction method, medium, device and computing equipment
US10977818B2 (en) Machine learning based model localization system
CN107111753B (en) Gaze detection offset for gaze tracking models
CN104380338B (en) Information processor and information processing method
CN107004275B (en) Method and system for determining spatial coordinates of a 3D reconstruction of at least a part of a physical object
KR101608253B1 (en) Image-based multi-view 3d face generation
CN109660783B (en) Virtual reality parallax correction
EP4383193A1 (en) Line-of-sight direction tracking method and apparatus
Shen et al. Virtual mirror rendering with stationary rgb-d cameras and stored 3-d background
WO2019140945A1 (en) Mixed reality method applied to flight simulator
US11170521B1 (en) Position estimation based on eye gaze
CN113610889B (en) Human body three-dimensional model acquisition method and device, intelligent terminal and storage medium
CN110913751A (en) Wearable eye tracking system with slip detection and correction functions
EP4307233A1 (en) Data processing method and apparatus, and electronic device and computer-readable storage medium
JP2016522485A (en) Hidden reality effect and intermediary reality effect from reconstruction
JP2014106543A (en) Image processor, image processing method and program
CN113012293A (en) Stone carving model construction method, device, equipment and storage medium
CN110648274B (en) Method and device for generating fisheye image
JP7459051B2 (en) Method and apparatus for angle detection
US20210082176A1 (en) Passthrough visualization
CN115496864B (en) Model construction method, model reconstruction device, electronic equipment and storage medium
CN108734772A (en) High accuracy depth image acquisition methods based on Kinect fusion

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant