CN111796681A - Self-adaptive sight estimation method and medium based on differential convolution in man-machine interaction - Google Patents

Self-adaptive sight estimation method and medium based on differential convolution in man-machine interaction Download PDF

Info

Publication number
CN111796681A
CN111796681A CN202010647088.1A CN202010647088A CN111796681A CN 111796681 A CN111796681 A CN 111796681A CN 202010647088 A CN202010647088 A CN 202010647088A CN 111796681 A CN111796681 A CN 111796681A
Authority
CN
China
Prior art keywords
convolution
sight line
differential
gaze
human
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010647088.1A
Other languages
Chinese (zh)
Inventor
罗元
陈旭
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chongqing University of Post and Telecommunications
Original Assignee
Chongqing University of Post and Telecommunications
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chongqing University of Post and Telecommunications filed Critical Chongqing University of Post and Telecommunications
Priority to CN202010647088.1A priority Critical patent/CN111796681A/en
Publication of CN111796681A publication Critical patent/CN111796681A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/011Arrangements for interaction with the human body, e.g. for user immersion in virtual reality
    • G06F3/013Eye tracking input arrangements
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/18Eye characteristics, e.g. of the iris
    • G06V40/19Sensors therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/18Eye characteristics, e.g. of the iris
    • G06V40/193Preprocessing; Feature extraction
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/18Eye characteristics, e.g. of the iris
    • G06V40/197Matching; Classification

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Human Computer Interaction (AREA)
  • Evolutionary Computation (AREA)
  • Computing Systems (AREA)
  • Ophthalmology & Optometry (AREA)
  • Computational Linguistics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Biophysics (AREA)
  • Multimedia (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Image Analysis (AREA)
  • Eye Examination Apparatus (AREA)

Abstract

The invention requests to protect a self-adaptive sight line estimation method and a medium based on differential convolution in human-computer interaction, wherein the method comprises the following steps: s1, preprocessing the face image, detecting the face and positioning the human eye area by using the MTCNN algorithm, and extracting human eye characteristic information; s2, directly estimating the head pose by using the face image; s3, automatically fusing the head posture and the human eye characteristic diagram by utilizing the full connection layer of the convolutional neural network, and performing primary sight estimation; s4, predicting the gaze difference of the eyes by training through a differential convolution network; and S5, calibrating the preliminary realization estimation result by using the gaze difference, and outputting a final sight line estimation result. The results of verification on the public data set eyescap and comparison with the sight line estimation model with good performance in recent years show that the sight line estimation model provided by the method can estimate the sight line direction more accurately in the state of free head movement.

Description

Self-adaptive sight estimation method and medium based on differential convolution in man-machine interaction
Technical Field
The invention belongs to the field of image processing and pattern recognition, and particularly relates to an adaptive sight estimation method based on poor carriage convolution.
Background
With the rapid development in the fields of computer vision, artificial intelligence, and the like, the research on the sight line estimation technology has attracted extensive attention. The sight line is a very important non-linguistic clue for analyzing human behavior and psychological state, is one of the expressions of human attention and interest, and the sight line information is helpful for deducing the internal state or intention of the human, so that the interaction between individuals can be better understood. Therefore, line-of-sight estimation plays an important role in many research areas, such as: human-computer interaction, virtual reality, social interaction analysis, medical treatment, and the like.
The sight line estimation in a broad sense refers to research related to eyeballs, eye movements, sight lines and the like. Generally speaking, the sight line estimation method can be divided into two broad categories, model-based method and appearance-based method. The basic idea of the model-based method is to estimate the sight direction based on the characteristics of corneal reflection and the like of the eyes and by combining the prior knowledge of the 3D eyeball. The appearance-based method directly extracts visual features of eyes, trains a regression model, and learns a model that maps appearances to sight directions, thereby performing sight estimation. Through comparative analysis, the accuracy obtained by the model-based method is higher, but the requirements on the quality and the resolution of the picture are also higher, in order to achieve the purpose, special hardware is generally required to be used, and the mobility of the head pose and the like of a user is greatly limited; while appearance-based methods perform better for low-resolution and high-noise images, training of the model requires a large amount of data and is prone to the phenomenon of over-fitting. With the rise of deep learning and the disclosure of large data sets, appearance-based approaches are receiving increasing attention.
At present, although the research on the sight line estimation technology is greatly improved, the accuracy obtained by a universal model is limited due to the difference of the eye shape and the intraocular structure among individuals, and meanwhile, the moving amplitude of the head of a user has a great influence on the experimental result, and the recognition accuracy is reduced.
Disclosure of Invention
The present invention is directed to solving the above problems of the prior art. A self-adaptive sight line estimation method and medium based on differential convolution in man-machine interaction are provided. The technical scheme of the invention is as follows:
a self-adaptive sight line estimation method based on differential convolution in human-computer interaction comprises the following steps:
s1, preprocessing the face image by utilizing a bilinear difference method to carry out multi-scale scaling, detecting the face by utilizing an optimized multi-task cascade convolution neural network algorithm, realizing pupil center positioning, and extracting human eye characteristic information;
s2, directly estimating the head pose by using the face image;
s3, automatically fusing the head pose in the step S1 and the human eye feature map in the step S2 by utilizing the full connection layer of the convolutional neural network to carry out preliminary sight estimation;
s4, predicting the gaze difference of the eyes by training by utilizing a differential convolution network;
and S5, calibrating the preliminary realization estimation result by using the obtained gaze difference, and outputting a final sight line estimation result.
In the step S1, 5 human face feature points are output by using the optimized multi-task cascade convolution neural network algorithm, so that pupil center positioning is completed while human face detection is performed.
The output of the multitasking cascade convolution neural network algorithm includes the pupil center position.
Further, the step S2 of directly performing head pose estimation by using the face image specifically includes: positioning the head position and orientation using a real-time head pose estimation system of a random regression forest, using Tt=[Tx,Ty,Tz]Indicating the position information of the head at time t, Rt=[Ry,Rp,Rr]Representing the rotation angle information of the head at the time t, the head deflection parameter at the time t can be recorded as ht=(Tt,Rt)。
Further, the step S3 automatically fuses the head pose and the eye feature map by using the full connection layer of the convolutional neural network to perform the preliminary gaze estimation, which specifically includes:
using a convolutional neural network based approach, we map 3@48 x 72 eyesTaking an image I as input, wherein 3 represents the number of channels of an eye image, 48 multiplied by 72 represents the size of the eye image, preprocessing the image, applying the preprocessed image to a convolutional layer, inputting an obtained characteristic map into a full-link layer, and finally obtaining a primary sight direction g in the full-link layer by training a linear regressionp(I) The loss function is:
Figure BDA0002573535300000021
wherein, ggt(I) For the true gaze direction, D is the training data set and | is the cardinality computation graph.
Further, the step S4 predicts the gaze difference amount of the eye through training by using a differential convolution network, and specifically includes:
the differential convolution is to analyze the mode direction of a certain sample and an adjacent sample, and the differential calculation reflects the change of continuous samples by calculating the difference between sample activations;
the differential convolution network adopts a parallel structure, each branch of the parallel structure consists of three convolution layers, each convolution layer is subjected to batch processing normalization and a ReLU unit, and maximum pooling is applied after a first layer and a second layer so as to reduce the size of an image; and after the third layer, performing normalization processing on the characteristic images of the two input images and splicing the characteristic images into a new tensor, and then applying two full-connection layers on the tensor to predict the gaze difference of the two input images.
Further, the differential convolutional network selects a ReLU function as an activation function of the convolutional layer and the fully-connected layer, and the formula is as follows:
f(x)=max(0,x) (10)
where x is the input, f (x) is the output after the ReLU unit;
training a gaze estimation model using a loss function, using dp(I, J) represents the predicted gaze difference of the difference network, then the loss function LdComprises the following steps:
Figure BDA0002573535300000031
wherein I is a test image, F is a reference image, DkTo a subset of the training set D, only images of one eye of the kth individual are included.
Further, the step S5 is to calibrate the preliminary estimation result by using the obtained gaze difference, and output a final gaze estimation result, specifically: predicting the difference d between a test image I and a reference image F by means of a differential convolution networkp(I, J) in combination with the true gaze value ggt(F) To predict the final viewing direction ggt(F)+dp(I, J), formula:
Figure BDA0002573535300000041
wherein D iscFor the calibration set of reference images, w (-) weights the importance of each prediction.
A storage medium, the storage medium being a computer readable storage medium storing one or more programs that, when executed by an electronic device comprising a plurality of application programs, cause the electronic device to perform the method of any of the above.
The invention has the following advantages and beneficial effects:
currently, most apparent-based gaze estimation methods return the gaze direction directly from a single face or eye image. However, due to differences in eye shape and intraocular structure between individuals, the universal model has limited acquisition accuracy, and its output often exhibits high variability and subject-related bias. Meanwhile, when the head deflection angle is too large, the sight line estimation result is also greatly affected. Therefore, the present disclosure provides an adaptive line-of-sight estimation method based on differential convolution to solve the above problems. And (3) introducing differential convolution, directly training a differential convolution neural network to predict the gaze difference between two eye input images of the same subject, and then calibrating the initial implementation estimation result by using the gaze difference. In addition, head pose information is fused in the model to improve the robustness of the gaze estimation system.
Tests on the public data set Eyediap show that when head posture information is blended and a differential network is used for calibration, the sight line estimation error is minimum. The introduction of the visible differential convolution can effectively calibrate the sight line estimation result, reduce the error of the sight line estimation, and the fusion of the head posture information can enable the system to have better robustness to the change of the head posture. In order to more clearly compare the visual line estimation effects of different models, the algorithm model provided by the invention is compared with other visual line estimation methods based on the convolutional neural network, the error of the model provided by the invention on the visual line estimation is smaller, and the excellent performance is obtained.
Drawings
FIG. 1 is a diagram of a preferred embodiment of a line-of-sight estimation framework based on a differential convolutional network (DNet) according to the present invention;
fig. 2 is a diagram of a differential convolutional network structure.
Detailed Description
The technical solutions in the embodiments of the present invention will be described in detail and clearly with reference to the accompanying drawings. The described embodiments are only some of the embodiments of the present invention.
The technical scheme for solving the technical problems is as follows:
s1, preprocessing the face image by using a bilinear difference method to perform multi-scale scaling, detecting the face by using an optimized multi-task cascade convolution neural network algorithm (in the invention, the existing algorithm is adopted, so the invention is abbreviated), realizing pupil center positioning at the same time, and extracting human eye characteristic information;
and S2, directly estimating the head pose by using the human face image. A real-time head pose estimation system of a random regression forest is employed to locate the head position and orientation. By Tt=[Tx,Ty,Tz]Indicating the position information of the head at time t, Rt=[Ry,Rp,Rr]Representing the rotation angle information of the head at the time t, the head deflection parameter at the time t can be recorded as ht=(Tt,Rt)。
S3, automatically fusing the head pose and the human eye feature map by using the full connection layer of the convolutional neural network, performing preliminary sight line estimation, and taking an eye image I of 3@48 × 72 as input by adopting a method based on the convolutional neural network, wherein 3 represents the number of channels of the eye image, and 48 × 72 represents the size of the eye image. Preprocessing the image, applying the preprocessed image to a convolutional layer, inputting the obtained characteristic map into a full-link layer, and finally obtaining a primary sight line direction g in the full-link layer by training a linear regressionp(I) In that respect The loss function is:
Figure BDA0002573535300000051
wherein, ggt(I) For the true gaze direction, D is the training data set and | is the cardinality computation graph.
And S4, predicting the gaze difference of the eyes by training by using a differential convolution network. Differential convolution is the analysis of the pattern direction of a certain sample and the adjacent sample, and differential calculation reflects the change of continuous samples by calculating the difference between sample activations.
The differential convolutional network adopts a parallel structure, and each branch of the parallel structure is composed of three convolutional layers, and each convolutional layer is subjected to batch normalization and a ReLU unit. Maximum pooling is applied after the first and second layers to reduce the image size. And after the third layer, normalizing the characteristic images of the two input images and splicing the characteristic images into a new tensor. Two fully connected layers are then applied in the tensor to predict the gaze difference of the two input images.
The differential convolutional network selects a ReLU function as an activation function of the convolutional layer and the fully-connected layer, and the formula is as follows:
f(x)=max(0,x) (10)
where x is the input and f (x) is the output after the ReLU unit.
Training a gaze estimation model using a loss function, using dp(I, J) represents the predicted gaze difference of the difference network, then the loss function LdComprises the following steps:
Figure BDA0002573535300000061
wherein I is a test image, F is a reference image, DkTo a subset of the training set D, only images of one eye of the kth individual are included.
And S5, calibrating the preliminary realization estimation result by using the obtained gaze difference, and outputting the final sight line estimation result. Predicting the difference d between a test image I and a reference image F by means of a differential convolution networkp(I, J) in combination with the true gaze value ggt(F) To predict the final viewing direction ggt(F)+dp(I, J), formula:
Figure BDA0002573535300000062
wherein D iscFor the calibration set of reference images, w (-) weights the importance of each prediction.
The systems, devices, modules or units illustrated in the above embodiments may be implemented by a computer chip or an entity, or by a product with certain functions. One typical implementation device is a computer. In particular, the computer may be, for example, a personal computer, a laptop computer, a cellular telephone, a camera phone, a smartphone, a personal digital assistant, a media player, a navigation device, an email device, a game console, a tablet computer, a wearable device, or a combination of any of these devices.
Computer-readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), Digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device. As defined herein, a computer readable medium does not include a transitory computer readable medium such as a modulated data signal and a carrier wave.
It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
The above examples are to be construed as merely illustrative and not limitative of the remainder of the disclosure. After reading the description of the invention, the skilled person can make various changes or modifications to the invention, and these equivalent changes and modifications also fall into the scope of the invention defined by the claims.

Claims (8)

1. A self-adaptive sight line estimation method based on differential convolution in human-computer interaction is characterized by comprising the following steps:
s1, preprocessing the face image by utilizing a bilinear difference method to carry out multi-scale scaling, detecting the face by utilizing an optimized multi-task cascade convolution neural network algorithm, realizing pupil center positioning, and extracting human eye characteristic information;
s2, directly estimating the head pose by using the face image;
s3, automatically fusing the head pose in the step S1 and the human eye feature map in the step S2 by utilizing the full connection layer of the convolutional neural network to carry out preliminary sight estimation;
s4, predicting the gaze difference of the eyes by training by utilizing a differential convolution network;
and S5, calibrating the preliminary realization estimation result by using the obtained gaze difference, and outputting a final sight line estimation result.
2. The adaptive sight line estimation method based on differential convolution in human-computer interaction according to claim 1, wherein in the step S1, 5 human face feature points are output by using an optimized multitask cascade convolution neural network algorithm, so that pupil center positioning is completed while human face detection is performed.
The output of the multitasking cascade convolution neural network algorithm includes the pupil center position.
3. The adaptive sight line estimation method based on differential convolution in human-computer interaction according to claim 2, wherein the step S2 is to directly perform head pose estimation by using a face image, and specifically includes: positioning the head position and orientation using a real-time head pose estimation system of a random regression forest, using Tt=[Tx,Ty,Tz]Indicating the position information of the head at time t, Rt=[Ry,Rp,Rr]Representing the rotation angle information of the head at the time t, the head deflection parameter at the time t can be recorded as ht=(Tt,Rt)。
4. The adaptive sight line estimation method based on differential convolution in human-computer interaction according to claim 3, wherein the step S3 is to perform preliminary sight line estimation by automatically fusing a head pose and a human eye feature map by using a full connection layer of a convolutional neural network, and specifically includes:
adopting a method based on a convolutional neural network, taking an eye image I of 3@48 multiplied by 72 as input, wherein 3 represents the channel number of the eye image, 48 multiplied by 72 represents the size of the eye image, preprocessing the image, applying the preprocessed image to a convolutional layer, inputting an obtained feature map into a full-link layer, and finally, performing full-link processing on the full-link layerThe layer is connected to obtain a primary sight direction g by training a linear regressionp(I) The loss function is:
Figure FDA0002573535290000021
wherein, ggt(I) For the true gaze direction, D is the training data set and | is the cardinality computation graph.
5. The adaptive sight line estimation method based on differential convolution in human-computer interaction according to claim 4, wherein the step S4 predicts the gaze difference amount of the eye through training by using a differential convolution network, and specifically comprises:
the differential convolution is to analyze the mode direction of a certain sample and an adjacent sample, and the differential calculation reflects the change of continuous samples by calculating the difference between sample activations;
the differential convolution network adopts a parallel structure, each branch of the parallel structure consists of three convolution layers, each convolution layer is subjected to batch processing normalization and a ReLU unit, and maximum pooling is applied after a first layer and a second layer so as to reduce the size of an image; and after the third layer, performing normalization processing on the characteristic images of the two input images and splicing the characteristic images into a new tensor, and then applying two full-connection layers on the tensor to predict the gaze difference of the two input images.
6. The adaptive sight line estimation method based on differential convolution in human-computer interaction according to claim 5, characterized in that the differential convolution network selects a ReLU function as an activation function of a convolution layer and a full link layer, and the formula is as follows:
f(x)=max(0,x) (10)
where x is the input, f (x) is the output after the ReLU unit;
training a gaze estimation model using a loss function, using dp(I, J) represents the predicted gaze difference of the difference network, then the loss function LdComprises the following steps:
Figure FDA0002573535290000022
wherein I is a test image, F is a reference image, DkTo a subset of the training set D, only images of one eye of the kth individual are included.
7. The adaptive gaze estimation method based on differential convolution in human-computer interaction according to claim 6, wherein the S5 calibrates the preliminary implementation estimation result by using the obtained gaze difference, and outputs a final gaze estimation result, specifically: predicting the difference d between a test image I and a reference image F by means of a differential convolution networkp(I, J) in combination with the true gaze value ggt(F) To predict the final viewing direction ggt(F)+dp(I, J), formula:
Figure FDA0002573535290000031
wherein D iscFor the calibration set of reference images, w (-) weights the importance of each prediction.
8. A storage medium being a computer readable storage medium storing one or more programs which, when executed by an electronic device comprising a plurality of application programs, cause the electronic device to perform the method of any of claims 1-7 above.
CN202010647088.1A 2020-07-07 2020-07-07 Self-adaptive sight estimation method and medium based on differential convolution in man-machine interaction Pending CN111796681A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010647088.1A CN111796681A (en) 2020-07-07 2020-07-07 Self-adaptive sight estimation method and medium based on differential convolution in man-machine interaction

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010647088.1A CN111796681A (en) 2020-07-07 2020-07-07 Self-adaptive sight estimation method and medium based on differential convolution in man-machine interaction

Publications (1)

Publication Number Publication Date
CN111796681A true CN111796681A (en) 2020-10-20

Family

ID=72809704

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010647088.1A Pending CN111796681A (en) 2020-07-07 2020-07-07 Self-adaptive sight estimation method and medium based on differential convolution in man-machine interaction

Country Status (1)

Country Link
CN (1) CN111796681A (en)

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112418074A (en) * 2020-11-20 2021-02-26 重庆邮电大学 Coupled posture face recognition method based on self-attention
CN112597823A (en) * 2020-12-07 2021-04-02 深延科技(北京)有限公司 Attention recognition method and device, electronic equipment and storage medium
CN112711984A (en) * 2020-12-09 2021-04-27 北京航空航天大学 Fixation point positioning method and device and electronic equipment
CN113642393A (en) * 2021-07-07 2021-11-12 重庆邮电大学 Attention mechanism-based multi-feature fusion sight line estimation method
CN113705349A (en) * 2021-07-26 2021-11-26 电子科技大学 Attention power analysis method and system based on sight estimation neural network
CN113807330A (en) * 2021-11-19 2021-12-17 哈尔滨工业大学(深圳)(哈尔滨工业大学深圳科技创新研究院) Three-dimensional sight estimation method and device for resource-constrained scene
CN113838135A (en) * 2021-10-11 2021-12-24 重庆邮电大学 Pose estimation method, system and medium based on LSTM double-current convolution neural network
CN114898453A (en) * 2022-05-23 2022-08-12 重庆邮电大学 Cooperative network based sight line estimation method
CN116226712A (en) * 2023-03-03 2023-06-06 湖北商贸学院 Online learner concentration monitoring method, system and readable storage medium

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111046734A (en) * 2019-11-12 2020-04-21 重庆邮电大学 Multi-modal fusion sight line estimation method based on expansion convolution

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111046734A (en) * 2019-11-12 2020-04-21 重庆邮电大学 Multi-modal fusion sight line estimation method based on expansion convolution

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
GANG LIU 等: ""A Differential Approach for Gaze Estimation"", 《HTTPS://ARXIV.ORG/ABS/1904.09459V3》 *
GANG LIU 等: ""A Differential Approach for Gaze Estimation"", 《IEEE》 *
陈雪峰: ""行为特征融合的视觉注意力检测技术研究"", 《中国优秀硕士学位论文全文数据库(信息科技辑)》 *

Cited By (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112418074A (en) * 2020-11-20 2021-02-26 重庆邮电大学 Coupled posture face recognition method based on self-attention
CN112418074B (en) * 2020-11-20 2022-08-23 重庆邮电大学 Coupled posture face recognition method based on self-attention
CN112597823A (en) * 2020-12-07 2021-04-02 深延科技(北京)有限公司 Attention recognition method and device, electronic equipment and storage medium
CN112711984B (en) * 2020-12-09 2022-04-12 北京航空航天大学 Fixation point positioning method and device and electronic equipment
CN112711984A (en) * 2020-12-09 2021-04-27 北京航空航天大学 Fixation point positioning method and device and electronic equipment
CN113642393A (en) * 2021-07-07 2021-11-12 重庆邮电大学 Attention mechanism-based multi-feature fusion sight line estimation method
CN113642393B (en) * 2021-07-07 2024-03-22 重庆邮电大学 Attention mechanism-based multi-feature fusion sight estimation method
CN113705349A (en) * 2021-07-26 2021-11-26 电子科技大学 Attention power analysis method and system based on sight estimation neural network
CN113705349B (en) * 2021-07-26 2023-06-06 电子科技大学 Attention quantitative analysis method and system based on line-of-sight estimation neural network
CN113838135A (en) * 2021-10-11 2021-12-24 重庆邮电大学 Pose estimation method, system and medium based on LSTM double-current convolution neural network
CN113838135B (en) * 2021-10-11 2024-03-19 重庆邮电大学 Pose estimation method, system and medium based on LSTM double-flow convolutional neural network
CN113807330A (en) * 2021-11-19 2021-12-17 哈尔滨工业大学(深圳)(哈尔滨工业大学深圳科技创新研究院) Three-dimensional sight estimation method and device for resource-constrained scene
CN114898453A (en) * 2022-05-23 2022-08-12 重庆邮电大学 Cooperative network based sight line estimation method
CN116226712A (en) * 2023-03-03 2023-06-06 湖北商贸学院 Online learner concentration monitoring method, system and readable storage medium

Similar Documents

Publication Publication Date Title
CN111796681A (en) Self-adaptive sight estimation method and medium based on differential convolution in man-machine interaction
CN109117731B (en) Classroom teaching cognitive load measurement system
JP7273157B2 (en) Model training method, device, terminal and program
US10614289B2 (en) Facial tracking with classifiers
US20230377190A1 (en) Method and device for training models, method and device for detecting body postures, and storage medium
CN111325726A (en) Model training method, image processing method, device, equipment and storage medium
CN109359512A (en) Eyeball position method for tracing, device, terminal and computer readable storage medium
CN111046734B (en) Multi-modal fusion sight line estimation method based on expansion convolution
CN111598168B (en) Image classification method, device, computer equipment and medium
CN112330684B (en) Object segmentation method and device, computer equipment and storage medium
CN110555426A (en) Sight line detection method, device, equipment and storage medium
CN114120432A (en) Online learning attention tracking method based on sight estimation and application thereof
Aslan et al. Multimodal video-based apparent personality recognition using long short-term memory and convolutional neural networks
CN110222780A (en) Object detecting method, device, equipment and storage medium
CN111754546A (en) Target tracking method, system and storage medium based on multi-feature map fusion
Sumer et al. Attention flow: End-to-end joint attention estimation
CN113177559A (en) Image recognition method, system, device and medium combining breadth and dense convolutional neural network
Sun et al. Personality assessment based on multimodal attention network learning with category-based mean square error
Duraisamy et al. Classroom engagement evaluation using computer vision techniques
CN111339878B (en) Correction type real-time emotion recognition method and system based on eye movement data
Lee et al. Automatic facial recognition system assisted-facial asymmetry scale using facial landmarks
CN110889393A (en) Human body posture estimation method and device
CN115116117A (en) Learning input data acquisition method based on multi-mode fusion network
CN113805695A (en) Reading understanding level prediction method and device, electronic equipment and storage medium
Elahi et al. Webcam-based accurate eye-central localization

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20201020