CN114005167A - Remote sight estimation method and device based on human skeleton key points - Google Patents

Remote sight estimation method and device based on human skeleton key points Download PDF

Info

Publication number
CN114005167A
CN114005167A CN202111473575.1A CN202111473575A CN114005167A CN 114005167 A CN114005167 A CN 114005167A CN 202111473575 A CN202111473575 A CN 202111473575A CN 114005167 A CN114005167 A CN 114005167A
Authority
CN
China
Prior art keywords
human body
human
key points
orientation angle
face orientation
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111473575.1A
Other languages
Chinese (zh)
Inventor
赵思源
彭春蕾
胡瑞敏
刘德成
苗紫民
万爽
孙飞洋
郭荟青
罗肖怡
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xidian University
Original Assignee
Xidian University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xidian University filed Critical Xidian University
Priority to CN202111473575.1A priority Critical patent/CN114005167A/en
Publication of CN114005167A publication Critical patent/CN114005167A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computational Linguistics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Evolutionary Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a remote sight line estimation method and a device based on human skeleton key points, wherein the method comprises the following steps: separating pedestrians in the image to be detected, and cutting out a human body boundary frame image; inputting the human body boundary frame image into a pre-trained human body key point detection network model to obtain position coordinates of a plurality of human body key points in the human body boundary frame image, wherein the human body key points at least comprise a left eye, a right eye, a left ear, a right ear, a nose, a left shoulder and a right shoulder; obtaining an initial human face orientation angle according to the position coordinates of the plurality of human key points; and obtaining the sight line estimation falling point coordinates by using the initial human face orientation angle. The invention can satisfactorily identify and estimate the far-distance pedestrian sight line in a real scene and a game scene.

Description

Remote sight estimation method and device based on human skeleton key points
Technical Field
The invention belongs to the technical field of image processing, and particularly relates to a remote sight estimation method and device based on human skeleton key points.
Background
The human being can quickly and easily understand and interpret the orientation and movement of the head, thereby utilizing this important behavioral language to quickly infer the intentions of other people in the vicinity. With the continuous development of society, the scenes of multiple people are more and more, the data recorded by a machine are more and more, and in order to solve the safety problem caused by a large number of images or videos, the ability of rapidly deducing the sight line of a pedestrian in the images or videos becomes more and more important.
At present, the sight line detection method can be roughly divided into two methods according to the characteristics of sight line estimation. One gaze detection method is a face keypoint-based method that performs alignment by establishing correspondences between keypoints and 3D head models and restores the ability to 3D pose of the head, the accuracy of which depends on whether enough face keypoints and corresponding 3D head models can be detected. The other method is a sight line detection method based on head features, which extracts relevant head texture features for analysis to achieve the purpose of detecting sight lines, for example, extracting and detecting head features of each attitude angle left by a CNN (Convolutional Neural Network) model, and the accuracy of the method depends on the head features extracted by the Network. Head pose estimation has an inherent link to gaze estimation, namely the ability to characterize the direction of a person's eye gaze focus. As such, in cases where the human eye is not visible (e.g., low resolution, presence of occlusion), the head pose estimation provides only a rough characterization. In the context of computer vision, head pose estimation is most often interpreted as the ability to infer the orientation of a person's head relative to a camera. It is generally believed that the human head can be modeled as a rigid, solid-free object, under the assumption that the pose of the human head is limited to three degrees of freedom, which are typically pitch, roll, and yaw. Under the deep learning method, the head pose angle is predicted directly from the image features using a multi-loss network with one loss for each angle, i.e., three separate losses, each with two components: pose classification and regression. After training, the network can directly predict the head pose angle after inputting the image features, and further can be visualized as head pose estimation.
However, when environmental factors and video quality factors change, the accuracy rate based on the key points of the face and the head features is reduced, for example, when the face is in a blurred state at a long distance and at a low resolution, the first method cannot effectively detect the face, the number of the key points is seriously insufficient, the key points cannot be aligned with the 3D head model, and thus the sight line focus of the head cannot be detected, and on the other hand, the 3D head model is customized, and when the detected key points are seriously inconsistent with the 3D head model, the accuracy of head sight line estimation is also reduced. The facial features are seriously insufficient, the head pose estimation cannot be carried out on pedestrians in a long-distance low-resolution scene, and the features under all pose angles cannot be distinguished, because partial features of the head disappear under the long-distance low-resolution condition, and the result is that the overall accuracy is low.
Disclosure of Invention
In order to solve the problems in the prior art, the invention provides a remote sight line estimation method and device based on human skeleton key points. The technical problem to be solved by the invention is realized by the following technical scheme:
one aspect of the present invention provides a method for estimating a remote visual line based on key points of human bones, comprising:
s1: separating pedestrians in the image to be detected, and cutting out a human body boundary frame image;
s2: inputting the human body boundary frame image into a pre-trained human body key point detection network model to obtain position coordinates of a plurality of human body key points in the human body boundary frame image, wherein the human body key points at least comprise a left eye, a right eye, a left ear, a right ear, a nose, a left shoulder and a right shoulder;
s3: obtaining an initial human face orientation angle according to the position coordinates of the plurality of human key points;
s4: and obtaining the sight line estimation falling point coordinates by using the initial human face orientation angle.
In an embodiment of the present invention, the S1 includes:
separating all pedestrians in the image to be detected by using a human body detector, cutting out the human body boundary frame image of each pedestrian, and obtaining the upper left corner coordinate and the lower right corner coordinate of the human body boundary frame image.
In an embodiment of the present invention, the S3 includes:
s31: extracting the position coordinates of the key points of the human body, comprising the following steps: left eye (x)1,y1) Right eye (x)2,y2) Left ear (x)3,y3) Right ear (x)4,y4) Nose (x)5,y5) Left shoulder (x)6,y6) And right shoulder (x)7,y7);
S32: selecting a coordinate origin (x) in the human body boundary frame image according to the position coordinates of the human body key points0,y0);
S33: using the origin of coordinates (x)0,y0) Obtaining an initial human face orientation angle:
headangel1=arctan(y0/x0)。
in an embodiment of the present invention, the S32 includes:
s321: calculating the distance l between the left ear and the left eyedistDistance r from right ear to right eyedistAnd the midpoint coordinates ms of the left shoulder and the right shoulder;
s322: judgment of ldistAnd rdistAnd choosing the origin of coordinates (x)0,y0) If l isdist>rdistThen x0=msx-x5,y0=y3-y5If l isdist<rdistThen x0=msx-x5,y0=y4-y5Wherein ms isxThe abscissa indicates the midpoint of the left and right shoulders.
In an embodiment of the present invention, the S4 includes:
s41: equally dividing the human face turning range into a plurality of angle ranges;
s42: judging the initial human face orientation angle headangel1The falling angle range is selected, and the median of the current angle range is selected as the final human face orientation angle headangel2
S43: and obtaining the sight line estimation falling point coordinates of the human body according to the selected final face orientation angle of the human body.
In an embodiment of the present invention, the S43 includes:
according to the selected final human face orientation angle headangle2And the set visual middle sight length L, and obtaining the relative coordinate (x) of the sight estimation falling point with the nose as the origin8,y8) The calculation formula is as follows:
Figure BDA0003381612520000041
according to the relative coordinate (x)8,y8) And calculating the coordinates (x, y) of the sight line estimation landing point:
x=x5+x8,y=y5+y8
another aspect of the present invention provides a remote visual line estimation apparatus based on key points of human bones, comprising:
the human body detector is used for separating the pedestrians in the image to be detected and cutting out a human body boundary frame image;
the human body key point detection module is used for inputting the human body boundary frame image into a pre-trained human body key point detection network model to obtain position coordinates of a plurality of human body key points in the human body boundary frame image, wherein the human body key points at least comprise a left eye, a right eye, a left ear, a right ear, a nose part, a left shoulder part and a right shoulder part;
the face orientation angle acquisition module is used for acquiring an initial human face orientation angle according to the position coordinates of the plurality of human key points;
and the remote sight line estimation module is used for obtaining sight line estimation falling point coordinates by utilizing the initial human face orientation angle.
In an embodiment of the present invention, the human key point detection module includes a pre-trained human key point detection network model.
In an embodiment of the present invention, the face orientation angle obtaining module is specifically configured to:
extracting position coordinates of key points of the human body; selecting a coordinate origin in the human body boundary frame image according to the position coordinates of the human body key points; and obtaining an initial human face orientation angle by using the coordinate origin.
In an embodiment of the present invention, the remote gaze estimation module is specifically configured to:
equally dividing the human face turning range into a plurality of angle ranges; judging the angle range in which the initial human face orientation angle falls, and selecting the median of the current angle range as the final human face orientation angle; and obtaining the sight line estimation falling point coordinates of the human body according to the selected final face orientation angle of the human body.
Compared with the prior art, the invention has the beneficial effects that:
the remote sight line estimation device based on the human skeleton key points can analyze the remote pedestrians by adding the operation of the skeleton key points, for example, the action tracks of the pedestrians can be deduced through different head postures, the attention of the pedestrians to different areas can be judged due to the head orientations, and the remote sight line of the pedestrians can be satisfactorily recognized and estimated in a real scene and a game scene. Secondly, because the network model is detected by using the human body key points which are pre-trained, the method does not relate to the training of the network model in the execution process, has the advantages of low consumption of computing resources and suitability for various computers, and the test video can achieve the real-time effect.
The present invention will be described in further detail with reference to the accompanying drawings and examples.
Drawings
FIG. 1 is a flowchart of a method for estimating a distance vision based on key points of human bones according to an embodiment of the present invention;
fig. 2 is a schematic diagram illustrating a human body detector separating all pedestrians in an image to be detected according to an embodiment of the present invention;
FIG. 3 is a diagram of a cropped human body bounding box image according to an embodiment of the present invention;
FIG. 4 is a schematic diagram of key points of a human bone according to an embodiment of the present invention;
FIG. 5 is a schematic diagram of detecting key points of human bones in a boundary image of a human body according to an embodiment of the present invention;
FIG. 6 is a schematic diagram of a process for calculating an orientation angle of a human face according to an embodiment of the present invention;
FIG. 7 is a schematic diagram of a gaze focus calculation process provided by an embodiment of the invention;
FIG. 8 is a block diagram of a remote visual line estimation apparatus based on human skeleton key points according to an embodiment of the present invention;
fig. 9 is a violin diagram of a three-part video result provided by an embodiment of the present invention.
Detailed Description
In order to further explain the technical means and effects of the present invention adopted to achieve the predetermined invention purpose, the following detailed description is made on a remote visual line estimation method and device based on human skeleton key points according to the present invention with reference to the accompanying drawings and the detailed implementation.
The foregoing and other technical matters, features and effects of the present invention will be apparent from the following detailed description of the embodiments, which is to be read in connection with the accompanying drawings. The technical means and effects of the present invention adopted to achieve the predetermined purpose can be more deeply and specifically understood through the description of the specific embodiments, however, the attached drawings are provided for reference and description only and are not used for limiting the technical scheme of the present invention.
It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that an article or device that comprises a list of elements does not include only those elements but may include other elements not expressly listed. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of additional like elements in the article or device comprising the element.
Example one
Referring to fig. 1, fig. 1 is a flowchart illustrating a method for estimating a long-distance visual line based on key points of human bones according to an embodiment of the present invention, where as shown in the figure, the method for estimating a long-distance visual line according to the embodiment includes:
s1: and separating the pedestrians in the image to be detected, and cutting out the human body boundary frame image.
Specifically, all pedestrians in the image to be detected are separated by the human body detector, so that a human body boundary frame image of each pedestrian and upper left corner coordinates and lower right corner coordinates of the human body boundary frame image are obtained, and as shown in fig. 2, the upper left corner coordinates and the lower right corner coordinates can be used for limiting the uppermost boundary, the leftmost boundary, the lowermost boundary and the rightmost boundary of the human body boundary frame image. Then, the human body boundary frame image of each pedestrian is cut out from the human body boundary frame image by using the upper left corner coordinate and the lower right corner coordinate of the human body boundary frame image, as shown in fig. 3. The position coordinates in the present embodiment are pixel positions indicating the points.
S2: inputting the human body boundary frame image into a pre-trained human body key point detection network model to obtain position coordinates of a plurality of human body key points in the human body boundary frame image, wherein the human body key points at least comprise a left eye, a right eye, a left ear, a right ear, a nose, a left shoulder and a right shoulder.
In this embodiment, the human body key point detection network model selects an Alpha pos network, the Alpha pos network is an existing neural network for multi-person posture recognition, positions of human body key points can be predicted, multiple groups of human body key points can be predicted, each group has a score, the score with the highest score is selected as a final output result of coordinates of the human body key points, and detailed descriptions are omitted here. It should be noted that the human body key point detection network model outputs 17 human body bone key points as shown in fig. 4, which includes: left and right eyes, nose, left and right ears, left and right shoulders, left and right elbows, left and right hands, left and right crotch, left and right knees, and left and right ankles. The human body key points used in the embodiment of the invention comprise a left eye, a right eye, a left ear, a right ear, a nose, a left shoulder and a right shoulder. For the case that part of the human key points are unclear or are occluded, the human key point detection network model can also estimate the position coordinates of other key points as required according to the positions and coordinates of the known key points. In other embodiments, other detection networks capable of obtaining the human body key points may be used for processing, which is not limited herein.
S3: and obtaining an initial human face orientation angle according to the position coordinates of the plurality of human key points.
In this embodiment, step S3 specifically includes:
s31: extracting the position coordinates of the human key points in the current human boundary frame image from the human key point detection network model, wherein the position coordinates comprise: left eye (x)1,y1) Right eye (x)2,y2) Left ear (x)3,y3) Right ear (x)4,y4) Nose (x)5,y5) Left shoulder (x)6,y6) And right shoulder (x)7,y7);
S32: selecting a coordinate origin (x) in the human body boundary frame image according to the position coordinates of the human body key points0,y0)。
Specifically, the S32 includes:
s321: calculating the distance l between the left ear and the left eyedistDistance r from right ear to right eyedistAnd the midpoint coordinates ms of the left shoulder and the right shoulder:
ldist=(x1-x3)2+(y1-y3)2
rdist=(x2-x4)2+(y2-y4)2
Figure BDA0003381612520000081
s322: judgment of ldistAnd rdistAnd choosing the origin of coordinates (x)0,y0) Selecting x by using coordinates of middle points of left and right shoulders and coordinates of nose0Selecting y by using the distance between the ear coordinate and the eye coordinate0If l isdist>rdistThen x0=msx-x5,y0=y3-y5If l isdist<rdistThen x0=msx-x5,y0=y4-y5Wherein ms isxAbscissa representing the midpoint of the left and right shoulders, thereby obtaining origin of coordinates O (x)0,y0) As shown in fig. 6.
S33: using the origin of coordinates (x)0,y0) Obtaining an initial human face orientation angle:
headangel1=arctan(y0/x0)。
s4: and obtaining the sight line estimation falling point coordinates by using the initial human face orientation angle.
In this embodiment, the S4 includes:
s41: the human face turning range is equally divided into a plurality of angle ranges.
Specifically, the human head turning range is equally divided into 12 intervals, namely within 360 degrees, every 30 degrees: [ -180, -150], [ -150, -120], [ -120, -90], [ -90, -60], [ -60, -30], [ -30,0], [0,30], [30,60], [60,90], [90,120], [120,150] and [150,180], followed by determining the landing position in which range the initial human face orientation angle falls.
S42: judging the initial human face orientation angle headangel1The falling angle range is selected, and the median of the current angle range is selected as the final human face orientation angle headangel2
If the pedestrian in the image to be detected is in a long distance, the influence of the irregular rotation factor of the head of the pedestrian exists, and if the calculated initial human face orientation angle head is directly usedangel1The method can cause the problems of overlarge change of the orientation angle between video frames and unfavorable visual impression, and in addition, the method does not need to have very high accuracy on the pedestrians at long distances, so the embodiment adopts a median method to adjust the orientation angle of the human face.
Specifically, two-dimensional space coordinates of a specific sight focus are obtained after dividing the sight range, wherein if the initial human face faces towards the angle headangel1If the human face falls into a certain sight line range, selecting the middle angle of the range as the final human face orientationAngle headangel2For example, if the initial human face orientation angle head calculated in step S33angel1When 12, it falls into [0,30]]Within the range of degree, selecting [0,30]]The middle angle in the range of degrees, i.e., 15 °, is taken as the head-facing angle.
S43: and obtaining the sight line estimation falling point coordinates of the human body according to the selected final face orientation angle of the human body.
According to the selected final human face orientation angle headangle2And the set visual middle sight length L, and obtaining the relative coordinate (x) of the sight estimation falling point with the nose as the origin8,y8) As shown in fig. 7, the calculation formula is:
Figure BDA0003381612520000101
preferably, L ═ 20. Then, according to the relative coordinates (x)8,y8) And calculating the coordinates (x, y) of the sight line estimation landing point:
x=x5+x8,y=y5+y8
the remote sight line estimation device based on the human skeleton key points can analyze the remote pedestrians due to the addition of the operation of the skeleton key points, for example, the action tracks of the pedestrians can be deduced through different head postures, the attention of the pedestrians to different areas can be judged due to the head orientations, and the remote sight line of the pedestrians can be satisfactorily recognized and estimated in real scenes and game scenes. Secondly, because the network model is detected by using the human body key points which are pre-trained, the method does not relate to the training of the network model in the execution process, has the advantages of low consumption of computing resources and suitability for various computers, and the test video can achieve the real-time effect.
Example two
On the basis of the above embodiments, the present embodiment provides a remote sight line estimation apparatus based on human skeleton key points, as shown in fig. 8, the remote sight line estimation apparatus of the present embodiment includes a human body detector 1, a human body key point detection module 2, a face orientation angle acquisition module 3, and a remote sight line estimation module 4. The human body detector 1 is used for separating pedestrians in the image to be detected and cutting out a human body boundary frame image. Specifically, the human body detector 1 can separate all pedestrians in the image to be detected, obtain the human body boundary frame image of each pedestrian and the upper left corner coordinate and the lower right corner coordinate of the human body boundary frame image, and then cut out the human body boundary frame image of each pedestrian from the human body boundary frame image by utilizing the upper left corner coordinate and the lower right corner coordinate of the human body boundary frame image.
The human body key point detection module 2 is configured to input the human body boundary frame image into a pre-trained human body key point detection network model to obtain position coordinates of a plurality of human body key points in the human body boundary frame image, where the human body key points at least include a left eye, a right eye, a left ear, a right ear, a nose, a left shoulder, and a right shoulder. The human body key point detection module comprises a pre-trained human body key point detection network model. In this embodiment, the human body key point detection network model is an Alpha pos network.
The face orientation angle obtaining module 3 is configured to obtain an initial human face orientation angle according to the position coordinates of the plurality of human key points.
Further, the face orientation angle obtaining module 3 is specifically configured to: extracting position coordinates of key points of the human body; selecting a coordinate origin in the human body boundary frame image according to the position coordinates of the human body key points; and obtaining an initial human face orientation angle by using the coordinate origin.
And the remote sight line estimation module 4 is used for obtaining sight line estimation falling point coordinates by utilizing the initial human face orientation angle. Further, the far distance gaze estimation module 4 is specifically configured to: equally dividing the human face turning range into a plurality of angle ranges; judging the angle range in which the initial human face orientation angle falls, and selecting the median of the current angle range as the final human face orientation angle; and obtaining the sight line estimation falling point coordinates of the human body according to the selected final face orientation angle of the human body.
The following verification and explanation of the remote sight line estimation method based on human skeleton key points provided by the experimental example of the invention are performed through simulation experiments.
(1) Simulation conditions
In order to verify the effect of the above method, the present embodiment obtains data sets of multiple viewing angles and multiple recording environments, where the multiple viewing angles include: head-up and look-down, multiple recording environments include: hand-held camera (i.e. the camera moves with the person), on-vehicle camera (i.e. the camera moves with the car) and fixed position camera, and multiple scenes include: the method comprises the steps of manually collecting a data set in a real scene and a data set in a game scene, wherein the data set comprises M frames of video images, and M is a natural number greater than 0.
The simulation is carried out by using Pytrch 1.7, and the MOT17 in the real scene of the source data set, partial scene videos in the MTA in the monitoring scene in the game and monitoring view videos used in news reports disclosed in the trembles are adopted in the data set.
The results of the simulation experiments of this example were in the form of questionnaires to arrive at the final performance results. Specifically, a total of three test video sources were employed in the questionnaire, specifically including: jittering, MTA public data set, MOT public data set, where each part contains 5 short videos, each video being approximately 10 seconds or so in length. Evaluation indexes are as follows: and evaluating the performance of the remote pedestrian sight estimation in each video by 0-10 points, wherein the higher the score is, the better the performance of the remote pedestrian sight estimation is and the higher the accuracy is. The persons who answer the questionnaire are both experts and researchers with a scientific background in the field.
Referring to table 1, the average score of the three videos is: 7.38, 7.12, 7.51, standard deviation of 1.27, 1.36, 1.06, variance of 1.59, 1.82, 1.11.
TABLE 1 Scoring results for three-part videos
Figure BDA0003381612520000121
Referring to fig. 9, the three videos are visualized as violin diagrams for analysis, and the result shows that indexes of median, confidence interval and quartering range all reach excellent indexes.
According to experimental results, the remote sight line estimation method based on the human skeleton key points achieves satisfactory recognition performance for remote pedestrian sight line estimation in both a real scene and a game scene.
In the embodiments provided in the present invention, it should be understood that the apparatus and method disclosed in the present invention can be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the modules is merely a logical division, and in actual implementation, there may be other divisions, for example, multiple modules or components may be combined or integrated into another system, or some features may be omitted, or not implemented.
In addition, functional modules in the embodiments of the present invention may be integrated into one processing module, or each of the modules may exist alone physically, or two or more modules are integrated into one module. The integrated module can be realized in a hardware form, and can also be realized in a form of hardware and a software functional module.
The foregoing is a more detailed description of the invention in connection with specific preferred embodiments and it is not intended that the invention be limited to these specific details. For those skilled in the art to which the invention pertains, several simple deductions or substitutions can be made without departing from the spirit of the invention, and all shall be considered as belonging to the protection scope of the invention.

Claims (10)

1. A remote sight line estimation method based on human skeleton key points is characterized by comprising the following steps:
s1: separating pedestrians in the image to be detected, and cutting out a human body boundary frame image;
s2: inputting the human body boundary frame image into a pre-trained human body key point detection network model to obtain position coordinates of a plurality of human body key points in the human body boundary frame image, wherein the human body key points at least comprise a left eye, a right eye, a left ear, a right ear, a nose, a left shoulder and a right shoulder;
s3: obtaining an initial human face orientation angle according to the position coordinates of the plurality of human key points;
s4: and obtaining the sight line estimation falling point coordinates by using the initial human face orientation angle.
2. The method for remote gaze estimation based on human skeletal keypoints according to claim 1, wherein the S1 comprises:
separating all pedestrians in the image to be detected by using a human body detector, cutting out the human body boundary frame image of each pedestrian, and obtaining the upper left corner coordinate and the lower right corner coordinate of the human body boundary frame image.
3. The method for remote gaze estimation based on human skeletal keypoints according to claim 1, wherein the S3 comprises:
s31: extracting the position coordinates of the key points of the human body, comprising the following steps: left eye (x)1,y1) Right eye (x)2,y2) Left ear (x)3,y3) Right ear (x)4,y4) Nose (x)5,y5) Left shoulder (x)6,y6) And right shoulder (x)7,y7);
S32: selecting a coordinate origin (x) in the human body boundary frame image according to the position coordinates of the human body key points0,y0);
S33: using the origin of coordinates (x)0,y0) Obtaining an initial human face orientation angle:
headangel1=arctan(y0/x0)。
4. the method for remote gaze estimation based on human skeletal keypoints according to claim 3, wherein the S32 comprises:
s321: calculating the distance l between the left ear and the left eyedistDistance r from right ear to right eyedistAnd the midpoint coordinates ms of the left shoulder and the right shoulder;
s322: judgment of ldistAnd rdistAnd choosing the origin of coordinates (x)0,y0) If l isdist>rdistThen x0=msx-x5,y0=y3-y5If l isdist<rdistThen x0=msx-x5,y0=y4-y5Wherein ms isxThe abscissa indicates the midpoint of the left and right shoulders.
5. The method for remote gaze estimation based on human skeletal keypoints according to claim 3, wherein the S4 comprises:
s41: equally dividing the human face turning range into a plurality of angle ranges;
s42: judging the initial human face orientation angle headangel1The falling angle range is selected, and the median of the current angle range is selected as the final human face orientation angle headangel2
S43: and obtaining the sight line estimation falling point coordinates of the human body according to the selected final face orientation angle of the human body.
6. The method for remote gaze estimation based on human skeletal keypoints according to claim 5, wherein the S43 comprises:
according to the selected final human face orientation angle headangle2And the set visual middle sight length L, and obtaining the relative coordinate (x) of the sight estimation falling point with the nose as the origin8,y8) The calculation formula is as follows:
Figure FDA0003381612510000021
according to the relative coordinate (x)8,y8) ComputingAnd (3) obtaining coordinates (x, y) of the sight line estimation landing point:
x=x5+x8,y=y5+y8
7. a remote sight line estimation device based on human skeleton key points is characterized by comprising:
the human body detector is used for separating the pedestrians in the image to be detected and cutting out a human body boundary frame image;
the human body key point detection module is used for inputting the human body boundary frame image into a pre-trained human body key point detection network model to obtain position coordinates of a plurality of human body key points in the human body boundary frame image, wherein the human body key points at least comprise a left eye, a right eye, a left ear, a right ear, a nose part, a left shoulder part and a right shoulder part;
the face orientation angle acquisition module is used for acquiring an initial human face orientation angle according to the position coordinates of the plurality of human key points;
and the remote sight line estimation module is used for obtaining sight line estimation falling point coordinates by utilizing the initial human face orientation angle.
8. The device according to claim 7, wherein the human key point detection module comprises a pre-trained human key point detection network model.
9. The device according to claim 7, wherein the face orientation angle acquisition module is specifically configured to:
extracting position coordinates of key points of the human body; selecting a coordinate origin in the human body boundary frame image according to the position coordinates of the human body key points; and obtaining an initial human face orientation angle by using the coordinate origin.
10. The device according to any one of claims 7 to 9, wherein the remote gaze estimation module is specifically configured to:
equally dividing the human face turning range into a plurality of angle ranges; judging the angle range in which the initial human face orientation angle falls, and selecting the median of the current angle range as the final human face orientation angle; and obtaining the sight line estimation falling point coordinates of the human body according to the selected final face orientation angle of the human body.
CN202111473575.1A 2021-11-29 2021-11-29 Remote sight estimation method and device based on human skeleton key points Pending CN114005167A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111473575.1A CN114005167A (en) 2021-11-29 2021-11-29 Remote sight estimation method and device based on human skeleton key points

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111473575.1A CN114005167A (en) 2021-11-29 2021-11-29 Remote sight estimation method and device based on human skeleton key points

Publications (1)

Publication Number Publication Date
CN114005167A true CN114005167A (en) 2022-02-01

Family

ID=79931267

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111473575.1A Pending CN114005167A (en) 2021-11-29 2021-11-29 Remote sight estimation method and device based on human skeleton key points

Country Status (1)

Country Link
CN (1) CN114005167A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114529715A (en) * 2022-04-22 2022-05-24 中科南京智能技术研究院 Image identification method and system based on edge extraction
CN114542874A (en) * 2022-02-23 2022-05-27 常州工业职业技术学院 Device for automatically adjusting photographing height and angle and control system thereof
CN115631464A (en) * 2022-11-17 2023-01-20 北京航空航天大学 Pedestrian three-dimensional representation method oriented to large space-time target association
CN117238039A (en) * 2023-11-16 2023-12-15 暗物智能科技(广州)有限公司 Multitasking human behavior analysis method and system based on top view angle

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114542874A (en) * 2022-02-23 2022-05-27 常州工业职业技术学院 Device for automatically adjusting photographing height and angle and control system thereof
CN114529715A (en) * 2022-04-22 2022-05-24 中科南京智能技术研究院 Image identification method and system based on edge extraction
CN114529715B (en) * 2022-04-22 2022-07-19 中科南京智能技术研究院 Image identification method and system based on edge extraction
CN115631464A (en) * 2022-11-17 2023-01-20 北京航空航天大学 Pedestrian three-dimensional representation method oriented to large space-time target association
CN117238039A (en) * 2023-11-16 2023-12-15 暗物智能科技(广州)有限公司 Multitasking human behavior analysis method and system based on top view angle
CN117238039B (en) * 2023-11-16 2024-03-19 暗物智能科技(广州)有限公司 Multitasking human behavior analysis method and system based on top view angle

Similar Documents

Publication Publication Date Title
CN114005167A (en) Remote sight estimation method and device based on human skeleton key points
Jain et al. Real-time upper-body human pose estimation using a depth camera
Harville et al. Fast, integrated person tracking and activity recognition with plan-view templates from a single stereo camera
CN102831439B (en) Gesture tracking method and system
EP1320830B1 (en) Facial image processing system
US8462996B2 (en) Method and system for measuring human response to visual stimulus based on changes in facial expression
MX2013002904A (en) Person image processing apparatus and person image processing method.
CN109766796B (en) Deep pedestrian detection method for dense crowd
CN105809144A (en) Gesture recognition system and method adopting action segmentation
CN110837784A (en) Examination room peeping cheating detection system based on human head characteristics
CN111460976B (en) Data-driven real-time hand motion assessment method based on RGB video
CN111209811B (en) Method and system for detecting eyeball attention position in real time
WO2021068781A1 (en) Fatigue state identification method, apparatus and device
JP2010104754A (en) Emotion analyzer
CN111881749A (en) Bidirectional pedestrian flow statistical method based on RGB-D multi-modal data
CN112417142A (en) Auxiliary method and system for generating word meaning and abstract based on eye movement tracking
CN112926522A (en) Behavior identification method based on skeleton attitude and space-time diagram convolutional network
US20220036056A1 (en) Image processing apparatus and method for recognizing state of subject
Gu et al. Hand gesture interface based on improved adaptive hand area detection and contour signature
CN117593792A (en) Abnormal gesture detection method and device based on video frame
CN115841497B (en) Boundary detection method and escalator area intrusion detection method and system
CN111694980A (en) Robust family child learning state visual supervision method and device
JPH0991432A (en) Method for extracting doubtful person
CN111507192A (en) Appearance instrument monitoring method and device
Miyoshi et al. Detection of Dangerous Behavior by Estimation of Head Pose and Moving Direction

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination