CN116246345A

CN116246345A - Demonstration control method and device based on dynamic gesture recognition

Info

Publication number: CN116246345A
Application number: CN202310219341.7A
Authority: CN
Inventors: 沈来信; 邵岭; 郑小林
Original assignee: Light Controls Tesilian Shanghai Information Technology Co ltd
Current assignee: Light Controls Tesilian Shanghai Information Technology Co ltd
Priority date: 2023-03-07
Filing date: 2023-03-07
Publication date: 2023-06-09

Abstract

The embodiment of the disclosure relates to the field of intelligent office, and provides a demonstration control method and device based on dynamic gesture recognition, wherein the method comprises the following steps: performing human body detection based on the acquired original image to obtain a human body image; detecting hand key points by using a preset key point detection model to obtain joint point information of the human hand; determining a demonstration hand meeting preset characteristic conditions based on the joint point information, and giving a control focus to the demonstration hand when the action of the demonstration hand meets preset activation conditions; determining a motion history map of the demonstration hand based on the continuously acquired multi-frame images; determining an action recognition result through an action fusion algorithm based on the motion history map; determining an actual control instruction of a demonstration control object aiming at controlling the focus instruction based on a preset relation between the action and the control instruction; and controlling the demonstration control object demonstration through the actual control instruction. The embodiment of the disclosure has less time delay, more accurate dynamic gesture recognition and can effectively realize the lightweight tracking, judgment and control of the current demonstration hand.

Description

Demonstration control method and device based on dynamic gesture recognition

Technical Field

The disclosure relates to the technical field of intelligent office, in particular to a demonstration control method and device based on dynamic gesture recognition.

Background

With the development of camera image acquisition and video motion recognition technology, hand motion recognition such as digital 1-9, thumb up, down, left, right, ratio V, OK and the like can be completed by utilizing a gesture recognition algorithm based on hand images. The gesture remote control based on hand motion recognition can be applied to intelligent large screens and presentation files, such as double-click opening, region enlargement, region reduction, right click and the like of display contents on the intelligent large screens, or the presentation files such as page turning down, page turning up, double-click enlargement, region reduction and the like.

Currently, in the scheme of performing remote motion control on an intelligent large screen, a video frame image of a human body is generally obtained by using a camera, human body detection is completed by using a human body detection network based on the video frame image, and then motion classification of the human body image is completed by using convolutional neural networks (Convolutional Neural Networks, CNN) in a deep neural network, such as a ResNet network and the like. However, such a network classifies actions included in a human body image, and a large number of images need to be acquired and corresponding actions need to be labeled. Also, in this approach, the hand area will be relatively small throughout the figure, which may make the sorting network unable to focus on the detection of small changes in the hand for many times. Therefore, for the task of gesture controlling a large screen, the manner of directly classifying the motion of the human body image to obtain the gesture motion is not suitable.

In addition, algorithms for detecting gestures by directly utilizing hand joint points exist in the prior art, most of the algorithms are based on static images, the information of the hand joint points is extracted, and a convolutional neural network classification algorithm is designed to realize gesture recognition. However, gesture operation is usually a dynamic process, and motion recognition based on a static single-frame image cannot accurately determine a sequence of motion, and motion recognition based on multiple-frame images also has problems of delay, motion fluctuation, interference and the like.

Chinese patent application with publication number CN114926898A entitled "method, apparatus, device, and medium for training gesture recognition model" constructs a convolutional neural network structure, and trains the convolutional neural network structure by using multiple gesture segmentation images to obtain a gesture recognition model. The model can meet the adaptability requirement on the training data, but since the model cannot focus on the details of gestures, the model has a robustness problem when gesture recognition is performed by using data different from the training data scene.

The Chinese patent application with the publication number of CN114944013A and the invention of 'a gesture recognition model training method and a gesture recognition method based on improved yolov 5' builds a gesture recognition model based on the improved yolov5, adds a feature extraction module in a neck network of an original yolov5 model, and provides receptive fields with different sizes for input features by performing expansion convolution with different degrees on the input features, so that the gesture recognition precision of the model is improved. However, the model can only perform gesture recognition based on static images.

The Chinese patent application with the publication number of CN111950341A and the name of a real-time gesture recognition method and a gesture recognition system based on machine vision completes gesture and hand joint recognition through four steps of human body joint recognition, hand positioning, hand joint recognition and gesture recognition. However, these steps are mainly still performed using a single frame of still image to accomplish the recognition of the still gesture.

Disclosure of Invention

The disclosure aims to at least solve one of the problems in the prior art, and provides a demonstration control method and device based on dynamic gesture recognition.

In one aspect of the present disclosure, a presentation control method based on dynamic gesture recognition is provided, the presentation control method including:

acquiring an original image, and performing human body detection based on the original image to obtain a corresponding human body image;

performing hand key point detection on the human body image by using a preset key point detection model to obtain human body hand joint point information;

determining a demonstration hand meeting preset characteristic conditions based on the joint point information of the human hand, and giving a control focus to the demonstration hand when the action of the demonstration hand meets preset activation conditions;

Continuously acquiring multi-frame images, and determining a motion history diagram of the demonstration hand in the current frame image based on the multi-frame images;

determining an action recognition result of the demonstration hand in the current frame image through an action fusion algorithm based on the motion history image;

determining an actual control instruction of the demonstration control object indicated by the control focus corresponding to the action recognition result based on a preset relation between the demonstration hand action and the control instruction;

and controlling the demonstration control object to demonstrate through the actual control instruction.

Optionally, the determining, based on the multi-frame image, a motion history of the demonstration hand in the current frame image includes:

determining joint point information of the demonstration hand in each frame of image based on the multi-frame images to obtain motion history joint point information of the demonstration hand;

and projecting the motion history joint point information onto a current frame image where the demonstration hand is located to obtain the motion history image.

Optionally, the determining, based on the multi-frame images, joint point information of the demonstration hand in each frame of image respectively, to obtain motion history joint point information of the demonstration hand includes:

Based on the multi-frame images, a convex hull detection algorithm is utilized to generate convex hulls of the demonstration hands in each frame of images and joint point information outside the convex hulls;

and determining gesture actions of the demonstration hand in each frame of image by utilizing a gesture recognition algorithm based on the position and the angle based on the convex hull and the joint point information outside the convex hull, so as to obtain the motion history joint point information.

Optionally, the generating, based on the multi-frame image, a convex hull of the demonstration hand in each frame image and joint point information outside the convex hull by using a convex hull detection algorithm includes:

based on the multi-frame images, respectively constructing a convex hull curve of a palm part outline corresponding to the demonstration hand in each frame of image, and obtaining convex hulls of the demonstration hand in each frame of image respectively;

based on the multi-frame images, determining the distance between each joint point of the demonstration hand and the convex hull in each frame of image;

and determining the joint points outside the convex hull according to a preset distance corresponding relation based on the distance between each joint point of the demonstration hand and the convex hull.

Optionally, the determining, based on the convex hull and the joint point information outside the convex hull, the gesture action of the demonstration hand in each frame of image by using a gesture recognition algorithm based on the position and the angle includes:

Respectively calculating the number of the joint points outside the convex hull corresponding to the demonstration hand, the distance between any two joint points and the included angle between the edge formed by any two joint points and the edge formed by any two other joint points;

and determining the gesture action of the demonstration hand in each frame of image based on a preset gesture action judgment relation according to the number of the joint points outside the convex hull, the distance between any two joint points and the included angle.

Optionally, the distance between any two joint points is represented by the following formula (1):

dist(i,j)＝sqrt((i_x-j_x) ² +(i_y-j_y) ² ) (1)

the included angle between the edge formed by any two joint points and the edge formed by other two joint points is expressed as the following formula (2):

v(i-j,k-m)＝arctan((i_y-j_y)/(i_x-j_x))-arctan((k_y-m_y)/(k_x-m_x))(2)

wherein i not equal to j not equal to k not equal to m, dist (i, j) represents a distance between the node i and the node j, i_x represents an x coordinate of the node i, j_x represents an x coordinate of the node j, i_y represents a y coordinate of the node i, j_y represents a y coordinate of the node j, sqrt () represents a square root, v (i-j, k-m) represents an included angle between the side i-j and the side k-m, the side i-j is a side formed by the node i and the node j, the side k-m is a side formed by the node k and the node m, k_y represents a y coordinate of the node k, m_y represents a y coordinate of the node m, k_x represents an x coordinate of the node k, and m_x represents an x coordinate of the node m.

Optionally, the determining, based on the motion history map, a motion recognition result of the demonstration hand in the current frame image through a motion fusion algorithm includes:

determining a plurality of historical action recognition results of the demonstration hand in the current frame image based on the motion history map;

and respectively carrying out similarity calculation on the plurality of historical action recognition results and a preset action label sequence set, and fusing the similarity corresponding to the plurality of historical action recognition results to obtain the action recognition results.

Optionally, after the demonstration control object is controlled to carry out demonstration through the actual control instruction, the demonstration control method further includes:

determining joint point information of the demonstration hand in front and back frame images of the current frame image;

based on the joint point information in the front and back frame images, respectively calculating the average distance from the joint point on each finger of the demonstration hand to the palm center point to obtain a feature vector corresponding to each finger of the demonstration hand;

calculating finger similarity of the demonstration hand in the front and rear frame images based on the feature vector;

respectively calculating the distance between the palm center of the demonstration hand in the current frame image and the palm center of the demonstration hand in the front and rear frame images, and determining the palm similarity of the front and rear frame images according to the distance;

Fusing the finger similarity and the palm similarity to obtain the hand similarity of the demonstration hand;

and tracking the demonstration hand based on the hand similarity.

Optionally, after the tracking and determining the demonstration hand based on the hand similarity, the demonstration control method further includes:

judging whether the demonstration hand meets a preset failure condition or not based on the hand similarity;

and if the demonstration hand meets the preset failure condition, the control focus is retracted from the demonstration hand.

In another aspect of the present disclosure, there is provided a presentation control device based on dynamic gesture recognition, the presentation control device including:

the acquisition module is used for acquiring an original image, and performing human body detection based on the original image to obtain a corresponding human body image;

the detection module is used for detecting the key points of the hand of the human body image by using a preset key point detection model to obtain joint point information of the hand of the human body;

the activation module is used for determining a demonstration hand meeting preset characteristic conditions based on the joint point information of the human hand, and giving a control focus to the demonstration hand when the action of the demonstration hand meets the preset activation conditions;

The first determining module is used for continuously acquiring multi-frame images and determining a motion history diagram of the demonstration hand in the current frame image based on the multi-frame images;

the second determining module is used for determining the action recognition result of the demonstration hand in the current frame image through an action fusion algorithm based on the motion history image;

the third determining module is used for determining an actual control instruction corresponding to the action recognition result and aiming at the demonstration control object indicated by the control focus based on a preset relation between the demonstration hand action and the control instruction;

and the control module is used for controlling the demonstration control object to carry out demonstration through the actual control instruction.

Compared with the prior art, the method and the device for identifying the gesture motion by using the lightweight dynamic gesture recognition algorithm are used for identifying the gesture motion used for demonstration control, 25fps can be achieved at the CPU level, delay is less, motion history image information of the gesture motion is utilized in the gesture motion identification process, and therefore the identification of the dynamic gesture motion is more accurate. Meanwhile, when the action of the demonstration hand meets the preset activation condition, a control focus is given to the demonstration hand, so that the interference of the interference hand can be effectively eliminated, the demonstration hand is focused, and the lightweight tracking, judgment and control focus maintenance of the current demonstration hand are kept.

Drawings

One or more embodiments are illustrated by way of example and not limitation in the figures of the accompanying drawings, in which like references indicate similar elements, and in which the figures do not depict a proportional limitation unless expressly stated otherwise.

FIG. 1 is a flow chart of a method for demonstration control based on dynamic gesture recognition according to an embodiment of the present disclosure;

FIG. 2 is a schematic view of a hand joint provided in another embodiment of the present disclosure;

fig. 3 is a flowchart illustrating step S140 in the demonstration control method S100 according to another embodiment of the present disclosure;

fig. 4 is a flowchart of step S141 in the demonstration control method S100 according to another embodiment of the present disclosure;

FIG. 5 is a flow chart of a presentation control method based on dynamic gesture recognition according to another embodiment of the present disclosure;

fig. 6 is a schematic structural diagram of a presentation control device based on dynamic gesture recognition according to another embodiment of the present disclosure.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the embodiments of the present disclosure more apparent, the embodiments of the present disclosure will be described in detail below with reference to the accompanying drawings. However, those of ordinary skill in the art will understand that in various embodiments of the present disclosure, numerous technical details have been set forth in order to provide a better understanding of the present disclosure. However, the technical solutions claimed in the present disclosure can be implemented without these technical details and with various changes and modifications based on the following embodiments. The following divisions of the various embodiments are for convenience of description, and should not be construed as limiting the specific implementations of the disclosure, and the various embodiments may be mutually combined and referred to without contradiction.

One embodiment of the present disclosure relates to a presentation control method S100 based on dynamic gesture recognition, the flow of which is shown in fig. 1, including:

step S110, acquiring an original image, and performing human body detection based on the original image to obtain a corresponding human body image.

Specifically, the presentation control method S100 may be used for a large screen presentation information control or a presentation play control. The original image can be an image of a human body in front of a large screen to be controlled or a presentation, and the image can be acquired by a camera device such as a camera in front of the large screen to be controlled or the presentation.

Particularly, when a control person performs remote (for example, a distance of more than 3 meters) demonstration control in front of a large screen, the effect of directly detecting the hand of an original image acquired by an image pickup device is generally poor, and small changes (such as changes in about 6 px) of the hand are difficult to reflect on a complex large screen acquisition image (for example, an image of 1920px x 1080 px), so that the step can utilize a human body detection network MediaPipe or yolov5 to detect the original image to obtain a human body image, then expand the length and width of the human body image by 20%, and then intercept and store the human body image to obtain a human body image finally corresponding to the original image, thereby enabling the finally obtained human body image to reflect the small changes of the hand. The MediaPipe is an open-source project of ***, can provide an open-source and cross-platform common machine learning scheme, is actually an integrated tool library of machine learning vision algorithms, and comprises various models of face detection, face key point recognition, gesture recognition, image segmentation, human body gesture recognition and the like.

Step S120, performing hand key point detection on the human body image by using a preset key point detection model to obtain human body hand joint point information.

Specifically, the preset key point detection model may be a Holistic detection model of a lightweight human body key point detection package MediaPipe, and the model is used to detect the key points of the hand of the human body image, so as to obtain joint point information of the hand of the human body including 21 three-dimensional (3 d) joints of the hand of the human body. As shown in fig. 2, the reference numerals of the 21 joints of the hand may be 0 to 20, respectively. Wherein, joint point 0 may be denoted as WRIST, indicating that it is the corresponding node of the WRIST joint. Node 1 may be denoted THUMB CMC, indicating that it is the corresponding node of the carpal of the THUMB. The node 2 may be denoted thumb_mcp, representing its corresponding node of the metacarpophalangeal joint. The node 3 may be denoted THUMB IP, indicating that it is the corresponding node of the interphalangeal joint of the THUMB. The node 4 may be denoted THUMB TIP, representing its corresponding node as a THUMB TIP. The node 5 may be denoted as index_fiber_mcp, representing its corresponding node of the metacarpophalangeal joint of the INDEX FINGER. The node 6 may be denoted as index_finger_pip, representing its corresponding node of the INDEX FINGER proximal fingertip joint. The node 7 may be denoted as index_finger_dip, indicating that it is the corresponding node of the distal FINGER joint of the INDEX FINGER. The node 8 may be denoted as index_finger_tip, indicating that it is the corresponding node of the INDEX FINGER TIP. The node 9 may be denoted MIDDLE FINGER _mcp, representing its corresponding node of the metacarpophalangeal joint. The node 10 may be denoted middle_finger_pip, representing its corresponding node as the MIDDLE FINGER proximal fingertip joint. The node 11 may be denoted MIDDLE FINGER DIP, representing its corresponding node being the distal fingertip joint of the MIDDLE finger. The node 12 may be denoted as middle_finger_tip, representing its corresponding node as a MIDDLE FINGER TIP. The node 13 may be denoted as RING_FINGER_MCP, indicating that it is the corresponding node of the RING FINGER metacarpophalangeal joint. The node 14 may be denoted as RING_FINGER_PIP, meaning that it is the corresponding node of the proximal fingertip joint of the RING FINGER. The node 15 may be denoted as RING_FINGER_DIP, indicating that it is the corresponding node of the distal FINGER joint of the RING FINGER. The node 16 may be denoted as RING_FINGER_TIP, indicating that it is the corresponding node of a RING FINGER TIP. The node 17 may be denoted as a PINKY MCP, representing its corresponding node of the metacarpophalangeal joint. The node 18 may be denoted as a PINKY PIP, representing its corresponding node of the proximal finger joint. The node 19 may be denoted as a PINKY DIP, representing its corresponding node of the distal finger joint of the little finger. The node 20 may be denoted as a PINKY TIP, representing its corresponding node as a small finger TIP. The position coordinates of each of the nodes may be represented by corresponding coordinates (x, y, z) of the x-axis, y-axis, and z-axis.

Step S130, determining a demonstration hand meeting preset characteristic conditions based on the joint point information of the human hand, and giving a control focus to the demonstration hand when the action of the demonstration hand meets preset activation conditions.

Specifically, the preset characteristic condition here may be a palm rest lifting action for a preset period of time or more, and the preset period of time may be 60ms, 65ms, or the like. At this time, if the hand in the human body image is lifted and stays for a preset time period such as 60ms or more with the palm motion, the recognized hand can be used as a demonstration hand.

The preset activation condition may be that gesture recognition results of continuous preset number of frames of images are all palms, and the preset number of frames may be 3 frames, 4 frames, and the like. At this time, if the gesture recognition results of the continuous preset number of frames, such as 3 frames of images, are all palms, the demonstration hand can be considered to be ready, and the control focus of the large screen or the demonstration manuscript is given to the demonstration hand, so that the activation of the demonstration hand is completed.

Step S140, continuously acquiring multi-frame images, and determining a motion history of the demonstration hand in the current frame image based on the multi-frame images.

Specifically, the present embodiment may continuously acquire a plurality of frame images through a large screen to be controlled or an image capturing device such as a camera in front of a presentation, where each frame of image in the plurality of frame images should include a presentation hand, so as to determine a motion history of the presentation hand in a current frame of image based on the plurality of frame images. The motion history diagram is named as Motion History Image, MHI for short, and can reflect the motion history of the display hand in the multi-frame images according to the pixel change in the continuous multi-frame images.

For example, in combination with fig. 3, in step S140, determining a motion history of the demonstration hand in the current frame image based on the multi-frame image may include step S141 and step S142.

Step S141, based on the multi-frame images, determining joint point information of the demonstration hand in each frame of images respectively, and obtaining motion history joint point information of the demonstration hand.

Specifically, in this step, the activated demonstration hand may be used as a reference, and the other hand is used as an interference hand to determine the joint point information of the demonstration hand in the continuously acquired multi-frame images, thereby obtaining the motion history joint point information list of the current demonstration hand.

For example, in conjunction with fig. 4, step S141 may include step S141a and step S141b.

Step S141a, based on the multi-frame images, generating convex hulls and joint point information outside the convex hulls of the demonstration hands in each frame of images respectively by utilizing a convex hull detection algorithm.

Specifically, the convex hull detection algorithm firstly carries out outline detection of a demonstration hand on multi-frame images respectively, then uses points in the outline as point sets, finds convex hulls and joint points outside the convex hulls of the demonstration hand in each frame of images respectively, and accordingly generates convex hulls and joint point information outside the convex hulls of the demonstration hand in each frame of images respectively.

Illustratively, step S141a includes: based on multiple frames of images, respectively constructing a convex hull curve of a palm part outline corresponding to the demonstration hand in each frame of image, and obtaining convex hulls of the demonstration hand in each frame of image; based on the multi-frame images, the distance between each joint point of the demonstration hand in each frame of image and the convex hull is respectively determined; and determining the articulation points outside the convex hull according to a preset distance corresponding relation based on the distance between each articulation point of the demonstration hand and the convex hull.

Specifically, in this step, a convexHull method in OpenCV may be used to construct a contour convex hull curve of a palm portion corresponding to a demonstration hand in each frame of image. The OpenCV here is a cross-platform computer vision and machine learning software library that can run on Linux, windows, android and Mac OS operating systems, implementing many general algorithms in terms of image processing and computer vision. The convexHull method in the OpenCV can calculate the convex hull of the palm part corresponding to the demonstration hand in the image, and according to the contour points of the palm part in the image, the contour points of the palm part are converted into the point coordinates of the convex hull through the function convexHull, so that the convex hull corresponding to the palm part of the demonstration hand is drawn, and the contour convex hull curve of the palm part of the demonstration hand is obtained.

For example, in this step, the distance pt (i) between the hand node i of the demonstration hand and the convex hull in each frame of image can be calculated by using the pointPolygonTest method in OpenCV. The pointpolygon test method here can determine whether the hand node i is inside or outside the outline of the palm portion. If pt (i) =0, it indicates that the hand node i is on the convex hull curve of the palm portion contour. If pt (i) =1, it indicates that the hand node i is inside the convex hull curve of the palm portion contour. If pt (i) = -1, it indicates that the hand node i is outside the convex hull curve of the palm portion contour. Since the joints of the fingers are outside the palm, in this embodiment, the hand joints i of pt (i) = -1 are used as the joints of the fingers and recorded into the fingers array, so that all the joints outside the outline convex hull of the palm part, that is, all the finger joints, can be recorded into the fingers array, and at this time, the fingers array contains all the finger joint information.

Step S141b, determining gesture actions of the demonstration hand in each frame of image by utilizing a gesture recognition algorithm based on the position and the angle based on the convex hull and the joint point information outside the convex hull, and obtaining the movement history joint point information.

Specifically, in the step, the position and the angle in the finger joint point information contained in the finger array can be utilized, and a gesture recognition algorithm based on the position and the angle is used for determining gesture actions such as fist, ratio V, numbers 1-9 and the like of the demonstration hand in each frame of image, so that the movement history joint point information of the demonstration hand based on the continuously acquired multi-frame images is obtained.

Illustratively, step S141b includes: respectively calculating the number of articulation points outside the convex hull corresponding to the demonstration hand, the distance between any two articulation points, and the included angle between the edge formed by any two articulation points and the edge formed by any two other articulation points; and determining the gesture action of the demonstration hand in each frame of image based on the preset gesture action judgment relation according to the number of the joint points outside the convex hull, the distance between any two joint points and the included angle.

Specifically, the step can calculate the number of the nodes outside the convex hull, the distance between any two nodes and the included angle between the edges formed by the nodes based on the finger node information contained in the finger array.

Based on the number of joint points outside the convex hull, the number of fingers (fingers) outside the convex hull can be calculated. len (fingers) has a value of 0-5, which represents 0-5 fingers outside the convex hull.

Illustratively, the distance between any two joint points can be expressed as the following formula (1):

dist(i,j)＝sqrt((i_x-j_x) ² +(i_y-j_y) ² ) (1)。

wherein i and j are numbers of the nodes and i is not equal to j. dist (i, j) represents the distance between the node i and the node j, i_x represents the x-coordinate of the node i, j_x represents the x-coordinate of the node j, i_y represents the y-coordinate of the node i, j_y represents the y-coordinate of the node j, and sqrt () represents the square root.

The angle between the edge formed by any two joints and the edge formed by other two joints can be expressed as the following formula (2):

v(i-j,k-m)＝arctan((i_y-j_y)/(i_x-j_x))-arctan((k_y-m_y)/(k_x-m_x))(2)。

wherein, k and m are the numbers of the articulation points, i is not equal to j is not equal to k is not equal to m, v (i-j, k-m) represents the included angle between the side i-j and the side k-m, the side i-j is the side formed by the articulation point i and the articulation point j, the side k-m is the side formed by the articulation point k and the articulation point m, k_y represents the y coordinate of the articulation point k, m_y represents the y coordinate of the articulation point m, k_x represents the x coordinate of the articulation point k, and m_x represents the x coordinate of the articulation point m.

According to the number of the joint points outside the convex hull and the included angles between the edges formed by the joint points, a preset gesture judgment relation can be obtained, so that the judgment of the gesture of the demonstration hand is realized by using the gesture judgment relation.

For example, in the preset gesture motion determination relationship, the gesture motion fist may correspond to len (fingers) =0, that is, the number of fingers outside the convex hull is 0, that is, the number of joints outside the convex hull is 0. The ratio V may correspond to a case where len (fingers) =2 and fingers [0] > =3 and fingers [1] > =3 and V (fingers [0], fingers [1 ]) > =10, where len (fingers) =2 indicates that the number of fingers outside the convex hull is 2, fingers [0] > =3 indicates that the number of joints included in the finger with the number of 0 is not less than 3, fingers [1] > =3 indicates that the number of joints included in the finger with the number of 1 is not less than 3, and V (fingers [0], fingers [1 ]) > =10 indicates that the angle between the finger with the number of 0 and the finger with the number of 1 is not less than 10. The number 1 may correspond to the case where len (fingers) =1 and fingers [0] = =8, where len (fingers) =1 indicates that the number of fingers outside the convex hull is 1, fingers [0] = =8 indicates that the number 8 of the node located at the highest position, i.e., farthest from the convex hull, among fingers having the number 0, as shown in fig. 2, the node having the number 8 indicates the corresponding node of the tip of the index finger. The number 2 may correspond to the case of len (fingers) =2 and fingers [0] > =3 and fingers [1] > =3 and V (fingers [0], fingers [1 ]) <3, where V (fingers [0], fingers [1 ]) <3 indicates that the angle between the finger numbered 0 and the finger numbered 1 is less than 3, thereby achieving the distinction of the gesture motion number 2 from the ratio V by the angle between the finger 0 and the finger 1. The number 4 may correspond to the case of len (fingers) = =4 and fingers [0] > =3 and fingers [1] > =3 and fingers [2] > =3 and fingers [3] > =3. Wherein len (fingers) = 4 denotes that the number of fingers outside the convex hull is 4, fingers [2] > = 3 denotes that the number of joints included in the finger with the reference numeral 2 is not less than 3, and fingers [3] > = 3 denotes that the number of joints included in the finger with the reference numeral 3 is not less than 3. The number 5 may correspond to the case of len (fingers) = 5, where len (fingers) = 5 represents the number of fingers outside the convex hull as 5.

And step S142, projecting the motion history node information onto the current frame image where the demonstration hand is located, and obtaining a motion history image.

Specifically, in this step, the motion history joint point information carried by the motion history joint point information list of the current demonstration hand may be projected onto the current frame image where the current demonstration hand is located, so as to form a motion history map of the current frame of the current demonstration hand.

Step S150, determining the action recognition result of the demonstration hand on the current frame image through an action fusion algorithm based on the motion history map.

Specifically, the step may first determine that each gesture of the demonstration hand related to the motion history map is identified, then fuse all the identified gesture by using a motion fusion algorithm, and use the fusion result as a motion identification result of the demonstration hand in the current frame image.

Illustratively, step S150 includes: determining a plurality of historical action recognition results of the demonstration hand in the current frame image based on the motion history image; and respectively carrying out similarity calculation on the plurality of historical action recognition results and a preset action tag sequence set, and fusing the similarity corresponding to the plurality of historical action recognition results to obtain an action recognition result.

Specifically, the preset action tag sequence set is a set composed of all possible action sequences of the gesture actions of the preset type. When similarity calculation is performed on the plurality of historical action recognition results and a preset action tag sequence set respectively, similarity calculation can be performed on the plurality of historical action recognition results and action sequences of gesture actions of each type in the action tag sequence set respectively to obtain corresponding similarity, fusion calculation of the plurality of historical action recognition results is completed based on the similarity to obtain action fusion results of the motion histories corresponding to the motion histories, and the action fusion results are used as action recognition results of a demonstration hand in a current frame image.

Step S160, determining an actual control instruction of the demonstration control object indicated by the control focus corresponding to the action recognition result based on the preset relationship between the demonstration hand action and the control instruction.

Specifically, for different demonstration control objects, a preset relationship between the demonstration hand motion and different control instructions can be preset.

For example, when the demonstration control object is a smart large screen, the

numbers

1, 2, 3, 4, 5, 6, 7 can be used to control the CTRL key, ALT key, del key, mouse movement, right mouse click, left mouse click, double mouse click respectively, the thumb is used to control the window of the smart large screen to move left, right, up and down respectively, and the

numbers

8 and 9 are used to control the window of the smart large screen to zoom, close and the like when the preset relation between the action of the demonstration hand and the control instruction is set. At this time, if the demonstration control object indicated by the control focus of the demonstration hand is the smart large screen, when the gesture corresponding to the action recognition result is 9, the corresponding actual control instruction is the window closing control of the smart large screen.

For another example, when the presentation control object is a presentation, the

numbers

1, 2, 3, 4, 5, 6, and 7 may be used to control the CTRL key, ALT key, del key, mouse movement, right click, left click, and double click of the mouse, and the thumb may be used to control page down, page up, selection, and amplification of the presentation, and the

numbers

8 and 9 may be used to control zoom out, and close of the presentation. At this time, if the presentation control object indicated by the control focus of the presentation hand is a presentation, when the gesture corresponding to the action recognition result is 9, the corresponding actual control instruction is the closing control of the presentation.

Step S170, the demonstration control object is controlled to carry out demonstration through the actual control instruction.

For example, when the actual control instruction is that the window of the smart large screen is closed, the step can control the smart large screen indicated by the control focus of the current demonstration hand to close the window according to the instruction. Alternatively, when the actual control instruction is closing of the presentation, the step may control closing of the presentation indicated by the control focus of the current presentation hand according to the instruction, and so on.

Compared with the prior art, the embodiment of the disclosure adopts a lightweight dynamic gesture recognition algorithm to recognize gesture actions for demonstration control, can reach 25fps at the CPU level, has less delay, and utilizes motion history image information of the gesture actions in the gesture action recognition process, so that the dynamic gesture actions are more accurately recognized. Meanwhile, when the action of the demonstration hand meets the preset activation condition, a control focus is given to the demonstration hand, so that the interference of the interference hand can be effectively eliminated, the demonstration hand is focused, and the lightweight tracking, judgment and control focus maintenance of the current demonstration hand are kept.

Illustratively, following step S170, the presentation control method S100 may further include related steps of gesture tracking, which may specifically include: determining joint point information of a demonstration hand in front and back frame images of a current frame image; based on joint point information in front and back frame images, respectively calculating the average distance from the joint point on each finger in the demonstration hand to the palm center point to obtain a feature vector corresponding to each finger of the demonstration hand; calculating finger similarity of demonstration hands in front and back frame images based on the feature vectors; respectively calculating the distance between the palm center of the demonstration hand in the current frame image and the palm center of the demonstration hand in the front and rear frame images, and determining the palm similarity of the front and rear frame images according to the distance; fusing the finger similarity and the palm similarity to obtain the hand similarity of the demonstration hand; based on the hand similarity, the demonstration hands are tracked.

Specifically, when the gesture tracking is performed, the average distance between the joints fj (fj, 1-4 joints of each finger in the demonstration hand) and the central point O of the palm of the hand can be calculated by using the joint information in the front frame image and the back frame image, the value of fi is 1-5, the values of fj are respectively represented by 1-5 fingers of the demonstration hand, the average distance is taken as the characteristic vector value Vi of the corresponding finger fi, and the characteristic vector v= (V1, V2, V3, V4, V5) of five fingers of the demonstration hand is obtained, wherein V1, V2, V3, V4 and V5 respectively represent the characteristic vector values corresponding to 1, 2, 3, 4 and 5 fingers of the demonstration hand.

After the feature vectors of the hands and fingers are demonstrated in the front and back frame images are obtained, the feature vectors can be used for calculating the finger similarity s1 of the hands demonstrated in the front and back frame images. And simultaneously, calculating the distance between the palm center of the demonstration hand in the current frame image and the palm center of the demonstration hand in the front and rear frame images by utilizing the palm center coordinate positions of the demonstration hand in the current frame image and the front and rear frame images, and determining the palm similarity s2 of the front and rear frame images according to the distance. And fusing the finger similarity s1 and the palm similarity s2 to obtain the hand similarity s of the demonstration hand. Based on the hand similarity s, tracking and determination of the demonstration hand given the control focus can be realized. It should be noted that, when the finger similarity s1 or the palm similarity s2 is calculated, the reciprocal of the related distance may be used as the corresponding similarity, so as to avoid interference of other gestures and achieve the objective of demonstrating hand tracking.

According to the embodiment, the gesture tracking algorithm is utilized to track the current demonstration hand, so that interference of the interference hand can be further effectively eliminated, the demonstration hand is focused, and the lightweight tracking, judgment and focus keeping of the current demonstration hand are further kept.

Illustratively, after tracking and determining the demonstration hands based on the hand similarity, the demonstration control method S100 further includes: judging whether the demonstration hand meets a preset failure condition or not based on the hand similarity; and if the demonstration hand meets the preset failure condition, the control focus is retracted from the demonstration hand.

Specifically, the preset failure condition can be set according to actual needs. For example, the preset failure condition may be that the gesture of the demonstration hand is out of the control range of the demonstration control object, for example, the gesture of the demonstration hand is out of the screen of the smart large screen; it is also possible that the gesture of the demonstration hand is within the control range of the demonstration control object, but the gesture is kept unchanged for a preset time period, for example, the gesture of the demonstration hand is in the screen of the smart large screen, but the gesture is kept in the vertical drop action for a preset time period such as 100 ms.

When the current demonstration hand given the control focus meets the preset failure condition, the demonstration hand is considered not to perform demonstration control activities, so that the current control focus can be retracted from the current demonstration hand, and the current demonstration hand is made to fail.

For example, after the current demonstration hand fails, the camera device such as a camera may re-capture a new palm from which to make a new determination to activate the hand.

In order to enable a person skilled in the art to better understand the above embodiments, a specific example will be described below.

Referring to fig. 5, a presentation control method based on dynamic gesture recognition includes the following steps:

1) Human body detection: when the demonstration is carried out in a long distance (more than 3 meters) in front of a large screen, the hand detection effect is not good by directly carrying out the image acquisition of the camera, the small change (about 6 px) of the hand is difficult to reflect on the complicated large-screen acquired large image (such as 1920px x 1080 px), the human body image is detected firstly by utilizing the human body detection network MediaPipe, and then the length and the width of the human body image are enlarged by 20 percent and the human body image is intercepted for storage;

2) Hand joint point detection: performing hand key point detection on the human body image by using a lightweight human body key point detection package media Pipe detection model to obtain 21 pieces of 3D joint point information of the human body hand, wherein the labels are from 0 to 20, and the coordinate value of each joint point is (x, y, z);

3) Demonstration of hand activation: when one hand is lifted, the palm is stopped for more than 60ms, the detection algorithm completes palm recognition based on the palm, and the hand where the palm is located is taken as the demonstration hand. If the recognition results of the continuous 3 frames of images are all the palms of the demonstration hand, the current demonstration hand is considered to be ready for demonstration control, and a control focus is given to the current demonstration hand, so that the activation of the demonstration hand is completed.

4) Convex hull gesture recognition: based on a convex hull detection algorithm, the convex hull generation of the hand joint points in the image is completed, and meanwhile joint point information outside the convex hull is obtained. The convex hull detection algorithm comprises the following steps:

4.1 Constructing a convex hull curve of the outline of the palm portion by using a convexHull method of opencv.

4.2 The distance pt (i) between each hand joint point i and the convex hull is calculated by using the pointPolygonTest method of opencv.

4.3 Pt (i) = 0 indicates that the node i is on the curve of the convex hull, pt (i) = 1 indicates that the node i is inside the curve of the convex hull, pt (i) = -1 indicates that the node i is outside the curve of the convex hull, and the node i of pt (i) = -1 is recorded in a finger array, wherein the finger array comprises node information outside the convex hull.

4.4 Outputting a finger array including convex hull outer joint point information.

5) Motion history map MHI generation: and taking the currently activated demonstration hand as a reference, continuously acquiring joint point information of the demonstration hand of the multi-frame image by using other hands as interference hands, obtaining a motion history joint point information list of the current demonstration hand, and projecting the motion history joint point information list onto the image of the current frame to form a motion history map MHI of the current frame of the current demonstration hand.

6) Dynamic gesture recognition: based on the joint point position and angle included in the joint point information outside the convex hull, different gesture actions of the current demonstration hand are judged by using a gesture recognition algorithm based on the position and the angle, wherein the gesture actions comprise recognition of gesture actions such as fist, numbers 1-9, upward, downward, leftward and rightward of the thumb, ratio V, OK and the like. And then, utilizing the motion history map MHI of the current frame of the current demonstration hand to obtain a plurality of history action recognition results of the demonstration hand in the current frame, and fusing the plurality of history action recognition results through an action fusion algorithm to obtain the action recognition result of the demonstration hand in the current frame so as to complete dynamic gesture recognition.

The gesture recognition algorithm based on the position and the angle comprises the following steps:

6.1.1 Obtaining a finger array containing convex hull external joint point information.

6.1.2 Calculating the number of fingers len (fingers) outside the convex hull, the distance dist (i, j) between the joint point i and the joint point j, the included angle v (i-j, k-m) between the edge i-j formed by the joint point i and the joint point j and the edge k-m formed by the joint point k and the joint point m, and the like based on the number of joint points outside the convex hull. Wherein dist (i, j) is represented by the following formula (1), and v (i-j, k-m) is represented by the following formula (2).

dist(i,j)＝sqrt((i_x-j_x) ² +(i_y-j_y) ² )(1)

v(i-j,k-m)＝arctan((i_y-j_y)/(i_x-j_x))-arctan((k_y-m_y)/(k_x-m_x))(2)

6.1.3 According to the number of the joints outside the convex hull and the included angle between the edges formed by the joints, the judgment relation of gesture actions such as fist, numbers 1-9, thumb up, down, left, right, ratio V, OK and the like can be obtained. For example, fist corresponds to the case of len (fingers) =0. The number 1 corresponds to the case of len (fingers) =1 and fingers [0] = =8. The ratio V corresponds to the case of len (fingers) =2 and fingers [0] > =3 and fingers [1] > =3 and V (fingers [0], fingers [1 ]) > =10. Number 2 corresponds to the case of len (fingers) =2 and fingers [0] > =3 and fingers [1] > =3 and v (fingers [0], fingers [1 ]) < 3. Number 4 corresponds to the case of len (fingers) = =4 and fingers [0] > =3 and fingers [1] > =3 and fingers [2] > =3 and fingers [3] > =3. Number 5 corresponds to the case of len (fingers) = 5.

7) Control the large screen and demonstrate manuscript: based on the gesture motion of the current demonstration hand identified by the hand motion, the remote demonstration control of the intelligent large screen and the remote play demonstration control of the demonstration manuscript can be respectively realized.

7.1 Intelligent large screen control: the corresponding relation between the hand action and the intelligent large screen control instruction can be set as follows: the CTRL key, ALT key, del key, mouse movement, right mouse click, left mouse click, double mouse click, left, right, up, down movement, zoom, close, etc. on the smart large screen are controlled by the

numerals

1, 2, 3, 4, 5, 6, 7 and thumb to the left, right, up, down, 8, 9, respectively. Based on the above, when the control focus of the demonstration hand indicates an intelligent large screen, if the recognized gesture of the current demonstration hand is a number 1, the corresponding actual control instruction is CTRL key control; if the recognized gesture of the current demonstration hand is the thumb to the left, the corresponding actual control instruction is window left control; etc.

7.2 Presentation control: the corresponding relation between the demonstration hand action and the demonstration manuscript control instruction can be set as follows: the

numbers

1, 2, 3, 4, 5, 6, 7 and thumb are used for respectively controlling the CTRL key, the ALT key, the Del key, the mouse movement, the right click of the mouse, the left click of the mouse, the double click of the mouse, the next page, the last page, the selection, the enlargement, the reduction, the closing and the like of the presentation file. Based on the above, when the control focus of the demonstration hand indicates the demonstration manuscript, if the identified gesture of the current demonstration hand is the thumb to the left, the corresponding actual control instruction is the next page turning control of the demonstration manuscript; if the recognized gesture of the current demonstration hand is a numeral 9, the corresponding actual control instruction is the closing control of the demonstration manuscript; etc.

8) Gesture tracking: and calculating the average distance between the joint point fj (1-4) on each finger fi (1-5) and the palm center point O as the characteristic vector value Vi of the finger fi by utilizing the joint point information of the front and back frame images to form characteristic vectors V= (V1, V2, V3, V4 and V5) of five fingers, and completing the calculation of the finger similarity s1 of the front and back frame images by utilizing the characteristic vectors of the fingers through gesture tracking. And simultaneously, calculating the distance between the palm center points of the multi-frame images by utilizing the coordinate positions of the palm center points O of each hand in the current frame image and the front and back frame images, and taking the distance as the palm similarity s2 of the front and back frame images. And (3) fusing the s1 and the s2 to form the similarity s of the hands in the front and rear frame images, thereby completing tracking and judging of the demonstration hand with the control focus. When the finger similarity s1 or the palm similarity s2 is calculated, the reciprocal of the related distance is used as the corresponding similarity, so that the interference of other gestures is avoided, and the aim of demonstrating hand tracking is fulfilled.

9) Demonstration of hand failure: if the current demonstration hand meets the preset failure condition, for example, the gesture of the current demonstration hand is out of the screen, or the gesture of the demonstration hand is in the screen but is detected to be vertically put down for a period of time (for example, 100 ms), the current demonstration hand is considered not to perform demonstration control activity, the current control focus is retracted from the demonstration hand, the demonstration hand is made to be invalid, and the step 10 is performed. If the current demonstration hand does not meet the preset failure condition, the current demonstration hand is considered to continue the demonstration control activity, the step 4 can be returned again, and the gesture action recognition is continuously performed on the demonstration hand, so that the gesture control is continuously performed on the demonstration control object such as the intelligent large screen or the demonstration manuscript.

10 Ending the current presentation, detecting a new active hand: after the demonstration hand fails, the corresponding current gesture control demonstration is finished, the camera captures a new palm to determine the new demonstration hand based on the palm, and whether the demonstration hand is activated or not is judged.

Another embodiment of the present disclosure relates to a presentation control device based on dynamic gesture recognition, as shown in fig. 6, including:

the acquiring module 601 is configured to acquire an original image, and perform human body detection based on the original image to obtain a corresponding human body image;

the detection module 602 is configured to detect a hand key point of the human body image by using a preset key point detection model, so as to obtain joint point information of the human body hand;

an activation module 603, configured to determine a demonstration hand that meets a preset feature condition based on the joint point information of the human hand, and assign a control focus to the demonstration hand when the action of the demonstration hand meets the preset activation condition;

a first determining module 604, configured to continuously acquire multiple frame images, and determine a motion history of the demonstration hand in the current frame image based on the multiple frame images;

a second determining module 605, configured to determine, based on the motion history map, a motion recognition result of the demonstration hand in the current frame image through a motion fusion algorithm;

A third determining module 606, configured to determine an actual control instruction of the demonstration control object indicated by the control focus corresponding to the action recognition result based on a preset relationship between the demonstration hand action and the control instruction;

the control module 607 is configured to control the demonstration control object to demonstrate through the actual control instruction.

The specific implementation method of the demonstration control device based on dynamic gesture recognition provided in the embodiment of the present disclosure may be described with reference to the demonstration control method based on dynamic gesture recognition provided in the embodiment of the present disclosure, which is not described herein again.

It will be understood by those of ordinary skill in the art that the foregoing embodiments are specific embodiments for carrying out the present disclosure, and that various changes in form and details may be made therein without departing from the spirit and scope of the present disclosure.

Claims

1. A presentation control method based on dynamic gesture recognition, characterized in that the presentation control method comprises:

2. The presentation control method according to claim 1, wherein the determining a motion history of the presentation hand in a current frame image based on the multi-frame image includes:

3. The presentation control method according to claim 2, wherein the determining, based on the multi-frame images, the joint point information of the presentation hand in each frame image, respectively, to obtain the motion history joint point information of the presentation hand, includes:

4. The presentation control method according to claim 3, wherein the generating, based on the multi-frame image, the convex hull of the presentation hand in each frame image and the joint point information outside the convex hull by using a convex hull detection algorithm includes:

5. The presentation control method according to claim 4, wherein the determining, based on the convex hull and the joint point information outside the convex hull, the gesture action of the presentation hand in each frame of image using a gesture recognition algorithm based on position and angle includes:

6. The presentation control method according to claim 5, wherein,

the distance between any two joint points is expressed as the following formula (1):

dist(i,j)＝sqrt((i_x-j_x) ² +(i_y-j_y) ² ) (1)

v(i-j,k-m)＝arctan((i_y-j_y)/(i_x-j_x))-arctan((k_y-m_y)/(k_x-m_x))(2)

7. The presentation control method according to claim 6, wherein the determining, based on the motion history map, a motion recognition result of the presentation hand at the current frame image by a motion fusion algorithm includes:

8. The presentation control method according to any one of claims 1 to 7, characterized in that after the presentation control object is controlled to perform a presentation by the actual control instruction, the presentation control method further comprises:

and tracking the demonstration hand based on the hand similarity.

9. The presentation control method according to claim 8, wherein after the tracking and deciding of the presentation hand based on the hand similarity, the presentation control method further comprises:

10. A presentation control device based on dynamic gesture recognition, the presentation control device comprising: