CN110717385A

CN110717385A - Dynamic gesture recognition method

Info

Publication number: CN110717385A
Application number: CN201910816005.4A
Authority: CN
Inventors: 李�浩; 张运良
Original assignee: Xian University
Current assignee: Xian University
Priority date: 2019-08-30
Filing date: 2019-08-30
Publication date: 2020-01-21

Abstract

The invention provides a method capable of dynamically recognizing dynamic gestures, which extracts palm position information by combining human skeleton node information with depth image information, separates palm image information from a background, recognizes a segmented gesture image by adopting an SVM (support vector machine) algorithm, and finally performs optimal matching of dynamic gestures by utilizing a DTW (dynamic time warping) dynamic time warping algorithm through a motion sequence of arm skeleton nodes.

Description

Dynamic gesture recognition method

Technical Field

The invention relates to an intelligent recognition technology, which is particularly applied to the field of gesture recognition in transportation.

Background

The train driver gesture recognition is an important component of an intelligent traffic management system, belongs to a non-contact gesture collection mode based on computer vision, is low in equipment cost, can better meet the naturalness and comfort required by human-computer interaction, and is a hotspot of current research.

Compared with static gestures, dynamic gestures have the characteristics of intuition and convenience, are more suitable for flexible man-machine interaction application, and are difficult to research on dynamic gesture recognition due to the fact that the dynamic gestures are multiple in types, complex in characteristics and fast in change. In the prior art, distance transformation is performed on a binarized image to generate a hand region image with a skeleton extraction effect, and a central point is connected to obtain a hand skeleton, so that gestures are recognized and classified, and the recognition accuracy is almost 100%. Some methods can only realize the recognition of left hand gestures and human body trunks, and lack the recognition of palm skeletons. Still other methods use Kinect to identify hand arithmetic (arabic numbers and operator signs) and stone scissors cloth, obtain accurate images of hand regions by depth threshold segmentation, and use finger-ground movement distance (FEMD) measurement to measure the difference between different hand types for identification and classification, and the highest identification rate of the method reaches 93.9%. However, in the gesture recognition process, the black wrist strap worn by the hand of the tester has a certain influence on the recognition result, and the recognition accuracy is low when the wrist strap is not worn.

Most of the existing methods aim at static general gesture recognition in common scenes, have poor recognition effect on dynamic professional gestures in certain specific scenes, and cannot effectively judge and recognize gestures in the scenes.

Disclosure of Invention

Based on the problems, the invention provides a Dynamic gesture recognition method, in particular to a train driver Dynamic gesture recognition method based on machine vision, which adopts Kinect to obtain human skeleton node information, sets a distance difference threshold value to determine the position of an approximate palm node to obtain a gesture segmentation image, then adopts a Support Vector Machine (SVM) to perform gesture recognition and evaluation, combines a motion sequence of the skeleton node, adopts a DTW (Dynamic Time Warping) algorithm to recognize and detect the arm action of a train driver, and finally obtains effective gesture information.

The invention provides a gesture recognition method, which specifically comprises the following steps: step S1, determining the position of the palm node, averaging the position coordinates of all white pixels in a circle with the palm node as the center of the circle and the distance r between the palm node and the wrist node as the radius, and representing the position coordinates of the palm node by the average value, so that the (x) of the palm node_p,y_p) Is positioned as：

In the formula, T represents the number x of white pixels in the circle_iAbscissa, y, representing the ith white pixel_iExpressing the ordinate of the ith white pixel point;

step S2, after finding out the position of the palm node, searching gesture pixel points, and segmenting the gesture by judging the distance difference between the palm node and the peripheral area pixel points to Kinect;

when the gesture recognition object is a palm gesture,

recognizing the segmented gesture image by adopting an SVM algorithm and evaluating the gesture normalization; the SVM classification result is the confidence between the test gesture and the standard gesture, and can be used as an evaluation criterion of palm gesture specification, as shown in a formula:

in the formula, round (-) represents an integer, T is the total frame number of the dynamic gesture sequence,

and (4) representing the gesture image output result of the SVM to the ith frame, wherein the palm gesture score is the average value of the gesture scores of the whole sequence.

Further, the gesture recognition also comprises recognition of arm actions, and the recognition of the arm actions specifically comprises: key skeleton node coordinate data of the data acquired by the Kinect sensor; the coordinate of the key bone node is P_s＝(x_s,y_s,z_s) And the remaining arm skeleton node coordinate is P_i＝(x_i,y_i,z_i) I is 1,2,3,4, so node P_iAnd key bone node P_sThe distance between them is:

within a certain time T, the motion sequence of the arm skeleton node is represented as (D)_si ¹,D_si ²,...,D_si ^T) And i is 1,2,3 and 4, and according to the arm skeleton node motion sequence, a dynamic gesture optimal matching can be performed by adopting a DTW algorithm.

Further, wherein the key bone node is specifically one of: palm, wrist, elbow, shoulder center.

Further, searching for a gesture pixel in S2 specifically includes:

the method comprises the following steps of searching gesture pixel points in a large rectangular area with a palm node as a center, and the method comprises the following steps:

setting the distance d from the skeleton palm node extracted by the Kinect to the Kinect camera_pThe position of the palm node is (x)_p,y_p,d_p) The position of the wrist node is (x)_r,y_r,d_r) Performing gesture pixel point search in the range of rectangular pixels with the palm node as the center, the width w and the height H,

with S₀Denotes an initial gesture pixel point set, d_ijIndicating the ith row and jth column of pixels g in the rectangular area_ijDistance to the Kinect camera, therefore:

wherein k represents the number of searching times, threshold represents the distance difference threshold between the palm node and the gesture pixel point in the rectangular area to Kinect, abs (d)_p-d_ij) Representing the absolute value of the difference between the distance of the palm node and the gesture pixel area, S_kRepresenting the final detected set of gesture pixel points.

Further, the DTW algorithm specifically includes: time of taking standard dynamic gesture sampleThe intersequence is X ═ D_s ¹,D_s ²,...,D_s ^m) Test gesture time sequence Y ═ D (D)_t ¹,D_t ²,...,D_t ⁿ) Let the point-to-point relationship between the two sequences be (k) to (phi)_s(k),φ_t(k) Wherein 1 is less than or equal to phi_s(k)≤m，1≤φ_t(k) N is less than or equal to n, k is less than or equal to m + n, and finding the optimal point pair relation phi (k) between the two sequences so that the sum of the distances between the corresponding points is minimum, expressed as:

in the formula (I), the compound is shown in the specification,

and recognizing the input dynamic gesture according to the obtained DTW distance to obtain a gesture recognition result.

Further wherein the size of the rectangular area is greater than 3 times the distance between the palm node and the wrist node.

Further, the gesture recognition result is consistent with the sample class with the minimum DTW distance in the standard dynamic gesture library, and is represented as:

in the formula, Xi is a standard dynamic gesture sample, Y is an input dynamic gesture, i is a dynamic gesture sample category, and O is a finally identified dynamic gesture category;

the measurement mode of the arm dynamic gesture score is DTW distance, the closer the test gesture sequence is to the standard gesture sample sequence, the smaller the DTW distance is, and the arm dynamic gesture score is expressed as:

in the formula, X_iAnd representing a standard dynamic gesture sequence, wherein Y is a test gesture sequence, N is the number of standard gesture sequence samples, and alpha is the DTW distance average value between the standard gesture sequence samples.

A computer storage medium having a computer program stored thereon, the computer program being executable by a processor to implement the method as described above

Drawings

The features and advantages of the present disclosure will be more clearly understood by reference to the accompanying drawings, which are illustrative and should not be construed as limiting the disclosure in any way, in which

FIG. 1 is a schematic diagram of a gesture image segmentation method

FIG. 2 is a diagram illustrating a method for determining a palm node position

FIG. 3 flow chart of palm gesture recognition

FIG. 4 flow chart for identifying arm movements of a driver

Detailed Description

These and other features and characteristics of the present disclosure, as well as the methods of operation and functions of the related elements of structure and the combination of parts and economies of manufacture, will be better understood by reference to the following description and drawings, which form a part of this specification. It is to be expressly understood, however, that the drawings are for the purpose of illustration and description only and are not intended as a definition of the limits of the disclosure. It will be understood that the figures are not drawn to scale. Various block diagrams are used in this disclosure to illustrate various variations of embodiments according to the disclosure.

Example 1

According to the gesture detection method combining human skeleton node information and depth image information, a gesture image is segmented, human skeleton node data are obtained through Kinect, the position of a palm node is found, gestures are searched in the range of the palm node, when all pixel points on the whole palm are close to a camera, gesture information can be separated from the background by setting a distance difference threshold, and the flow of the method is shown in fig. 1.

Because the node drift phenomenon easily occurs when the Kinect tracks the human skeleton node, the distance from the palm node to the Kinect is not the distance from the actual palm node to the Kinect, and the gesture segmentation fails when the distance difference threshold is used for segmentation. Therefore, an approximate palm node position determining method is adopted, as shown in fig. 2, the position coordinates of all white pixel points are averaged in a circle with the palm node as the center of the circle and the distance r between the palm node and the wrist node as the radius, and the average value represents the position coordinates of the palm node, so that the (x) of the palm node_p,y_p) The positions are as follows:

in the formula, T represents the number x of white pixels in the circle_iAbscissa, y, representing the ith white pixel_iThe ordinate of the ith white pixel point is represented. After the position of the palm node is found, the gesture is segmented by judging the distance difference between the palm node and the pixels in the surrounding area to the Kinect.

After finding the palm node, need search the gesture pixel around the palm node, in order to prevent to take place to drift because of the palm node and lead to the gesture pixel to search and appear the deviation, carry out the search of gesture pixel in a big rectangle region with the palm node as the center, the algorithm process is as follows:

And (4) performing recognition on the gesture action, recognizing the segmented gesture image by adopting an SVM algorithm and evaluating the gesture normalization. The SVM classification result is a confidence between the test gesture and the standard gesture, and can be used as an evaluation criterion of the palm gesture specification, as shown in formula (3):

and (4) representing the gesture image output result of the SVM to the ith frame, so that the palm gesture score is the average value of the gesture scores of the whole sequence.

The method comprises the steps that a depth image collected by a Kinect sensor is firstly processed through gesture detection, on one hand, pixel point searching is carried out in a palm region, on the other hand, a human body skeleton node image is adopted to judge the position of a palm node, and as the Kinect detects the skeleton node with low precision and the node is easy to drift, the size threshold of a rectangular search region cannot be set to be too small in order to avoid incomplete palm detection, in the experiment, when t is larger than or equal to 3, namely the size of the rectangular search region is larger than the distance between a palm node and a wrist node by 3 times, the gesture search effect is ideal at the moment. The setting of the distance difference threshold value threshold between the palm node and the gesture pixel region is also important for the gesture segmentation result, and when the threshold value is too small, the gesture segmentation is incomplete; when the threshold is set too large, the wrist and the like are easily separated. The experimental result shows that when threshold belongs to (15,25) mm, the gesture detection effect is ideal.

The Kinect sensor acquires image depth data as well as driver skeletal data. The driver completes a whole set of gesture actions including not only the gesture of the palm part but also the action of the arm part, so that the data acquired by the Kinect sensor should include a plurality of key skeleton node coordinate data of the palm, the wrist, the elbow, the shoulder center and the like. When a driver does different gesture motions, generally the relative position of the shoulder center node is basically kept unchanged, and the coordinate of the shoulder center node is set as P_s＝(x_s,y_s,z_s) And the remaining arm skeleton node coordinate is P_i＝(x_i,y_i,z_i) I is 1,2,3,4, so node P_iAnd shoulder center node P_sThe distance between them is:

within a certain time T, the motion sequence of the arm skeleton node is represented as (D)_si ¹,D_si ²,...,D_si ^T) The dynamic gesture optimal matching method based on the DTW algorithm has the advantages that i is 1,2,3 and 4, the DTW algorithm can be adopted to carry out the optimal matching of the dynamic gesture according to the arm skeleton node motion sequence, the DTW algorithm can solve the problem that the lengths of two time sequences are different, the method is very suitable for train driver dynamic gesture recognition, and the time sequence of a standard dynamic gesture sample of a driver is set to be X (D)_s ¹,D_s ²,...,D_s ^m) Test gesture time sequence Y ═ D (D)_t ¹,D_t ²,...,D_t ⁿ) Let the point-to-point relationship between the two sequences be (k) to (phi)_s(k),φ_t(k) Wherein 1 is less than or equal to phi_s(k)≤m，1≤φ_t(k) N is less than or equal to n, k is less than or equal to max (m, n) and less than or equal to m + n, the DTW algorithm aims to find the optimal point pair relation phi (k) between the two sequences, so that the sum of the distances between corresponding pointsMinimum, expressed as:

in the formula (I), the compound is shown in the specification,

the equation (5) is usually solved by a dynamic programming method to reduce the algorithm complexity.

And recognizing and evaluating the input dynamic gesture according to the obtained DTW distance. The recognition result of the dynamic gesture of the driver is consistent with the sample category with the minimum DTW distance in the standard dynamic gesture library, and is represented as follows:

in the formula, X_iThe method comprises the steps of inputting a standard dynamic gesture sample, inputting a dynamic gesture by Y, classifying the dynamic gesture sample by i, and classifying the finally recognized dynamic gesture by O.

The detection of the action of the palm area of the train driver is preferably taken as an example. The palm region action generally comprises four actions, namely extending out the index finger and the middle finger of a palm fist (a), tilting up the thumb of the palm fist (b), extending out the thumb and the small thumb of the palm fist (c), and stretching out the five fingers of the palm to be closed (d).

When a train driver does a gesture action, a palm usually generates corresponding deformation and rotation, therefore, when the gesture recognition is performed, the size normalization of the detected gesture is required, in order to reduce the influence of the gesture rotation problem in the sequence on the recognition effect, when an SVM classifier is trained offline, a rotation sample is participated in the classifier training, the robustness of the classifier is increased, and table 1 shows 4 palm gesture classification recognition rates when the train driver does a hand raising action

TABLE 1

As can be seen from table 1, when the method provided by the present application is used to detect a palm and a gesture, the detection rate is high, firstly, by determining a palm center node, a gesture pixel point is searched around the palm center node, which can effectively avoid the condition of missing detection, and by setting a distance difference threshold between a pixel around the gesture and the palm center pixel to Kinect, the possibility that the algorithm detects a wrist part can be reduced. In the aspect of recognition, palm gesture images in a plurality of gesture sequences are trained by using an SVM algorithm, so that the error recognition caused by the rotation of the images is reduced, and the average recognition rate of 4 palm gestures of a train driver can reach over 88%. For the gesture score, the average value of the sum of the confidence degrees of the gesture sequence recognition is the final gesture score, so that the standard degree of the palm gesture of the train driver can be effectively judged.

The arm actions of the train driver comprise a plurality of different gestures with different meanings, such as 4 dynamic gestures of arm lifting, forward arm, elbow bending and arm left-right swinging,

TABLE 2

Table 2 shows that the gesture recognition method provided by the present invention has an average recognition rate of 4 common dynamic gestures of over 85% for the recognition effect of 4 train driver arm actions, and the DTW algorithm adopted by the present invention is very suitable for train driver dynamic arm action recognition. The calculation of the arm action score can effectively improve the recognition degree and the accuracy of the algorithm by analyzing the relation between the DTW distance between the test gesture sequence and the standard gesture sequence in the sample library.

Table 3 shows the comparison between the classical HMM algorithm and the method of the present invention for the recognition effect of 4 kinds of arm dynamic gestures of the train driver, and it can be known from table 3 that the average recognition rate DTW algorithm is higher than the HMM algorithm by 4.3%, and since the length of the driver dynamic gesture sequence changes all the time, the DTW algorithm of the present invention can solve the matching problem of motion sequences of different lengths by the dynamic programming method. Therefore, compared with the HMM algorithm, the DTW algorithm is more suitable for processing the dynamic arm action recognition problem of the train driver.

TABLE 3

Meanwhile, according to simulation experiment results, the gesture action recognition system provided by the application has the advantages that through multiple tests, the result is reliable, the operation is stable, the operation speed can reach 25 frames per second, and the gesture action recognition system is very suitable for gesture recognition and normalization evaluation of train drivers.

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by a computer program, which can be stored in a computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. The storage medium may be a magnetic Disk, an optical Disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a Flash Memory (Flash Memory), a Hard Disk (Hard Disk Drive, abbreviated as HDD) or a Solid State Drive (SSD), etc.; the storage medium may also comprise a combination of memories of the kind described above.

Claims

1. Dynamic gesture recognition methodThe method is characterized in that: step S1, determining the position of the palm node, averaging the position coordinates of all white pixel points in a circle with the palm node as the center of the circle and the distance r between the palm node and the wrist node as the radius, and representing the position coordinates of the palm node by the average value so as to obtain the (x) of the palm node_p,y_p) The positions are as follows:

when the dynamic gesture recognition object is a palm gesture,

recognizing the segmented gesture image by adopting an SVM (support vector machine) algorithm and evaluating the gesture normalization; the SVM classification result is the confidence between the test gesture and the standard gesture, and can be used as an evaluation criterion of palm gesture specification, as shown in a formula:

2. The method of claim 1, wherein the dynamic gesture recognition further comprises recognition of an arm motion, and the recognition of an arm motion is specifically: key skeleton node coordinate number of data acquired by Kinect sensorAccordingly; the coordinate of the key bone node is P_s＝(x_s,y_s,z_s) And the remaining arm skeleton node coordinate is P_i＝(x_i,y_i,z_i) I is 1,2,3,4, so node P_iAnd key bone node P_sThe distance between them is:

within a certain time T, the motion sequence of the arm skeleton node is represented as (D)_si ¹,D_si ²,...,D_si ^T) And i is 1,2,3 and 4, and according to the arm skeleton node motion sequence, a DTW dynamic time warping algorithm can be adopted for performing dynamic gesture optimal matching.

3. The method of claim 2, wherein the key bone node is specifically one of: palm, wrist, elbow, shoulder center.

4. The method according to any one of claims 1-3, wherein the searching for gesture pixels in S2 specifically comprises:

5. The method of claim 3, wherein the DTW algorithm is specifically: the time sequence for obtaining the standard dynamic gesture sample is X ═ D_s ¹,D_s ²,...,D_s ^m) Test gesture time sequence Y ═ D (D)_t ¹,D_t ²,...,D_t ⁿ) Let the point-to-point relationship between the two sequences be (k) to (phi)_s(k),φ_t(k) Wherein 1 is less than or equal to phi_s(k)≤m，1≤φ_t(k) N is less than or equal to n, k is less than or equal to m + n, and finding the optimal point pair relation phi (k) between the two sequences so that the sum of the distances between the corresponding points is minimum, expressed as:

in the formula (I), the compound is shown in the specification,

6. A method as claimed in claim 5, wherein the size of the rectangular area is greater than 3 times the distance between the palm node and the wrist node.

7. The method of claim 6, wherein the gesture recognition result is consistent with the sample class with the smallest DTW distance in the standard dynamic gesture library, and is represented as:

in the formula, X_iThe method comprises the steps of obtaining a standard dynamic gesture sample, Y an input dynamic gesture, i a dynamic gesture sample category and O a finally recognized dynamic gesture category;

8. A computer storage medium having stored thereon a computer program for execution by a processor to perform the method of claims 1-7.