CN109376730B

CN109376730B - Gesture recognition method and device

Info

Publication number: CN109376730B
Application number: CN201811642237.4A
Authority: CN
Inventors: 曾宪威; 谢语谦; 夏至贤; 张凌; 郭华龙; 吴柏翰; 李永文
Original assignee: Longyan University
Current assignee: Longyan University
Priority date: 2018-12-29
Filing date: 2018-12-29
Publication date: 2021-07-16
Anticipated expiration: 2038-12-29
Also published as: CN109376730A

Abstract

The invention discloses a gesture recognition method and a gesture recognition device, wherein the method comprises the following steps: reducing the resolution of the image to be identified into a first image; performing skin color detection processing according to the first image to obtain a second image with a skin color area; performing morphological operation on the second image to obtain a third image after filtering processing; obtaining foreground information of the third image, and obtaining a fourth image with a foreground complexion block according to the foreground information and the third image; acquiring an image area where the arm part is located according to the horizontal and vertical texture information of the fourth image, and intercepting an arm part image according to the image area where the arm part is located; and acquiring corresponding gesture information according to the finger image information of the arm partial image. According to the technical scheme, the system operation amount is reduced by reducing the resolution, then the human face area and the hand area are quickly separated through transportation, the use amount of the memory is reduced, and the difficulty of the embedded system is also reduced.

Description

Gesture recognition method and device

Technical Field

The invention relates to the technical field of gesture recognition, in particular to a gesture recognition method and device.

Background

With the development of science and technology, motion sensing detection is widely applied to various devices in recent years, images are acquired through a photographic lens, and then the images are analyzed to calculate human actions so as to achieve the purpose of operating a system, namely man-machine interaction, wherein gestures are used for controlling machines most conveniently and widely used, but the operation time spent by the existing gesture recognition method and the use amount of a memory are large, so that an algorithm is not easy to realize in an embedded system.

Disclosure of Invention

Therefore, it is desirable to provide a gesture recognition method and device, which solve the problems of the existing gesture recognition method that the operation time is long and the usage amount of the memory is large.

To achieve the above object, the inventors provide a gesture recognition method,

reducing the resolution of the image to be identified into a first image;

performing skin color detection processing according to the first image to obtain a second image with a skin color area;

performing morphological operation on the second image to obtain a third image after filtering processing;

obtaining foreground information of the third image, and obtaining a fourth image with a foreground complexion block according to the foreground information and the third image;

acquiring an image area where the arm part is located according to the horizontal and vertical texture information of the fourth image, and intercepting an arm part image according to the image area where the arm part is located;

and acquiring corresponding gesture information according to the finger image information of the arm partial image.

Further, the step of reducing the resolution of the image to be recognized to be the first image comprises the following steps:

and reducing the resolution of the image to be identified into a first image through discrete wavelet transformation.

Further, the skin color detection processing according to the first image to obtain a second image with a skin color area includes the following steps:

the method comprises the following steps of detecting a skin color block by making an HSV color space according to a first image, converting an original RGB color space of the first image into the HSV color space, and marking the skin color according to a range formula of the skin color corresponding to hue H and saturation S in the HSV color space, wherein the range formula is as follows:

wherein, x and y respectively represent the abscissa and the ordinate of the pixel point, and Skin represents the Skin color value.

Further, the performing morphological operation on the second image to obtain a third image after filtering processing includes:

and performing contraction and expansion processing on the second image to obtain a third image.

Further, the acquiring foreground information of the third image includes: and after the background of the third image is obtained through a Codebook algorithm, separating to obtain foreground information of the third image.

Further, the step of obtaining a fourth image with a foreground skin color block according to the foreground information and the third image comprises:

and performing logic and operation of pixel points according to the background information and the third image to obtain a fourth image of the foreground complexion block.

Further, the step of obtaining the image area where the arm part is located according to the horizontal and vertical texture information of the fourth image comprises the steps of:

counting the number of robust texture points in each communication area of the fourth image;

the face area with the number of the robust texture points larger than the threshold value is used as the face area;

the hand area with the number of the robust texture points less than or equal to the threshold value is the hand area.

Further, the robust texture point is obtained by the following formula:

wherein, Label (x, y) is a strong texture point mark, S (x, y) is a texture point mark, and x and y respectively represent the abscissa and the ordinate of the pixel point;

the texture point mark is obtained by the following formula:

wherein, G is a Sobel operator, HL (x, y) is high-low frequency information after two-dimensional discrete conversion, G (x, y) pixel point is gray, and the formula is as follows:

th0 and Th1 are set thresholds.

Further, the step of acquiring corresponding gesture information according to the finger image information of the arm partial image includes:

acquiring a palm image of the arm partial image;

acquiring the length of a finger and the position of a fingertip;

and acquiring gesture information according to the length of the finger and the position of the fingertip.

The invention provides a gesture recognition device, which comprises a memory and a processor, wherein the memory is stored with a computer program, and the computer program realizes the steps of the method according to any item when being executed by the processor.

Compared with the prior art, the technical scheme reduces the system operation amount by reducing the resolution ratio, and then quickly separates the human face area from the hand area by transportation, so that the use amount of a memory is reduced, and the difficulty of realizing an embedded system is also reduced.

Drawings

FIG. 1A is a diagram of low and low frequency image information of an image subjected to discrete wavelet transform processing according to an embodiment;

FIG. 1B is a diagram illustrating low and high frequency image information of an image subjected to discrete wavelet transform according to an embodiment;

FIG. 1C is a diagram of high and low frequency image information of an image subjected to discrete wavelet transform processing according to an embodiment;

FIG. 2A is a reduced resolution image according to an embodiment;

FIG. 2B is an image marked according to skin color according to an embodiment;

FIG. 3 is an image after expansion and contraction processing;

fig. 4 is an image after the foreground information acquisition process;

FIG. 5 is a diagram of an image after a foreground information processed image and an image after an expansion and contraction process are logically operated;

FIG. 6 is a schematic view of a captured arm image being rotated;

FIG. 7A is a schematic view of a hand feature;

FIG. 7B is another schematic view of a hand feature;

FIG. 7C is a schematic view of finding the end of a finger;

FIG. 7D is a cut-away view of a hand;

FIG. 8A is a schematic view of a wrist and a wrist center point;

FIG. 8B is a schematic view of a semicircle drawn by the center point of the wrist;

FIG. 8C is a schematic view of a semicircle drawn at 1/4 offset left and right by the wrist center point wrist width;

FIG. 9A is a schematic diagram of gesture 2;

FIG. 9B is a schematic diagram of gesture 6;

FIG. 9C is a schematic diagram of gesture 7;

FIG. 9D is a schematic diagram of gesture 3;

FIG. 9E is a schematic diagram of gesture 8;

FIG. 9F is a schematic diagram of gesture 4;

FIG. 9G is a schematic diagram of gesture 9;

Detailed Description

To explain technical contents, structural features, and objects and effects of the technical solutions in detail, the following detailed description is given with reference to the accompanying drawings in conjunction with the embodiments.

Referring to fig. 1, the present embodiment provides a gesture recognition method, which first reduces the resolution of an image to be recognized into a first image. In order to increase the image processing speed, the image is usually reduced to reduce the subsequent processing time. Generally, the commonly used methods can be divided into three methods, namely, a Down-sampling method, a 2 × 2 averaging filter method, and a Discrete Wavelet Transform (DWT), which can reduce the resolution to achieve the effect of reducing the amount of computation, wherein the Discrete Wavelet Transform is widely used because it has excellent characteristics of energy concentration and multiple analysis, can process different components separately, can retain the energy characteristics of the original image after reducing the resolution by a large amount, not only can reduce the image resolution, but also can achieve the acquisition of high and low frequency information of the image.

After the discrete wavelet transform processing, a degraded image can be obtained as shown in fig. 1A, where fig. 1A is low and low frequency information, fig. 1B is low and high frequency information, that is, vertical texture, and fig. 1C is high and low frequency information, that is, horizontal texture.

And then, carrying out skin color detection processing according to the first image to obtain a second image with a skin color area. Skin color is a very obvious feature for a human being, and is easy to distinguish the human body from other objects. There are many methods for skin color identification, and the most important difference of skin color of people of all races is brightness, so if brightness is omitted, skin color particle groups of people of all races have considerable clustering performance and can achieve quite high identification rate, so this embodiment uses HSV color space to identify skin color blocks. The HSV color space divides the original RGB color space into three dimensions of h (hue), s (saturation), and V (value) lightness, and by converting the original RGB color space into HSV color space, the factor affecting the greatest lightness (V) is separated, and the skin color is marked by using the range of the skin color corresponding to h (hue), s (saturation) as formula (1).

As shown in fig. 2A and 2B, fig. 2A is an image with reduced resolution, and fig. 2B is an image marked according to skin color, wherein white is a preliminarily obtained skin color region. It can be seen that although the representative characteristic feature of the human body is skin color, there are many skin color objects around the life, and these skin color objects and noise will affect the experimental result of this embodiment, so the embodiments in the following steps will correct the identified skin color blocks.

And then, performing morphological operation on the second image to obtain a third image after filtering processing. The principle of morphological operation is to use the image and the structural elements to do set operation to generate a new filtering image, which has the advantage that unnecessary miscellaneous points can be filtered out and the objects which should not be connected can be separated after the expansion and contraction operation in the morphology. Dilation is the process of merging all background points in contact with an object into the object, expanding the boundary outward. Can be used to fill in voids in objects. The algorithm of inflation: each pixel of the image is scanned with a 3x3 structuring element, and the structuring element is anded with the binary image it overlays, with the resulting pixel of the image being 0 if both are 0. Otherwise it is 1. As a result: the binary image is enlarged by one turn. Shrinking is a process of eliminating boundary points and shrinking boundaries inward. Can be used to eliminate small and meaningless objects. The algorithm of the contraction is as follows: with a 3 × 3 structure element, each pixel of the scanned image is anded with the structure element and its overlaid binary image if both are 1, resulting in that pixel of the image being 1. Otherwise it is 0. As a result: the binary image is reduced by one turn. As shown in fig. 3, after the filtering process by expansion and contraction, the noise is greatly reduced.

And obtaining foreground information of the third image, and obtaining a fourth image with a foreground complexion block according to the foreground information and the third image. In order to capture the foreground, the present embodiment needs a background model as a reference. In the method for establishing the background, the background model is established in the embodiment by adopting a Codebook background modeling mode. The Codebook algorithm is different from the prior background construction method, mainly adopts the quantization clustering technology to model a background model under a complex scene, utilizes independent pixel points in an image sequence as a basis, expresses each pixel by Codebook after being subjected to quality improvement, samples the color and the brightness of each pixel point, and compares the color distance and the brightness in a color model to judge whether the pixel point belongs to the background or not. If the judgment result is the background, quantizing into various groups of coded, and storing the background characteristics by taking the pixel points as units, wherein the codebooks of all the pixel points form a complete background. Different from a Gaussian mixture method, the probability distribution is calculated, and the Codebook only needs to calculate the color distance and the brightness range of the pixel points, so that the calculation method is small in complexity and small in used memory, can quickly extract the background from the image with the foreground moving object in the initialization process, adaptively updates and compresses the background model, can process local or global illumination change, and is very suitable for real-time image processing, and the processed image is shown in FIG. 4. After the foreground information is obtained, the human-like skin color blocks belonging to the background are filtered, after the foreground part is separated by the above Codebook algorithm, the foreground part and the skin color blocks of the third image are subjected to logic and operation of pixel by pixel, so as to obtain the foreground skin color blocks, as shown in fig. 5.

And acquiring an image area where the arm part is located according to the horizontal and vertical texture information of the fourth image, and intercepting the arm part image according to the image area where the arm part is located. From the experimental results, it can be found that the texture except the edges is not visible, and the texture characteristics of the human face and the interior of the palm are quite different. Since the horizontal texture information inside the face is quite rich, the horizontal texture intensity inside the connected block will be calculated in this embodiment. After the discrete wavelet transform processing in the step of reducing the image resolution, information of low and high frequencies and information of high and low frequencies can be obtained. The information of low frequency and high frequency respectively represents the vertical texture and the horizontal texture, and the horizontal texture information in the face texture is rich, and the vertical texture is very little, so that the horizontal texture is analyzed in the subsequent processing. Initially, in order to avoid interference caused by the texture information with weak intensity, the present embodiment first uses equation (3) to check the high and low frequency information after discrete wavelet transform, and if the value of HL (x, y) is smaller than the threshold Th0, it represents that the horizontal energy intensity is weak, and if it is negligible, it is represented by HL (x, y) being 255. Further confirmation is made on the grayscale image using the horizontal Sobel operator (Sobel) mask G (2) for the HL (x, y) ═ 255 position, as in equation (4). If the value convolved by the mask G is greater than the set threshold Th1, S (x, y) ═ 1 is determined as a texture point, and finally, based on that the horizontal texture should be continuous, it should not be a single point, so for a point whose S (x, y) ═ 1, if one of its S (x, y ± 1) meets the above condition, Label (x, y) ═ 1 as a strong horizontal texture point, as in formula (5), the method of this embodiment can avoid dropping texture points of many single points, so as to obtain more accurate horizontal texture information.

g (x, y) is a gray scale image; HL (x, y) is each pixel value of the original image after transformation processing; s (x, y) is a texture point mark; label (x, y) is a robust texture point marker.

Because the surface of the palm is smooth, texture information is less relative to the face, and the texture characteristics of the face and the hand are quite different, in the embodiment, the number of robust texture points in each connected block is counted, the hand and the face are distinguished by using the difference of the number of the texture points in the hand and the face, and in order to enable the recognition rate to be more accurate, a table look-up curve is established according to the relationship between the size of the face skin color block and the number of the robust texture points to serve as a threshold value. And when the number of the texture points is larger than the set threshold value, judging the texture points as faces.

And finally, acquiring corresponding gesture information according to the finger image information of the arm part image. In the hand recognition portion, the input hand image may be skewed, which is not favorable for the determination of the present embodiment. The gesture can be rotated firstly, then the characteristics of the gesture can be analyzed to find out the wrist and the joint of the finger and the palm, the part except the palm is cut off, and finally the finger head is distinguished by using the characteristic data obtained by characteristic analysis to be recognized so as to obtain a gesture recognition result. The rotating arm needs to know the angle of the arm skew first, and the embodiment uses a least square method (least square estimation), assuming a regression line and minimizing the sum of squares of errors between all discrete points and the regression line, and when the regression line is found, the slope and the rotation angle can be known. Then, the rotated graph can be obtained by substituting the rotation angle by using the formula (6), as shown in fig. 6, the left half part is the rotation money, and the right half part is the rotated image.

Most gestures are exhibited by the cooperation of the palm and the fingers, so a skin color zone below the wrist is unnecessary, so in order to reduce the complexity of the operation, the embodiment needs to cut off the part below the wrist and keep the palm part, fig. 7A and 7B are schematic diagrams of the human palm, and it can be found from fig. 7A that the palm part has a nearly square feature, and this characteristic is helpful for the next step of separating the palm. Further extending from FIG. 7A to the finger portion, as shown in FIG. 7B, it can be seen that the finger length and the palm length are very similar, and the ratio is roughly between 1-1.4. Therefore, in order to cut off the portion under the palm, the present embodiment must know the length of the finger, i.e., the length (finger length) of the finger in fig. 7B. To know the length of the finger, it is first required to search downward from any finger in fig. 7C, since the connection between the finger end and the finger suddenly increases in the width in the transverse direction, the position of the finger end is determined by this characteristic, and the finger length can be obtained, as shown by the second line in fig. 7D. Since the ratio of the Finger length to the wrist length is approximately 1 to 1.4, the wrist position is determined by a downward (Finger length × 1.2) pixels based on the Finger end in this embodiment, as shown by the third line in fig. 7D.

After the above steps, the area where the arm is removed is obtained, and information such as the width of the wrist, the center point of the wrist, and the like is known, it can be found from the schematic diagram of fig. 8B that the palm can be distinguished if a circle with a radius of 1.2 times the length of the palm (finger length) is drawn upward with the center point of the wrist as the center of the circle, in this embodiment, the radius is increased to 1.35 times the length of the palm, and the finger can be separated as shown by the arc line in fig. 8C, but an error may occur in the thumb. Therefore, in this embodiment, the semi-circles are drawn from the wrist center point to the left and right at positions 1/4 of the wrist width (i.e. the width of the bottom of the wrist), so as to avoid the omission of the thumb, as shown by the two arc lines in fig. 8C, two inner semi-arc lines are taken.

And finally, recognizing the gesture according to the hand index. Different numbers of fingers represent different gestures, and a hand index of 1 may only be gesture 1; hand index 2 may be gestures 2, 6, 7; hand index 3 may be gestures 3, 8; hand index 4 may be gestures 4, 9; hand index 5 is only possible for gesture 5. Thus, samples with hand indices of 2, 3, and 4 were left. First, referring to fig. 9B, the upper block of the arc is a separated finger, and for each block, the coordinates of its highest point are recorded, and then the highest point and the lowest point are found from the coordinates, the horizontal line in the figure uses the vertex of the semicircle as the height limitation, and the arc line uses the center point of the wrist as the basis for left-right segmentation. For hand index 2, FIG. 9A is a schematic diagram of gesture two, where it can be seen that the highest point and the lowest point are both above the yellow line; FIG. 9B shows a sixth gesture, in which the lowest point is necessarily below the horizontal line, and the highest point and the lowest point are necessarily on the left and right sides of the center point of the palm, i.e., on the left and right sides of the blue line; FIG. 9C shows gesture seven, where the lowest point is necessarily below the horizontal line and the highest point and the lowest point are necessarily on the same side. The hand index is 3, and may be gesture 3, gesture 8. Gesture 3 is shown in FIG. 9D, where the highest point and the lowest point are both above the horizontal line; gesture 8 is shown in FIG. 9E, where the highest point and the lowest point are on both sides of the horizontal line. The hand index is 4, possibly gesture 4, gesture 9. Gesture 4 is shown in FIG. 9F, where the highest point and the lowest point are both above the horizontal line; gesture 9 is shown in FIG. 9G, in which the highest point and the lowest point are on both sides of the horizontal line.

In the embodiment, the highest point and the lowest point are respectively compared with the wrist middle point and the radius of the circle and other information found in the previous step through the marked finger vertex coordinates, and the gesture can be judged by utilizing the relative position through the information, so that the gesture cannot be changed due to the distance, and the judgment error caused by the distance difference between the distance and the lens can be avoided.

Finally, experiments were performed using the method of the present invention. The resolution of the test image is 640 × 480, in this embodiment, 500 test images are used to test the face and hand recognition rate, and 9000 test images are used to test the gesture accuracy of 1-9, and the experimental results are shown in table one and table two. From the data in table two, it can be seen that face recognition has a relatively high accuracy. Referring to the second and third tables, the gesture recognition system of the embodiment is not only fast, but also has a gesture recognition rate of 1-9 reaching about nine times. Table three details the execution speed of each stage of the system, and the execution speed of the whole system is about 30fps, which can reach the real-time degree.

Gesture	Total number of sheets	Number of successful sheets	Number of failed sheets	Recognition success rate
					1	1000	983	17	98.3％
2	1000	991	9	99.1％
					3	1000	992	8	99.2％
4	1000	971	29	97.1％
					5	1000	973	27	97.3％
6	1000	961	39	96.1％
					7	1000	947	53	94.7％
8	1000	980	20	98.0％
					9	1000	981	19	98.1％

Table 1: gesture recognition result

Total number of sheets	Number of successful sheets	Number of failed sheets	Recognition success rate
				500	481	19	96.2％

Table 2: face recognition result

Table 3: execution time

The invention provides a method for quickly distinguishing the hand face blocks by discrete wavelet transform and a stable gesture recognition mode. The experimental result shows that the method provided by the embodiment can greatly reduce the computation of the system and can accurately distinguish the face and hand blocks. In the hand recognition part, the embodiment provides a simple and stable method for distinguishing the finger palm, and the recognition rate can be found to be as high as ninety percent from the experimental result. In the subsequent development, the method used in this embodiment does not use a complex sensing component, and the discrete wavelet transform combined with the system of this embodiment is easier to implement in hardware because the operation is simpler.

It should be noted that, although the above embodiments have been described herein, the invention is not limited thereto. Therefore, based on the innovative concepts of the present invention, the technical solutions of the present invention can be directly or indirectly applied to other related technical fields by making changes and modifications to the embodiments described herein, or by using equivalent structures or equivalent processes performed in the content of the present specification and the attached drawings, which are included in the scope of the present invention.

Claims

1. A gesture recognition method, characterized by:

reducing the resolution of the image to be identified into a first image;

acquiring corresponding gesture information according to the finger image information of the arm partial image;

the step of obtaining the image area where the arm part is located according to the horizontal and vertical texture information of the fourth image comprises the following steps:

the hand area with the number of the robust texture points less than or equal to the threshold value is used;

the robust texture point is obtained by the following formula:

the texture point mark is obtained by the following formula:

th0 and Th1 are set thresholds.

2. The gesture recognition method according to claim 1, wherein the step of reducing the resolution of the image to be recognized to the first image comprises the following steps:

3. The gesture recognition method according to claim 1, wherein the performing morphological operation on the second image to obtain the third image after filtering comprises:

4. The gesture recognition method according to claim 1, wherein the step of obtaining foreground information of the third image comprises the steps of: and after the background of the third image is obtained through a Codebook algorithm, separating to obtain foreground information of the third image.

5. The method for recognizing the gesture according to claim 1, wherein the step of obtaining the fourth image with the foreground skin color area according to the foreground information and the third image comprises the steps of:

6. The gesture recognition method according to claim 1, wherein the step of obtaining corresponding gesture information according to the finger image information of the arm partial image comprises the steps of:

acquiring a palm image of the arm partial image;

acquiring the length of a finger and the position of a fingertip;

7. A gesture recognition apparatus, characterized in that: comprising a memory, a processor, said memory having stored thereon a computer program which, when being executed by the processor, carries out the steps of the method according to any one of claims 1 to 6.