CN113705511A - Gesture recognition method and device - Google Patents

Gesture recognition method and device Download PDF

Info

Publication number
CN113705511A
CN113705511A CN202111028923.4A CN202111028923A CN113705511A CN 113705511 A CN113705511 A CN 113705511A CN 202111028923 A CN202111028923 A CN 202111028923A CN 113705511 A CN113705511 A CN 113705511A
Authority
CN
China
Prior art keywords
hand
gesture
convolution
feature
vector
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111028923.4A
Other languages
Chinese (zh)
Inventor
关本立
欧俊文
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ava Electronic Technology Co Ltd
Original Assignee
Ava Electronic Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ava Electronic Technology Co Ltd filed Critical Ava Electronic Technology Co Ltd
Priority to CN202111028923.4A priority Critical patent/CN113705511A/en
Publication of CN113705511A publication Critical patent/CN113705511A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Image Analysis (AREA)

Abstract

The invention relates to a gesture recognition method and device, which are used for inputting a hand shooting picture into a convolution feature extraction network after the hand shooting picture is obtained, and obtaining a plurality of convolution features in the convolution feature extraction network. Furthermore, reducing the dimension of each convolution feature to a corresponding dimension vector, splicing the corresponding dimension vectors to obtain a feature vector to be compared, and finally comparing the feature vector to be compared with a preset gesture library to obtain an identification result. The preset gesture library comprises gesture actions and multi-dimensional feature vectors corresponding to the gesture actions. Based on the method, through a predetermined preset gesture library, the feature vector to be compared is compared with the multi-dimensional feature vector to determine the gesture action corresponding to the hand shooting picture, and gesture recognition is accurately performed. Meanwhile, the detection accuracy of gesture recognition is guaranteed based on the characteristic that the convolutional feature extraction network can be trained in advance.

Description

Gesture recognition method and device
Technical Field
The invention relates to the technical field of image recognition, in particular to a gesture recognition method and device.
Background
Gesture recognition is a subject of computer science and language technology, with the aim of recognizing human gestures by mathematical algorithms. In the implementation process of gesture recognition, images of gestures need to be acquired, hand detection and gesture segmentation analysis are acquired according to the images, and then static or dynamic gesture recognition is performed.
In the video teaching application scene, gestures of teachers or students also need to be detected and recognized. The detection mode mainly comprises the steps of obtaining a static picture of a gesture, carrying out detection and identification, namely, single-stage detection and identification, and positioning and identifying the gesture in a characteristic regression mode; the other type is cascade type detection and identification, a candidate target area is positioned through image information, and then classification is carried out according to the information of the candidate area. However, the single-stage detection and recognition method can only be used for detecting the closed set gesture category, and the detection precision is low, while the cascade detection and recognition method has a large amount of calculation, which results in a slow processing speed.
Therefore, the traditional gesture recognition method based on the static pictures has the defects.
Disclosure of Invention
Therefore, it is necessary to provide a gesture recognition method and device for overcoming the defects of the conventional gesture recognition method based on still images.
A gesture recognition method comprising the steps of:
acquiring a hand shooting picture;
inputting the hand-shot picture into a convolution feature extraction network to obtain a plurality of convolution features in the convolution feature extraction network; wherein, the downsampling multiples corresponding to each convolution characteristic are different;
reducing the dimension of each convolution feature to a corresponding dimension vector respectively, and splicing the corresponding dimension vectors to obtain a feature vector to be compared;
comparing the feature vector to be compared with a preset gesture library to obtain a recognition result; the preset gesture library comprises gesture actions and multi-dimensional feature vectors corresponding to the gesture actions.
According to the gesture recognition method, after the hand shot picture is obtained, the hand shot picture is input into the convolution feature extraction network, and multiple convolution features in the convolution feature extraction network are obtained. Furthermore, reducing the dimension of each convolution feature to a corresponding dimension vector, splicing the corresponding dimension vectors to obtain a feature vector to be compared, and finally comparing the feature vector to be compared with a preset gesture library to obtain an identification result. The preset gesture library comprises gesture actions and multi-dimensional feature vectors corresponding to the gesture actions. Based on the method, through a predetermined preset gesture library, the feature vector to be compared is compared with the multi-dimensional feature vector to determine the gesture action corresponding to the hand shooting picture, and gesture recognition is accurately performed. Meanwhile, the pre-training characteristic of the network is extracted based on the convolution characteristic, so that the detection accuracy of gesture recognition is guaranteed, and meanwhile, the calculation amount is reduced conveniently.
In one embodiment, the process of obtaining a hand shot further comprises the steps of:
and carrying out image preprocessing on the hand-shot picture.
In one embodiment, the process of inputting a hand shot into a convolutional feature extraction network comprises the steps of:
outputting the hand shooting picture to a hand detection network to obtain the area frame coordinates containing the hand and the hand classification confidence coefficient in the hand shooting picture;
determining a hand detection area of the hand shot picture according to the area frame coordinates and the hand classification confidence;
and inputting the hand detection area into a convolution feature extraction network.
In one embodiment, the hand detection network includes a convolutional feature extraction sub-network and a multi-size feature fusion sub-network.
In one embodiment, the downsampling multiple comprises an nth power multiple of 2; wherein N is a natural number greater than 1.
In one embodiment, the corresponding dimension comprises a dimension to the power of M of 2; wherein M is a natural number greater than 6.
In one embodiment, the feature vector to be compared is a 256-dimensional vector.
A gesture recognition apparatus comprising:
the picture acquisition module is used for acquiring a hand shooting picture;
the picture transmission module is used for inputting the hand-shot picture into the convolution feature extraction network to obtain a plurality of convolution features in the convolution feature extraction network; wherein, the downsampling multiples corresponding to each convolution characteristic are different;
the vector acquisition module is used for respectively reducing the dimension of each convolution feature to a corresponding dimension vector and splicing the corresponding dimension vectors to obtain a feature vector to be compared;
the result comparison module is used for comparing the feature vector to be compared with a preset gesture library to obtain a recognition result; the preset gesture library comprises gesture actions and multi-dimensional feature vectors corresponding to the gesture actions.
According to the gesture recognition device, after the hand shooting picture is obtained, the hand shooting picture is input into the convolution feature extraction network, and multiple convolution features in the convolution feature extraction network are obtained. Furthermore, reducing the dimension of each convolution feature to a corresponding dimension vector, splicing the corresponding dimension vectors to obtain a feature vector to be compared, and finally comparing the feature vector to be compared with a preset gesture library to obtain an identification result. The preset gesture library comprises gesture actions and multi-dimensional feature vectors corresponding to the gesture actions. Based on the method, through a predetermined preset gesture library, the feature vector to be compared is compared with the multi-dimensional feature vector to determine the gesture action corresponding to the hand shooting picture, and gesture recognition is accurately performed. Meanwhile, the detection accuracy of gesture recognition is guaranteed based on the characteristic that the convolutional feature extraction network can be trained in advance.
A computer storage medium having computer instructions stored thereon, the computer instructions when executed by a processor implementing the gesture recognition method of any of the above embodiments.
After the hand shot picture is obtained, the hand shot picture is input into the convolution feature extraction network, and multiple convolution features in the convolution feature extraction network are obtained. Furthermore, reducing the dimension of each convolution feature to a corresponding dimension vector, splicing the corresponding dimension vectors to obtain a feature vector to be compared, and finally comparing the feature vector to be compared with a preset gesture library to obtain an identification result. The preset gesture library comprises gesture actions and multi-dimensional feature vectors corresponding to the gesture actions. Based on the method, through a predetermined preset gesture library, the feature vector to be compared is compared with the multi-dimensional feature vector to determine the gesture action corresponding to the hand shooting picture, and gesture recognition is accurately performed. Meanwhile, the detection accuracy of gesture recognition is guaranteed based on the characteristic that the convolutional feature extraction network can be trained in advance.
A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the gesture recognition method of any of the above embodiments when executing the program.
After the hand shot picture is obtained, the hand shot picture is input into the convolution feature extraction network, and multiple convolution features in the convolution feature extraction network are obtained. Furthermore, reducing the dimension of each convolution feature to a corresponding dimension vector, splicing the corresponding dimension vectors to obtain a feature vector to be compared, and finally comparing the feature vector to be compared with a preset gesture library to obtain an identification result. The preset gesture library comprises gesture actions and multi-dimensional feature vectors corresponding to the gesture actions. Based on the method, through a predetermined preset gesture library, the feature vector to be compared is compared with the multi-dimensional feature vector to determine the gesture action corresponding to the hand shooting picture, and gesture recognition is accurately performed. Meanwhile, the detection accuracy of gesture recognition is guaranteed based on the characteristic that the convolutional feature extraction network can be trained in advance.
Drawings
FIG. 1 is a flow diagram of a gesture recognition method according to an embodiment;
FIG. 2 is a flow chart of a gesture recognition method according to another embodiment;
FIG. 3 is a schematic diagram of a convolutional feature extraction network structure according to an embodiment;
FIG. 4 is a block diagram of a gesture recognition apparatus according to an embodiment;
FIG. 5 is a schematic diagram of an internal structure of a computer according to an embodiment.
Detailed Description
For better understanding of the objects, technical solutions and effects of the present invention, the present invention will be further explained with reference to the accompanying drawings and examples. Meanwhile, the following described examples are only for explaining the present invention, and are not intended to limit the present invention.
The embodiment of the invention provides a gesture recognition method.
Fig. 1 is a flowchart illustrating a gesture recognition method according to an embodiment, and as shown in fig. 1, the gesture recognition method according to an embodiment includes steps S100 to S103:
s100, acquiring a hand shooting picture;
s101, inputting a hand-shot picture into a convolution feature extraction network to obtain a plurality of convolution features in the convolution feature extraction network; wherein, the downsampling multiples corresponding to each convolution characteristic are different;
s102, reducing the dimension of each convolution feature to a corresponding dimension vector respectively, and splicing the corresponding dimension vectors to obtain a feature vector to be compared;
s103, comparing the feature vector to be compared with a preset gesture library to obtain a recognition result; the preset gesture library comprises gesture actions and multi-dimensional feature vectors corresponding to the gesture actions.
According to the hand shooting of the shooting object, a hand shooting picture is obtained. In execution, the obtained hand shot picture may be obtained from a shooting device or a storage device or the like, the hand shot picture including the gesture motion.
In one embodiment, fig. 2 is a flowchart of a gesture recognition method according to another embodiment, and as shown in fig. 2, the process of acquiring a hand picture in step S100 further includes step S200:
and S200, carrying out image preprocessing on the hand-shot picture.
By carrying out image preprocessing on the hand-shot picture, the subsequent image feature extraction is facilitated. Image pre-processing includes image cropping, size scaling or filtering noise reduction, etc. In one embodiment, the image preprocessing of the hand-taken picture in step S200 includes resizing the hand-taken picture to a set size. The set size is consistent with the picture size trained in advance by the convolution feature extraction network. As a preferred embodiment, the size of the hand shot is adjusted to a set size of 128 × 128.
In one embodiment, the image pre-processing further comprises data normalization processing.
In step S101, the hand-taken picture is input to a convolution feature extraction network to perform convolution feature extraction. The convolution feature extraction network is trained in advance, and gesture image samples of various gestures prepared in advance are trained. For example, 2 ten thousand gesture image samples are prepared, covering a plurality of human gestures, images are taken under various environments, then the images are classified according to different regions of gesture actions, and each gesture has 50-200 pictures as the gesture image samples. In the training process, a full-connection layer is added behind the network output feature vector, and output dimensionality of the full-connection layer is set according to the classification category number of the data set. The convolution feature extraction network model can be regarded as a classification network, the gesture image samples are input into the model for training, and the weight parameters of each layer of the model are continuously adjusted according to the output result of the model and the labels of the samples, so that the output result of the model is continuously close to the labels of the samples.
After training is finished, removing the last full-connection layer, inputting the hand-shot picture into a convolution feature extraction network, namely completing convolution feature extraction by the convolution feature extraction network and extracting gesture features; further, the vector acquisition of step S102 is completed.
In one embodiment, fig. 3 is a schematic structural diagram of a convolution feature extraction network according to an embodiment, and as shown in fig. 3, the convolution feature extraction network includes a feature extraction backbone network, a spatial attention residual extraction module, and a spatial channel attention module. The feature extraction backbone network is used for extracting convolution features, the spatial attention residual extraction module can improve the dimension of feature extraction, and the spatial channel attention module can adjust the contribution weights of different channels of the convolution features.
In one embodiment, as shown in fig. 2, the process of inputting the hand shot picture into the convolution feature extraction network in step S101 includes steps S201 to S203:
s201, outputting the hand shooting picture to a hand detection network, and obtaining the area frame coordinates containing the hand and the hand classification confidence in the hand shooting picture;
s202, determining a hand detection area of the hand shot picture according to the area frame coordinates and the hand classification confidence;
and S203, inputting the hand detection area into a convolution feature extraction network.
And outputting the hand shooting picture to a hand detection network, and extracting the coordinates of a region frame containing the hand and the hand classification confidence coefficient in the hand shooting picture so as to extract a hand detection region from the hand shooting picture. In one embodiment, the hand detection network includes a convolutional feature extraction sub-network and a multi-size feature fusion sub-network.
The hand detection network integrates convolution characteristics of different sizes, can identify hand positions of different sizes in an input hand shooting picture, detects hand targets of different sizes at any positions in the hand shooting picture, and does not limit the distance between a hand detection area and hand detection.
The output hand classification confidence is used for filtering the areas approximate to the hand features, and the calculation amount of subsequent gesture feature extraction is reduced. Setting a filtering threshold according to the actual use condition, and trying to detect as many hand regions as possible in the image, so that the confidence threshold can be reduced; the confidence threshold may be adjusted high to reduce the identified regions or only the region with the highest confidence may be output.
Similarly, before the hand detection area is input into the convolution feature extraction network, the size of the hand detection area is also adjusted to be the set size. Based on the method, irrelevant background information is removed through extraction of the hand detection area, so that the extracted picture feature vector has stronger representation on hand area data; the sizes are uniformly set, so that the feature vectors with the same dimensionality can be conveniently calculated by a convolution feature extraction network and used for subsequent gesture category comparison.
As shown in fig. 3, in the convolution feature extraction network, convolution features of each layer are extracted based on different downsampling multiples. In one embodiment, the downsampling multiple comprises an nth power multiple of 2; wherein N is a natural number greater than 1. As shown in fig. 3, downsampling multiples of N, i.e., 8 times convolution feature, 16 times convolution feature, and 32 times convolution feature, of 3, 4, and 5, respectively, are selected.
After determining the convolution characteristics of the down-sampling multiples, converting the convolution characteristics of each down-sampling multiple into corresponding dimension vectors. In one embodiment, the corresponding dimension comprises a dimension to the power of M of 2; wherein M is a natural number greater than 6. As shown in FIG. 3, M includes 6 and 7-8 times the convolution features and 16 times the convolution features are converted to 64-dimensional vectors and 32 times the convolution features are converted to 128-dimensional vectors.
And determining the corresponding dimension vectors, and splicing the corresponding dimension vectors to obtain the feature vectors to be compared. As shown in fig. 3, the 64-dimensional vector and the 128-dimensional vector are spliced into a 256-dimensional vector.
After the vector splicing is completed in step S102, the feature vector to be compared is compared with a preset gesture library. The preset gesture library comprises gesture actions and multi-dimensional feature vectors corresponding to the gesture actions.
And the dimension of the multi-dimensional characteristic vector corresponding to the gesture motion is the same as that of the characteristic vector to be compared. And finishing a preset gesture library by comparing the feature vector to be compared with the multi-dimensional feature vector, and determining the gesture action of the hand-shot picture corresponding to the feature vector to be compared as the recognition result according to the gesture action corresponding to the multi-dimensional feature vector with high similarity to the feature vector to be compared.
The gesture actions and the multidimensional feature vectors of the preset gesture library and the mapping relation of the gesture actions and the multidimensional feature vectors can be determined according to pre-training. In one embodiment, the preset gesture library comprises a gesture library feature matrix, and the gesture library feature matrix is composed of gesture actions and multi-dimensional feature vectors.
The gesture library feature matrix can be configured with a plurality of standard motion pictures with set sizes for each gesture according to a plurality of predefined gesture motions, and the convolution feature extraction is carried out through a convolution feature extraction network and converted into corresponding multidimensional feature vectors to form the gesture library feature matrix.
Based on the above, the receptive fields of the convolution characteristics are distinguished through the convolution characteristics of different downsampling multiples. The characteristic of small receptive field of shallow layer characteristics is utilized for distinguishing small difference of local areas of hand-shot pictures; the deep layer features have large receptive field and are used for distinguishing the difference of the overall gesture outline of the hand shooting picture. The embodiment of the convolutional sign extraction network is that 3 layers of convolutional features with different receptive field sizes are output.
The regions that need to be focused by different gesture actions are different, and for the case that the superficial layer feature receptive field is small, if the focus given by all the regions is the same, the gesture action classification may be not facilitated due to the excessively small difference value. And when shallow features are extracted, a spatial channel attention module is added, and the output proportion weight of the region is readjusted during training, so that the region with large contribution to the class distinction is heavier in weight, and the region with small contribution is smaller in weight.
In step S103, comparing the feature vector to be compared with the feature matrix of the gesture library, including the steps of:
calculating the similarity of the feature vector to be compared and each row of vectors in the feature matrix of the gesture library, and selecting the maximum similarity of the feature vector to be compared and the vectors in the feature matrix of the gesture library and the serial numbers of the rows of the feature vector to be compared; and judging whether the maximum similarity is larger than a set identification threshold value. When the maximum similarity is larger than the recognition threshold, outputting the serial number of the row corresponding to the maximum similarity, and determining the corresponding gesture action according to the serial number; otherwise, outputting the gesture unrecognized result identification.
In the gesture recognition method of any of the embodiments, after the hand-shot picture is obtained, the hand-shot picture is input into the convolution feature extraction network, and multiple convolution features in the convolution feature extraction network are obtained. Furthermore, reducing the dimension of each convolution feature to a corresponding dimension vector, splicing the corresponding dimension vectors to obtain a feature vector to be compared, and finally comparing the feature vector to be compared with a preset gesture library to obtain an identification result. The preset gesture library comprises gesture actions and multi-dimensional feature vectors corresponding to the gesture actions. Based on the method, through a predetermined preset gesture library, the feature vector to be compared is compared with the multi-dimensional feature vector to determine the gesture action corresponding to the hand shooting picture, and gesture recognition is accurately performed. Meanwhile, the detection accuracy of gesture recognition is guaranteed based on the characteristic that the convolutional feature extraction network can be trained in advance.
The embodiment of the invention also provides a gesture recognition device.
Fig. 4 is a block diagram of a gesture recognition apparatus according to an embodiment, and as shown in fig. 4, the gesture recognition apparatus according to an embodiment includes a block 100, a block 101, a block 102, and a block 103:
a picture acquiring module 100, configured to acquire a hand-shot picture;
the picture transmission module 101 is used for inputting the hand-shot picture into the convolution feature extraction network to obtain a plurality of convolution features in the convolution feature extraction network; wherein, the downsampling multiples corresponding to each convolution characteristic are different;
the vector obtaining module 102 is configured to reduce the dimension of each convolution feature to a corresponding dimension vector, and splice the corresponding dimension vectors to obtain feature vectors to be compared;
the result comparison module 103 is used for comparing the feature vector to be compared with a preset gesture library to obtain a recognition result; the preset gesture library comprises gesture actions and multi-dimensional feature vectors corresponding to the gesture actions.
According to the gesture recognition device, after the hand shooting picture is obtained, the hand shooting picture is input into the convolution feature extraction network, and multiple convolution features in the convolution feature extraction network are obtained. Furthermore, reducing the dimension of each convolution feature to a corresponding dimension vector, splicing the corresponding dimension vectors to obtain a feature vector to be compared, and finally comparing the feature vector to be compared with a preset gesture library to obtain an identification result. The preset gesture library comprises gesture actions and multi-dimensional feature vectors corresponding to the gesture actions. Based on the method, through a predetermined preset gesture library, the feature vector to be compared is compared with the multi-dimensional feature vector to determine the gesture action corresponding to the hand shooting picture, and gesture recognition is accurately performed. Meanwhile, the detection accuracy of gesture recognition is guaranteed based on the characteristic that the convolutional feature extraction network can be trained in advance.
The embodiment of the invention also provides a computer storage medium, on which computer instructions are stored, and the instructions are executed by a processor to implement the gesture recognition method of any one of the above embodiments.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware related to instructions of a computer program, which can be stored in a non-volatile computer-readable storage medium, and when executed, the computer program can include the processes of the embodiments of the methods described above. Any reference to memory, storage, database, or other medium used in the embodiments provided herein may include non-volatile and/or volatile memory, among others. Non-volatile memory can include read-only memory (ROM), Programmable ROM (PROM), Electrically Programmable ROM (EPROM), Electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), Double Data Rate SDRAM (DDRSDRAM), Enhanced SDRAM (ESDRAM), Synchronous Link DRAM (SLDRAM), Rambus Direct RAM (RDRAM), direct bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM).
Alternatively, the integrated unit of the present invention may be stored in a computer-readable storage medium if it is implemented in the form of a software functional module and sold or used as a separate product. Based on such understanding, the technical solutions of the embodiments of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a terminal, or a network device) to execute all or part of the methods of the embodiments of the present invention. And the aforementioned storage medium includes: a removable storage device, a RAM, a ROM, a magnetic or optical disk, or various other media that can store program code.
Corresponding to the computer storage medium, in one embodiment, a computer device is further provided, where the computer device includes a memory, a processor, and a computer program stored in the memory and executable on the processor, and the processor executes the computer program to implement any one of the gesture recognition methods in the embodiments.
The computer device may be a terminal, and its internal structure diagram may be as shown in fig. 5. The computer device includes a processor, a memory, a network interface, a display screen, and an input device connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program is executed by a processor to implement a gesture recognition method. The display screen of the computer equipment can be a liquid crystal display screen or an electronic ink display screen, and the input device of the computer equipment can be a touch layer covered on the display screen, a key, a track ball or a touch pad arranged on the shell of the computer equipment, an external keyboard, a touch pad or a mouse and the like.
After the hand shooting picture is obtained, the hand shooting picture is input into the convolution feature extraction network, and multiple convolution features in the convolution feature extraction network are obtained. Furthermore, reducing the dimension of each convolution feature to a corresponding dimension vector, splicing the corresponding dimension vectors to obtain a feature vector to be compared, and finally comparing the feature vector to be compared with a preset gesture library to obtain an identification result. The preset gesture library comprises gesture actions and multi-dimensional feature vectors corresponding to the gesture actions. Based on the method, through a predetermined preset gesture library, the feature vector to be compared is compared with the multi-dimensional feature vector to determine the gesture action corresponding to the hand shooting picture, and gesture recognition is accurately performed. Meanwhile, the detection accuracy of gesture recognition is guaranteed based on the characteristic that the convolutional feature extraction network can be trained in advance.
The technical features of the above embodiments can be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the above embodiments are not described, but should be considered as the scope of the present specification as long as there is no contradiction between the combinations of the technical features.
The above examples only show some embodiments of the present invention, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the inventive concept, which falls within the scope of the present invention. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims (10)

1. A gesture recognition method, comprising the steps of:
acquiring a hand shooting picture;
inputting the hand-shot picture into a convolution feature extraction network to obtain a plurality of convolution features in the convolution feature extraction network; the downsampling multiples corresponding to the convolution characteristics are different;
reducing the dimension of each convolution feature to a corresponding dimension vector respectively, and splicing the corresponding dimension vectors to obtain a feature vector to be compared;
comparing the feature vector to be compared with a preset gesture library to obtain a recognition result; the preset gesture library comprises gesture actions and multi-dimensional feature vectors corresponding to the gesture actions.
2. The gesture recognition method according to claim 1, wherein the process of obtaining a hand shot further comprises the steps of:
and carrying out image preprocessing on the hand shooting picture.
3. The gesture recognition method according to claim 1, wherein the process of inputting the hand shot picture into a convolutional feature extraction network comprises the steps of:
outputting the hand shooting picture to a hand detection network to obtain the area frame coordinates containing the hand and the hand classification confidence coefficient in the hand shooting picture;
determining a hand detection region of the hand shot picture according to the region frame coordinates and the hand classification confidence;
and inputting the hand detection area into the convolution feature extraction network.
4. The gesture recognition method of claim 3, wherein the hand detection network comprises a convolutional feature extraction sub-network and a multi-dimensional feature fusion sub-network.
5. The gesture recognition method of claim 1, wherein the downsampling multiple comprises an nth power multiple of 2; wherein N is a natural number greater than 1.
6. The gesture recognition method according to claim 1, wherein the corresponding dimension includes an M-th power dimension of 2; wherein M is a natural number greater than 6.
7. The gesture recognition method according to any one of claims 1 to 6, wherein the feature vector to be compared is a 256-dimensional vector.
8. A gesture recognition apparatus, comprising:
the picture acquisition module is used for acquiring a hand shooting picture;
the picture transmission module is used for inputting the hand-shot picture into a convolution feature extraction network to obtain a plurality of convolution features in the convolution feature extraction network; the downsampling multiples corresponding to the convolution characteristics are different;
the vector acquisition module is used for respectively reducing the dimension of each convolution feature to a corresponding dimension vector and splicing the corresponding dimension vectors to obtain a feature vector to be compared;
the result comparison module is used for comparing the feature vector to be compared with a preset gesture library to obtain a recognition result; the preset gesture library comprises gesture actions and multi-dimensional feature vectors corresponding to the gesture actions.
9. A computer storage medium having computer instructions stored thereon, wherein the computer instructions, when executed by a processor, implement a gesture recognition method according to any one of claims 1 to 7.
10. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the gesture recognition method according to any one of claims 1 to 7 when executing the program.
CN202111028923.4A 2021-09-02 2021-09-02 Gesture recognition method and device Pending CN113705511A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111028923.4A CN113705511A (en) 2021-09-02 2021-09-02 Gesture recognition method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111028923.4A CN113705511A (en) 2021-09-02 2021-09-02 Gesture recognition method and device

Publications (1)

Publication Number Publication Date
CN113705511A true CN113705511A (en) 2021-11-26

Family

ID=78657797

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111028923.4A Pending CN113705511A (en) 2021-09-02 2021-09-02 Gesture recognition method and device

Country Status (1)

Country Link
CN (1) CN113705511A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117612247A (en) * 2023-11-03 2024-02-27 重庆利龙中宝智能技术有限公司 Dynamic and static gesture recognition method based on knowledge distillation

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2980728A1 (en) * 2014-08-01 2016-02-03 Imersivo, S.L. Procedure for identifying a hand gesture
CN105893959A (en) * 2016-03-30 2016-08-24 北京奇艺世纪科技有限公司 Gesture identifying method and device
CN109214250A (en) * 2017-07-05 2019-01-15 中南大学 A kind of static gesture identification method based on multiple dimensioned convolutional neural networks
CN109359538A (en) * 2018-09-14 2019-02-19 广州杰赛科技股份有限公司 Training method, gesture identification method, device and the equipment of convolutional neural networks
CN109902577A (en) * 2019-01-25 2019-06-18 华中科技大学 A kind of construction method of lightweight gestures detection convolutional neural networks model and application
WO2019201035A1 (en) * 2018-04-16 2019-10-24 腾讯科技(深圳)有限公司 Method and device for identifying object node in image, terminal and computer readable storage medium
CN111950460A (en) * 2020-08-13 2020-11-17 电子科技大学 Muscle strength self-adaptive stroke patient hand rehabilitation training action recognition method
CN112464860A (en) * 2020-12-10 2021-03-09 深圳市优必选科技股份有限公司 Gesture recognition method and device, computer equipment and storage medium
CN112766028A (en) * 2019-11-06 2021-05-07 深圳云天励飞技术有限公司 Face fuzzy processing method and device, electronic equipment and storage medium
CN113033398A (en) * 2021-03-25 2021-06-25 深圳市康冠商用科技有限公司 Gesture recognition method and device, computer equipment and storage medium

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2980728A1 (en) * 2014-08-01 2016-02-03 Imersivo, S.L. Procedure for identifying a hand gesture
CN105893959A (en) * 2016-03-30 2016-08-24 北京奇艺世纪科技有限公司 Gesture identifying method and device
CN109214250A (en) * 2017-07-05 2019-01-15 中南大学 A kind of static gesture identification method based on multiple dimensioned convolutional neural networks
WO2019201035A1 (en) * 2018-04-16 2019-10-24 腾讯科技(深圳)有限公司 Method and device for identifying object node in image, terminal and computer readable storage medium
CN109359538A (en) * 2018-09-14 2019-02-19 广州杰赛科技股份有限公司 Training method, gesture identification method, device and the equipment of convolutional neural networks
CN109902577A (en) * 2019-01-25 2019-06-18 华中科技大学 A kind of construction method of lightweight gestures detection convolutional neural networks model and application
CN112766028A (en) * 2019-11-06 2021-05-07 深圳云天励飞技术有限公司 Face fuzzy processing method and device, electronic equipment and storage medium
CN111950460A (en) * 2020-08-13 2020-11-17 电子科技大学 Muscle strength self-adaptive stroke patient hand rehabilitation training action recognition method
CN112464860A (en) * 2020-12-10 2021-03-09 深圳市优必选科技股份有限公司 Gesture recognition method and device, computer equipment and storage medium
CN113033398A (en) * 2021-03-25 2021-06-25 深圳市康冠商用科技有限公司 Gesture recognition method and device, computer equipment and storage medium

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
冯家文: "双通道卷积神经网络在静态手势识别中的应用", 《计算机工程与应用》, no. 2018, 24 August 2017 (2017-08-24), pages 148 - 152 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117612247A (en) * 2023-11-03 2024-02-27 重庆利龙中宝智能技术有限公司 Dynamic and static gesture recognition method based on knowledge distillation

Similar Documents

Publication Publication Date Title
US10467459B2 (en) Object detection based on joint feature extraction
CN110909651B (en) Method, device and equipment for identifying video main body characters and readable storage medium
CN109165589B (en) Vehicle weight recognition method and device based on deep learning
KR101896357B1 (en) Method, device and program for detecting an object
CN111709313B (en) Pedestrian re-identification method based on local and channel combination characteristics
CN107784288A (en) A kind of iteration positioning formula method for detecting human face based on deep neural network
CN111612822B (en) Object tracking method, device, computer equipment and storage medium
CN111832581B (en) Lung feature recognition method and device, computer equipment and storage medium
JP6756406B2 (en) Image processing equipment, image processing method and image processing program
CN113255557B (en) Deep learning-based video crowd emotion analysis method and system
CN112861785B (en) Instance segmentation and image restoration-based pedestrian re-identification method with shielding function
CN112668374A (en) Image processing method and device, re-recognition network training method and electronic equipment
CN111259823A (en) Pornographic image identification method based on convolutional neural network
CN111951283A (en) Medical image identification method and system based on deep learning
CN112541394A (en) Black eye and rhinitis identification method, system and computer medium
CN112668462A (en) Vehicle loss detection model training method, vehicle loss detection device, vehicle loss detection equipment and vehicle loss detection medium
Lahiani et al. Hand pose estimation system based on Viola-Jones algorithm for android devices
CN110717407A (en) Human face recognition method, device and storage medium based on lip language password
Andiani et al. Face recognition for work attendance using multitask convolutional neural network (MTCNN) and pre-trained facenet
CN113705511A (en) Gesture recognition method and device
CN113378852A (en) Key point detection method and device, electronic equipment and storage medium
Patil et al. Techniques of deep learning for image recognition
CN117152625A (en) Remote sensing small target identification method, system, equipment and medium based on CoordConv and Yolov5
Dong et al. A supervised dictionary learning and discriminative weighting model for action recognition
CN113705489B (en) Remote sensing image fine-granularity airplane identification method based on priori regional knowledge guidance

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination