CN117523604A

CN117523604A - Gesture recognition method, gesture recognition device, electronic equipment and computer readable storage medium

Info

Publication number: CN117523604A
Application number: CN202311369182.5A
Authority: CN
Inventors: 黄乐; 董培; 庞建新; 谭欢
Original assignee: Shenzhen Ubtech Technology Co ltd
Current assignee: Shenzhen Ubtech Technology Co ltd
Priority date: 2023-10-20
Filing date: 2023-10-20
Publication date: 2024-02-06

Abstract

The application is applicable to the technical field of terminals, and particularly relates to a gesture recognition method, a gesture recognition device, electronic equipment and a computer readable storage medium. In the method, the electronic equipment can acquire a first image containing the gesture to be recognized, and input the first image into a first image recognition model for processing to obtain a target gesture type output by the first image recognition model. Then, the electronic device can determine a second image recognition model according to the target gesture category, and input the first image into the second image recognition model for processing, so as to obtain a gesture recognition result output by the second image recognition model. The method comprises the steps of firstly carrying out rough gesture recognition on a first image through a first image recognition model, and determining a target gesture category corresponding to the first image. Then, through the second image recognition model corresponding to the target gesture category, the gesture recognition result corresponding to the first image is accurately determined, so that the accuracy of gesture recognition can be improved, the false recognition of gestures is reduced, and the user experience is improved.

Description

Gesture recognition method, gesture recognition device, electronic equipment and computer readable storage medium

Technical Field

The application belongs to the technical field of terminals, and particularly relates to a gesture recognition method, a gesture recognition device, electronic equipment and a computer readable storage medium.

Background

Human-computer interaction through gestures is an important interaction mode, namely, the electronic equipment can be driven to execute corresponding actions through gesture recognition. Currently, gesture recognition is generally performed by detecting key points of each joint of a finger and according to the position and angle relation between the key points. The gesture recognition method based on the key points of each joint is poor in recognition accuracy, easy to cause false recognition and poor in user experience.

Disclosure of Invention

The embodiment of the application provides a gesture recognition method, a gesture recognition device, electronic equipment and a computer readable storage medium, which can solve the problems that the gesture recognition method in the prior art is poor in recognition accuracy, easy to cause false recognition and poor in user experience.

In a first aspect, an embodiment of the present application provides a gesture recognition method, including:

acquiring a first image containing a gesture to be recognized;

inputting the first image into a first image recognition model for processing to obtain a target gesture class output by the first image recognition model;

determining a second image recognition model according to the target gesture category;

and inputting the first image into the second image recognition model for processing to obtain a gesture recognition result output by the second image recognition model.

In the image recognition method provided by the invention, when gesture recognition is required, the electronic device can acquire the first image containing the gesture to be recognized, and can input the first image into the first image recognition model for processing, so as to obtain the target gesture type output by the first image recognition model. Then, the electronic device can determine a second image recognition model according to the target gesture category, and input the first image into the second image recognition model for processing, so as to obtain a gesture recognition result output by the second image recognition model. In other words, in the embodiment of the present application, the electronic device may first perform rough gesture recognition on the first image through the first image recognition model, so as to determine the target gesture category corresponding to the first image. Then, the gesture recognition result corresponding to the first image can be accurately determined through the second image recognition model corresponding to the target gesture category, so that the accuracy of gesture recognition can be effectively improved, the false recognition of gestures can be reduced, and the user experience can be improved.

In one example, after the acquiring the first image containing the gesture to be recognized, the method may further include:

acquiring an image area corresponding to the gesture to be recognized in the first image, wherein the image area is a partial area in the first image;

the step of inputting the first image into a first image recognition model for processing to obtain a target gesture class output by the first image recognition model comprises the following steps:

inputting the image area into the first image recognition model for processing to obtain the target gesture category output by the first image recognition model;

the step of inputting the first image into the second image recognition model for processing to obtain a gesture recognition result output by the second image recognition model comprises the following steps:

and inputting the image area into the second image recognition model for processing to obtain a gesture recognition result output by the second image recognition model.

In the image recognition method provided by the example, the first image recognition model and the second image recognition model can only process partial image areas corresponding to the gesture to be recognized, so that the calculated amount of the first image recognition model and the second image recognition model can be effectively reduced, the recognition speed of the first image recognition model and the second image recognition model is improved, the gesture recognition speed is improved, and the user experience is improved.

Illustratively, before the inputting the image region into the first image recognition model for processing, the method may further include:

and adjusting the size of the image area to a preset size.

Optionally, the preset size is 320×320.

In the image recognition method provided by the example, the calculation amount of the first image recognition model and the second image recognition model can be reduced by adjusting the size of the image area, and the recognition speed of the first image recognition model and the second image recognition model can be effectively improved, so that the gesture recognition speed is improved, and the user experience is improved.

In one example, the first image recognition model and the second image recognition model may be recognition models based on a mobilent network or a resnet network.

Optionally, the first image recognition model is obtained by training with gesture images corresponding to different gesture categories, the second image recognition model corresponding to the target gesture category is obtained by training with gesture images corresponding to the target gesture category, and each gesture category includes one or more gestures.

In a second aspect, embodiments of the present application provide a gesture recognition apparatus, including:

the gesture recognition system comprises a first image acquisition module, a second image acquisition module and a gesture recognition module, wherein the first image acquisition module is used for acquiring a first image containing a gesture to be recognized;

the first recognition module is used for inputting the first image into a first image recognition model for processing to obtain a target gesture class output by the first image recognition model;

the second recognition model determining module is used for determining a second image recognition model according to the target gesture category;

and the second recognition module is used for inputting the first image into the second image recognition model for processing to obtain a gesture recognition result output by the second image recognition model.

In a third aspect, an embodiment of the present application provides an electronic device, including a memory, a processor, and a computer program stored in the memory and executable on the processor, where the processor implements the gesture recognition method of any one of the first aspects when the processor executes the computer program.

In a fourth aspect, embodiments of the present application provide a computer-readable storage medium storing a computer program that, when executed by a processor, implements the gesture recognition method of any one of the first aspects above.

In a fifth aspect, embodiments of the present application provide a computer program product, which when run on an electronic device, causes the electronic device to perform the gesture recognition method of any one of the first aspects above.

It will be appreciated that the advantages of the second to fifth aspects may be found in the relevant description of the first aspect, and are not described here again.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the following description will briefly introduce the drawings that are needed in the embodiments or the description of the prior art, it is obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a diagram of examples of keypoints in a keypoint-based gesture recognition;

FIG. 2 is an exemplary diagram of a gesture;

FIG. 3 is a schematic flow chart of a gesture recognition method provided in an embodiment of the present application;

FIG. 4 is a schematic structural diagram of a gesture recognition apparatus according to an embodiment of the present disclosure;

fig. 5 is a schematic structural diagram of an electronic device according to an embodiment of the present application.

Detailed Description

In the following description, for purposes of explanation and not limitation, specific details are set forth, such as particular system configurations, techniques, etc. in order to provide a thorough understanding of the embodiments of the present application. It will be apparent, however, to one skilled in the art that the present application may be practiced in other embodiments that depart from these specific details. In other instances, detailed descriptions of well-known systems, devices, circuits, and methods are omitted so as not to obscure the description of the present application with unnecessary detail.

It should be understood that the terms "comprises" and/or "comprising," when used in this specification and the appended claims, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.

It should also be understood that the term "and/or" as used in this specification and the appended claims refers to any and all possible combinations of one or more of the associated listed items, and includes such combinations.

As used in this specification and the appended claims, the term "if" may be interpreted as "when..once" or "in response to a determination" or "in response to detection" depending on the context. Similarly, the phrase "if a determination" or "if a [ described condition or event ] is detected" may be interpreted in the context of meaning "upon determination" or "in response to determination" or "upon detection of a [ described condition or event ]" or "in response to detection of a [ described condition or event ]".

In addition, in the description of the present application and the appended claims, the terms "first," "second," "third," and the like are used merely to distinguish between descriptions and are not to be construed as indicating or implying relative importance.

Reference in the specification to "one embodiment" or "some embodiments" or the like means that a particular feature, structure, or characteristic described in connection with the embodiment is included in one or more embodiments of the application. Thus, appearances of the phrases "in one embodiment," "in some embodiments," "in other embodiments," and the like in the specification are not necessarily all referring to the same embodiment, but mean "one or more but not all embodiments" unless expressly specified otherwise. The terms "comprising," "including," "having," and variations thereof mean "including but not limited to," unless expressly specified otherwise.

Human-computer interaction through gestures is an important interaction mode, namely, the electronic equipment can be driven to execute corresponding actions through gesture recognition. Currently, gesture recognition is generally performed by detecting key points of each joint of a finger and according to the position and angle relation between the key points.

For example, referring to FIG. 1, FIG. 1 illustrates an exemplary plot of keypoints in a keypoint-based gesture recognition.

In gesture recognition based on a key point, gesture recognition is generally performed by detecting a key point of each joint, that is, detecting a key point 0 to a key point 20 shown in fig. 1. Therefore, in gesture recognition based on the key points, if gesture recognition is performed using a gesture recognition model, 21 key points need to be marked on the hand in each training image when training the gesture recognition model, and training the gesture recognition model generally requires a large number of training images, resulting in a large data marking amount.

In addition, in gesture recognition based on a keypoint, the result of gesture recognition is strongly correlated with the detection of the keypoint.

For example, referring to fig. 2, fig. 2 shows an exemplary diagram of a gesture.

As shown in fig. 2 (a), when a part of a gesture is blocked, only a part of key points corresponding to the gesture can be detected, and according to the fact that the part of key points corresponding to the gesture is difficult to determine an accurate gesture recognition result, gesture misrecognition is easily caused, and user experience is poor.

For similar gestures, for example, as shown in (b) in fig. 2 and (c) in fig. 2, a gesture of "opening a palm" and a gesture of comparing "number 4" are similar, and because the key points of the two gestures are similar, the two gestures are identified according to the detected key points, and thus false identification is easily caused, which results in poor user experience.

In summary, the gesture recognition method based on the key points of each joint is poor in recognition accuracy, and is easy to cause false recognition, so that user experience is poor.

In order to solve the above problems, embodiments of the present application provide a gesture recognition method, a gesture recognition device, an electronic device, and a computer readable storage medium. In the method, when gesture recognition is required, the electronic device can acquire a first image containing a gesture to be recognized, and can input the first image into a first image recognition model for processing to obtain a target gesture type output by the first image recognition model. Then, the electronic device can determine a second image recognition model according to the target gesture category, and input the first image into the second image recognition model for processing, so as to obtain a gesture recognition result output by the second image recognition model. In other words, in the embodiment of the present application, the first image may be first subjected to rough gesture recognition by using the first image recognition model, so as to determine the target gesture category corresponding to the first image. Then, the gesture recognition result corresponding to the first image can be accurately determined through the second image recognition model corresponding to the target gesture category, so that the accuracy of gesture recognition can be effectively improved, the false recognition of gestures is reduced, the user experience is improved, and the gesture recognition method has strong usability and practicability.

The gesture recognition method provided by the embodiment of the application can be applied to electronic devices such as smart home, robots, mobile phones, tablet computers, wearable devices, vehicle-mounted devices, augmented reality (augmented reality, AR)/Virtual Reality (VR) devices, notebook computers, ultra-mobile personal computer (UMPC), netbooks, personal digital assistants (personal digital assistant, PDA) and the like, and the specific types of the electronic devices are not limited.

The gesture recognition method provided in the embodiments of the present application is described in detail below with reference to the accompanying drawings.

Referring to fig. 3, fig. 3 is a schematic flowchart of a gesture recognition method according to an embodiment of the present application, where the method may be applied to an electronic device. As shown in fig. 3, the method may include:

s301, acquiring a first image containing a gesture to be recognized.

In one example, the gesture to be recognized may be a gesture performed by a user while the user interacts with the electronic device. Optionally, an image acquisition device such as a camera may be disposed in the electronic device. When the electronic device interacts with the user, the electronic device may activate an image capturing device such as a camera, and may capture an image (hereinafter may be referred to as a first image) including a gesture performed by the user through the image capturing device such as the camera.

It should be understood that the gesture included in the first image may be a gesture to be recognized that is required to be recognized by the electronic device.

In another example, the first image may be an image acquired by another electronic device, for example, an image including a gesture acquired by another electronic device through an image acquisition device such as a camera, or an image including a gesture acquired by another electronic device from a network, or the like.

It should be noted that, the above-mentioned obtaining the first image by the image capturing device such as the camera of the electronic device or obtaining the first image by other electronic devices is only exemplary and explained, and should not be construed as limiting the embodiment of the present application, where the manner of obtaining the first image is not limited and may be determined according to the actual scene.

S302, inputting the first image into a first image recognition model for processing, and obtaining a target gesture type output by the first image recognition model.

In the embodiment of the application, for each gesture to be recognized by the electronic device, the gestures may be classified into one or more gesture categories according to the difference between the gestures. One or more gestures may be included in each gesture category. The difference between gestures in the same gesture category is small, i.e. the gestures in the same gesture category may be similar, the difference between gestures in different gesture categories is large.

For example, taking 8 gestures to be recognized by the electronic device, including "fist," "praise," "heart," "OK," "yer," "do-nothing," "hand-hold," "number 4," as the three gestures are similar, all or part of the fingers need to be held. The "OK" and "than" gestures are similar, requiring a portion of the finger to be extended. The three gestures "do not", "hold" and "number 4" are relatively similar. The "do nothing" may refer to a vertically open palm, and the "support" may refer to a horizontally open palm. Thus, these 8 gestures can be divided into 3 gesture categories. The first gesture category (hereinafter may be referred to as gesture category a) may include "fist," praise, "and" heart-rate, "the second gesture category (hereinafter may be referred to as gesture category B) may include" OK "and" bijean, "and the third gesture category (hereinafter may be referred to as gesture category C) may include" do-it, "" hand-holding, "and" number 4.

Optionally, after the first image is acquired, the electronic device may first perform rough recognition on the first image through the first image recognition model to determine a target gesture category corresponding to the first image, that is, the electronic device may first determine, through the first image recognition model, a gesture category to which a gesture included in the first image approximately belongs.

For example, when the gesture category includes gesture category a, gesture category B, and gesture category C, the electronic device may first determine, through the first image recognition model, which of gesture category a, gesture category B, and gesture category C the target gesture category corresponding to the first image is.

The first image recognition model may be a recognition model using a mobile network or a resnet network as a backbone network, for example. The first image recognition model can be obtained through gesture image training corresponding to different gesture categories.

For example, when the gesture category includes gesture category a, gesture category B, and gesture category C, the first image recognition model may be trained together by the gesture image corresponding to gesture category a, the gesture image corresponding to gesture category B, and the gesture image corresponding to gesture category C. The trained first image recognition model can recognize an image containing gestures, and determine which gesture type A, gesture type B and gesture type C the gestures contained in the image belong to.

It should be noted that, in the embodiment of the present application, the execution body for training the first image recognition model is not specifically limited, and may be an electronic device for executing the gesture recognition method, or may be a cloud or other electronic devices, which may be specifically determined according to an actual application scenario.

S303, determining a second image recognition model according to the target gesture type.

In this embodiment of the present application, for each gesture to be recognized by an electronic device, after classifying the gesture into one or more gesture categories, for each gesture category, in order to accurately recognize the gesture in the gesture category, an image recognition model corresponding to the gesture category may be trained through a gesture image corresponding to the gesture category.

For example, when the gesture category includes gesture category a, gesture category B, and gesture category C, the image recognition model a corresponding to gesture category a may be trained by the gesture image corresponding to gesture category a, the image recognition model B corresponding to gesture category B may be trained by the gesture image corresponding to gesture category B, and the image recognition model C corresponding to gesture category C may be trained by the gesture image corresponding to gesture category C. The trained image recognition model A can accurately recognize various gestures in the gesture category A, the trained image recognition model B can accurately recognize various gestures in the gesture category B, and the trained image recognition model C can accurately recognize various gestures in the gesture category C.

Optionally, the image recognition model corresponding to each gesture category may be a recognition model using a mobile network or a resnet network as a backbone backhaul network.

It should be noted that, in the embodiment of the present application, the execution subject for training the image recognition model corresponding to each gesture category is not specifically limited, and may be an electronic device for executing the gesture recognition method, or may be a cloud or other electronic devices, and may be specifically determined according to an actual application scenario.

Thus, after determining the target gesture category corresponding to the first image, the electronic device may determine an image recognition model (hereinafter may be referred to as a second image recognition model) corresponding to the target gesture category to accurately recognize the gesture of the first image through the second image recognition model.

S304, inputting the first image into the second image recognition model for processing, and obtaining a gesture recognition result output by the second image recognition model.

In this embodiment of the present application, after determining the second image recognition model according to the target gesture category corresponding to the first image, the electronic device may input the first image to the second image recognition model. The second image recognition model can recognize the first image so as to accurately determine a gesture recognition result corresponding to the gesture to be recognized in the first image. The gesture recognition is performed through the first image recognition model and the second image recognition model, so that the key points of the hand can be blocked or similar gestures can be accurately recognized, and the recognition accuracy is improved.

The gesture recognition method provided in the embodiment of the present application is described below by taking a specific application scenario as an example.

For example, when the gesture category includes the gesture category a, the gesture category B, and the gesture category C, and the gesture to be recognized included in the first image is "like" the gesture, the electronic device may determine, through the first image recognition model, that the target gesture category corresponding to the first image is the gesture category a.

Then, the electronic device may determine, according to the gesture type a, that the second image recognition model is the image recognition model a corresponding to the gesture type a. Thus, the electronic device can input the first image to the image recognition model a for processing. The image recognition model A can accurately determine that the gesture to be recognized is praise according to the first image.

In one possible implementation, after the first image is acquired, the electronic device may first resize the first image to a preset size, for example, the first image may be resized to 320×320, that is, the first image may include 320 pixels in a lateral direction and 320 pixels in a longitudinal direction. Then, the electronic device can perform gesture recognition according to the first image adjusted to the preset size. The first image adjusted to the preset size can be input into the first image recognition model to be processed, the target gesture type corresponding to the first image is obtained, the second image recognition model is determined according to the target gesture type, and the first image adjusted to the preset size can be input into the second image recognition model to be processed, so that a final gesture recognition result is obtained. The method can reduce the calculated amount of the first image recognition model and the second image recognition model by adjusting the size of the first image, and improve the recognition speed of the first image recognition model and the second image recognition model, so that the gesture recognition speed is improved, and the user experience is improved.

In another possible implementation manner, after the first image is acquired, the electronic device may first detect a hand region from the first image, that is, detect an image region corresponding to the gesture to be recognized, where the image region corresponding to the gesture to be recognized may be a partial region in the first image, and may perform gesture recognition according to the image region corresponding to the gesture to be recognized. The method comprises the steps that after an image area corresponding to a gesture to be recognized is obtained, the image area can be input into a first image recognition model to be processed, a target gesture type corresponding to a first image is obtained, after a second image recognition model is determined according to the target gesture type, the image area is input into the second image recognition model to be processed, a final gesture recognition result is obtained, so that the first image recognition model and the second image recognition model only need to process partial image areas corresponding to the gesture to be recognized, the calculated amount of the first image recognition model and the second image recognition model can be effectively reduced, the recognition speed of the first image recognition model and the second image recognition model is improved, the gesture recognition speed is improved, and the user experience is improved.

Alternatively, the electronic device may detect the hand region in the first image by using a target detection method such as Yolov3 or Yolov 5. The embodiment of the application does not limit the specific mode of detecting the hand area, and can be specifically determined according to the actual application scene.

In one example, after determining the image area corresponding to the gesture to be recognized, the electronic device may first adjust the size of the image area to a preset size, for example, may adjust the size of the image area to 320×320. Then, the electronic device can perform gesture recognition according to the image area adjusted to the preset size, so that the calculated amount of the first image recognition model and the second image recognition model is reduced by adjusting the size of the image area, and the recognition speed of the first image recognition model and the second image recognition model is improved, so that the gesture recognition speed is improved, and the user experience is improved.

It should be understood that the specific recognition manner of performing gesture recognition according to the image area adjusted to the preset size is similar to that of performing gesture recognition according to the first image adjusted to the preset size, and will not be described herein.

In this embodiment, when gesture recognition is required, the electronic device may acquire a first image including a gesture to be recognized, and may input the first image to the first image recognition model to perform processing, so as to obtain a target gesture category output by the first image recognition model. Then, the electronic device can determine a second image recognition model according to the target gesture category, and input the first image into the second image recognition model for processing, so as to obtain a gesture recognition result output by the second image recognition model. In other words, in the embodiment of the present application, the first image may be first subjected to rough gesture recognition by using the first image recognition model, so as to determine the target gesture category corresponding to the first image. Then, the gesture recognition result corresponding to the first image can be accurately determined through the second image recognition model corresponding to the target gesture category, so that the accuracy of gesture recognition can be effectively improved, the false recognition of gestures can be reduced, and the user experience can be improved.

It should be understood that the sequence number of each step in the foregoing embodiment does not mean that the execution sequence of each process should be determined by the function and the internal logic of each process, and should not limit the implementation process of the embodiment of the present application in any way.

Corresponding to the gesture recognition method described in the above embodiments, fig. 4 shows a block diagram of the gesture recognition apparatus provided in the embodiments of the present application, and for convenience of explanation, only the portions related to the embodiments of the present application are shown.

Referring to fig. 4, the apparatus may include:

a first image acquisition module 401, configured to acquire a first image including a gesture to be recognized;

the first recognition module 402 is configured to input the first image to a first image recognition model for processing, so as to obtain a target gesture class output by the first image recognition model;

a second recognition model determining module 403, configured to determine a second image recognition model according to the target gesture category;

and the second recognition module 404 is configured to input the first image to the second image recognition model for processing, so as to obtain a gesture recognition result output by the second image recognition model.

Optionally, the apparatus may further include:

the image area acquisition module is used for acquiring an image area corresponding to the gesture to be identified in the first image, wherein the image area is a partial area in the first image;

the first recognition module 402 is further configured to input the image area to the first image recognition model for processing, so as to obtain the target gesture class output by the first image recognition model;

the second recognition module 404 is further configured to input the image area to the second image recognition model for processing, so as to obtain a gesture recognition result output by the second image recognition model.

Illustratively, the apparatus may further include:

and the image adjusting module is used for adjusting the size of the image area to a preset size.

Optionally, the preset size is 320×320.

In one example, the first image recognition model and the second image recognition model are recognition models based on a mobilet network or a resnet network.

Optionally, the first image recognition model is trained by using gesture images corresponding to different gesture categories, and the second image recognition model is trained by using gesture images corresponding to the target gesture categories, where each gesture category includes one or more gestures.

It should be noted that, because the content of information interaction and execution process between the above devices/units is based on the same concept as the method embodiment of the present application, specific functions and technical effects thereof may be referred to in the method embodiment section, and will not be described herein again.

It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-described division of the functional units and modules is illustrated, and in practical application, the above-described functional distribution may be performed by different functional units and modules according to needs, i.e. the internal structure of the apparatus is divided into different functional units or modules to perform all or part of the above-described functions. The functional units and modules in the embodiment may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit, where the integrated units may be implemented in a form of hardware or a form of a software functional unit. In addition, specific names of the functional units and modules are only for convenience of distinguishing from each other, and are not used for limiting the protection scope of the present application. The specific working process of the units and modules in the above system may refer to the corresponding process in the foregoing method embodiment, which is not described herein again.

Fig. 5 is a schematic structural diagram of an electronic device according to an embodiment of the present application. As shown in fig. 5, the electronic apparatus 5 of this embodiment includes: at least one processor 50 (only one is shown in fig. 5), a memory 51, and a computer program 52 stored in the memory 51 and executable on the at least one processor 50, the processor 50 implementing the steps in any of the various gesture recognition method embodiments described above when executing the computer program 52.

The electronic device 5 may be an electronic device such as a mobile phone, a robot, an intelligent home, a desktop computer, a notebook computer, a palm computer, etc. The electronic device may include, but is not limited to, a processor 50, a memory 51. It will be appreciated by those skilled in the art that fig. 5 is merely an example of the electronic device 5 and is not meant to be limiting of the electronic device 5, and may include more or fewer components than shown, or may combine certain components, or different components, such as may also include input-output devices, network access devices, etc.

The processor 50 may be a central processing unit (central processing unit, CPU), the processor 50 may also be other general purpose processors, digital signal processors (digital signal processor, DSP), application specific integrated circuits (application specific integrated circuit, ASIC), field-programmable gate arrays (field-programmable gate array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, or the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

The memory 51 may in some embodiments be an internal storage unit of the electronic device 5, such as a hard disk or a memory of the electronic device 5. The memory 51 may in other embodiments also be an external storage device of the electronic device 5, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) card, a flash card (flash card) or the like, which are provided on the electronic device 5. Further, the memory 51 may also include both an internal storage unit and an external storage device of the electronic device 5. The memory 51 is used for storing an operating system, application programs, boot loader (BootLoader), data, other programs, etc., such as program codes of the computer program. The memory 51 may also be used to temporarily store data that has been output or is to be output.

Embodiments of the present application also provide a computer readable storage medium storing a computer program, which when executed by a processor, may implement the steps in the above-described embodiments of the gesture recognition method.

Embodiments of the present application provide a computer program product that, when run on an electronic device, causes the electronic device to perform the steps of the various gesture recognition method embodiments described above.

The integrated units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. Based on such understanding, the present application implements all or part of the flow of the method of the above embodiments, and may be implemented by a computer program to instruct related hardware, where the computer program may be stored in a computer readable storage medium, where the computer program, when executed by a processor, may implement the steps of each of the method embodiments described above. Wherein the computer program comprises computer program code which may be in source code form, object code form, executable file or some intermediate form etc. The computer readable storage medium may include at least: any entity or device capable of carrying computer program code to an apparatus/electronic device, a recording medium, a computer memory, a read-only memory (ROM), a random access memory (random access memory, RAM), an electrical carrier signal, a telecommunications signal, and a software distribution medium. Such as a U-disk, removable hard disk, magnetic or optical disk, etc. In some jurisdictions, computer-readable storage media may not be electrical carrier signals and telecommunications signals in accordance with legislation and patent practice.

In the foregoing embodiments, the descriptions of the embodiments are emphasized, and in part, not described or illustrated in any particular embodiment, reference is made to the related descriptions of other embodiments.

Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

In the embodiments provided in the present application, it should be understood that the disclosed apparatus/electronic device and method may be implemented in other manners. For example, the apparatus/electronic device embodiments described above are merely illustrative, e.g., the division of the modules or units is merely a logical function division, and there may be additional divisions in actual implementation, e.g., multiple units or components may be combined or integrated into another system, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed may be an indirect coupling or communication connection via interfaces, devices or units, which may be in electrical, mechanical or other forms.

The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

The above embodiments are only for illustrating the technical solution of the present application, and are not limiting; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present application, and are intended to be included in the scope of the present application.

Claims

1. A method of gesture recognition, comprising:

acquiring a first image containing a gesture to be recognized;

2. The method of claim 1, wherein after the acquiring the first image containing the gesture to be recognized, the method further comprises:

3. The method of claim 2, wherein prior to said inputting said image region into said first image recognition model for processing, said method further comprises:

and adjusting the size of the image area to a preset size.

4. A method according to claim 3, wherein the predetermined size is 320 x 320.

5. The method according to any one of claims 1 to 4, wherein the first image recognition model and the second image recognition model are recognition models based on a mobilent network or a resnet network.

6. The method of claim 5, wherein the first image recognition model is trained using gesture images corresponding to different gesture categories, and the second image recognition model corresponding to the target gesture category is trained using gesture images corresponding to the target gesture category, each of the gesture categories including one or more gestures.

7. A gesture recognition apparatus, comprising:

8. The apparatus of claim 7, wherein the apparatus further comprises:

the first recognition module is further used for inputting the image area into the first image recognition model for processing, and obtaining the target gesture category output by the first image recognition model;

the second recognition module is further configured to input the image area to the second image recognition model for processing, so as to obtain a gesture recognition result output by the second image recognition model.

9. An electronic device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, wherein the processor implements the gesture recognition method of any one of claims 1 to 6 when the computer program is executed by the processor.

10. A computer readable storage medium storing a computer program, characterized in that the computer program, when executed by a processor, implements the gesture recognition method according to any one of claims 1 to 6.